Yesterday I pleaded, Bug #915: please help! It got posted to Hacker News, where Robert Xiao (nneonneo) did some impressive debugging and found the answer.
The user’s code used mocks to simulate an OSError when trying to make temporary files (source):
with patch('tempfile._TemporaryFileWrapper') as mock_ntf:
mock_ntf.side_effect = OSError()
Inside tempfile.NamedTemporaryFile, the error handling misses the possibility that _TemporaryFileWrapper will fail (source):
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
try:
file = _io.open(fd, mode, buffering=buffering,
newline=newline, encoding=encoding, errors=errors)
return _TemporaryFileWrapper(file, name, delete)
except BaseException:
_os.unlink(name)
_os.close(fd)
raise
If _TemporaryFileWrapper fails, the file descriptor fd is closed, but the file object referencing it still exists. Eventually, it will be garbage collected, and the file descriptor it references will be closed again.
But file descriptors are just small integers which will be reused. The failure in bug 915 is that the file descriptor did get reused, by SQLite. When the garbage collector eventually reclaimed the file object leaked by NamedTemporaryFile, it closed a file descriptor that SQLite was using. Boom.
There are two improvements to be made here. First, the user code should be mocking public functions, not internal details of the Python stdlib. In fact, the variable is already named mock_ntf as if it had been a mock of NamedTemporaryFile at some point.
NamedTemporaryFile would be a better mock because that is the function being used by the user’s code. Mocking _TemporaryFileWrapper is relying on an internal detail of the standard library.
The other improvement is to close the leak in NamedTemporaryFile. That request is now bpo39318. As it happens, the leak had also been reported as bpo21058 and bpo26385.
Lessons learned:
- Hacker News can be helpful, in spite of the tangents about shell redirection, authorship attribution, and GitHub monoculture.
- There are always people more skilled at debugging. I had no idea you could script gdb.
- Error handling is hard to get right. Edge cases can be really subtle. Bugs can linger for years.
I named Robert Xiao at the top, but lots of people chipped in effort to help get to the bottom of this. ikanobori posted it to Hacker News in the first place. Chris Caron reported the original #915 and stuck with the process as it dragged on. Thanks everybody.
Comments
https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md
Add a comment: