Memory leak mystery

Sunday 22 November 2009This is over 13 years old. Be careful.

Here’s a detailed story of finding and fixing a memory leak in a Python C extension, complete with testing mysteries.

A user of reported that running it consumed all his memory. No one else had mentioned anything like this, but it’s a beta version, and those don’t get nearly the attention as regular releases.

He helpfully had reproduced the problem on a large public test suite by running Django’s test suite under coverage. I tried running the Django test suite with the latest coverage code, and sure enough, the memory ballooned. A plain run consumes about 100Mb, but with coverage it was up to 500Mb. I expect running with to use a little more memory, since it has to collect its data in memory, but the amount of data should be very small, proportional not to the execution time, but to the number of distinct lines of code executed. has two different trace modules, one in C and one in Python. Forcing the Python tracer (by using the --timid flag) showed no memory leak, confirming my suspicion that the C code was at fault.

I made a small test file to run for a long time:

def once(x):
    if x:
        return 10
        return 11
for i in xrange(10**9):

Running it, nothing untoward happened, no memory leak. OK, so something was different between Django and my test file. Of course, lots of things are different between them, the task was to figure out what was the important difference.

Poring over the C tracer code, there are plenty of interesting program events that cause stuff to happen internally: calling a function, returning from a function, the first execution in a file, raising an exception, deep recursion, and so on.

I added some statistics gathering in the tracer code to see which of these might be happening enough to correlate with the memory leak. Exceptions and deep recursion seemed like likely candidates since my test code had neither.

It turned out that the Django code didn’t recur deeply: it never got past the internally-interesting milestone of 100 stack frames. It did throw a lot of exceptions: the full test suite raises 1,862,318 of them. Examining the code, though, there’s not much happening there that could leak, and disabling that code proved that it wasn’t the issue.

How about first executions in a file? The Django test suite touches 9300 files (or 9300 different ways to refer to files). This is certainly more than my test file so maybe it was the source of the leak. When a new file is visited, the tracer calls back to a Python function to determine if the file is interesting enough to trace. One of the arguments to the callback is the stack frame. This seemed like a good candidate for a leak: mishandling the reference count on a frame could keep it around forever, and frames refer to lots of other things, so tons of memory could leak each of those 9300 times.

But a few experiments, ending with completely disabling the callback, proved that this code was not to blame either. I was running out of places to look.

The per-line code was about all that was left. It didn’t make any sense for this to be the problem, since my test file runs many many lines just like Django does. But just to rule it out, I commented out the line of C code that adds the line number to the data dictionary. To my surprise, the test suite no longer leaked memory!

Here’s the buggy line (simplified):

PyDict_SetItem(file_dict, PyInt_FromLong(frame->f_lineno), Py_None);

The problem is that PyInt_FromLong returns a new reference, and PyDict_SetItem doesn’t steal the reference. This is a memory leak. (If you don’t know what I’m talking about with this reference stuff, take a peek at A Whirlwind Excursion through Python C Extensions.)

The fix was straightforward; I had to explicitly release my reference on the integer object:

PyObject * this_line = PyInt_FromLong(frame->f_lineno);
PyDict_SetItem(file_dict, this_line, Py_None);

Running this on the Django tests showed that the problem was fixed! Great!

But back to the remaining mystery: why did my test file, which executed that same buggy line billions of times, not leak? The docs for PyInt_FromLong provided the answer:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.

Aha! Because my test file was only 10 lines long, the line numbers were all less than 256, so the integer objects returned by PyInt_FromLong were all pre-allocated and held for the lifetime of the process. The buggy code was mishandling the reference count, but it didn’t matter, since no new objects were allocated.

Finally I had the answer to my question: the difference between my file and the Django source is that Django has files longer than 256 lines.

And in fact, using the buggy tracer code, and adding 300 blank lines to my test file proved the point: once my simple code had line numbers above 256, it leaked like a sieve!

If there’s a lesson in this, it’s that complex systems introduce leaky abstractions in unexpected places. This has unfortunate implications for testing: how do I know if I’ve got test cases that can ferret out potential problems like this? Does white-box testing extend to reading up on the internal details of Python integers? If I add test cases with 300 lines of source, how do I know there isn’t some other effect that only kicks in at 5000?

Be careful out there...


Now I understand what exactly the problem was because after looking at the fix and observed that the memory leak was gone I was still unable to formulate clearly the root cause.
Your explanation help me understand why It has been so difficult to find a public test suite that was illustrating the issue.
I am amazed by how quickly you have been able to fix this complex issue once you had a test case to reproduce it.
thank you
Thanks, fascinating read! Gives me a lot better insight into debugging memory leaks in C extensions.
First rule: Never try to debug memory issues using a standard build of Python! It is extraordinarily difficult for two reasons. The first is the highly optimised Python memory allocator which obfuscates what is going on, often allocating objects in groups and with the group not being deallocated until all members are deallocated. The second is that most builtin objects keep a freelist - objects can get recycled rather than being freed and reallocated. This makes it harder to determine if they are inuse or not.

For Python 2.6+ (including 3) you can build your own debug python that uses the standard malloc/free thereby making it possible for valgrind to track what is going on and you can disable the recycle lists to keep object lifetimes clear for valgrind. Then just run your code under valgrind with the debug Python. It would have caught this case plus any other leaks you have first time.

Here is how:

The comments at the top show how to build a debug Python. The rest then run valgrind (or callgrind). Using "--leak-check=full --leak-resolution=high --show-reachable=yes" will show problems like this one on exit. Using --freelist-vol=500000000 will ensure that memory that is freed won't become eligible for reallocation for a long time and catch any use after free.
As always, Ned, you give a clear explanation with enough detail to explain the problem but not so much detail that the reader's eyes glaze over. And pointers to learn more, too.

QA/testing is always so tricky. Stuff like this helps me know what to worry about and what to test.
@Roger, thanks for these instructions. I'm sure I'll use them the next time I have an issue like this to debug.

@yml, @Eric, @Dan: Thanks!
First, excellent blog post, a very interesting read.
Second: I think this is an implementation issue, not a leaky abstraction. Your code was creating a reference and not then losing it. It was just masked by a cache.

When checking code for these issues, it would be very useful to have a function called "leak_occured". Then, add stress-tests to your regular black-box testing.
I'd do a little brainstorm on possible things to enlarge: single code lines, number of files, running time, number of branches, number of functions, and of course, file sizes :)

It's also interesting to note that the detection function may be also written "empirically" (by looking at the memory footprint) to catch more kinds of leaks.
@lorg, I added an automated test to try to catch these sorts of issues, although of course in addition to the fuzziness of what might trigger a leak, there's the complexity of separating normal RAM growth from leaky growth, not to mention the complex question of how Python memory use correlates with OS process RAM size...
Thanks for the insight in to this issue. Definitely interesting that it only appears with files >256 lines. We had just begun using coverage as part of our build process and I was very surprised that adding coverage was increasing memory usage by around an order of magnitude. Great to hear that it actually doesn't, and that this was a strange bug.

I'm guessing this fix will eventually show up as 3.2b3 on pypi?

Thanks again for the great tool.
@Wes: 3.2b3 is on PyPI now. I had to finish one improvement that was in-progress before posting it.
Using sys.gettotalrefcount() helps too (you need to have compiled python with PYDEBUG IIRC), I haven't been able to completely automate this yet since it seems very hard to control the exact number of reference counts (just using print with % formatting and a dict makes jumps in the reference counting for example, even if no object stays around), but I have a script that loops: if it stays stable after an initial jump it's ok, if it keeps increasing forever there's a problem.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.