A nasty little bug

Sunday 5 July 2009This is over 15 years old. Be careful.

James Bennett wrote to me the other day to ask for help with a problem using coverage.py. Some code he was measuring behaved very strangely: according to coverage.py, the first half of the function was executing, but the second half wasn’t. And this was happening for a whole group of functions. These were test functions for XML-producing Django views, and each parsed the XML. The call to the XML parsed was the last function executed each time, none of the lines after it were detected as run.

My first thought was that the XML parsing was throwing an exception, the simplest explanation for why execution would stop in the middle of a function. It made sense, since it was a similar operation in each function exhibiting this behavior. And James had already had another problem with this same code where an unexpected exception was throwing off his tests.

But James is a clever boy, and proved that exceptions were not to blame by adding asserts, print statements and so on. The lines were really being executed, but coverage.py didn’t think they were. Something more interesting and unusual was at work. I broke out my old quip:

I learn something new every day, no matter how hard I try.

To understand what happened here, I need to explain a little about how coverage.py works. Python provides a trace feature: you register a function with sys.settrace(), and it is called as code is executed. It is passed an event argument indicating what happened: “call” for entering a new scope, “return” for leaving a scope, “line” for a line of code executed, and “exception” for an exception being raised.

One of the changes I made in coverage.py 3.0 was to write a more sophisticated trace function which uses the nesting of call and return events to keep track of what file we’re running in and whether to record execution in that file.

Back to the bug: after an intensive day of adding extensive logging to the trace function and poring over 1000-line log files, I found a sequence like this:

    line expatbuilder.py 222
     line expatbuilder.py 223
     call pyexpat.c 905
        call expatbuilder.py 892
           line expatbuilder.py 894
           line expatbuilder.py 900
           exception expatbuilder.py 900
           return expatbuilder.py 900
        exception pyexpat.c 905
        exception expatbuilder.py 223
        line expatbuilder.py 225
        line expatbuilder.py 226

The first odd thing to note about this is the third line, where we seem to be entering a new scope at line 905 of pyexpat.c. That’s odd because the trace function doesn’t get called for execution in C files, only in Python files! And then further down, there’s an exception at line 905 in pyexpat.c, but no return from that line. All my experience with trace functions said that “call” and “return” events would always match. If a scope is left due to an exception being thrown, then there would be an “exception” event and a “return” event, like we see a little higher, at line 900 of expatbuilder.py.

Digging into pyexpat.c itself reveals a strange thing: for some reason, this module calls the trace function explicitly, using “pyexpat.c” as a file name. That explains how the trace function could see inside C code: it can’t, this module alone among all of the modules in the Python distribution calls the trace function to report on what it is doing. I have no idea why.

And further, it does it a little wrong: in the other place where the trace function is called, the interpreter itself, exceptions always generate a return event when they leave a scope. Pyexpat doesn’t do this: if an exception comes up through the C code, it will generate an exception event, but no return event to indicate the scope is exiting. My trace function counts on matching call and return events to keep the bookkeeping straight.

What was happening in James’ code is that inside the XML parser, an exception is being thrown (something that happens internally when parsing doctype declarations), which caused a return event to be skipped, which threw off my trace function’s bookkeeping, which made it think that James’ code was actually inside the XML parser module, which is code that shouldn’t be measured (since it is part of the Python standard library, and who wants to measure that), so the second half of James’ code wasn’t measured. Whew. Mystery solved, now what to do about it?

The first order of business was to write a test showing the problem. It took a while to get a small code sample, because without a doctype declaration in the XML chunk, there was no internal exception to throw off the event pairing. The line numbers in the logging files helped me figure out the source of the exception.

I took a few stabs at fixing the problem, because I was hoping for a general approach to the problem of detecting mis-matched call and return pairs, but I didn’t see a way to do that. In the end I took the unfortunate step of checking for “pyexpat.c” in the file name reported to the trace function, and assuming that an exception event from there needed a missing return event to be synthesized. You can see the unfortunate blot in the tracer code for yourself.

For good measure, I wrote a bug against pyexpat, but I’ll be surprised if anything is done about it. In fact, I kind of hope nothing is, since I’ll have to further complicate my workaround if there are newer versions of pyexpat that do the right thing.

OK, problem fixed, time to run my newly-enhanced test suite against all the versions of Python that coverage.py supports, 2.3 through 2.6.

Uh-oh: 2.3 failed.

Yet more digging, and it turns out that Python 2.3 doesn’t properly match call and return trace events: it shows the same problem with Python code that pyexpat does. In Python 2.3, an exception that leaves a scope will fire an exception event but no return event. D’oh!

2.3 is on the trailing edge of my support list, so I’m not willing to twist the code too badly for it, but I want to make it work.

In coverage.py 3.0, there are two trace functions: a fast one in C, and a slow one in Python for installations that can’t build the C extension. I didn’t want to complicate the C extension, so to keep Python 2.3 working, I tweaked the Python trace function. Now Python 2.3 always uses the Python trace function, which is too bad, since the C implementation is much faster. But it’s better than dropping support for Python 2.3 altogether.

This style of development is unfortunately par for the course for coverage.py. It’s a natural outcome of a few goals I have for it:

  • Accuracy: coverage.py’s job is to help developers understand their code. It has to give accurate information, or it’s just adding new mysteries for them to solve. Claiming lines aren’t executed when they are executed is unacceptable.
  • Applicability: if you are building real code in Python, coverage.py should work for you. This means it can’t opt out of older Python versions, or specific modules from the standard library.
  • Convenience: it shouldn’t be a burden to run coverage testing on your code, it should be fast and easy.

I wish I didn’t have to have pyexpat-specific code in the trace function, but I don’t see a way to avoid it. People should be able to measure code coverage of XML parsing code without having to read a footnote somewhere that says it doesn’t work.

Update: Funny how things work out. I had worked on this bug for a day or two, trying all sorts of ways to fix it. I thought I had found the best way, even though it wasn’t very nice. Then I wrote this blog post, and Andre commented on it, saying he thought there had to be a better way. While I was writing a response along the lines of, “no, there really isn’t, and here’s why,” a new idea occurred to me.

Each event is passed the frame object. The essence of a return event is that the next event will be in a different frame. Not only that, but we know what frame it will be in: the parent of the return event. When an exception event happens, there are two possibilities:

  • the exception is being caught, and the next event will be a line event in the same frame.
  • the exception is causing us to leave the scope, and the next event will be a return event in the same frame.

Either way, the event after an exception should be in the same frame. If the return event is incorrectly skipped, the next event will be something happening in the parent of the exception’s frame.

So that’s now how I detect missing return events: when an exception happens, record the parent frame pointer. If the next event happens in that frame, then we’re missing a return, and can fake one. I didn’t do the more straightforward thing of recording the frame the exception happened in, and checking against that, because by the time we get the next wrong event, that frame has been destroyed, and I didn’t like keeping a stale pointer to check against.

This fix is much nicer:

  • There’s no hard-coding of “pyexpat.c” in the code,
  • If pyexpat is ever fixed, this code will still work, and
  • Python 2.3 can still be handled by the C trace function.

Thanks everyone, for being my sounding board! The new nicer fix is viewable on bitbucket.

BTW: I wanted to say, “thanks for being my wooden Indian,” from an idea I read about (I thought) on the c2.com wiki about how if you explain something to someone, they don’t have to respond at all, just the act of talking it through will help you understand it better, so you might as well explain your problem to a wooden Indian. But I can find no trace of it now.

Comments

[gravatar]
Ick.

I guess it's fairly easy to see what happened here: pyexpat.c correctly implemented the trace rules as of 2.3 and wasn't updated when the rules changed in 2.4.

pyexpat has always tried to pretend to be more like regular Python than most C modules (it can appear in tracebacks too) which is I can understand the motivation for but it's so unlike everything else that I don't think it was a good idea.

Cheers,
mwh
[gravatar]
Well, you have three possible trace patterns: call/except or csll/except/return (discarding the details of lines and subcalls for now). This should be possible to solve without special code for either pyexpat or python 2.3 - as you appear to even have the signature (file, line).

So on exception you could set a flag that allows you to emit fake return traces walking up the stack to the file / line that this one is behind. Am I wrong? Why the ugly expat hack instead?

[gravatar]
@Michael: I would like to know more about why pyexpat went down this path. It is literally the only C module in the whole Python tree to fiddle with the trace function.

@Andre: the problem is that if an except is caught, then you'll get an exception event without a return event. There's no way to know if call/except is correct as is, or should be call/except/return. You can't even examine the file and location, because a caught exception will be call/except/line, with the last two in the same file, probably near each other. But a call/except/missingreturn/line could have the same pattern.

Though now that I'm thinking about it, maybe the frame object passed in could be useful...
[gravatar]
I don't know any details on why, I'm afraid. I've just bumped into it before.
[gravatar]
I've never heard of “wooden Indian” for that idea; perhaps that term is too racist to get much traction these days.

I've always known it as “rubber ducking”, as Sverrir points out: “thanks for being the rubber duck”, or “tell it to the duck first”.
[gravatar]
Back when I was studying to be a Mechanical Engineer, a professor used the Wooden Indian line, and he had to explain it as very few people (especially the foreign students) had a clue what he meant or what a Wooden Indian was (carved wood indian holding cigars to indicate a store sold tobacco).

We diverged into a discussion on techno-babble. The professor (whom I forget his name) believed it came from the dawn of the railroads, and first heard it from his grandfather.

Later when I switched to electrical engineering I heard the same maxim in my COBOL class, only it was a rubber duck.
[gravatar]
This is an exciting tale!

I remember my brushes with coverage tracing. One of the more interesting discoveries was that an exception raised inside the tracing function would silently disable tracing for the rest of the execution. Combine this with an unintentional and silently-suppressed RuntimeError due to a stack overflow and you're having fun. The stack overflow, by the way, was caused by a time-saving decision to implement a __len__ method in terms of __iter__ by writing return len(list(self)), without realizing that list(self) will call self.__len__() as an optimisation. Since it was an optimisation, list() caught the RuntimeError silently, discarded it, and went ahead with the slow path without letting us know about little side effects like lost performance and lost coverage tracing.

I don't know what the moral of the tale is. Don't be lazy? Keep your debugging skills sharp?
[gravatar]
Indeed, working at this level of the language makes for some very interesting debugging. Basic properties you thought were etched in stone can quietly morph beneath your feet. Sharp skills are very handy!
[gravatar]
Pardon me for thread resurrecting, but it looks like you can blame Martin for this:

http://mail.python.org/pipermail/xml-sig/2000-December/004214.html

Fred Drake made some comments about it in the context of Stackless compatibility, too:

http://mail.python.org/pipermail/python-dev/2003-March/034056.html

I think the root issue is that when extensions end up calling back into Python code and an exception is raised, the call stack doesn't include the extension calls, which can be confusing. I'm running into a similar issue right now--we have an application that embeds a Python interpreter for scripting, and the scripts must call back into the application as if it were an extension. In some cases, there can be several nested transitions, which makes the tracebacks rather hard to figure out: "But this script function doesn't even CALL that script function!"

Might be worth a writeup for python-ideas, if not python-dev.
[gravatar]
At work (where we have something similar with a python interpreter built into an application with custom 'builtin' modules), we have our own error/stack management at the boundary levels. To do this you must have control of the top level (i.e. embed the interpreter). At each point where the exception reaches back to the C side, we generate the stack text and raise a C++ special exception, which then turns back into python exception at the top of the C++ stack, unwinding all the way to top C++ level, which then puts the pieces back together. For serious fatal exceptions and C++ exceptions, we have those cross the python boundary and keep them in C++. You need to rebuild python with -fexceptions to allof for this to work, otherwise when the C++ exception hits the Python C/API it core dumps.
[gravatar]
Doug, that's almost a word for word description of what we've designed, too.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.