A bug was filed against coverage.py this morning, and digging into it revealed a number of details about Python's inner workings. The bug boiled down to this code:

import copy

class Tricky(object):
    def __init__(self):
        self.special = ["foo"]

    def __getattr__(self, name):
        if name in self.special:
            return "yes"
        raise AttributeError()

t1 = Tricky()
assert t1.foo == "yes"
t2 = copy.copy(t1)
assert t2.foo == "yes"
print "This runs, but isn't covered."

The code runs just fine, but coverage.py claims that the last two lines aren't executed. They clearly are, because the print statement produces output during the run.

It turns out that coverage fails because there's an infinite recursion here, and when the Python interpreter unwinds the recursion, it doesn't report it to the trace function, so its bookkeeping gets out of whack.

But where's the recursion? It's well-known that you have to be careful in __getattr__ not to use an attribute that might be missing. That would cause an infinite recursion. But here, the only attribute used in __getattr__ is self.special, and that's created in __init__, so it should always be present, right?

The answer lies in how copy.copy works. When it copies an object, it doesn't invoke its __init__ method. It makes a new empty object, then copies attributes from the old to the new. In order to implement custom copying, the object can provide functions to do the copying, so the copy module looks for those attributes on the object. This naturally invokes __getattr__.

If we add a bit of logging to __getattr__ like this:

def __getattr__(self, name):
    print name
    if name in self.special:
        return "yes"
    raise AttributeError()

then we see the recursion:

foo
__getnewargs__
__getstate__
__setstate__
special
special
special
special
.. 989 more ..
special
special
special
special
foo

What's happening here is this: the copy module looks for a __setstate__ attribute, which doesn't exist, so __getattr__ is invoked. It tries to access self.special, but that doesn't exist either, because this is a newly created object which hasn't had __init__ invoked to create self.special. Because the attribute doesn't exist, __getattr__ is invoked, and the infinite recursion begins.

The Python interpreter limits the recursion to 1000 (or so) levels, but why don't we see the exception? Because the attribute access is inside the copy module's hasattr(o, "__setstate__"), and hasattr takes any exception to mean, "No, this attribute doesn't exist," returning False. So hasattr swallows the exception, and we never hear about it.

To fix the problem, we have to prevent the recursion due to looking up self.special:

def __getattr__(self, name):
    if name == "special":
        raise AttributeError()
    if name in self.special:
        return "yes"
    raise AttributeError()

Now there's no error due to reaching the recursion limit, and everything works the way it should. The moral of the story is that if you access an attribute in __getattr__, you have to defend against recursion, even if there's "no way" it could be missing from the object.

tagged: » 12 reactions

Comments

[gravatar]
Steve Holden 10:12 PM on 7 Oct 2010

A very subtle case indeed. And you *did* name the class appropriately, at least.

[gravatar]
Marius Gedminas 7:08 AM on 8 Oct 2010

I was once lazy and implemented a sequence-like object like this:

    def __iter__(self):
        ... bunch of yields or something ...
    def __len__(self):
        return len(list(self))
Turns out list() calls len() as an optimization, causing infinite recursion, and then hides any exceptions much like hasattr(). Infinite recursion disables the tracing hook set by sys.settrace().

[gravatar]
n1ywb 8:55 AM on 8 Oct 2010

Thanks for looking into this, Ned. You are clearly a Python guru. I always worry when I open a ticket regarding to different projects that the fingers will just point at each other, I appreciate you taking the time to identify the root cause, even if it's not in coverage.

[gravatar]
Benjamin Peterson 12:58 PM on 8 Oct 2010

Luckily, hasattr() is fixed in Python 3.2, too.

[gravatar]
Casey Duncan 1:04 PM on 8 Oct 2010

You could make a pretty strong argument that this is a bug in Python, caused by the fact that excessive recursion is a RuntimeError and hasattr() swallows all exceptions inheriting from Exception. If recursion exceptions were instead a specific exception class inheriting from BaseException, hasattr would pass it on saving some gray hairs. Here's another similarly dubious example of hasattr's behavior:

>>> class Bad:
...  def __getattr__(self, name): exec("I are not python")
... 
>>> b = Bad()
>>> b.a
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in __getattr__
  File "", line 1
    I are not python
        ^
SyntaxError: invalid syntax
>>> hasattr(b, "a")
False
IIRC, the situation used to be much worse. hasattr used to unconditionally swallow all exceptions, even SystemExit and KeyboardInterrupt. I think if hasattr were added to Python today, the incumbent's behavior wouldn't really fly. Ideally it would just catch AttributeError and pass through everything else. But of course that would break lots of existing code if it were changed to do that today.

Reporting excessive recursion as a specific exception, rather than a generic RuntimeError seems like a doable change though to me.

[gravatar]
Anand Chitipothu 12:24 AM on 12 Oct 2010

I generally use this workaround whenever I implement __getattr__.

def __getattr__(self, name):
    if name.startswith("__"):
        raise AttributeError(name)
    # rest of the implementation here

[gravatar]
Ned Batchelder 5:35 AM on 12 Oct 2010

@Marius: do you know where in the CPython code the trace function is disabled? I'm still wondering if there's anything coverage.py can do to alert the user that something pathological has happened. I went looking in the CPython source myself, and got lost in a maze of twisty passages, all alike.

@Anand: that's a prudent thing to do, but notice it wouldn't have prevented the recursion I encountered.

[gravatar]
Marius Gedminas 2:22 PM on 12 Oct 2010

No, I never looked at CPython sources for this.

You could look at sys.gettrace() at the end: if it still points to your trace function, everything's probably fine, but if it got reset to None, warn the user.

>>> def f(*args, **kw): pass
... 
>>> import sys
>>> sys.settrace(f)
>>> sys.gettrace()
<function f at 0xb77b36f4>
>>> sys.gettrace() is f
True
>>> def r(): r()
... 
>>> r()
Traceback (most recent call last):
  File "<stdin>", line 1, in 
  File "<stdin>", line 1, in r
...

  File "<stdin>", line 1, in r
RuntimeError: maximum recursion depth exceeded
>>> sys.gettrace()
>>> sys.gettrace() is None
True

[gravatar]
Marius Gedminas 2:29 PM on 12 Oct 2010

FWIW the code is in Python/sysmodule.c, in trace_trampoline:

	result = call_trampoline(tstate, callback, frame, what, arg);
	if (result == NULL) {
		PyEval_SetTrace(NULL, NULL);
		Py_XDECREF(frame->f_trace);
		frame->f_trace = NULL;
		return -1;
	}
Any exception that happens in the trace function turns it off. The stack overflow is always triggered in the trace function, for obvious reasons.

[gravatar]
Ned Batchelder 8:48 AM on 13 Oct 2010

@Marius, thanks, checking gettrace is a great idea, I'll be adding that soon, and thanks for the C code reference.

[gravatar]
Michael Foord 5:06 PM on 18 Oct 2010

This problem cropped up in Mock (and thanks to Ned for diagnosing it). Unfortunately although I can use the sys.settrace "trick" in standalone code it doesn't kick the trace function out when I run it in a test! To test the fix I called sys.setrecursionlimit(sys.maxint) - this reliably segfaults Python without the fix in place.

[gravatar]
Sergio Fernández 9:00 AM on 25 Oct 2010

There is an intentional asymmetry between __getattr__ and __setattr__, use the __getattribute__ method for a way to actually get total control. See more at http://pyref.infogami.com/__getattr__

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.