Python mystery #6237: solved

Sunday 19 October 2008This is nearly 16 years old. Be careful.

I was refactoring some code today, and couldn’t figure out why my simple textbook change was making code break, until I relearned a subtlety of Python.

The code looked like this:

def __eq__(self, other):
    # Compare the parts, but be clever when comparing the specs.
    if self.__class__ != other.__class__:
        return False
    return self.spec.make_div_only() == other.spec.make_div_only() and self.chunks == other.chunks

I didn’t like that long last line, so I made the simple change to this:

def __eq__(self, other):
    # Compare the parts, but be clever when comparing the specs.
    if self.__class__ != other.__class__:
        return False
    if self.spec.make_div_only() != other.spec.make_div_only():
        return False
    return self.chunks == other.chunks

and ran my unit tests, and they failed. What!? How could doing a simple boolean refactoring cause breakage? I scratched my head, re-read the lines, questioned my understanding of the short-circuit nature of the and operator, and so on. The usual “am I going crazy?” debugging techniques.

Undoing the refactor made the tests work again, I changed it again to the shorter lines, and the tests failed again. Adding some print statements to see the actual values being compared, I realized that the result of make_div_only is an object (of class Spec), and that object defines its own Spec.__eq__ method to define the meaning of the == operator for its instances.

Then it hit me: the class doesn’t define a __ne__ method. My refactoring changed the operator from == to !=, the first was overridden to provide custom semantics, but the second was not, so simple object inequality was being checked, so make_div_only inequality test was always true, and the method incorrectly always returned False.

The Python docs are clear on this point:

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. Accordingly, when defining __eq__(), one should also define __ne__() so that the operators will behave as expected.

Adding a __ne__ method to my Spec class made everything work properly:

def __ne__(self, other):
    return not self.__eq__(other)

Comments

[gravatar]
Well, it may make sense if you're dutch, but if I had to design this feature, I would make the __ne__ function return not self.__eq__(other) by default, so someone implementing __eq__ does not need to implement __ne__.

Of course, someone who really liked to override __ne__ to mean something different would have no problem doing it, apart from the users of his class coming after him with torches and pitchforks.

So where is the upside of not defaulting __ne__ to the right thing?
[gravatar]
@Andre, that's a good question. My best guess is that once you bring __lt__, __le__, __gt__, and __ge__ into the picture, it's harder to default everything to "make sense". And I don't know if you can default things so that someone defining __ne__ but not __eq__ would also get the right behavior. Probably the logic was, "screw it, this is hard, and if someone is advanced enough to define __eq__, they're advanced enough to take care of these other details."
[gravatar]
Jean-Paul Calderone 2:11 PM on 19 Oct 2008
Hi Ned,

You might want your class's comparison definition to work nicely with subclasses or unrelated classes which can compare themselves to it. If so, take a look at http://jcalderone.livejournal.com/32837.html.
[gravatar]
@Ned, that is a probability, and it's very interesting to see how different languages handle it. For example, Lua has no inequality metamethod, but defines inequality hard-coded as the inverse of equality.

I am pretty sure that one can in fact default things so that the functions are chained correctly (the default for __eq__ could look inside the class if the __ne__ function exists, and if so, return the inverse and __ne__ vice versa).

But even if the implementor did not want to go this extra mile (for whatever reason), he could assume that someone writing a class would implement equality first, and default __ne__ accordingly, thus removing a potential source for errors with little effort.

Same thing goes for __lt__ and the rest. Normally, people may implement __lt__ (and maybe __gt__), but I'd guess that __le__ is very seldom implemented, so it could default to not __gt__ (if it exists) or __lt__ or __eq__.
[gravatar]
@Jean-Paul, thanks for the link to your page, it really helped explain some things I had overlooked, like NotImplemented.
[gravatar]
@andre/ned: I'm not a python weenie, but doesn't python use the __cmp__ operator as the basis for the all comparison operations(eq, ne, lt, gt, le, gt)? Seems like you should just be overriding that instead of __eq__ in this case.
At least, that's how I read http://www.python.org/doc/2.5.2/ref/customization.html

But like I said, I don't "do" python so there's probably something I've overlooked.
[gravatar]
why change actual operations if you're just wrapping?
    if self.spec.make_div_only() == other.spec.make_div_only():
        return self.chunks == other.chunks
    return False
[gravatar]
Broofa: __cmp__ is -- for reasons unknown to me -- frowned upon in favor of the rich comparison methods:

"These are the so-called “rich comparison” methods, and are called for comparison operators in preference to __cmp__() below"

I've been using a class decorator based around this total_ordering decorator to synthesize rich comparison methods given just one.
[gravatar]
@Samus: yes, I could have refactored your way, but when making a ladder of comparisons like this, I prefer to have each clause break out of the function to keep a similar shape. Besides, I wouldn't have figured out mystery #6237!
[gravatar]
@Jacob: whoa! that's a pretty intense recipe. I'll have to remember that. I'm not sure I like calling __lt__ twice to get __ne__, but it is cool that Foord automated the relationships.
[gravatar]
Not sure why you have an if statement at all?
return (self.__class__ == other.__class__ and 
    self.spec.make_div_only() == other.spec.make_div_only() and 
    self.chunks == other.chunks)
Slightly faster, slightly shorter and I believe slightly clearer (as there's no branching in the logic).
[gravatar]
@Tom: I just don't happen to like multi-line statements like that. I like the explicit return False's better. Just a matter of taste I guess.
[gravatar]
Taste is important but there are good reasons in general why boolean expressions beat logic.

Terseness is obvious, readability might vary between individuals, but the main fact is that conditional branches ("if statements") are classic points where mistakes happen, because you have to do one higher level of reasoning about where the program counter is - and classically people make mistakes there. (The problem you exhibit here wouldn't have been fixed this way but other less subtle ones would. The typical result is a return statement that you can't in practice reach...)

But this is a tiny detail :-D not worth fussing about in the large scale of things.
[gravatar]
(and thanks for the article, you undoubtedly saved me from running into the same bug at some point in the future...)
[gravatar]
If I'm reading point #3 in the following list correctly, __ne__ will default to the opposite of __eq__ in python 3.0.

http://docs.python.org/dev/3.0/whatsnew/3.0.html#other-language-changes

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.