« | » Main « | »

Explaining descriptors

Tuesday 18 June 2013

I've always found descriptors to be one of the more confusing corners of Python. Partly, I think that's to do with the explanations in the docs, which I find opaque:

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol.

What is binding behavior, and why is it in quotes? This lead sentence hints at overriding attribute access, but doesn't tell me how it happens. It's a tall wall to scale right at the start of the learning process.

The best explanation I've seen of what descriptors do and why you'd want to write them was in Chris Beaumont's lightning talk at Boston Python, Demystifying Descriptors in 5 Minutes. The video quality was not great, but now Chris has written it up: Python Descriptors Demystified.

(Sorry about the quality, we're getting much better... PS: subscribe to our YouTube channel!)

Chris' insight was that instead of defining descriptors, and then showing how you could make properties with them, he flipped that explanation around: explain properties, then show how descriptors are like generalized properties. Read the whole thing: Python Descriptors Demystified.

When explaining things, you have to build from what people already know, a step at a time. I picture a student's understanding being like geography. What they know is a land mass, and when you teach them, you are extending their land out into the unknown ocean. You want to make their island bigger, and there's a particular point out in the ocean you want to encompass.

Some technical descriptions will explain that distant point, and either hope you can build the peninsula yourself, or expect to be able to build backwards toward the mainland. The classic descriptor explanation is like that: provide a definition of the distant concept, and hope students can make the leap.

Chris' explanation is more incremental. Start with what we know, and extend a little bit at a time, with motivations as we go. I love it.

BTW: I made some edits to the Python documentation, but they haven't been adopted: Edits to descriptor howto. Others have also suggested reorganizations of the docs about descriptors: Harmonizing descriptor protocol documentation.

Descriptors are still an advanced feature, and I don't expect everyone to understand and use them. But they are not as complicated as they first seem, and the explanations can do a better job helping people up that learning curve.

51 at MoMath

Sunday 16 June 2013

For my birthday (today), we visited the Museum of Math in New York (yesterday). I've been looking forward to getting there since it opened six months ago, and my reluctant family had to accede to my birthday destination, so all five of us spent the afternoon.

Museum of Math

The museum is a fun place, with lots of interactive exhibits. Some were intriguing but baffling, like a polyhedron exploration device which looked great, but was impossible to control. We rode square-wheeled tricycles, made ourselves into fractal trees, explored cross-sections in the wall of fire, rolled weird shapes to make weird paths, looked at specular holography, and so on.

As is typical with these kinds of high-traffic interactive displays, a number of them were not working, which was disappointing. But overall, it was a lot of fun, and not the same feel as math class at all. One helpful museum worker kept popping up to tell us how to better use the exhibits, and Susan said, "if I had her as a math teacher, I might have learned something in high school!"

It was a great day. If you enjoy mathematical thinking, I heartily recommend the Museum of Math.

A few blocks north, we found the Museum of Sex, but decided not to go in with the kids....

Filter a list into two parts

Tuesday 11 June 2013

A recent discussion on comp.lang.python asked how to Split a list into two parts based on a filter?, and the answers were interesting and insightful.

The question was by Roy Smith. He asked about dividing a list of songs into two lists based on a predicate, like this:

new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

but without iterating the list twice and calling the predicate twice for each element. He also provided this implementation, which iterates the list only once:

new_songs = []
old_songs = []
for s in songs:
    if s.is_new():
        new_songs.append(s)
    else:
        old_songs.append(s)

which got shortened to:

new_songs, old_songs = [], []
for s in songs: 
    (new_songs if s.is_new() else old_songs).append(s)

This works just fine for a simple list, but suppose the original sequence to be divided was a generator, and you don't want to buffer all the values in memory? Is there a way to do this lazily?

Usually, we go to the itertools module for things like this, but this problem is tricky: we need to get two lazy iterators at once.

Chris Angelico came up with this remarkable machine, with a deque tweak by Tim Chase:

def iterpartition(pred, it):
    """Partition an iterable based on a predicate.

    Returns two iterables, for those with pred False and those True.
    """
    falses, trues = collections.deque(), collections.deque()
    it = iter(it)
    def get_false():
        while True:
            if falses: 
                yield falses.popleft()
            else:
                while True:
                    val = next(it)
                    if pred(val): 
                        trues.append(val)
                    else: 
                        break
                yield val
    def get_true():
        while True:
            if trues: 
                yield trues.popleft()
            else:
                while True:
                    val = next(it)
                    if not pred(val): 
                        falses.append(val)
                    else: 
                        break
                yield val
    return get_false(), get_true()

Here two generators are created as closures, and returned as the result of the function. The generators cooperate to work over the original iterable, calling the predicate only once for each element, and buffering only enough values to find the next one you want.

Peter Otten contributed this gem:

def partition(items, predicate=bool):
    a, b = itertools.tee((predicate(item), item) for item in items)
    return ((item for pred, item in a if not pred),
            (item for pred, item in b if pred))

Here we finally have itertools helping us. This behaves like Chris' code, buffering enough values for either returned iterable to provide the next requested value. But we don't have to do the buffering, itertools.tee does it for us, and the predicate is called only once for each element in the iterable. Itertools.tee gets a generator expression that produces the original elements along with the result of calling the predicate. That is tee'd into two iterables, which are then fed to two generator expressions to select the proper values. Clever.

When I first read the question on the list, I didn't see how to improve on the list-building code. Now I've got two interesting implementations, and I learned a few things!

« | » Main « | »