« | » Main « | »

Recursive dogma

Monday 28 May 2012

This morning in the #python IRC channel, someone needed help with a recursive function:

Questioner: i have a problem with a little recursive function
Questioner: i create a list recursive of length 2**30 but i get a memory problem
Naysayer: recursion in Python is almost always bad news
Questioner: i have probs with high memory because i create too many too large list when i enter a value of 2**30 for N
RecursionFan: recursion is awesome
RecursionFan: it sounds like you're doing it wrong
Naysayer: you should avoid recursion in python
RecursionFan: why?
Naysayer: There's this little thing called the stack
Questioner: yeah i now the depth of my recursion, its just log(N)
Questioner: but can somebody help me so this function can work for parameters of 2**30
RecursionFan: there's this little thing called http://en.wikipedia.org/wiki/Tail_call
Naysayer: yeah, and python doesn't have it
Onlooker: RecursionFan, does python optimise tail-recursion?
nedbat: RecursionFan: Python doesn't do tail-call removal.
Naysayer: Onlooker: No
nedbat: Naysayer: "you should avoid recursion" is a bit strong. You need to understand the limitation.
nedbat: Naysayer: Python is as good at recursion as C is at integer arithmetic.
Naysayer: nedbat: It is good practice to avoid recursion in Python if you don't understand how it works.
nedbat: Naysayer: it is good practice to avoid _____ in Python if you don't understand how it works.

(Names other than mine, nedbat, are changed.) Before anyone's read Questioner's code, or even listened fully to his question, we're off on the usual rant against recursion in Python, and debates about tail-call elimination. As it happens, and as Questioner rightly understood, the problem is not about deep recursion:

def computeHW(N):
    if N == 1:
        return [1]
    else:
        L = computeHW(N/2)
        L.extend([2*i for i in L])
        return L

Questioner wanted to call this function with 2**30, which is quite modest in terms of depth, the stack will only get 30 levels deep. But it's crazy in memory terms: it tries to allocate a list with a billion elements, but not before it's allocated and deallocated many many others, up to that size.

So yes, this function isn't going to work, but not because of tail-call optimization or its lack. We can change the function into a generator that still works recursively, but won't allocate any memory:

def computeHW2(N):
    if N == 1:
        yield 1
    else:
        for i in computeHW2(N/2):
            yield i
        for i in computeHW2(N/2):
            yield 2 * i

Talking more with Questioner, it turns out he didn't want the sequence, he wanted to be able to randomly access elements of his list. We can turn this function into a closed-form solution that computes the n'th element of this list:

def closedHW(N):
    if N == 0:
        return 1
    else:
        bits = N.bit_length()
        return 2 * closedHW(N - 2**(bits-1))

This function is still recursive, and so will use up to 30 stack frames for numbers up to 2**30, but doesn't allocate any lists at all. This function is also in a simple tail-call form, so we can easily make it iterative:

def closedHW_iterative(N):
    hw = 1
    while N:
        bits = N.bit_length()
        N -= 2**(bits-1)
        hw *= 2
    return hw

BTW, this code gets to use one of the few methods on integers, bit_length(), which is new in 2.7.

In the end, we eliminated the recursion, but not for any of the reasons Naysayer was darkly hinting at.

When people say Python is bad at recursion, they are referring to the fact that the stack can't grow beyond 1000 frames. This makes it bad at the kind of recursion that languages like Scheme and Haskell encourage. This school of thought uses recursive procedures for iterative processes, to put it in terms from Structure and Interpretation of Computer Programs. Python is not good for this sort of recursion, because it won't work for more than 1000 iterations, which is far too low a limit.

But truly recursive processes don't often need that much stack, so using recursive procedures for recursive processes is fine in Python. Blanket statements like "recursion is almost always bad in Python" are just simplistic.

Be the guy

Friday 25 May 2012

Max just finished his first year at NYU, studying film. Older students have significant film projects to finish, and younger students work as crew member on these shoots. Max has already finished two of these shoots, and yesterday I was taking him to the bus station so he could work for two weeks on a third one.

We were joking about messing up shots, and getting the director annoyed, and he said his goal was not to have anyone be annoyed at him. So I told him my theory of Being a Good Employee, and that is,

Find out what bothers your boss, and what makes him happy, and do the thing that makes him happy.

That's about it. Kind of obvious, but it turns out there are a lot of other rules of thumb you can follow that won't work as well. And Max gave me his guiding principle on film sets:

I just watch for when something needs to get done, and I try to be The Guy, like the guy who gets it done.

This is a great way to think about work. Don't get hung up on who's job is whose, don't wait to be told what to do, just be The Guy. Funny thing is, it reminds me of the Yiddish term, "mensch," which means "man," but means more than that, it means, "a solid person that you can count on to do the right thing." In a lot of ways, Max's "The Guy" is just like the classic "mensch."

It seems like Max's strategy is working, since he is being asked to crew on other sets.

I'm so proud of him.

Tabblo's last day: May 30

Thursday 24 May 2012

HP has announced the date of Tabblo's shutdown: May 30, next week. It's sad, but it's better than continuing to let it rot on the vine.

I've enhanced my Tabblo Lifeboat downloader, it's faster and more robust, and gets a little bit more information from Tabblo.

If you have stuff on Tabblo that you want to keep, get it now. If you need help, let me know.

Coverage.py v3.5.2

Friday 4 May 2012

Since posting v3.5.2 beta 1 last Sunday, no one has said anything about it, so it must be perfect. The exact same code is now Coverage.py v3.5.2. This release of the foremost code coverage tool for Python includes a number of small fixes. The full details, including links to the tickets that were closed, are in the coverage.py change history.

May all your lines and branches be covered!

« | » Main « | »