Thursday 20 December 2018 — This is six years old. Be careful.
Working on a programming challenge, I was surprised by something. I built a tree structure with a recursive function. Then I tried to use a recursive function to sum up some values across the tree, and was met with a RecursionError. How could I have enough stack depth to build the tree, but not enough to then sum up its values?
Python has a limit on how large its stack can grow, 1000 frames by default. If you recur more than that, a RecursionError will be raised. My recursive summing function seemed simple enough. Here are the relevant methods:
class Leaf:
def __init__(self):
self.val = 0 # will have a value.
def value(self):
return self.val
class Node:
def __init__(self):
self.children = [] # will have nodes added to it.
def value(self):
return sum(c.value() for c in self.children)
My code made a tree about 600 levels deep, meaning the recursive builder function had used 600 stack frames, and Python had no problem with that. Why would value() then overflow the stack?
The answer is that each call to value() uses two stack frames. The line that calls sum() is using a generator comprehension to iterate over the children. In Python 3, all comprehensions (and in Python 2 all except list comprehensions) are actually compiled as nested functions. Executing the generator comprehension calls that hidden nested function, using up an extra stack frame.
It’s roughly as if the code was like this:
def value(self):
def _comprehension():
for c in self.children:
yield c.value()
return sum(_comprehension())
Here we can see the two function calls that use the two frames: _comprehension() and then value().
Comprehensions do this so that the variables set in the comprehension don’t leak out into the surrounding code. It works great, but it costs us a stack frame per invocation.
That explains the difference between the builder and the summer: the summer is using two stack frames for each level of the tree. I’m glad I could fix this, but sad that the code is not as nice as using a comprehension:
class Node:
...
def value(self):
total = 0
for c in self.children:
total += c.value()
return total
Oh well.
Update: Jonathan Slenders suggested using a recursive generator to flatten the sequence of nodes, then summing the flat sequence:
class Leaf:
...
def values(self):
yield self.val
class Node:
...
def values(self):
for c in self.children:
yield from c.values()
def value(self):
return sum(self.values())
This is clever, and solves the problem. My real code had a mixture of two different nodes, one using sum() the other using max(), so it wouldn’t have worked for me. But it’s nice for when it does.
Comments
I understand the main point is to figure out what causes the recursion error, however, about fixing the problem, is this way (using a loop) better than setting a larger recursion limit?
I have just learned that it can be set by `sys.setrecursionlimit`, but I don't know whether it's common (or safe) to do this...
https://gist.github.com/iksteen/7948168640bdd67017c8
I’ve tried it on a couple of platforms, both 2.7 and 3.x, 32 and 64 bit, linux and mac. The actual limit before getting a segfault varies between ~15000 and 24000. So 10000 or a more conservative limit of 5000 should be ok.
Can anyone test this is on Windows?
Is there anything I'm missing that you've done?
I had the problem, that the RecursionError goes silently away, but still limits the recursion.
https://python-forum.io/Thread-Cheat-against-recursion-limit
Is my conclusion about 'yield from' right?
You can prevent the RecursionError, but you are still limited in recursion.
PS: I work always with IPython. There the default recursion limit is set to 3000. So, I tested my code with this value.
Add a comment: