![]() | Ned Batchelder : Blog | Code | Text | Site Enumerating trees » Home : Blog : February 2008 |
Enumerating treesMonday 11 February 2008 I was hacking around with a program that needed tree-structured data, and I wanted to generate all possible trees of a certain size. I searched around and found nothing useful. After a few experiments involving Python generators, postfix expressions, and recursive trees, I've got some code to enumerate binary trees. I started with postfix expressions because they are easy to work with, and are equivalent to trees, in that trees are like infix notation expressions, and they are equivalent to postfix. After some puzzling on it, I saw the recursive light, and came up with this to generate all possible postfix expressions: # Enumerate postfix expressions The nice thing here is the generator is a pretty clear expression of the enumeration technique: as you are building an expression:
The nature of recursive generators requires the for/yield structure: # When recurring in generators, you have to loop to pull the values, Running the code shows all the expressions of two operators on five values, and the counts of expressions of one operator over varying numbers of values: A B + C + D + I still wanted to find other work on this stuff, and I figured anyone who had successfully enumerated trees would also have done a count like I did, so I did a search on the last two counts (208012 742900). I was directed to a few interesting pages, in particular the wikipedia page on Catalan numbers, which confirmed that my count of trees is correct, and led to the theory of Dyck languages, which is too much math for my purposes, and led to a few interesting papers that I couldn't understand. To make trees, my first impulse was to use my expression generator and then execute the expressions to construct trees, but that seemed unnecessary, so I adapted the expression code to create trees directly: # Enumerate binary trees Which produces: (((A + B) + C) + D) One tricky thing about this generator: it avoids trees which are structurally different, but with associate operators produces the same result, so it won't return both ((A+B)+C) and (A+(B+C)). If you want both of those trees, remove the check at line 25 for right associativity. The number of trees grows very large as the number of leaves becomes big enough to be interesting. I guess I'll have to dig into application-specific ways to limit the choices. | |
Comments
Is this hyper-geekiness because Damien is now some kind of über geek in *his* blog? Honestly, I wish that Damien would go back to talking about liquid poop and you would talk about video games and Legos (tm)
Seriously, recursion is the tool of the devil. I am pretty sure that you have found reason #2 for recursion to exist in programming languages (#1 being a gee-whiz Fibonacci calculator that we all did in CS 100).
All recursion ever did for me was blow the stack.
Andrew:
Then you're missing out. There are LOTS of problems that are best solved with recursion. For instance, it isn't just enumerating trees... pretty much ANYTHING having to do with trees is best implemented using recursion. Blowing the stack with recursion isn't a problem if you're using it to walk a tree, really it's only a problem when you use recursion to do iteration (and then only in languages like Java or Python that don't handle recursion "properly", but that's a given).
This could be useful for writing fuzzers (and unit-testing) for tree-like structures.
Not too long ago I wrote about a possible compiler fuzzer, so it would make sense to implement one with a similar technique.
I always check Sloane's On-Line Encyclopedia of Integer Sequences for this sort of thing:
2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900
@lorg: yes, I imagine there are lots of testing possibilities here.
@Iain: Sloane's looks a little barebones, but could be very useful.
Andrew,
Where have you been, other than Florida? Ned's always been Uber Geeky, regardless of Damien and his sofa.
But I'm with you on recursion, especially the eternal sort. So last century.
Ned, cool stuff. I played with it in Iron Python for a bit.
By the way, if you're looking for real Über geeks, go rent The King Of Kong.
I got an email from Sanghyeon Seo pointing out that Donald Knuth is writing about this very topic in ACP vol 4, available as a fascicle: http://www-cs-faculty.stanford.edu/~knuth/fasc4a.ps
A quick look through there showed a few very dense algorithms to generate trees that seem much more efficient than mine, but I don't understand them!
Add a comment: