![]() | Ned Batchelder : Blog | Code | Text | Site Iterator merge in Python » Home : Blog : October 2006 |
Iterator merge in PythonTuesday 3 October 2006 Although I've been working in Python for a long time, every once in a while I come across a chunk of code that makes me stop and think, and admire its elegance and Pythonic nature. Raymond Hettinger's recipe for merging multiple sorted iterators is one of those pieces of code. It's only 29 lines including the supporting stuff, but it opened my eyes in these ways:
There's nothing like a succinct thought-provoking snippet to start your day.
tagged:
python» 6 reactions | |
Comments
Not impressed with this one. Far too much work for such a triffle thing. It does too much of the grunt work when Python already includes most of the bells and whistles. Simplicity trumps all else.
def merge(*iters):
iters = list(iter(i) for i in iters)
while iters:
for i in iters:
try:
yield i.next()
except StopIteration, e:
iters.remove(i)
print sorted(merge([1,2,3,5], [4,5]))
Calvin, yours is shorter, but misses an important point in Raymond's: yours pulls the entire contents of the iterators into memory, while his does not. Imagine trying to merge 5 sorted 2Gb files together. His will use very little memory (only 5 lines of text need to be stored at once), while yours will consume 10Gb.
One more thing: Calvin's version of course has use where memory use is not a concern, and can be simplified further, using only standard modules:
from itertools import chain
print sorted(chain([1,2,3,4], [4,5]))
I don't want to brag or anything, but I wrote something very close to Raymond a few day before he entered his version in the cookbook :
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/466302
It include all the nifty stuff about using heapq and calling next() on iterators. Well, yeah, I guess I'm bragging :) At least Raymond's version is generic and not mixed with file management.
How embarrasing! I misread the recipe originally. I did not notice the nifty trick of pushing both the value and the next function, and somehow interpreted it as pushing all values into the heap immediately. This is why I mistakenly saw nothing in the recipe that even cared or took advantage of the iterables being pre-sorted, and thus shot it down.
I conceed my disagreement.
Neat. It's worth noting the similarity of the (val, next-method) approach to the "streams" model in SICP which is fascinating (required?) reading.
Add a comment: