|Ned Batchelder : Blog | Code | Text | Site|
Filter a list into two parts
» Home : Blog : June 2013
A recent discussion on comp.lang.python asked how to Split a list into two parts based on a filter?, and the answers were interesting and insightful.
The question was by Roy Smith. He asked about dividing a list of songs into two lists based on a predicate, like this:
but without iterating the list twice and calling the predicate twice for each element. He also provided this implementation, which iterates the list only once:
which got shortened to:
This works just fine for a simple list, but suppose the original sequence to be divided was a generator, and you don't want to buffer all the values in memory? Is there a way to do this lazily?
Usually, we go to the itertools module for things like this, but this problem is tricky: we need to get two lazy iterators at once.
Chris Angelico came up with this remarkable machine, with a deque tweak by Tim Chase:
Here two generators are created as closures, and returned as the result of the function. The generators cooperate to work over the original iterable, calling the predicate only once for each element, and buffering only enough values to find the next one you want.
Peter Otten contributed this gem:
Here we finally have itertools helping us. This behaves like Chris' code, buffering enough values for either returned iterable to provide the next requested value. But we don't have to do the buffering, itertools.tee does it for us, and the predicate is called only once for each element in the iterable. Itertools.tee gets a generator expression that produces the original elements along with the result of calling the predicate. That is tee'd into two iterables, which are then fed to two generator expressions to select the proper values. Clever.
When I first read the question on the list, I didn't see how to improve on the list-building code. Now I've got two interesting implementations, and I learned a few things!
tagged: python» 14 reactions