Tuesday 11 June 2013 — This is ten years old. Be careful.
A recent discussion on comp.lang.python asked how to Split a list into two parts based on a filter?, and the answers were interesting and insightful.
The question was by Roy Smith. He asked about dividing a list of songs into two lists based on a predicate, like this:
new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]
but without iterating the list twice and calling the predicate twice for each element. He also provided this implementation, which iterates the list only once:
new_songs = []
old_songs = []
for s in songs:
if s.is_new():
new_songs.append(s)
else:
old_songs.append(s)
which got shortened to:
new_songs, old_songs = [], []
for s in songs:
(new_songs if s.is_new() else old_songs).append(s)
This works just fine for a simple list, but suppose the original sequence to be divided was a generator, and you don’t want to buffer all the values in memory? Is there a way to do this lazily?
Usually, we go to the itertools module for things like this, but this problem is tricky: we need to get two lazy iterators at once.
Chris Angelico came up with this remarkable machine, with a deque tweak by Tim Chase:
def iterpartition(pred, it):
"""Partition an iterable based on a predicate.
Returns two iterables, for those with pred False and those True.
"""
falses, trues = collections.deque(), collections.deque()
it = iter(it)
def get_false():
while True:
if falses:
yield falses.popleft()
else:
while True:
val = next(it)
if pred(val):
trues.append(val)
else:
break
yield val
def get_true():
while True:
if trues:
yield trues.popleft()
else:
while True:
val = next(it)
if not pred(val):
falses.append(val)
else:
break
yield val
return get_false(), get_true()
Here two generators are created as closures, and returned as the result of the function. The generators cooperate to work over the original iterable, calling the predicate only once for each element, and buffering only enough values to find the next one you want.
Peter Otten contributed this gem:
def partition(items, predicate=bool):
a, b = itertools.tee((predicate(item), item) for item in items)
return ((item for pred, item in a if not pred),
(item for pred, item in b if pred))
Here we finally have itertools helping us. This behaves like Chris’ code, buffering enough values for either returned iterable to provide the next requested value. But we don’t have to do the buffering, itertools.tee does it for us, and the predicate is called only once for each element in the iterable. Itertools.tee gets a generator expression that produces the original elements along with the result of calling the predicate. That is tee’d into two iterables, which are then fed to two generator expressions to select the proper values. Clever.
When I first read the question on the list, I didn’t see how to improve on the list-building code. Now I’ve got two interesting implementations, and I learned a few things!
Comments
test = predicate(data)
If test:
yield (data, None)
else:
yield (None, data)
http://docs.python.org/dev/library/itertools.html#itertools-recipes
It looks like this, which I think is also more readable:
I haven't touched at all here on how different usages might skew the preference for one solution over another.
BC
The function partitionn below will partition items under the same restrictions as mentioned but into N parts given a predicate that returns an integer 0 .. n-1 when acting on any item. (Note that in the printed output of generator objects I changed the angle brackets to square brackets so it would not be consumed by the comment editor).
http://stackoverflow.com/questions/1359232/generate-two-lists-at-once
This was the best answer I got:
those_words = [w for w in words if w in ("foo", "bar")]
other_words = list(set(words) - set(those_words))
Add a comment: