Monday 26 February 2018 — This is almost seven years old. Be careful.
In the #python IRC channel today, someone asked:
Does anyone know of any libraries that could convert ‘1-5,7,9,10-13’ to [1,2,3,4,5,7,9,10,11,12,13] ?
This seemed like an interesting challenge, and people started offering code. This was mine. It’s ungainly and surprising, and I wouldn’t want to keep it, so I call it a gargoyle:
[
i for p in s.split(',')
for a, _, b in [p.partition('-')]
for i in range(int(a), int(b or a)+1)
]
There are a few things going on here. First, this is a list comprehension, but with three nested loops. A simple list comprehension has this form:
result = [ EXPR for NAMES in ITERABLE ]
which is the same as this code:
result = []
for NAMES in ITERABLE:
result.append(EXPR)
For many, the list comprehension seems kind of backwards, where the expression comes first, before the loop produces it. Then the multi-loop form can seem like another surprise:
result = [ EXPR for NAMES1 in ITERABLE1 for NAMES2 in ITERABLE2 ]
which is equivalent to:
result = []
for NAMES1 in ITERABLE1:
for NAMES2 in ITERABLE2:
result.append(EXPR)
The first time I ever tried to write a double list comprehension, I thought the loops should go in the other order, in keeping with the Yoda-style of EXPR coming first. They don’t.
Back to the gargoyle. It’s a triply-nested loop (this time formatted a little differently):
[
i
for p in s.split(',')
for a, _, b in [p.partition('-')]
for i in range(int(a), int(b or a)+1)
]
The first loop splits the number ranges on comma, to produce the individual chunks: ‘1-5’, ‘7’, ‘9’, ‘10-13’. This seems like an obvious first step.
The next loop is the most surprising. Strings in Python have a .partition method, which is super-handy and under-used. It takes a separator, and produces three values: the part of the string before the separator, the separator itself, and the part of the string after the separator. The best thing about .partition is that is always produces three values, even if the separator doesn’t appear in the string. In that case, the first value is the whole string, and the other two values are empty strings:
>>> '1-5'.partition('-')
('1', '-', '5')
>>> '12'.partition('-')
('12', '', '')
>>>
This means we can always assign the result to three names. Super-handy.
But list comprehensions can’t have assignments in them, so what to do? Recently, the Python-Ideas mailing list had a thread about adding assignments to list comprehensions, which can simplify complicated comprehensions. In that thread, Stephan Houben pointed out that you can already get the same effect with a cute trick:
for x in [17]:
do_something()
# has the same effect as:
x = 17
do_something()
We can explicitly make a one-element list, and “iterate” over it to assign its value to a name. In my gargoyle, we get a and b as the two numbers in the chunk.
The third loop is where we actually generate numbers. We’ll use range(), and we always want to start from the first number in the chunk. If there was a second number in the chunk (b), then we want to iterate up to and including it, so we need range(a, b+1). If there was no second number, then we act just as if there were a second number, the same as the first number. That is, “7” should behave just like “7-7”. If there was no second number, then .partition will have set b to be the empty string. So “b or a” will give us the second number we need.
To help a little more, here is the gargoyle:
s = '1-5,7,9,10-13'
result = [
i
for p in s.split(',')
for a, _, b in [p.partition('-')]
for i in range(int(a), int(b or a)+1)
]
and here is the same idea written out as explicit loops:
s = '1-5,7,9,10-13'
result = []
for p in s.split(','):
a, _, b = p.partition('-')
for i in range(int(a), int(b or a)+1):
result.append(i)
I wouldn’t recommend the list comprehension approach in real code, but it’s fun to make gargoyles sometimes.
BTW, one of the reasons the original question caught my eye is because coverage.py has a function that does the opposite.
Comments
In this case, though, we don't really need a fixed number and we can avoid the application of boolean operators to strings (I detest use of “truthy” and “falsey” values other than True and False); we just need either one or two values and to use the first and the last (which might also be the first) of them:
[
i for p in s.split(',')
for a in [p.split('-', 1)]
for i in range(int(a[0]), int(a[-1])+1)
]
I usually use generator expressions when I need an assignment step. The left-hand-side and right-hand-side of the equals sign gets separated this way which can definitely be a little confusing at times. I find them fairly clear most of the time though.
Assignment expressions (the walrus operator) were added to Python since this blog post was published.
They allow for a different version of this gargoyle:
An extra assignment is needed because tuple unpacking isn’t supported in assignment expressions.
Add a comment: