A Python gargoyle

Monday 26 February 2018

In the #python IRC channel today, someone asked:

Does anyone know of any libraries that could convert ‘1-5,7,9,10-13’ to [1,2,3,4,5,7,9,10,11,12,13] ?

This seemed like an interesting challenge, and people started offering code. This was mine. It’s ungainly and surprising, and I wouldn’t want to keep it, so I call it a gargoyle:

[
    i for p in s.split(',')
    for a, _, b in [p.partition('-')]
    for i in range(int(a), int(b or a)+1)
]

There are a few things going on here. First, this is a list comprehension, but with three nested loops. A simple list comprehension has this form:

result = [ EXPR for NAMES in ITERABLE ]

which is the same as this code:

result = []
for NAMES in ITERABLE:
    result.append(EXPR)

For many, the list comprehension seems kind of backwards, where the expression comes first, before the loop produces it. Then the multi-loop form can seem like another surprise:

result = [ EXPR for NAMES1 in ITERABLE1 for NAMES2 in ITERABLE2 ]

which is equivalent to:

result = []
for NAMES1 in ITERABLE1:
    for NAMES2 in ITERABLE2:
        result.append(EXPR)

The first time I ever tried to write a double list comprehension, I thought the loops should go in the other order, in keeping with the Yoda-style of EXPR coming first. They don’t.

Back to the gargoyle. It’s a triply-nested loop (this time formatted a little differently):

[
    i 
    for p in s.split(',')
        for a, _, b in [p.partition('-')]
            for i in range(int(a), int(b or a)+1)
]

The first loop splits the number ranges on comma, to produce the individual chunks: ‘1-5’, ‘7’, ‘9’, ‘10-13’. This seems like an obvious first step.

The next loop is the most surprising. Strings in Python have a .partition method, which is super-handy and under-used. It takes a separator, and produces three values: the part of the string before the separator, the separator itself, and the part of the string after the separator. The best thing about .partition is that is always produces three values, even if the separator doesn’t appear in the string. In that case, the first value is the whole string, and the other two values are empty strings:

>>> '1-5'.partition('-')
('1', '-', '5')
>>> '12'.partition('-')
('12', '', '')
>>>

This means we can always assign the result to three names. Super-handy.

But list comprehensions can’t have assignments in them, so what to do? Recently, the Python-Ideas mailing list had a thread about adding assignments to list comprehensions, which can simplify complicated comprehensions. In that thread, Stephan Houben pointed out that you can already get the same effect with a cute trick:

for x in [17]:
    do_something()

# has the same effect as:

x = 17
do_something()

We can explicitly make a one-element list, and “iterate” over it to assign its value to a name. In my gargoyle, we get a and b as the two numbers in the chunk.

The third loop is where we actually generate numbers. We’ll use range(), and we always want to start from the first number in the chunk. If there was a second number in the chunk (b), then we want to iterate up to and including it, so we need range(a, b+1). If there was no second number, then we act just as if there were a second number, the same as the first number. That is, “7” should behave just like “7-7”. If there was no second number, then .partition will have set b to be the empty string. So “b or a” will give us the second number we need.

To help a little more, here is the gargoyle:

s = '1-5,7,9,10-13'
result = [
    i 
    for p in s.split(',')
        for a, _, b in [p.partition('-')]
            for i in range(int(a), int(b or a)+1)
]

and here is the same idea written out as explicit loops:

s = '1-5,7,9,10-13'

result = []
for p in s.split(','):
    a, _, b = p.partition('-')
    for i in range(int(a), int(b or a)+1):
        result.append(i)

I wouldn’t recommend the list comprehension approach in real code, but it’s fun to make gargoyles sometimes.

BTW, one of the reasons the original question caught my eye is because coverage.py has a function that does the opposite.

Comments

[gravatar]
Ed Davies 9:03 AM on 27 Feb 2018

My first thought on seeing partition was “why not just use split?” but, as you say, partition has the advantage of always giving the same number of outputs.

In this case, though, we don't really need a fixed number and we can avoid the application of boolean operators to strings (I detest use of “truthy” and “falsey” values other than True and False); we just need either one or two values and to use the first and the last (which might also be the first) of them:

[
i for p in s.split(',')
for a in [p.split('-', 1)]
for i in range(int(a[0]), int(a[-1])+1)
]

[gravatar]
Ned Batchelder 11:31 AM on 27 Feb 2018

@Ed, nice! :)

[gravatar]
Trey Hunner 5:21 PM on 27 Feb 2018

Very clever. Great explanation Ned!

I usually use generator expressions when I need an assignment step. The left-hand-side and right-hand-side of the equals sign gets separated this way which can definitely be a little confusing at times. I find them fairly clear most of the time though.

partitions = (
    group.partition('-')
    for group in string.split(',')
)
pairs = (
    ((a, b) if b else (a, a))
    for (a, _, b) in partitions
)
nums = [
    num
    for start, stop in pairs
    for num in range(int(start), int(stop)+1)
]

[gravatar]
stealth_ 3:50 AM on 2 Mar 2018

def ranger(s):
    for i in s.split(','):
        if '-' in i:
            a, b = i.split('-')
            a, b = int(a), int(b)
            while a
        

[gravatar]
Mike McCaffrey 5:50 PM on 2 Mar 2018

Thanks Ned! I always learn something from your excellent articles and talks.

[gravatar]
Peter Bengtsson 3:15 PM on 7 Mar 2018

There's an even less-gargoyle version that is arguably even easier to read:

result = []
for p in s.split(','):
    if '-' in p:
        start, end = p.split('-')
        for i in range(int(start), int(end) + 1):
            result.append(i)
    else:
        result.append(int(p))
Not saying it's better but it reads quite nicely.

[gravatar]
Gerry Jenkins 12:00 AM on 10 Mar 2018

I like this generator function for clear code:

def seq_gen(s):
  for item in s.split(', '):
      if '-' in item:
          a, b = map(int, item.split('-'))
          for i in range(a,b+1):
              yield i
      else:
          yield int(item)

[gravatar]
BoppreH 11:27 PM on 12 Mar 2018

Meanwhile, in meta-programming land, sins are being committed:

eval('['+re.sub('(\d+)-(\d+)', r'*range(\1, \2+1)', s)+']')
        

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.