Kay Rhodes wrote a post about simplistic vs. useful sorting: Alphabetical != ASCIIbetical (cute name). In it, he points to Dave Koelle’s Alphanum algorithm, which says to split the string to be sorted into numeric and non-numeric chunks, then sort so that the numeric chunks are treated as numbers. This makes “z2” sort after “z100”, for example.
I looked at the code Dave provided (in Java, C++, or Perl), and all of it is much longer than I expected: the C++ is 40 or so lines. A comment in there says it’s easier in a pattern language like Perl, but the Perl is still 20 iterative lines.
Return an int if possible, or `s` unchanged.
Turn a string into a list of string and number chunks.
["z", 23, "a"]
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
Sort a list in the way that humans expect.
A few helpful Python features make this more compact: re.split provides just the function we want for chunking the string, sort takes a key function for computing sort keys from data, the key to sort on can itself be a list, and comparing two lists compares lexicographically among the elements of the list.
Each of these features in and of itself may have only occasional use, but here they conspire to help me write code nearly as expressive as the English description in my first paragraph. And there are enough of those features that they often help make expressive code like that.
In Python I can express my self better, faster, and without all the bureaucracy involved with other languages. That also makes for faster coding.
>>> from itertools import groupby
>>> s = 'zx1a23456qwert'
>>> [int("".join(g)) if k else "".join(g) for k,g in groupby(s, lambda x: x in '0123456789')]
['zx', 1, 'a', 23456, 'qwert']
Better not to use an exception, since neither of the two branches of tryint are really exceptions, and exception handling is slow.
How about something simpleminded like
if s in ['0','1','2','3','4','5','6','7','8','9']:
if ord(s) > 48 and ord(s) < 58:
... if s.isdigit():
....... return int(s)
....... return s
... return [int(c) if c.isdigit() else c for c in re.split('([0-9]+)', s)]
return [tryfloat(c) for c in re.split('(-?[0-9\.]+)',s)]
def sort_nicely( l ):
""" Sort the given list in the way that humans expect.
convert = lambda text: int(text) if text.isdigit() else text
alphanum = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
l.sort( key=alphanum_key )
I don't think the regexp is a cleaner solution, have a look at my longer version that I posted here: http://weblog.masukomi.org/2007/12/10/alphabetical-asciibetical/comments/1199#comment-1199
The guts of which is:
return [int("".join(chars)) if digits else "".join(chars)
for digits,chars in groupby(string, str.isdigit)]
asciisorted = sorted(unsorted[:])
alphanumsorted = sorted(unsorted[:], key=splitondigits)
I had to read-up on groupby , whereas probably like you, I know the regexp syntax, but this seems a good fit for the groupby function.
Otherwise 'abc' will sort after 'DEF'
____key = re.split(r"(\d+)", s)
____key[1::2] = map(int, key[1::2])
____""" Sort the given list in the way that humans expect."""
____return sorted(l, key=alphanum_key)
def nsort(l): return sorted(l, key=lambda s: [map(int, e[1::2]) for e in re.split(r'(\d+)', s)])
Ned, the python version does have another advantage over the (original) Java version: It works. The Java version throws NumberFormatExceptions if the numbers get to big (e.g. x999999999 > x8888888888). I have sent a changed Java version that uses the fact that bigger numbers are usually longer (taking care of trailing zeroes) to Dave Koelle.
def nsort(l) return sorted(l, key=lambda a:zip(re.split("(\\d+)", a)[0::2], map(int, re.split("(\\d+)", a)[1::2])))
def nsort(l) return sorted(l, key=lambda a.lower()):zip(re.split("(\\d+)", a)[0::2], map(int, re.split("(\\d+)", a)[1::2])))
You've posted 13 months after the original article, making grandiose statements, all without code. I hope your not on the Ruby development team ;-)
Add a comment: