It’s December, which means Advent of Code is running again. It provides a new two-part puzzle every day until Christmas. They are a lot of fun, and usually are algorithmic in nature.
One of the things I like about the puzzles is they often lend themselves to writing unusual but general-purpose helpers. As I have said before, abstraction of iteration is a powerful and under-used feature of Python, so I enjoy exploring it when the opportunity arises.
For yesterday’s puzzle I needed to find the one unusual value in an otherwise uniform list. This is the kind of thing that might be in itertools if itertools had about ten times more functions than it does now. Here was my definition of the needed function:
def oddity(iterable, key=None):
Find the element that is different.
The iterable has at most one element different than the others. If a
`key` function is provided, it is a function used to extract a comparison
key from each element, otherwise the elements themselves are compared.
Two values are returned: the common comparison key, and the different
If all of the elements are equal, then the returned different element is
None. If there is more than one different element, an error is raised.
The challenge I set for myself was to implement this function in as general and useful a way as possible. The iterable might not be a list, it could be a generator, or some other iterable. There are edge cases to consider, like if there are more than two different values.
If you want to take a look, My code is on GitHub (with tests, natch.) Fair warning: that repo has my solutions to all of the Advent of Code problems so far this year.
One problem with my implementation: it stores all the values from the iterable. For the actual Advent of Code puzzle, that was fine, it only had to deal with less than 10 values. But how would you change the code so that it didn’t store them all?
My code also assumes that the comparison values are hashable. What if you didn’t want to require that?
Suppose the iterable could be infinite? This changes the definition somewhat. You can’t detect the case of all the values being the same, since there’s no such thing as “all” the values. And you can’t detect having more than two distinct values, since you’d have to read values forever on the possibility that it might happen. How would you change the code to handle infinite iterables?
These are the kind of considerations you have to take into account to write truly general-purpose itertools functions. It’s an interesting programming exercise to work through each version would differ.
BTW: it might be that there is a way to implement my oddity function with a clever combination of things already in itertools. If so, let me know!
Maybe this would work? One downside which comes to my mind is that the function traverses the function too slowly, causing most pairs of values to be compared twice since the 'sliding window' advances one element at a time. This could be fixed by using e.g. itertools.islice to advance two elements at a time, at the expense of having to write extra code which deals with iterables of inconvenient lengths (3*n + 1 elements).
The last sentence however limits the implementation somewhat: "If there is more than one different element, an error is raised.". This means that the function always has to traverse the entire sequence at once - which may be a long time, in case it's an infinite sequence.
Ironically, for day 7 I used collections.Counter instead of itertools:
@en zyme: you are probably right that specifying an exception is overkill. In fact, when I was planning this blog post, I meant to talk about the value of leaving behavior undefined, and I forgot to!
@Serhiy, @Nick, @Oliver: thanks for your implementations. It's fascinating to see the variety of approaches :)
When iterable is finite, the break inside the loop isn't necessary for correctness. It terminates early once three different keys have been seen, since nothing after that can restore the conditions for a non-exceptional return.
Add a comment: