Iter-tools for puzzles: oddity

Friday 8 December 2017This is nearly seven years old. Be careful.

It’s December, which means Advent of Code is running again. It provides a new two-part puzzle every day until Christmas. They are a lot of fun, and usually are algorithmic in nature.

One of the things I like about the puzzles is they often lend themselves to writing unusual but general-purpose helpers. As I have said before, abstraction of iteration is a powerful and under-used feature of Python, so I enjoy exploring it when the opportunity arises.

For yesterday’s puzzle I needed to find the one unusual value in an otherwise uniform list. This is the kind of thing that might be in itertools if itertools had about ten times more functions than it does now. Here was my definition of the needed function:

def oddity(iterable, key=None):
    """
    Find the element that is different.

    The iterable has at most one element different than the others. If a
    `key` function is provided, it is a function used to extract a comparison
    key from each element, otherwise the elements themselves are compared.

    Two values are returned: the common comparison key, and the different
    element.

    If all of the elements are equal, then the returned different element is
    None.  If there is more than one different element, an error is raised.

    """

The challenge I set for myself was to implement this function in as general and useful a way as possible. The iterable might not be a list, it could be a generator, or some other iterable. There are edge cases to consider, like if there are more than two different values.

If you want to take a look, My code is on GitHub (with tests, natch.) Fair warning: that repo has my solutions to all of the Advent of Code problems so far this year.

One problem with my implementation: it stores all the values from the iterable. For the actual Advent of Code puzzle, that was fine, it only had to deal with less than 10 values. But how would you change the code so that it didn’t store them all?

My code also assumes that the comparison values are hashable. What if you didn’t want to require that?

Suppose the iterable could be infinite? This changes the definition somewhat. You can’t detect the case of all the values being the same, since there’s no such thing as “all” the values. And you can’t detect having more than two distinct values, since you’d have to read values forever on the possibility that it might happen. How would you change the code to handle infinite iterables?

These are the kind of considerations you have to take into account to write truly general-purpose itertools functions. It’s an interesting programming exercise to work through each version would differ.

BTW: it might be that there is a way to implement my oddity function with a clever combination of things already in itertools. If so, let me know!

Comments

[gravatar]
Serhiy Storchaka 8:46 PM on 8 Dec 2017
def oddity(iterable, key=None):
    it = iter(iterable)
    if key is None:
        key = lambda v: v
    different_value = None
    different_key = None
    different_found = False
    common_value = next(it)
    common_key = key(common_value)
    common_repeated = False
    for v in it:
        k = key(v)
        if k != common_key:
            if not different_found:
                different_value = v
                different_key = k
                different_found = True
                continue
            if common_repeated or k != different_key:
                raise ValueError
            common_value, different_value = different_value, common_value
            common_key, different_key = different_key, common_key
        common_repeated = True
    return common_key, different_value
[gravatar]
I'm not too experienced with Python, but my understanding is that you just need to iterate the iterable until a triplet is found in which one element differs from the others. To avoid that the (potentially expensive) key function is called more than once on any element of the sequence, the code stores those values separately.

Maybe this would work?
def oddity(iterable, key=None):
    key = key or (lambda x: x)
    it = iter(iterable)
    try:
        a, b, c = next(it), next(it), next(it)
        u, v, w = key(a), key(b), key(c)
        while u == v and v == w:
            a, b, c = b, c, next(it)
            u, v, w = v, w, key(c)
        if u == v:
            return c
        return a if v == w else b
    except StopIteration:
        return None
One downside which comes to my mind is that the function traverses the function too slowly, causing most pairs of values to be compared twice since the 'sliding window' advances one element at a time. This could be fixed by using e.g. itertools.islice to advance two elements at a time, at the expense of having to write extra code which deals with iterables of inconvenient lengths (3*n + 1 elements).
[gravatar]
For what it's worth, I think it's sensible that the documentation explains that the function asserts: "The iterable has at most one element different than the others.".

The last sentence however limits the implementation somewhat: "If there is more than one different element, an error is raised.". This means that the function always has to traverse the entire sequence at once - which may be a long time, in case it's an infinite sequence.
[gravatar]
"If there is more than one different element, an error is raised." why bother? just return the first oddity. this is especially true if the iterable is big. no need to grind thru gigs of value just to confirm whether ot not it's one oddity or more. even more true for infinite generators. it seems far more useful to return just the None or the value. (+ ideally the index/key so the user could continue to search or monitor)
[gravatar]
I ended up using iterators for most of the other days https://github.com/osteele/notebooks/blob/master/Advent%20of%20Code%202017.ipynb. (Spoilers, of course.) This simplifed lots of code that wanted to "watch" different states of an iteration. If the iteration yields each line, different consumers can decide which data to count, accumulate, wait for, etc.

Ironically, for day 7 I used collections.Counter instead of itertools:
weights = [inclusive_weight(c) for c in children(d)]
if len(set(weights)) > 1:
    bad, good = (t[0] for t in sorted(Counter(weights).items(), key=lambda t:t[1]))
    ...
[gravatar]
My take is below. It jumps through some hoops to give sensible answers to short or empty lists and also to inputs like [None, None, 22, None]. An open question is what to return if there are two elements that are different. In that case, I arbitrarily chose the second as the oddity.
def oddity(iterable, key=None):
    # type: (Iterable[T], Optional[Callable[T, K]) -> Tuple[Optional[T], Optional[T]]
    """find the oddball value
    
    iterable: an iterable of a type implementing equality, or mappable
            to a type implementing equality via key
    key: if provided, a function mapping iterable's values to comparable values
    
    Returns:
        tuple of two values where the first (or its key-mapped counterpart) appears
        exactly once in iterable and the second is the value that appears in all other
        positions in interable. If there is no value meeting one of those criteria, the
        tuple slot is None.
    """
    def zipper():
        for item in iterable:
            yield (item, item if (key is None) else key(item))
    t1 = t2 = t3 = k1 = k2 = k3 = None
    items = zipper()
    sentinal = object()
    try:
        t1, k1 = items.next()
        t2, k2 = items.next()
        t3, k3 = items.next()
    except StopIteration:
        return (None, t1) if (k1 == k2) else (t1, t2)
    if k1 == k2:
        t_norm, k_norm = t1, k1
        t_odd, k_odd = (sentinal, sentinal) if (k3 == k_norm) else (t3, k3)
    elif k1 == k3:
        t_norm, k_norm = t1, k1
        t_odd, k_odd = t2, k2
    elif k2 == k3:
        t_norm, k_norm = t2, k2
        t_odd, k_odd = t1, k1
    else:
        raise Exception("more than one oddity!")
    for t, k in items:
        if k == k_norm:
            continue
        if k_odd is not sentinal:
            raise Exception("more than one oddity!")
        t_odd, k_odd = t, k
    return (t_odd if (t_odd is not sentinal) else None, t_norm if (t_norm is not sentinal) else None)
[gravatar]
@Frerich, I like your implementation. There is something charming about "a, b, c = next(it), next(it), next(it)". BTW, in Python you can use "while u == v == w:"

@en zyme: you are probably right that specifying an exception is overkill. In fact, when I was planning this blog post, I meant to talk about the value of leaving behavior undefined, and I forgot to!

@Serhiy, @Nick, @Oliver: thanks for your implementations. It's fascinating to see the variety of approaches :)
[gravatar]
The following isn't the most efficient implementation (it's more expensive to store values in dicts and tuples, as this implementation does, than to bind them straight to variables), but it looks easy to adapt if the spec changes, and I like the clarity.

When iterable is finite, the break inside the loop isn't necessary for correctness. It terminates early once three different keys have been seen, since nothing after that can restore the conditions for a non-exceptional return.
def oddity(iterable, key=None):
    if key is None:
        key = lambda v: v
    once = {}   # {key: value} of keys seen only once
    dups = []   # keys seen more than once
    for v in iterable:
        k = key(v)
        if k in once:
            del once[k]
            dups.append(k)
        elif k not in dups:
            once[k] = v
        if len(once) + len(dups) > 2:
            break
    if not dups:
        raise ValueError("No duplicated values")
    if len(dups) > 1:
        raise ValueError("Too many duplicated values")
    if len(once) > 1:
        raise ValueError("Too many distinct values")
    return dups[0], list(once.values() or (None,))[0]
[gravatar]
There's always an absurdly long combination of itertools objects doing the job:
def oddity(iterable, key=None):
     key = key or (lambda x: x)
     try:   
        return next( ((a,key(b)) if key(b)==key(c) else (b,key(a)) if key(a)==key(c) else (c, key(a))) 
          for a,b,c in zip_longest(*[chain(*tee(iterable,3))]*3) 
          if not (key(a) == key(b) == key(c)))
     except StopIteration:
         return None
It works for infinite iterators, and return None when there's no oddity. For an infinite iterator with no oddity it runs forever. It doesn't work if the iterator has more than one oddity (but it's not really compatible with infinite iterators anyway).

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.