Coverage.py podcast

Sunday 6 August 2017

I was a guest on the Podcast.__init__ podcast this week: Coverage.py with Ned Batchelder. We talk about coverage.py, how I got started on it, why it's good, why it's not good, how it works, and so on:

I've only listened to a small part of it. Getting ideas out verbally is different than in writing: there's no chance to go back and edit. To me, I sound a bit halting, because I was trying to get the words and sentences right in my head before I said them. I hope it sounds OK to others. Also, I think I said "dudes" and "guy" when I could have chosen more neutral words, sorry about that.

Tobias has been doing a great job with this podcast. It's not easy to consistently put out good content (I hate that word) on a regular schedule. He's been finding interesting people and giving them a good place to talk about their Python world, whatever that means to them.

BTW, this is my second time on Podcast.__init__, and it seems I never mentioned the first time in this blog, so here it is, from two years ago: Episode 5: Ned Batchelder. The focus then was the user group I organize, Boston Python, so a completely different topic:

And in the unlikely case that you want yet more of my dulcet tones, I was also on the Python Test podcast, mentioned in this blog post: The Value of Unit Tests.

Look around you

Sunday 23 July 2017

I've been trying my Instagram experiment for a year now. I've really liked doing it: it gives me a prompt for looking around me and seeing what there is to see.

One of the things that surprised me when I looked back is how many pictures I took in a very small area, one that I would have thought of as uninteresting: the few blocks between where I swim in the morning, and where I work. Of the 197 pictures I took in the last year, 38 of them are from that neighborhood:

Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot Something I shot

I'm not saying these are all masterpieces. But I wouldn't have thought to take them at all if I hadn't been explicitly looking for something interesting to shoot.

Look around you: not much is truly uninteresting.

Finding fuzzy floats

Sunday 9 July 2017

For a 2D geometry project I needed to find things based on 2D points. Conceptually, I wanted to have a dict that used pairs of floats as the keys. This would work, except that floats are inexact, and so have difficulty with equality checking. The "same" point might be computed in two different ways, giving slightly different values, but I want them to match each other in this dictionary.

I found a solution, though I'm surprised I didn't find it described elsewhere.

The challenges have nothing to do with the two-dimensional aspect, and everything to do with using floats, so for the purposes of explaining, I'll simplify the problem to one-dimensional points. I'll have a class with a single float in it to represent my points.

First let's look at what happens with a simple dict:

>>> from collections import namedtuple
>>> Pt = namedtuple("Pt", "x")
>>>
>>> d = {}
>>> d[Pt(1.0)] = "hello"
>>> d[Pt(1.0)]
'hello'
>>> d[Pt(1.000000000001)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: Pt(x=1.000000000001)

As long as our floats are precisely equal, the dict works fine. But as soon as we get two floats that are meant to represent the same value, but are actually slightly unequal, then the dict is useless to us. So a simple dict is no good.

To get the program running at all, I used a dead-simple structure that I knew would work: a list of key/value pairs. By defining __eq__ on my Pt class to compare with some fuzz, I could find matches that were slightly unequal:

>>> class Pt(namedtuple("Pt", "x")):
...     def __eq__(self, other):
...         return math.isclose(self.x, other.x)
...
>>> def get_match(pairs, pt):
...     for pt2, val in pairs:
...         if pt2 == pt:
...             return val
...     return None
...
>>> pairs = [
...     (Pt(1.0), "hello"),
...     (Pt(2.0), "goodbye"),
... ]
>>>
>>> get_match(pairs, Pt(2.0))
'goodbye'
>>> get_match(pairs, Pt(2.000000000001))
'goodbye'

This works, because now we are using an inexact closeness test to find the match. But we have an O(n) algorithm, which isn't great. Also, there's no way to define __hash__ to match that __eq__ test, so our points are no longer hashable.

Trying to make things near each other be equal naturally brings rounding to mind. Maybe that could work? Let's define __eq__ and __hash__ based on rounding the value:

>>> class Pt(namedtuple("Pt", "x")):
...     def __eq__(self, other):
...         return round(self.x, 6) == round(other.x, 6)
...     def __hash__(self):
...         return hash(round(self.x, 6))
...
>>> d = {}
>>> d[Pt(1.0)] = "apple"
>>> d[Pt(1.0)]
'apple'
>>> d[Pt(1.00000001)]
'apple'

Nice! We have matches based on values being close enough, and we can use a dict to get O(1) behavior. But:

>>> d[Pt(1.000000499999)] = "cat"
>>> d[Pt(1.000000500001)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: Pt(x=1.000000500001)

Because rounding has sharp edges (ironically!), there will always be pairs of numbers arbitrarily close together that will round away from each other.

So rounding was close, but not good enough. We can't guarantee that we won't have these pairs of numbers that are in just the wrong place so that rounding will give the wrong answer.

I thought about other data structures: bisection in a sorted list sounded good (O(log n)), but that requires testing for less-than, and I wasn't sure how the fuzziness would affect it. Reading about other geometric algorithms lead to things like k-d trees, which seem exotic and powerful, but I didn't see how to put them to use here.

Then early one morning when I wasn't even trying to solve the problem, I had an idea: round the values two different ways. The problem with rounding was that it failed for values that straddled the rounding boundary. We can round twice, the second time with half the rounding window added, so the two are half out of phase. That way, if the first rounding separates the values, the second rounding won't.

We can't use double-rounding to produce a canonical value for comparisons and hashes, so instead we make a new data structure. We'll use a dict to map points to values, just as we did originally. A second dict will map rounded points to the original points. When we look up a point, if it isn't in the first dict, then round it two ways, and see if either is in the second dict. If it is, then we know what original point to use in the first dict.

That's a lot of words. Here's some code:

class Pt(namedtuple("Pt", "x")):
    # no need for special __eq__ or __hash__
    pass

class PointMap:
    def __init__(self):
        self._items = {}
        self._rounded = {}

    ROUND_DIGITS = 6
    JITTERS = [0, 0.5 * 10 ** -ROUND_DIGITS]

    def _round(self, pt, jitter):
        return Pt(round(pt.x + jitter, ndigits=self.ROUND_DIGITS))

    def __getitem__(self, pt):
        val = self._items.get(pt)
        if val is not None:
            return val

        for jitter in self.JITTERS:
            pt_rnd = self._round(pt, jitter)
            pt0 = self._rounded.get(pt_rnd)
            if pt0 is not None:
                return self._items[pt0]

        raise KeyError(pt)

    def __setitem__(self, pt, val):
        self._items[pt] = val
        for jitter in self.JITTERS:
            pt_rnd = self._round(pt, jitter)
            self._rounded[pt_rnd] = pt

This is now O(1), wonderful! I was surprised that I couldn't find something like this already explained somewhere. Perhaps finding values like I need to do is unusual? Or this is well-known, but under some name I couldn't find? Or is it broken in some way I can't see?

Triangular Fibonacci numbers

Saturday 17 June 2017

Yesterday in my post about 55, I repeated Wikipedia's claim that 55 is the largest number that is both triangular and in the Fibonacci sequence. Chris Emerson commented to ask for a proof. After a moment's thought, I realized I had no idea how to prove it.

The proof is in On Triangular Fibonacci Numbers, a dense 10-page excursion into number theory I don't understand.

While I couldn't follow the proof, I can partially test the claim empirically, which leads to fun with Python and itertools, something which is much more in my wheelhouse.

I started by defining generators for triangular numbers and Fibonacci numbers:

def tri():
    """Generate an infinite sequence of triangular numbers."""
    n = 0
    for i in itertools.count(start=1):
        n += i
        yield n
        
print(list(itertools.islice(tri(), 50)))
[1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171,
 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561,
 595, 630, 666, 703, 741, 780, 820, 861, 903, 946, 990, 1035, 1081, 1128,
 1176, 1225, 1275]
def fib():
    """Generate an infinite sequence of Fibonacci numbers."""
    a, b = 1, 1
    while True:
        yield a
        b, a = a, a+b
        
print(list(itertools.islice(fib(), 50)))
[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584,
 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811,
 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352,
 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437,
 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049,
 12586269025, 20365011074]

The Fibonacci sequence grows much faster!

My first impulse was to make two sets of the numbers in the sequences, and intersect them, but building a very large set took too long. So instead I wrote a function that took advantage of the ever-increasing nature of the sequences to look for equal elements in two monotonic sequences:

def find_same(s1, s2):
    """Find equal elements in two monotonic sequences."""
    try:
        i1, i2 = iter(s1), iter(s2)
        n1, n2 = next(i1), next(i2)
        while True:
            while n1 < n2:
                n1 = next(i1)
            if n1 == n2:
                yield n1
                n1 = next(i1)
            while n2 < n1:
                n2 = next(i2)
            if n1 == n2:
                yield n1
                n1 = next(i1)
    except StopIteration:
        return

And a function to cut off an infinite sequence once a certain ceiling had been reached:

def upto(s, n):
    """Stop a monotonic sequence once it gets to n."""
    for i in s:
        if i > n:
            return
        yield i

Now I could ask, what values less than quadrillion are in both the triangular numbers and the Fibonacci sequence?:

>>> list(find_same(upto(fib(), 1e15), upto(tri(), 1e15)))
[1, 3, 21, 55]

This doesn't prove the claim for all numbers, but it shows that 55 is the largest number under a quadrillion that is in both sequences.

Another way to do this is to take advantage of the asymmetry in growth rate. The Fibonacci sequence up a quadrillion is only 72 elements. Make that a set, then examine the triangular numbers up to quadrillion and keep the ones that are in the Fibonacci set. And I'm certain there are other techniques too.

I can't explain why, but composable generators please me. This was fun. :)

Re-ruling .rst

Friday 12 May 2017

Sometimes, you need a small job done, and you can write a small Python program, and it does just what you need, and it pleases you.

I have some Markdown files to convert to ReStructured Text. Pandoc does a really good job. But it chooses a different order for heading punctuation than our house style, and I didn't see a way to control it.

But it was easy to write a small thing to do the small thing:

import re
import sys

# The order we want our heading rules.
GOOD_RULES = '#*=-.~'

# A rule is any line of all the same non-word character, 3 or more.
RULE_RX = r"^([^\w\d])\1\1+$"

def rerule_file(f):
    rules = {}
    for line in f:
        line = line.rstrip()
        rule_m = re.search(RULE_RX, line)
        if rule_m:
            if line[0] not in rules:
                rules[line[0]] = GOOD_RULES[len(rules)]
            line = rules[line[0]] * len(line)
        print(line)

rerule_file(sys.stdin)

If you aren't conversant in .rst: there's no fixed order to which punctuation means which level heading. The first rule encountered is heading 1, the next style found is heading 2, and so on.

There might be other ways to do this, but this makes me happy.

Shell = Maybe

Monday 24 April 2017

A common help Python question: how do I get Python to run this complicated command line program? Often, the answer involves details of how shells work. I tried my hand at explaining it what a shell does, why you want to avoid them, how to avoid them from Python, and why you might want to use one: Shell = Maybe.

Text-mode menu bar indicators

Monday 17 April 2017

I recently upgraded my Mac operating system, and decided to try out a new feature: automatically hiding the menu bar. This gives me back another sliver of vertical space. But it has a drawback: I no longer have the time, battery life, and speaker volume indicators available at a glance.

I went looking for a thing that I figured must exist: a Mac app that would display that information in a dock icon. I already have a dock clock. I found a dock battery indicator, though it tried so hard to be cute and pictorial, I couldn't tell what it was telling me.

Asking around, I got a recommendation for GeekTool. It lets you draw a panel on your desktop, and then draw in the panel with the output of a script. Now the ball was back in my court: I could build my own thing.

I'd long ago moved the dock to the left side of the screen (again, to use all the vertical space for my own stuff.) This left a small rectangle of desktop visible at the upper left and lower left, even with maximized windows. I drew a panel in the upper left of the desktop, and set it to run this script every five seconds:

#!/usr/bin/env python3.6

import datetime
import re
import subprocess

def block_eighths(eighths):
    """Return the Unicode string for a block of so many eighths."""
    assert 0 <= eighths <= 8
    if eighths == 0:
        return "\u2003"
    else:
        return chr(0x2590 - eighths)

def gauge(percent):
    """Return a two-char string drawing a 16-part gauge."""
    slices = round(percent / (100 / 16))
    b1 = block_eighths(min(slices, 8))
    b2 = block_eighths(max(slices - 8, 0))
    return b1 + b2

now = datetime.datetime.now()
print(f"{now:%-I:%M\n%-m/%-d}")

batt = subprocess.check_output(["pmset", "-g", "batt"]).decode('utf8').splitlines()
m = re.search(r"\d+%", batt[1])
if m:
    level = m.group(0)
    batt_percent = int(level[:-1])
else:
    level = "??%"
if "discharging" in batt[1]:
    arrow = "\u25bc"        # BLACK DOWN-POINTING TRIANGLE
elif "charging" in batt[1]:
    arrow = "\u25b3"        # WHITE UP-POINTING TRIANGLE
else:
    arrow = ""

print(level + arrow)
print(gauge(batt_percent) + "\u2578")   # BOX DRAWINGS HEAVY LEFT

vol = subprocess.check_output(["osascript", "-e", "get volume settings"]).decode('utf8')
m = re.search(r"^output volume:(\d+), .* muted:(\w+)", vol)
if m:
    level, muted = m.groups()
    if muted == 'true':
        level = "\u20e5"        # COMBINING REVERSE SOLIDUS OVERLAY
    print(f"\u24cb{level}")     # CIRCLED LATIN CAPITAL LETTER V

# For debugging: output the raw data, but pushed out of view.
print(f"{'':30}{batt}")
print(f"{'':30}{vol}")

This displays the time, date, battery level (both numerically and as a crudely drawn battery gauge), whether the battery is charging or discharging, and the volume level:

All that information, crammed into a tiny corner

BTW, if you are looking for Python esoterica, there are a few little-known things going on in this line:

print(f"{now:%-I:%M\n%-m/%-d}")

Finding Unicode characters to represent things was a bit of a challenge. I couldn't find exactly what I need for the right tip of the battery gauge, but it works well enough.

Geektool also does web pages, though in a quick experiment I couldn't make it do something useful, so I stuck with text mode. There also seem to be old forks of Geektool that offer text colors, which could be cool, but it started to feel a bit off-the-path.

This works great for what it does.

Clean-text bookmarklet

Saturday 8 April 2017

I love the text-based web. I love that people can speak their minds, express opinions, encourage each other, and create a lively world of words. This also means they are free to design their text in, shall we say, expressive ways. Those ways are not always ideal for actually reading the words.

Today I really liked Tiberius Hefflin's Part of That World, about the need to recognize non-code contributions in open source projects. You should read it, it is good and true.

But when I first got to the page, I saw this:

Screenshot of gray text on black background, slightly letterspaced

To start with the positive, this text has an elegance to it. It gives a peaceful quiet impression. It pairs perfectly with the mermaid illustration on the page. But I find it hard to read. This typeface is too weak to be light-on-dark, and letterspacing is almost always a bad idea for body text. It isn't even white-on-black, it's 70% white on black, so the letters seem to be hiding in the dark.

I don't mean to pick on this page. It's a well-designed page. There's clearly a mood being created here, and it's been established well. There are many pages online that veer much farther from the usual than this.

My solution for pages like this is a bookmarklet to strip away idiosyncracies in text layout. It changes text to almost-black on white, it removes letterspacing and shadows, and changes full-justified text to left-justified. When I use the bookmarklet on Part of That World, it looks like this:

Screenshot of cleaned-up text

You might prefer the original. That's fine, to each their own. You might feel like the personality has been bleached from this text. To some extent, that's true. But I saw the original, and can choose between them. This helped me to read the words, and not get snagged on the design of the page.

This is the bookmarklet: Clean text.

This is the JavaScript code in the bookmarklet, formatted and tweaked so you can read it:

javascript:(function () {
    var newSS = document.createElement('link'),
        styles = (
            '* { ' +
                'background: #fff; color: #111; ' +
                'letter-spacing: 0; text-shadow: none; hyphens: none; ' +
            '}' +
            ':link, :link * { color: #0000EE; } ' +
            ':visited, :visited * { color: #551A8B; }'
        ).replace(/;/g,' !important;');
    newSS.rel = 'stylesheet';
    newSS.href = 'data:text/css,' + escape(styles);
    document.getElementsByTagName('head')[0].appendChild(newSS);
    var els = document.getElementsByTagName('*');
    for (var i = 0, el; el = els[i]; i++) {
        if (getComputedStyle(el).textAlign === 'justify') {
            el.style.textAlign = 'left';
        }
    }
})();

There are other solutions to eccentrically designed pages. You could read blogs in a single aggregating RSS reader. But then everything is completely homogenized, and you don't even get a chance to experience the design as the author intended. Writers could (and are) flocking to sites like Medium that again homogenize the design.

By the way, full disclosure: I don't like the design of my own site, the page you are (probably) currently reading. I have been working on a re-design on and off for months. Maybe eventually it will be finished. The text will be serif, and larger, with a responsive layout and fewer distractions. Some day.

IronPython is weird

Wednesday 15 March 2017

Have you fully understood how Python 2 and Python 3 deal with bytes and Unicode? Have you watched Pragmatic Unicode (also known as the Unicode Sandwich, or unipain) forwards and backwards? You're a Unicode expert! Nothing surprises you any more.

Until you try IronPython...

Turns out IronPython 2.7.7 has str as unicode!

C:\Users\Ned>"\Program Files\IronPython 2.7\ipy.exe"
IronPython 2.7.7 (2.7.7.0) on .NET 4.0.30319.42000 (32-bit)
Type "help", "copyright", "credits" or "license" for more information.
>>> "abc"
'abc'
>>> type("abc")
<type 'str'>
>>> u"abc"
'abc'
>>> type(u"abc")
<type 'str'>
>>> str is unicode
True
>>> str is bytes
False

String literals work kind of like they do in Python 2: \u escapes are recognized in u"" strings, but not "" strings, but they both produce the same type:

>>> "abc\u1234"
'abc\\u1234'
>>> u"abc\u1234"
u'abc\u1234'

Notice that the repr of this str/unicode type will use a u-prefix if any character is non-ASCII, but it the string is all ASCII, then the prefix is omitted.

OK, so how do we get a true byte string? I guess we could encode a unicode string? WRONG. Encoding a unicode string produces another unicode string with the encoded byte values as code points!:

>>> u"abc\u1234".encode("utf8")
u'abc\xe1\x88\xb4'
>>> type(_)
<type 'str'>

Surely we could at least read the bytes from a file with mode "rb"? WRONG.

>>> type(open("foo.py", "rb").read())
<type 'str'>
>>> type(open("foo.py", "rb").read()) is unicode
True

On top of all this, I couldn't find docs that explain that this happens. The IronPython docs just say, "Since IronPython is a implementation of Python 2.7, any Python documentation is useful when using IronPython," and then links to the python.org documentation.

A decade-old article on InfoQ, The IronPython, Unicode, and Fragmentation Debate, discusses this decision, and points out correctly that it's due to needing to mesh well with the underlying .NET semantics. It seems very odd not to have documented it some place. Getting coverage.py working even minimally on IronPython was an afternoon's work of discovering each of these oddnesses empirically.

Also, that article quotes Guido van Rossum (from a comment on Calvin Spealman's blog):

You realize that Jython has exactly the same str==unicode issue, right? I've endorsed this approach for both versions from the start. So I don't know what you are so bent out of shape about.

I guess things have changed with Jython in the intervening ten years, because it doesn't behave that way now:

$ jython
Jython 2.7.1b3 (default:df42d5d6be04, Feb 3 2016, 03:22:46)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_31
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'
'abc'
>>> type(_)
<type 'str'>
>>> str is unicode
False
>>> type("abc")
<type 'str'>
>>> type(u"abc")
<type 'unicode'>
>>> u"abc".encode("ascii")
'abc'
>>> u"abc"
u'abc'

If you want to support IronPython, be prepared to rethink how you deal with bytes and Unicode. I haven't run the whole coverage.py test suite on IronPython, so I don't know if other oddities are lurking there.

Rubik's algorithms

Sunday 26 February 2017

Recently, a nephew asked about how to solve a Rubik's Cube. I couldn't sit down with him to show him what I knew, so I looked around the web for explanations. I was surprised by two things: first, that all the pages offering solutions seemed to offer the same one, even down to the colors discussed: "Start by making a white cross, ..., finally, finish the yellow side."

Second, that the techniques (or "algorithms") were often given without explanation. They're presented as something to memorize.

My own solving technique uses a few algorithms constructed in a certain way that I describe in Two-Part Rubik's Algorithms. I wrote them up as a resource I hope my nephew will be able to use.

A Rubik's Cube with two edges flipped

BTW, that page makes use of Conrad Rider's impressive TwistySim library.

Https

Saturday 25 February 2017

Someone posted a link to my latest blog post on /r/Python, but somehow got an https link for it. That's odd: my site doesn't even properly serve content over https. People were confused by the broken link.

I should say, my site didn't even serve content over https, because now it does. I'd been meaning to enable https, and force its use, for a long time. This broken link pushed it to the top of the list.

Let's Encrypt is the certificate authority of choice these days, because they are free and automatable. And people say they make it easy, but I have to say, I would not have classified this as easy. I'm sure it's easier than it used to be, but it's still a confusing maze of choices, with decision points you are expected to navigate.

Actually getting everything installed requires sudo, or without sudo, using third-party tools, with instructions from obscure blog posts. There's clearly still room for improvement.

Once you have the certificate in place, you need to redirect your http site to https. Then you have to fix the http references in your site. Protocol-relative (or schema-less) URLs are handy here.

It's all done now, the entire site should always be https. I'm glad I finally got the kick in the pants to do it. If you find something wrong, let me know.

Older:

Even older...