« | » Main « | »

What files should coverage measure?

Wednesday 28 March 2012

Maybe this is crazy, but I'm looking for advice.

Conceptually, coverage.py is pretty simple. First, using the sys.settrace facility in Python, record every line that is executed. Then, after the program is done, report on those lines, and especially on lines that could have been executed but were not.

Of course, the reality is more difficult. During execution, to record the line, we have to find the file name, which we get from the stack frame. Later, we look for that file by name to create the report. Sometimes, the file isn't a Python file!

One reason this can happen is if the file was actually created by a tool, and the tool provides the original source file as the reported name. For example, Jinja compiles .html files to Python code, and when the code is running, it claims to be "mytemplate.html". When coverage.py tries to report on the file, it can't parse it as Python, and things go wrong.

Originally, this error would be reported to the user. There's a -i switch that shuts off all errors like this, but it seemed dumb for coverage.py to get confused by something like this. So I changed it to not trace files named "*.html".

Of course, the world is more varied than that, so I got a report of someone with Jinja2 files named "*.jinja2" which now trip the error. So I need a more general solution.

I figure there are a couple of possibilities:

  1. Don't measure files at all if they have an extension that isn't ".py". This will let us measure extension-less files, and .py files, and will ignore all the rest, on the theory that any other extension implies that we won't be able to parse it later anyway.
  2. Measure all files, but during reporting, if a file can't be parsed, ignore the error if it has an extenstion that isn't "*.py".
  3. (Shudder) Make a configuration option about what extensions to measure, or which to ignore.
  4. Some people want "ignore errors" to be the default, but if a file is missing for some reason, it's important to know, because it will throw off the reporting, and that shouldn't happen silently.

Do people ever name their Python source files something other than "*.py"? Are there weird ecosystems like this that I'll only hear about if I make one of these changes?

Breaking out of two loops at once

Sunday 25 March 2012

This is a question that crops up often:

I have two nested loops, and inside, how can I break out of both loops at once?

Python doesn't offer a way to break out of two (or more) loops at once, so the naive approach looks like this:

done = False
for x in range(10):
    for y in range(20):
        if some_condition(x, y):
            done = True
            break
        do_something(x, y)
    if done:
        break

This works, but seems unfortunate. A lot of noise here concerns the breaking out of the loop, rather than the work itself.

The sophisticated approach is to get rid of, or at least hide away, the double loop. Looked at another way, this code is really iterating over one sequence of things, a sequence of pairs. Using Python generators, we can neatly encapsulate the pair-ness, and get back to one loop:

def pairs_range(limit1, limit2):
    """Produce all pairs in (0..`limit1`-1, 0..`limit2`-1)"""
    for i1 in range(limit1):
        for i2 in range(limit2):
            yield i1, i2

for x, y in pairs_range(10, 20):
    if some_condition(x, y):
        break
    do_something(x, y)

Now our code is nicely focused on the work at hand, and the mechanics of the double loop needed to produce a sequence of pairs is encapsulated in pairs_range.

Naturally, pairs_range could become more complex, more interesting ranges, not just pairs but triples, etc. Adapt to your own needs.

As with any language, you can approach Python as if it were C/Java/Javascript with different syntax, and many people do at first, relying on concepts they already know. Once you scratch the surface, Python provides rich features that take you off that track. Iteration is one of the first places you can find your Python wings.

Happy belated pi day

Friday 16 March 2012

Pi day (two days ago) passed without notice here, but then Eric Johnson posted a comment on last year's pi day post:

Ancient Egyptians may have thought Pi was 256/81: Approximations of π.

256/81 is about 3.16049382716049382716, which is approx 0.6% above the value of Pi. 22/7 is approx 0.04% less than Pi, so the ancient Egyptians weren't particularly accurate, but the numerator and denominator they choose are interesting for another reason.

256/81 can be expressed as 2^8 / 3^4, which can be expressed as 2^2^3/3^2^2, which of course is a palindrome.

Posted on A.E. Pi day, 2012 (A.E. = Ancient Egyptian)

I had never heard any of this before, and was delighted.

Poking around on the Wikipedia page about approximating pi, I found this interesting tidbit: there are points in the Mandelbrot set whose iteration escape counts provide arbitrarily accurate estimates to pi! Will the wonders never cease?

Happy belated Pi Day!

Pragmatic unicode

Thursday 15 March 2012

Last week was PyCon 2012, I had a blast as always. I gave a talk entitled, Pragmatic Unicode, or, How Do I Stop the Pain?

I chose the topic because I thought it would appeal to many Python developers, and because I knew all about it. Turns out I didn't! But it was great learning more details as I went. And then I filled in a few more tidbits by chatting with Martin v. Löwis at PyCon.

Part of the fun of this talk was finding the Unicode characters to decorate it with, and then building the credits slide at the end on the plane. It's all built with Cog to avoid cut-and-paste nightmares. Look at the HTML source of the actual presentation if you're interested in the Cog twistiness.

Of course, Unicode is a much bigger topic than this, but 25 minutes is what it is. Enjoy, the video, slides, and full text are there.

Ten years

Wednesday 7 March 2012

This blog started ten years ago today, with a post about My first job ever. It's strange to think about those ten years. At the time, it seemed late to be starting a blog, but now having a blog going back ten years makes it seem like one of the ancients.

I wrote far more frequently then than I do now, partly because of the novelty of it, partly because of time pressures, and partly because Twitter gets the shorter tossed-off ideas now. But I still value having a place to express myself when the universe moves me to.

If you haven't been a long-time reader, the most unusual post here was about dinner at the White House, though by far the most popular post was the animated CSS Homer. Of course I find much else in the archives that I would like to point out to you, but won't.

When I started this ten years ago, I didn't know what would come of it. As a side project, there were no requirements on it, and I could take it wherever I felt like taking it. It's still that way: I don't know what topics will find their way here in the next year or ten, and I'm interested to find out.

« | » Main « | »