« | » Main « | »

Photographs and time

Thursday 30 June 2011

A few recent interesting photography things caught my eye, all having to do with photographs and time.

Dear Photograph has a simple idea: take an old photograph, and take a picture of it in its original setting. Then write a message about it. Many of the results are quite touching, as people revisit the past and express their feelings about seeing the old in a new setting. Something about reaffirming that the "back then" is actually "right here" is very powerful.

Cinegraphs are almost still photos, they have just a small accent of motion to make them pop to life. They have the beautiful composed quality of still photos, but with that little spark that motion adds.

Peter Funch makes interesting composite photos. He photographs the same city location over a long interval, then composites similar subjects into one photograph, so that a busy street corner seems to be populated only by yawners, or people wearing red, or children, or manila-folder-bearers.

Of course, a simple snapshot carries with it the notion of time, since it stops time when the shutter is clicked, so that we can examine it later. That's the whole point. Here's my wife Susan and I on our honeymoon 27 years ago. Happy anniversary, Susie!

Coverage.py v3.5

Wednesday 29 June 2011

Coverage.py v3.5 for realz is now available. 3.5 adds two long-requested features: convenient navigation in the HTML report to find flagged sections of code, and better control over partial branch warnings.

More details are in the beta blog post from a few weeks ago. Not much has changed since then, I only heard from one user of the beta, but that user was Guido. As a result, the HTML navigation works properly in more browsers, and behaves better when the current chunk is completely off the screen. Enjoy!

Vi Hart etc

Monday 27 June 2011

I happened upon a charming origami proof of the Pythagorean theorem. It's just one of many math videos made by the also-charming Vi Hart, who seems to have boundless energy for math and its more whimsical sides. For example, see her rant about why Pi is (still) Wrong, complete with pie.

Vi is the daughter of George Hart, a geometer I have long admired. George is now the chief of content at the Museum of Math, which is opening next year in New York City.

Links from all of these pages lead to all sorts of wonders. I would have put more of them in this post, but I couldn't choose, and spent too much time looking at them myself...

Developer getting older

Thursday 16 June 2011

Lots of people this week are talking about Peter Knego's analysis of the correlation between age and reputation on Stack Overflow. His conclusion is that developers get better and scarcer with age.

As it happens, today is my 49th birthday, and so I was eager to see his charts, and have them counter the common notion that skill is negatively correlated with age. When I got there, I was pleased to see the rise in average reputation as developers got older, and dismayed to see that his chart ends at 49! One more year and I fall off the end of the world! As a college friend of mine put it, "I can feel the hot breath of 50 on my neck..."

Looking at the raw data, though, you can see why his graph ends at 49: he only included ages with at least 100 developers. In fact, if this experiment is repeated next year, the graph will extend farther, both because all of the developers in this graph will have aged a year, and because more people will have joined Stack Overflow. So I'm safe: in fact, I will always be at the leading edge! Welcome to the vanguard!

PS: the comments on Peter's post suggest all sorts of ways that this data is wrong, misinterpreted, measuring the wrong thing, skewed, not useful, and so on. Yes, sure, of course. Lighten up!

Running coverage on your tests

Wednesday 15 June 2011

Here's a true story from the #python IRC channel, a frequently-asked question played out for real. I'm nedbat, the other nick has been changed to protect the innocent:

other: there's no way to automagically omit coveraging the actual test modules is there?

nedbat: you can add an omit line to the .coveragerc, but I've found useful info from coverage on my tests.

other: What kind of useful info?

nedbat: useful info: two test methods with the same name (by accident) so one was never called.

other: hah oh boy, one of the test modules I was ignoring sure enough was a test that wasn't being run :). Perhaps I better not, I hadn't thought of that before.

nedbat: yay coverage! :)

other: yay coverage!

Moral of the story: if you have enough tests to run coverage, then your tests are a serious part of your product. You should run coverage on them, it will help.

Long-running restartable worker

Monday 13 June 2011

I had to write a program that would analyze a large amount of data. In fact, too much data to actually analyze all of. So I resorted to random sampling of the data, but even so, it was going to take a long time. For various reasons, the simplistic program I started with would stop running, and I'd lose the progress I made on crunching through the mountain of data.

You'd think I would have started with a restartable program so that I wouldn't have to worry about interruptions, but I guess I'm not that smart, so I had to get there iteratively.

The result worked well, and for the next time I need a program that can pick up where it left off and make progress against an unreasonable goal, here's the skeleton of what I ended up with:

import os, os.path, random, shutil, sys
import cPickle as pickle


class Work(object):
    """The state of the computation so far."""

    def __init__(self):
        self.items = []
        self.results = Something_To_Hold_Results()

    def initialize(self):
        self.items = Get_All_The_Possible_Items()
        random.shuffle(self.items)

    def do_work(self, nitems):
        for _ in xrange(nitems):
            item = self.items.pop()
            Process_An_Item_And_Update_Results(item)
        Display_Results_So_Far()
        

def main(argv):
    pname = "work.pickle"
    bname = "work.pickle.bak"
    if os.path.exists(pname):
        # A pickle exists! Restore the Work from
        # it so we can make progress.
        with open(pname, 'rb') as pfile:
            work = pickle.load(pfile)
    else:
        # This must be the first time we've been run.
        # Start from the beginning.
        work = Work()
        work.initialize()

    while True:
        # Process 25 items, then checkpoint our progress.
        work.do_work(25)
        if os.path.exists(pname):
            # Move the old pickle so we can't lose it.
            shutil.move(pname, bname)
        with open(pname, 'wb') as pfile:
            pickle.dump(work, pfile, -1)


if __name__ == '__main__':
    main(sys.argv[1:])

The "methods" in the Strange_Camel_Case are pseudo-code where the actual particulars would get filled in. The Work object is pickled every once in a while, and when the program starts, it reconstitutes the Work object from the pickle so that it can pick up where it left off.

The program will run forever, and display results every so often. I just let it keep running until it seemed like the random sampling had gotten me good convergence on the extrapolation to the truth. Another use of this skeleton might need a real end condition.

Books for Ben?

Thursday 9 June 2011

My son Ben is 13, is very creative, but doesn't like reading books. I want to find him some new books to try, ones that will appeal to him. He's liked some books, for example, Percy Jackson and the Olympians he loved. He's long been fascinated with Dante's Inferno, and at a young age, Genesis caught his attention. He plays a lot of video games, and lately has been talking a lot about Celtic mythology.

He read almost all of Harry Potter and a few of the Artemis Fowl books. He liked the Wimpy Kid series, but wouldn't consider it now that he's older. So I'm looking for more that will appeal to him. Graphic novels are no problem, but I feel like there are standard prose books that he would like if only we could find them. He has a dark, intense sensibility, there must be stuff out there to match.

Ideas?

Coverage.py v3.5 beta 1

Sunday 5 June 2011

Coverage.py v3.5 beta 1 is available now. 3.5 adds two long-requested features: convenient navigation in the HTML report to find flagged sections of code, and better control over partial branch warnings.

In the HTML report, there are hotkeys to navigate within your source code. '1' takes you to the first highlighted region, and 'j' and 'k' move up and down through them. Click the keyboard icon in the upper right of the report to see the complete list of hotkeys.

HTML reporting should now be faster in the common case of re-generating the report after making incremental changes to your code. Files which haven't changed won't be regenerated, speeding the entire process.

Coverage.py's branch coverage has been a bit simplistic. For example, it would complain that a "while True:" was partial, because it never finished the loop. Now a few simple constructs like that are understood properly right out of the box. In addition, there is a "no branch" pragma that you can use to mark your own intentionally-partial branches.

There have been lots of other changes, so take it for a spin. Drop me a line if you find any problems.

BTW: I've created a coveragepy-announce mailing list for new version announcements. Subscribe if you'd like an email when I release new versions.

Filenames with accents

Wednesday 1 June 2011

I'm working on projects for Threepress, and they have a good, extensive test suite. I was surprised when a test failed on Ubuntu that had always passed on their Macs.

The test in question was trying to open a file by name, no big deal, right? Well, in this case, the filename had an accented character, so it was a big deal. Getting to the bottom of it, I learned some new things about Python and Unicode.

On the disk is a file named lé.txt. On the Mac, this file can be opened by name, on Ubuntu, it cannot. Looking into it, the filename we're using, and the filename it has, are different:

>>> fname = u"l\u00e9.txt".encode('utf8')
>>> fname
'l\xc3\xa9.txt'
>>> os.listdir(".")
['le\xcc\x81.txt']

On the Mac, that filename will open that file:

>>> open(fname)
<open file 'lé.txt', mode 'r' at 0x1004250c0>

On Ubuntu, not so much:

>>> open(fname)
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'l\xc3\xa9.txt'

What's with the two different strings that seem to both represent the same text? Wasn't Unicode supposed to get us out of character set hell by having everyone agree on how to store text? Turns out it doesn't make everything simple, there are still multiple ways to store one string.

In this case, the accented é is represented as two different UTF-8 strings: both as '\xc3\xa9' and as 'e\xcc\x81'. In pure Unicode terms, the first is a single code point, U+00E9, or LATIN SMALL LETTER E WITH ACUTE. The second is two code points: U+0065 (LATIN SMALL LETTER E) and U+0301 (COMBINING ACUTE ACCENT). Turns out Unicode has both a single combined code point for accented e, and also two code points that together can mean accented é.

This demonstrates a complicated Unicode concept known as equivalence and normalization. Unicode defines complex rules that make it so that our two strings are "equivalent".

On the Mac, trying to open the file with either string works, on Ubuntu, you have to use the same form as is stored on disk. So to open the file reliably, we have to try a number of different Unicode normalization forms to be sure to open it.

Python provides the unicodedata.normalize function which can perform the normalizations for us:

>>> import unicodedata
>>> fname = u"l\u00e9.txt"
>>> unicodedata.normalize("NFD", fname)
u'le\u0301.txt'

Unfortunately, you can't be sure in what normalization form a filename might be. The Mac likes to create them in decomposed form, but Ubuntu seems to prefer composed form. Seems like a fool-proof file opener would need to try the four different normalization forms (NFD, NFC, NFKD, NFKC) to be sure to open a file with non-ASCII characters in it, but that also seems like a huge pain. Is it really true I have to jump through those hoops to open these files?

« | » Main « | »