« | » Main « | »

Coverage.py v3.1 beta 1: Python 3.x and Cobertura output

Sunday 27 September 2009

A beta of Coverage.py 3.1 is available. Coverage.py is a tool for measuring code coverage of Python programs, usually during testing. The big feature of 3.1 is that Python 3.1 is now supported.

Kits are available as source or as Windows installers from the coverage.py page on PyPI, and code is also available from the repository on bitbucket.

Significant changes in coverage.py since v3.0.1:

  • Python 3.1 is supported. The same source kit works on both 2.x and 3.x.
  • The "coverage" command now uses a sub-command syntax similar to source control systems. This will make new feature additions easier.
  • Coverage results can be reported as a Cobertura-compatible XML file. Use the new "coverage xml" command. I'm looking for users who use Hudson or Sonar to ensure that this is working properly in all cases.
  • Some users reported incorrect results due to using DecoratorTools, which fiddles destructively with the settrace function. TurboGears is a major example of code that wasn't measured properly. The new --timid switch makes coverage.py operate simply enough that DecoratorTools doesn't interfere with its operation.
  • HTML coverage reports now have syntax-colored Python source.

Please try 3.1b1 and let me know what you think. Feedback is welcome in any way you like, but particularly good are tickets on bitbucket, or email on the testing-in-python mailing list.

Line continuations from tokenize.generate_tokens

Thursday 24 September 2009

OK, this is really geeky, but I wish I had found it on the interwebs, so I'm putting it here for the next guy.

tokenize.generate_tokens is a very useful function in the Python standard library: it tokenizes Python source code, generating a stream of tokens. I used it to add syntax coloring to the HTML reporting in coverage.py.

But it has a flaw, which the docs hint at:

The line passed (the last tuple item) is the logical line; continuation lines are included.

If you've continued a source line with a backslash:

def my_function(arguments):
    a = very_long_function(arguments) + \
        another_really_long_function(arguments) + \
        so_that_we_have_to_wrap_the_line_with_backslashes()

then generate_tokens doesn't ever give you a token with that backslash as the text. If you're trying to recreate the Python source from the tokens, the backslashes will be missing.

Googling this problem turns up some muttering about how something ought to be done about it, but no solutions. It turned out not to be too hard to wrap the token generator to insert the needed backslashes:

def phys_tokens(toks):
    """Return all physical tokens, even line continuations.
    
    tokenize.generate_tokens() doesn't return a token for the backslash
    that continues lines.  This wrapper provides those tokens so that we
    can re-create a faithful representation of the original source.
    
    Returns the same values as generate_tokens()
    
    """
    last_line = None
    last_lineno = -1
    for ttype, ttext, (slin, scol), (elin, ecol), ltext in toks:
        if last_lineno != elin:
            if last_line and last_line[-2:] == "\\\n":
                if ttype != token.STRING:
                    ccol = len(last_line.split("\n")[-2]) - 1
                    yield (
                        99999, "\\\n",
                        (slin, ccol), (slin, ccol+2),
                        last_line
                        )
            last_line = ltext
        yield ttype, ttext, (slin, scol), (elin, ecol), ltext
        last_lineno = elin

Use it by passing it the generate_tokens generator:

tokgen = tokenize.generate_tokens(source_file.readline)
physgen = phys_tokens(tokgen)
for ttype, ttext, (slin, scol), (elin, ecol), ltext in physgen:
    # Blah blah, process tokens as usual

Accidental haikus

Wednesday 16 September 2009

Jonathan Feinberg has made a neat hack: Haiku Finder. It uses NLTK to parse English text, looking for sentences that happen to fit the syllabic pattern required of haikus. It seems to work really well. I ran it on my longer text pieces, and it found these:

These are personal
    tools, meaning they do just what
I want them to do.

People who visit
    the page in their browser will
see the new entry.

But this powerful
    feature of C++ is missing
in those languages.

I know this sounds like
    coddling, or bending over
backward, and it is.

And maybe you don't
    want to put effort into
improving your log.

It sounds simple, but
    there are right ways and wrong ways
to go about it.

Ask them to tell you
    what they're thinking as they look
for the solution.

If you get it wrong,
    the object will be freed out
from under you: crash!

This is a macro
    that creates the initial
fields in the structure.

It seemed like I was
    buried in that dark harsh towel
cyclone for ages.

Note that it did a great job, but didn't know that "C++" is a three-syllable word, not one syllable.

If you want to try this, you'll have to install NLTK, which is a large package. It requires the punkt dataset, so you have to install that from the NLTK page after the code is installed. The whole process is automated, but perhaps more than you expected, so be forewarned.

You may remember Feinberg as the creator of Wordle, so I expect we'll see more inspired language-related hacks from him...

Why numbering should start at zero

Thursday 10 September 2009

My son Max is taking a computer class in high school, and described a class exercise involving a deck of cards numbered 0 to 51. I asked him if there was any discussion in class about why it wasn't 1 to 52. He said the teacher told them that if they learned C++, they'd understand why it started from zero.

I guess the teacher meant pointer arithmetic makes it obvious, but I remembered a more scholarly exposition: Edsger Dijkstra wrote a brief paper called Why Numbering Should Start at Zero. He lays out a mathematical explanation that has nothing directly to do with pointers.

I'm impressed by Dijkstra because of his methodical approach to even the seemingly most trivial detail of computing. He was willing to stop, think through, and explain why something should be done a certain way, no matter how small.

Earlier today over lunch, we discussed the amazing foresight of the AT&T engineers who laid out the North American area codes. We marvelled at the care they took with designing a solution to a problem that wouldn't hit for a decade or so. How often do we see that kind of care and attention to detail in day-to-day work?

Both examples made me stop and admire true professionals, engineers building carefully, making sure to get it right no matter how far off the consequences. Thanks, guys. We should each try to channel your spirit in our own work.

Xenocode and multiple IE's

Saturday 5 September 2009

One of the banes of a web developer's existence is the need to test their site in Internet Explorer, not just once, but in multiple versions of Internet Explorer. These days, IE's 6, 7, and 8 are pretty much required. Because of their tight integration with Windows, it's difficult to run all three side by side.

There are installers that claim to run them independently, but we've definitely seen side-by-side IE6 behave differently than a true IE6-only machine.

A technology that's been making the rounds of our office is Xenocode, which is a new kind of virtualization: application-level virtualization.

In the heat of a last-minute debugging session, a co-worker insisted I visit their browser sandbox page and click on IE7. My machine has a pristine IE6 install which I have resisted upgrading so that we can have "real" IE6 available. I didn't understand what this Xenocode thing was going to do, so I was nervous.

I clicked on IE7, and after downloaded, it ran, and I had an IE7 window running on my machine. When I started to enter the URL to test, it auto-completed for me! A little spooky for an application that is supposed to be running in a sandbox.

Still curious about how this worked, I used Process Explorer to see what was going on. I expected to see xenocode.exe or something, but it claimed to be iexplore.exe, and it claimed to be in "C:\Program Files\Internet Explorer". Scary. It seemed like it had upgraded me to IE7, but that's exactly what it claimed not to do.

Just to experiment, I ran a Xenocode application that I didn't already have installed on my machine, a poker client. Sure enough, the Poker Stars application ran, and Process Explorer claimed it was running out of "C:\Program Files\Poker Stars". Except that directory doesn't exist on my machine!

Xenocode virtualizes the operating system services seen by the application, and provides a virtual view of the filesystem, registry, and so on. This lie is complete enough that even third-party OS utilities believe it, and report that the code is running from Program Files directories that don't exist.

So far, it looks like it really works. It's a great way to get multiple IE support on one machine. I don't know why you'd need to run other kinds of applications virtually, but it sure seems like impressive technology.

Threading emails

Wednesday 2 September 2009

Trent Mick wrote to me some time ago asking for a feature on this blog: could I make it so that email notifications of blog comments would thread together nicely?

The email subject lines from my notifications look like this:

A comment on "Weird URL data encoding" from Richard Schwartz

I use Thunderbird for email, and don't thread my inbox, so I never considered threading. Trent sent along information from a friend which said that "References:" headers were the key that would make a set of emails into a single thread.

I hacked for a little while, and could not get them to thread. I created a fake message id from the blog post and had all comment notifications have a References header with the id in it. No threading. I added unique Message-ID headers to each comment, then made subsequent comments have all previous message ids in a References header. No threading.

I tried the same in Gmail, and nothing seemed to thread the messages together. Googling around, it seemed others had come to the conclusion that only the subject line matters. Apparently if two messages have the same subject (plus or minus some "Re:" prefixes), then they are in the same thread.

But what is the actual algorithm? I know that there can be differences in the subject lines ("Re:" and all). What are these mail clients doing to decide that two messages are in a thread?

I like having the author name in the subject line, it makes the Inbox listing richer. But it's also what's keeping these messages from threading. Is there a way to get the best of both worlds?

I know I've seen threads in Thunderbird where the subject line changes completely mid-thread. Is that because they have Reply-To headers? Comment notifications aren't replies to each other, but maybe that's a way to force threading?

« | » Main « | »