« | » Main « | »

Movie title stills collection

Wednesday 29 July 2009

Christian Annyas is doing a great job collecting movie titles on his Movie Title Stills Collection. The movie titles are a wonderful time capsule of design aesthetics throughout the 20th (and early 21st) century, with each decade displaying its own characteristic look.

The site itself is really nicely designed, too, with a terrific movie-promotion feel to it. Christian links to two other movie title sites, and it's interesting to see the contrast. Steven Hill's Movie Title Screen Page has a welcome-to-1996 feel to it, with bits of content all over the page, mixed with ads in an incoherent pile. And Movie Title Screenshots has many more images (5700!), but in an page that feels like an undistinguished gamer's-forum warehouse.

In the deeper end of the pool, The Art of the Title Sequence doesn't just collect images, they provide the entire sequence, often with commentary, interviews with the creators, or preliminary work.

Lastly, we can't talk about movies and design without including Mark Simonson's Typecasting, about "the use (and misuse) of period typography in movies", an obsessively detailed look at the type that flies by on props, both accurate and inaccurate. Mark's been adding to the series as his snark moves him.

Back from Paris

Tuesday 28 July 2009

Susan and I took Max and Ben to Paris last week. If you're interested, here's a tabblo of photos from the week:

Tabblo: Paris 2009

Here are some other random observations...

I was impressed at how courteous the drivers were to pedestrians if you were crossing at the right place and time. The drivers always stopped to defer to those on foot. On the other hand, if you were not in a crosswalk, or the light was not in your favor, you took your life in your hands.

Smart cars and motorcycles and bicycles are much more common than in the US, naturally. We were tempted to rent the public Velibe bicycles, but you need a European-system credit card, and anyway, then we'd be bicycling in the Paris traffic, a daunting prospect.

Small things about the cityscape were different. For example, the street signs are attached to the corners of buildings rather than hung over corners or streets. I knew this before. But I hadn't realized that street lights are hung on the near side of intersections rather than the far side. This means the first driver can't see the signal, so they hang a smaller light at drivers-eye level pointed at the first driver so that he knows when to start.

The Metro worked great for us, and even here there are small differences: on some lines, the doors don't open unless you work a lever or button to open them when the train stops. Also, each platform only has a single line running through it, so the trains have no identification on the front. If you're on the platform for the 4 train, then only number-4 trains will come along, so no need for them to identify themselves as 4's.

Also in the Metro, we once came upon a 9-piece band playing their music, including two accordions and a cello. They were playing up a storm, but the Parisians didn't seem to think it was so unusual, so I guess it wasn't.

I know it seems cliched to say it, but the Eiffel Tower is really really big. And as large as it seems to me now, I can't imagine how the natives dealt with it in 1889 when it was built. We're used to being up high, what with skyscrapers and airplanes. But back then, it was the tallest building in the world, almost twice as tall as the runner-up, the Washington Monument. It's well-known that Parisians considered it ugly, but how did they make sense of its sheer size?

The Tour de France ended the day after we left, and the final eight laps of downtown Paris carried the riders right past our hotel on Rue de Rivoli (or as I heard a woman on the Metro call it, Rue de Ravioli). I watched the final on the internet, and finally began to understand something about the intricacies of the race.

The catacombs were every bit as spooky as you'd imagine, with thousands of bones piled in abandoned quarry tunnels. The oddest thing about it, though, may be that when you emerge from the depths, you are in what looks like a nurse's office, with a bench and a defibrillator machine, and two skulls nonchalantly sitting on top of the first-aid kit. Then you walk outside onto an ordinary side street with no idea how to get back to where you entered the crypts.

I enjoyed exploring a new cityscape very much, and though we won't be going back to Paris soon (it's expensive!), I'd like to be able to hold on to that spirit of adventure in other ways. It was fun seeing what most appealed to the rest of my family: Susan (treats, flowers, beauty, history), Max (capturing video, speaking French), and Ben (climbing, exploring, rides).

Coverage.py on Python 3.x

Monday 13 July 2009

Last week I went through the exercise of making coverage.py compatible with Python 3.1. I learned a few things along the way.

At first, I wanted to create code that would work on both 2.x and 3.x, but while that is possible to an extent, there are syntactic differences that make it impossible. Then I toyed with the idea of a preprocessor-like tool that would let me have 2.x lines and 3.x lines together in one file, but it seemed like more trouble than it was worth.

In the end I went back to the standard 2to3 tool. I had thought that this tool was meant to be a starting point for creating a 3.x codebase, and I started using it that way. But a recommendation I read somewhere suggested using it not once at the start of the project, but as a build step to create your 3.x kit from your 2.x sources. This is what I ended up doing.

2to3 is impressive: it runs over your source files, changing code to work under 3.x. It doesn't always do the best thing, but I never saw it do the wrong thing.

My process is to copy my whole source tree into a "three" directory, then run 2to3, then run the unit tests. After fixing a problem, repeat the process. This seems like an odd way to run with an interpreted language, but works really well, and lets me keep one code base. It doesn't run as-is on 2.x and 3.x, but it's one set of files that produces code that runs on both.

The bulk of the changes I had to make were in the tests rather than in the coverage.py code itself. Coverage.py's tests consists of many small snippets of code, often in strings, so 2to3 wasn't able to fix it all up for me. In these snippets, I had often used print statements where any statement would do, so I ended up converting a lot of these to assignments. Where I really did want printing I added parentheses to make them compatible between 2.x and 3.x.

Here are some other differences I had to accommodate:

  • There's no setuptools on 3.x. The one feature I really used from it was its auto-generated coverage script, so I wrote a simple script and conditionalized setup.py. Unfortunately, it means I can't use "coverage" as a command name under 3.x, but have to run it as "python /blah/Python3.1/Scripts/coverage etc".
  • 3.x no longer has os.popen4, so I wrote a helper function to run commands, with different implementations for 2.x and 3.x.
  • filter() is no longer available, but easy to replace.
  • exec is no longer a statement, so those tests had to be conditionalized by version.
  • Variables in comprehensions are local to the comprehension in 3.x, whereas they are available outside the comprehension in 2.x. This ended up making a difference because of the way coverage tracing doesn't start until the next call, and in 3.x, the new scope for comprehensions means it is traced as a call.
  • Bytes vs. strings: this took a few go-rounds to get right, and wasn't helped by the fact that the 3.x docs say write() takes a string. It doesn't: what it expects depends on how the file was opened. In binary mode, write() expects a bytes argument, in text mode, it expects a string. This makes perfect sense, and is part of the new logical goodness in 3.x, but I learned about it the hard way. I dealt with it by moving around some encode() and decode() calls, and still might have it a little wrong, but it works, so I don't think so.
  • Comparison special functions: __cmp__() is gone in 3.x. This is too bad, since now I have to implement the comparison as __lt__() and __eq__(). But once I do that, the 2.x code doesn't work properly, since it wants all six functions defined. So where I used to have __cmp__(), now I have __lt__(), __le__(), __eq__(), __ne__(), __gt__(), and __ge__(). Is there a simpler way to define custom comparisons that work on both 2.x and 3.x?
  • The module containing the built-in functions has changed. In 2.x, it's __builtin__, in 3.x, it's builtins.
  • An obscure difference: at one point in my tests I was appending to the PYTHONPATH environment variable, and doing it repeatedly, adding the same entries over and over again. In Python 2.x, this worked fine. In 3.x, once the variable got longer than some limit, it was ignored, and my tests failed. I hadn't meant to append repeatedly like that, so I fixed the tests not to, but I don't know why 3.x minded when 2.x didn't.

After all of these changes, now I have code that passes all its unit tests on 3.x. I still haven't tackled packaging kits for 3.x, but that's next.

Palin on the loose

Sunday 12 July 2009

I'd been staying away from reading political blogs since the election hubbub died down, but Sarah Palin's resignation drew me back in. I wanted to know what her supporters would make of her latest unconventional move.

It was interesting to see that they were as divided as anyone else over what they thought it meant: was she going to run for president or not, and did the move fatally cripple that effort?

I predict that she does intend to run for president, but it won't go anywhere, and she will find a home in the conservative media. She has always seemed more Rush than Reagan to me anyway. She's adored by her fans, and has a strong following in the rightmost parts of the GOP. But she seems to have no ability or interest in bridging to the more moderate elements, much less attracting undecided and independent voters.

She does not seem to have used her time since the election to improve her weak points. If she wants to successfully run for president, she's going to have to figure out how to do a thoughtful interview. When she speaks, she often seems confused and wandering. Through all of the Palin record, the thing that scares me most about her is the Katie Couric interview where she couldn't name a Supreme Court ruling she disagreed with, or a newspaper that she read. Her latest press event was her resignation, where even on her own terms on her lawn she seemed to meander around some confusing ideas: I'm leaving office but I'm not a quitter?

She's going to have to learn to drive home a message. She's good at rallies of the faithful, but more as a t-shirt cannon than a thought leader. Lobbing slogans into the cheering crowd is very different than persuading skeptics and bringing people to your point of view.

As an early indication of the marketability of the Palin brand, Vulnerable GOPs want Palin to stay home, believing that having her on the stump with them would hurt their chances.

Whatever happens, Palin makes for interesting political theater. She and the media are drawn to each other like a moth to a flame, though I'm not sure which is which. It'll be interesting to see what happens.

Coverage.py v3.0.1

Monday 6 July 2009

I've just released Coverage.py v3.0.1, a bugfix release of my code coverage measurement package for Python code. Since releasing 3.0 three weeks ago, a few serious bugs have surfaced, and this release fixes them:

  • Removed the recursion limit in the tracer function. Previously, code that ran more than 500 frames deep would crash.
  • Fixed a bizarre problem involving pyexpat, whereby lines following XML parser invocations could be overlooked.
  • On Python 2.3, coverage.py could mis-measure code with exceptions being raised. This is now fixed.
  • The coverage.py code itself will now not be measured by coverage.py, and no coverage modules will be mentioned in the nose --with-cover plugin.
  • When running source files, coverage.py now opens them in universal newline mode just like Python does. This lets it run Windows files on Mac, for example.

As always, let me know how it is or is not working for you.

A nasty little bug

Sunday 5 July 2009

James Bennett wrote to me the other day to ask for help with a problem using coverage.py. Some code he was measuring behaved very strangely: according to coverage.py, the first half of the function was executing, but the second half wasn't. And this was happening for a whole group of functions. These were test functions for XML-producing Django views, and each parsed the XML. The call to the XML parsed was the last function executed each time, none of the lines after it were detected as run.

My first thought was that the XML parsing was throwing an exception, the simplest explanation for why execution would stop in the middle of a function. It made sense, since it was a similar operation in each function exhibiting this behavior. And James had already had another problem with this same code where an unexpected exception was throwing off his tests.

But James is a clever boy, and proved that exceptions were not to blame by adding asserts, print statements and so on. The lines were really being executed, but coverage.py didn't think they were. Something more interesting and unusual was at work. I broke out my old quip:

I learn something new every day, no matter how hard I try.

To understand what happened here, I need to explain a little about how coverage.py works. Python provides a trace feature: you register a function with sys.settrace(), and it is called as code is executed. It is passed an event argument indicating what happened: "call" for entering a new scope, "return" for leaving a scope, "line" for a line of code executed, and "exception" for an exception being raised.

One of the changes I made in coverage.py 3.0 was to write a more sophisticated trace function which uses the nesting of call and return events to keep track of what file we're running in and whether to record execution in that file.

Back to the bug: after an intensive day of adding extensive logging to the trace function and poring over 1000-line log files, I found a sequence like this:

    line expatbuilder.py 222
     line expatbuilder.py 223
     call pyexpat.c 905
        call expatbuilder.py 892
           line expatbuilder.py 894
           line expatbuilder.py 900
           exception expatbuilder.py 900
           return expatbuilder.py 900
        exception pyexpat.c 905
        exception expatbuilder.py 223
        line expatbuilder.py 225
        line expatbuilder.py 226

The first odd thing to note about this is the third line, where we seem to be entering a new scope at line 905 of pyexpat.c. That's odd because the trace function doesn't get called for execution in C files, only in Python files! And then further down, there's an exception at line 905 in pyexpat.c, but no return from that line. All my experience with trace functions said that "call" and "return" events would always match. If a scope is left due to an exception being thrown, then there would be an "exception" event and a "return" event, like we see a little higher, at line 900 of expatbuilder.py.

Digging into pyexpat.c itself reveals a strange thing: for some reason, this module calls the trace function explicitly, using "pyexpat.c" as a file name. That explains how the trace function could see inside C code: it can't, this module alone among all of the modules in the Python distribution calls the trace function to report on what it is doing. I have no idea why.

And further, it does it a little wrong: in the other place where the trace function is called, the interpreter itself, exceptions always generate a return event when they leave a scope. Pyexpat doesn't do this: if an exception comes up through the C code, it will generate an exception event, but no return event to indicate the scope is exiting. My trace function counts on matching call and return events to keep the bookkeeping straight.

What was happening in James' code is that inside the XML parser, an exception is being thrown (something that happens internally when parsing doctype declarations), which caused a return event to be skipped, which threw off my trace function's bookkeeping, which made it think that James' code was actually inside the XML parser module, which is code that shouldn't be measured (since it is part of the Python standard library, and who wants to measure that), so the second half of James' code wasn't measured. Whew. Mystery solved, now what to do about it?

The first order of business was to write a test showing the problem. It took a while to get a small code sample, because without a doctype declaration in the XML chunk, there was no internal exception to throw off the event pairing. The line numbers in the logging files helped me figure out the source of the exception.

I took a few stabs at fixing the problem, because I was hoping for a general approach to the problem of detecting mis-matched call and return pairs, but I didn't see a way to do that. In the end I took the unfortunate step of checking for "pyexpat.c" in the file name reported to the trace function, and assuming that an exception event from there needed a missing return event to be synthesized. You can see the unfortunate blot in the tracer code for yourself.

For good measure, I wrote a bug against pyexpat, but I'll be surprised if anything is done about it. In fact, I kind of hope nothing is, since I'll have to further complicate my workaround if there are newer versions of pyexpat that do the right thing.

OK, problem fixed, time to run my newly-enhanced test suite against all the versions of Python that coverage.py supports, 2.3 through 2.6.

Uh-oh: 2.3 failed.

Yet more digging, and it turns out that Python 2.3 doesn't properly match call and return trace events: it shows the same problem with Python code that pyexpat does. In Python 2.3, an exception that leaves a scope will fire an exception event but no return event. D'oh!

2.3 is on the trailing edge of my support list, so I'm not willing to twist the code too badly for it, but I want to make it work.

In coverage.py 3.0, there are two trace functions: a fast one in C, and a slow one in Python for installations that can't build the C extension. I didn't want to complicate the C extension, so to keep Python 2.3 working, I tweaked the Python trace function. Now Python 2.3 always uses the Python trace function, which is too bad, since the C implementation is much faster. But it's better than dropping support for Python 2.3 altogether.

This style of development is unfortunately par for the course for coverage.py. It's a natural outcome of a few goals I have for it:

  • Accuracy: coverage.py's job is to help developers understand their code. It has to give accurate information, or it's just adding new mysteries for them to solve. Claiming lines aren't executed when they are executed is unacceptable.
  • Applicability: if you are building real code in Python, coverage.py should work for you. This means it can't opt out of older Python versions, or specific modules from the standard library.
  • Convenience: it shouldn't be a burden to run coverage testing on your code, it should be fast and easy.

I wish I didn't have to have pyexpat-specific code in the trace function, but I don't see a way to avoid it. People should be able to measure code coverage of XML parsing code without having to read a footnote somewhere that says it doesn't work.

Update: Funny how things work out. I had worked on this bug for a day or two, trying all sorts of ways to fix it. I thought I had found the best way, even though it wasn't very nice. Then I wrote this blog post, and Andre commented on it, saying he thought there had to be a better way. While I was writing a response along the lines of, "no, there really isn't, and here's why," a new idea occurred to me.

Each event is passed the frame object. The essence of a return event is that the next event will be in a different frame. Not only that, but we know what frame it will be in: the parent of the return event. When an exception event happens, there are two possibilities:

  • the exception is being caught, and the next event will be a line event in the same frame.
  • the exception is causing us to leave the scope, and the next event will be a return event in the same frame.

Either way, the event after an exception should be in the same frame. If the return event is incorrectly skipped, the next event will be something happening in the parent of the exception's frame.

So that's now how I detect missing return events: when an exception happens, record the parent frame pointer. If the next event happens in that frame, then we're missing a return, and can fake one. I didn't do the more straightforward thing of recording the frame the exception happened in, and checking against that, because by the time we get the next wrong event, that frame has been destroyed, and I didn't like keeping a stale pointer to check against.

This fix is much nicer:

  • There's no hard-coding of "pyexpat.c" in the code,
  • If pyexpat is ever fixed, this code will still work, and
  • Python 2.3 can still be handled by the C trace function.

Thanks everyone, for being my sounding board! The new nicer fix is viewable on bitbucket.

BTW: I wanted to say, "thanks for being my wooden Indian," from an idea I read about (I thought) on the c2.com wiki about how if you explain something to someone, they don't have to respond at all, just the act of talking it through will help you understand it better, so you might as well explain your problem to a wooden Indian. But I can find no trace of it now.

« | » Main « | »