|Ned Batchelder : Blog | Code | Text | Site|
» Home : Blog
I'm casual friends with a yoga student and teacher on Facebook, and he recently posted a video. It's a yoga teacher demonstrating two extremes of bad form, and the happy medium:
I almost didn't click on the video at first, because there's a lot of stuff that goes by on Facebook that I don't look at. But my friend described it as funny, and funny is a prime motivator for clicking.
And the video is funny, because David Swenson does a good job mocking his bad students. But the thing that stuck with me was his "happy medium" demonstration, and just how smooth and relaxed he was while moving far more than most people do.
His motions have stuck with me, and I've been trying to be aware of tension and relaxation in my own muscles. I know it's not why Ross posted the video in the first place, but you never know where you might find inspiration.
One simple thing: I try to take my hands off the steering wheel when I come to a stop at a red light and just breathe deeply. Lord knows one place we could use more relaxation is in the car...
Everyone likes to complain about Python packaging, me included. It's no fun, and it's confusing, and it seems to be in a constant state of flux, and the languages and the distros can't agree on who's in charge, etc, etc.
I'll add just one small observation: setuptools is widely used as a way to install Python packages. Look at its latest version: 0.6c11. In case it's not familiar to you, that "c" in there means this is the eleventh release candidate for version 0.6. What!? Do we really believe that a release engineer is closely monitoring the health of this package, and is about to release 0.6, but needs this code tried first? And that this is their eleventh attempt at building an acceptable 0.6 kit?
This is crazy. According to PyPI, the first release candidate of 0.6 was posted 3 years 4 months ago, and the first beta was posted 14 months before that. What are we waiting for? Considering how widely adopted this package is, and how dependent we are on it working properly, and how little it has changed, this latest code should have just been called 1.1.
I've long held that it's a plague on open source that so many project insist on staying in the zeros. But setuptools takes it even further than most with this insane charade that somehow these are release candidates.
It's just one more small reason I say, Welcome to Distribute.
Coverage.py v3.2 beta 3 is available, with a fix for the debilitating memory leak introduced in 3.2 beta 1. If you tried using the latest coverage.py and found it consumed all your RAM, this one will work much better.
Other changes since beta 1:
Here's a detailed story of finding and fixing a memory leak in a Python C extension, complete with testing mysteries.
A user of coverage.py reported that running it consumed all his memory. No one else had mentioned anything like this, but it's a beta version, and those don't get nearly the attention as regular releases.
He helpfully had reproduced the problem on a large public test suite by running Django's test suite under coverage. I tried running the Django test suite with the latest coverage code, and sure enough, the memory ballooned. A plain run consumes about 100Mb, but with coverage it was up to 500Mb. I expect running with coverage.py to use a little more memory, since it has to collect its data in memory, but the amount of data should be very small, proportional not to the execution time, but to the number of distinct lines of code executed.
Coverage.py has two different trace modules, one in C and one in Python. Forcing the Python tracer (by using the --timid flag) showed no memory leak, confirming my suspicion that the C code was at fault.
I made a small test file to run for a long time:
Running it, nothing untoward happened, no memory leak. OK, so something was different between Django and my test file. Of course, lots of things are different between them, the task was to figure out what was the important difference.
Poring over the C tracer code, there are plenty of interesting program events that cause stuff to happen internally: calling a function, returning from a function, the first execution in a file, raising an exception, deep recursion, and so on.
I added some statistics gathering in the tracer code to see which of these might be happening enough to correlate with the memory leak. Exceptions and deep recursion seemed like likely candidates since my test code had neither.
It turned out that the Django code didn't recur deeply: it never got past the internally-interesting milestone of 100 stack frames. It did throw a lot of exceptions: the full test suite raises 1,862,318 of them. Examining the code, though, there's not much happening there that could leak, and disabling that code proved that it wasn't the issue.
How about first executions in a file? The Django test suite touches 9300 files (or 9300 different ways to refer to files). This is certainly more than my test file so maybe it was the source of the leak. When a new file is visited, the tracer calls back to a Python function to determine if the file is interesting enough to trace. One of the arguments to the callback is the stack frame. This seemed like a good candidate for a leak: mishandling the reference count on a frame could keep it around forever, and frames refer to lots of other things, so tons of memory could leak each of those 9300 times.
But a few experiments, ending with completely disabling the callback, proved that this code was not to blame either. I was running out of places to look.
The per-line code was about all that was left. It didn't make any sense for this to be the problem, since my test file runs many many lines just like Django does. But just to rule it out, I commented out the line of C code that adds the line number to the data dictionary. To my surprise, the test suite no longer leaked memory!
Here's the buggy line (simplified):
PyDict_SetItem(file_dict, PyInt_FromLong(frame->f_lineno), Py_None);
The problem is that PyInt_FromLong returns a new reference, and PyDict_SetItem doesn't steal the reference. This is a memory leak. (If you don't know what I'm talking about with this reference stuff, take a peek at A Whirlwind Excursion through Python C Extensions.)
The fix was straightforward; I had to explicitly release my reference on the integer object:
PyObject * this_line = PyInt_FromLong(frame->f_lineno);
Running this on the Django tests showed that the problem was fixed! Great!
But back to the remaining mystery: why did my test file, which executed that same buggy line billions of times, not leak? The docs for PyInt_FromLong provided the answer:
Aha! Because my test file was only 10 lines long, the line numbers were all less than 256, so the integer objects returned by PyInt_FromLong were all pre-allocated and held for the lifetime of the process. The buggy code was mishandling the reference count, but it didn't matter, since no new objects were allocated.
Finally I had the answer to my question: the difference between my file and the Django source is that Django has files longer than 256 lines.
And in fact, using the buggy tracer code, and adding 300 blank lines to my test file proved the point: once my simple code had line numbers above 256, it leaked like a sieve!
If there's a lesson in this, it's that complex systems introduce leaky abstractions in unexpected places. This has unfortunate implications for testing: how do I know if I've got test cases that can ferret out potential problems like this? Does white-box testing extend to reading up on the internal details of Python integers? If I add test cases with 300 lines of source, how do I know there isn't some other effect that only kicks in at 5000?
Be careful out there...
I've noticed in the office, people often refer to meetings with just an adjective phrase, letting "meeting" be implied: "I have to go, I have a 2:00". Why don't we carry this to its logical extreme? Let's just say things like,
After conversion using ditaa, the above file becomes:
I haven't tried it, so I don't know where the parsing falls down, but the samples in the doc sure look better than I thought they would...
Coverage.py v3.2 beta 1 is available, and it's got a big new feature: branch coverage. It's been a long time coming, but I'm pretty pleased with the results. I'm very interested to hear whether this is useful, and what could be improved.
Coverage.py now supports branch coverage measurement. Where a line in your program could jump to more than one next line, coverage.py tracks which of those destinations are actually visited, and flags lines that haven't visited all of their possible destinations.
def my_partial_fn(x): # line 1
In this code, the if on line 2 could branch to either line 3 or line 4. Statement coverage would show all lines of the function as executed. But the if is always true, so line 2 never jumps to line 4. Even though line 4 is executed, coverage.py knows that it was never because of a branch from line 2.
Branch coverage would flag this code as not fully covered because of the missing jump from line 2 to line 4.
How to measure branch coverage
To measure branch coverage, run coverage.py with the --branch flag:
coverage run --branch myprog.py
When you report on the results with "coverage report" or "coverage html", the percentage of branch possibilities taken will be included in the percentage covered total for each file. The coverage percentage for a file is the actual executions divided by the execution opportunities. Each line in the file is an execution opportunity, as is each branch destination.
Currently, only HTML reports give information about which lines had missing branches. Lines that were missing some branches are shown in yellow, with an annotation at the far right showing branch destination line numbers that were not exercised.
How it works
When measuring branches, coverage.py collects pairs of line numbers, a source and destination for each transition from one line to another. Static analysis of the compiled bytecode provides a list of possible transitions. Comparing the measured to the possible indicates missing branches.
The idea of tracking how lines follow each other was from C. Titus Brown. Thanks, Titus!
Some Python constructs are difficult to measure properly. For example, an infinite loop will be marked as partially executed:
while True: # line 1
Because the loop never terminates naturally (jumping from line 1 to 6), coverage.py thinks the branch is partially executed.
Currently, if you exclude code from coverage testing, a branch into that code will still be considered executable, and may result in the branch being flagged.
A few other unexpected cases are described in branches.html, which also shows how partial branches are displayed in the HTML report.
The only way currently to initiate branch coverage is with the command-line interface. In particular, the nose coverage plugin has no way to use it.
One interesting side effect of tracking line transitions: we know where some exceptions happened because a transition happens that wasn't predicted by the static analysis. Currently, I'm not doing anything with this information. Any ideas?
I'm sure this will be controversial: an infographic summarizing the differences between the political Left and the political Right:
(keep clicking through to get to a full-size readable version).
Probably both sides will find things to complain about, though I suspect the right will be more upset. The terms used there seem negative to me, or is that just because I'm on the left?
Interesting that they made a "US" version, which simply has the colors reversed! I knew our settling on blue for left was a recent innovation, I hadn't realized that the rest of the world long ago settled on the opposite.
OK, this is really astounding: first, it's a Lego kit box made out of Lego, not sure why it hasn't been done before. But then he opens the box, and inside is a pop-up temple:
Continuous integration is a great idea: you configure a server to periodically pull your source code, build it, run the tests, run lint, measure coverage, and so on. Then it graphs everything, stores the results for examination, and so on.
I'd been trying to figure out how to use the Hudson CI server with Python, and the few times I tried to get my mind around it, it just wasn't clicking. I happened to mention my mental block to Joe Heck, and a few days later, he produced Setting up a python CI server with Hudson. It's a great step-by-step how-to covering everything you need to get Hudson going for a Python project.
Running through his guide finally cleared the last misconception for me: continuous integration isn't a build tool or a test runner. You don't run Hudson on your development machine. Sounds silly, but something needed to clear it up, and this was it.
For Halloween, Ben dressed as Link from the Legend of Zelda video games. We made a shield for him in a hurry, with corrugated cardboard and colored electrical tape. It was pretty good for an hour's work. But hchiu completely outdid himself on a similar task: Life-Sized Link, 831 pieces printed on 197 sheets of paper, cut, folded, and glued into a life-sized statue. OMG!!
BTW, there's an entire subculture of Nintendo papercraft with some very dedicated participants...
This is just genius, by love all this:
I've seen clever decorations of Mac laptops incorporating the Apple logo before, but this Magritte-inspired sticker is one of the best. Their other stickers, for walls as well as laptops, are also good, with a nice dash of whimsy.
I had an idea the other day for a way to measure a car's tire pressure that wouldn't require any action on the part of the driver. Think of it as a remote passive tire gauge. The problem is, I don't know if it would work.
The idea is to put a plate in the ground at gas pumps. The plate would have sensors that measure two things: the weight of the car, and the area of the tires' footprint. Dividing the weight by the area gives pounds per square inch, the inflation of the tires. For more useful information, the plate can be divided in four to measure each tire independently.
Will it work? I know the footprint of the tire increases with weight. Imagine putting a large load on top of your car: it will settle lower, flattening more of the tire against the ground, increasing the footprint at the same time that the weight increases because of the load.
But is the relationship direct enough to make accurate measurement possible? One co-worker argued that low-profile tires are much wider than high-profile tires, and so have larger footprints, even with the same inflation pressure. I think it's possible that the structural support of the tire itself (that is, the rubber) makes the measurement useless. A large portion of the tire's response to the weight of the car is not related to the air in the tire. Imagine a completely flat tire: it doesn't have a footprint large enough for the division to come out to 5psi, for example.
So, is this a useful idea? Does it get us any part of the way to alerting drivers that their tires are low? I think it would be useful to tell people that without them having to check the pressure themselves.
PS: if anyone out there builds this thing and becomes fabulously wealthy, I expect a cut!