|Ned Batchelder : Blog | Code | Text | Site|
I fixed five bugs in coverage.py 4.0.2 to produce coverage.py 4.0.3. XML reports now have correct <source> elements in complex cases, and a mysterious problem that seriously borked really unusual cases has been fixed.
I fixed three more bugs in coverage.py 4.0.1 to produce coverage.py 4.0.2. Non-ASCII characters in program source and filenames should work better, and if you are setting your own trace function, that works again... :)
My day job is working on Open edX. It's large, and our requirements files are getting unruly. In particular, our requirements file for installing our other GitHub repos has grown very long in the tooth.
First, we have a mix of -e installs and non-e installs. -e means, check out the git working tree, and then install it as the code. This makes it easy to use the code: you can skip the discipline of writing and properly maintaining a setup.py. Just changing the SHA in the github URL should bring in new code.
We also have inconsistent use of "#egg" syntax in the URLs, and we don't always include the version number, and when we do, we use one of three different syntaxes for it.
Worse, we'd developed a cargo-cult mentality about the mysteries of what pip might do. No one had confidence about the different behavior to expect from the different syntaxes. Sometimes updated code was being installed, and sometimes not.
I did an experiment where I made a simple package with just a version number in it (version_dummy), and I tried installing it in various ways. I found that I had to include a version number in the hash fragment at the end of the URL to get it to update properly. Then another engineer did a similar experiment and came to the opposite conclusion, that just changing the SHA would be enough.
As bad as cargo-culting is, this was even worse: two experiments designed to answer the same question, with different results! It was time to get serious.
An important property of science is reproducibility: another investigator should be able to run your experiment to see if they get the same results. On top of that, I knew I'd want to re-run my own experiment many times as I thought of new twists to try.
So I wrote a shell script that automated the installation and verification of versions. You can run it yourself: create a new virtualenv, then run the script.
I asked in the #pypa IRC channel for help with my mystery, and they had the clue I needed to get to the bottom of why we got two different answers. I had a git URL that looked like this:
He had a URL like this:
These look similar enough that they should behave the same, right? The difference is that mine has an underscore in the name, and his does not. My suffix ('#egg=version_dummy') is being parsed inside pip as if the package name was "version" and the version was "dummy"! This meant that updating the SHA wouldn't install new code, because pip thought it knew what version it would get ("dummy"), and that's the version it already had, so why install it?
Writing my experiment.sh script gave me a good place to try out different scenarios of updating my version_dummy from version 1.0 to 2.0.
Things I learned:
If anyone has more information, I'm really interested.
For Susan's birthday today, a cake about biking through fields of lavender:
This is a dream bike ride for Susan, which she has not yet made real. This cake will have to do for now. It's just a chocolate cake mix, baked into a rectangle pan and two mixing bowls. The bottoms of the mixing bowls became the hillocks. The road is Hershey bars, but she is biking off-road on the scree of crumbled cookies.
After more than 20 months, coverage.py 4.0 is available. This version brings a number of significant changes:
A number of things internal to coverage.py have changed in incompatible ways. Helper tools sometimes build on internal interfaces. If you find that coverage.py 4.0 seems broken in some way, please update to the latest versions of your helper libraries while diagnosing the problem.
If you have any problems, please let me know.
Next up for coverage.py: some form of the often requested feature, "show who tests what". If you are interested in that, get in touch, or comment on the issue.
I've just done a bunch of work on continuous integration for coverage.py. The biggest change is that now I've got Appveyor running Windows tests and building Windows kits.
Appveyor is a good service: the docs are helpful, and the support forum seems to be an obsession with the staff, especially Feodor Fitsner, who have always answered my questions within 12 hours regardless of when I ask them.
Oliver Grisel has a demonstration of building wheels on Appveyor which was very helpful in getting started.
Of course, part of the problem with supporting Windows is that it is unfamiliar to many of us. Appveyor provides the platform on which to run, but we still have to come up with the steps ourselves. And Python is a bit unfamiliar to Appveyor, so the steps include installing Python. It all gets a bit tangled.
The high point in my adventure came with this line:
Explanation: like most CI services, Appveyor is configured with a YAML file. This line is part of the install step before tests are run. It's a Windows command line. Our appveyor.yml file installs a number of versions of Python, because Appveyor doesn't have all the versions we need pre-installed. So each job sets two environment variables: PYTHON is the path to the Python installation directory (for example, "C:\Python35") and TOXENV is the tox environment to use ("py35").
The problem is that tox has a built-in mapping from environment ("py35") to Python directory, and that mapping is wrong if we've had to install custom versions of Python in unusual places. For one thing, we install both 32- and 64-bit versions, in different directories, and Tox doesn't know where to find them.
So this line writes a file called "python3.5.bat" so that when Tox tries to run "python3.5", it will find it. The bat file simply has the actual path to the Python installation in it. The trick with this line was getting all of the escaping right: it's a YAML file containing a Windows command line which runs Python code to write a Windows bat file. "\x25" being the same as "%" definitely saved my sanity.
And getting code like this right is especially tricky because to run it on the CI system, you have to commit it and push it, and wait for the builds. It's like building a ship in a bottle: you can imagine the intricacy you need to create, and you can see the results of your efforts, but you have only a very tenuous set of tools to manipulate the process.
(In fact, as I write this, the Python 2.6 jobs are failing for both coverage.py and python-appveyor-demo, not sure why. It seems like the get-pip.py installation step is failing, but get-pip.py doesn't talk about what it is doing, so I'm not sure what's wrong. Back to the bottle...)
One of the nice things Appveyor does that some other CI systems don't is to hold onto build artifacts so that you can download them directly from Appveyor. This makes building wheels and kits there really convenient. I wrote a script to download all the artifacts from the latest build, so now it's really easy for me to include Windows runs in my coverage measurement, and I can build my own kits instead of having to ask other people to do it for me.
Along the way, I started another tool to help diagnose problems on remote machines: PyDoctor. (I know, there already is a pydoctor, I should probably change the name. Ideas welcome.)
After all the CI work, I feel like I have a vast distributed pinball machine. Every time I commit to Bitbucket:
These services are incredibly useful, but keeping them configured and running smoothly is an art and an adventure in and of itself.
A co-worker mentioned to me the other day that our test runner would run methods with the word "test" in the name, even if they didn't start with "test_". This surprised him and me, so I went looking into exactly how the function names were matched.
What I found was surprising: a single line of code with between three and five errors, depending on how you wanted to count them.
This is the line of code:
Regexes are complicated, and it is easy to make mistakes. I'm not writing this to point fingers, or to label people as stupid. The point is that code is inherently complicated, and scrutiny and review are very very important for keeping things working.
The regex here is using an r"" string, as all regexes should. But notice there are two instances of double backslashes. The r"" string means that the regex will actually have two double backslashes. Each of them therefore means, "match a backslash." So we have a character class (in the square brackets, also called character ranges) with backslash in it twice. That's needless, one is enough.
But looking closer, what's that "b" doing in there? It will actually match a "b" character. Which means that "abtest" will match this pattern, but "bctest" will not. Surely that's a mistake.
Going back in the history of the repo, we see that the line used to be:
That is, the backslashes used to be single rather than double. The doubling happened during a documentation pass: the docs needed the backslashes doubled, and I guess a misguided attempt at symmetry also doubled the backslashes in the code.
But with this older line, we can see that the intent of the backslashes was to get "\b" and "\." into the character set. The "\." wasn't necessary, a dot isn't special in a character set, so just "." would have been fine.
What's the "\b" for? The Python re docs say,
So the intent here was to force the word "test" to be at the boundary of a word. In which case, why include dot or dash in the regex? They would already define the boundary of a word.
But reading further in the re docs:
So the "\b" didn't match word boundaries at all, it matched a backspace character. I'm guessing it never encountered a backspace character in the directory, class, and function names it was being used on.
OK, so that probably explains how the dot and dash got there: the \b wasn't doing its job, and rather than get to the bottom of it, a developer threw in the explicit characters needed.
Let's look at os.sep. That's the platform-specific pathname separator: slash on Unix-like systems, backslash on Windows. String formatting is being used to insert it into the character class, so that the pathname separator will also be OK before the word "test". (Remember: if the \b had worked, we wouldn't need this at all.)
Look more closely at the result of that string formatting though: on Windows, the %s will be replaced with a backslash. The character class will then end with ".\-]". In a character class, backslashes still serve their escaping function, so this is the same as ".-]". That is, the backslash won't be used as a literal backslash at all. On Windows, this os.sep interpolation was pointless. (Keep in mind: this problem was solved when the incorrect backslash doubling happened.)
What's our final tally? I don't know, depends how you count. The line still has bugs ("abtest" vs "bctest"), and a long and checkered past.
Think carefully about what you write. Review it well. Get others to help. If something doesn't seem to be working, figure out why. Be careful out there.
Python's context managers are the general mechnism underlying the "with" statement. They're a nice abstraction of doing a thing, and then later undoing it. The classic example is opening a file and then later closing it:
But context managers can be used for other do-then-later-undo types of behavior. Here's a context manager that changes the current directory, then later changes it back:
Context managers are objects that have __enter__ and __exit__ methods, but here we've used a very handy decorator from contextlib to make a context manager using the yield statement.
Now, suppose you have a context manager that neatly encapsulates your needed behavior, and further suppose that you are writing unit tests, and wish to get this behavior in your setUp and tearDown methods. How do you do it?
You can't use a with statement, because you need the "do" part of the context manager to happen in setUp, and then you need the "later undo" part of it to happen in tearDown. The syntax-indicated scope of the with statement won't work.
You can do it using the context manager protocol directly to perform the actions you need. And unittest has a mechanism better than tearDown: addCleanup takes a callable, and guarantees to call it when the test is done. So both the before-test logic and the after-test logic can be expressed in one place.
Here's how: write a helper function to use a context manager in a setUp function:
Now where you would have used a context manager like this:
you can do this in your setUp function:
Simple and clean.
Notice that @contextlib.contextmanager lets us write a generator, then use a decorator to turn it into a context manager. There's a lot of Python features at work here in a very small space, which is kind of cool. Then we use addCleanup to take a callable as a first-class object to get the clean up we need, which is even more cool.
One caveat about this technique: a context manager's __exit__ method can be called with information about an exception in progress. The mechanism shown here will never do that. I'm not sure it even should, considering how it's being used in a test suite. But just beware.
A seemingly simple change to fix a small bug lead me to some interesting software design choices. I'll try to explain.
In the new beta of coverage.py, I had a regression where the "run --append" option didn't work when there wasn't an existing data file. The problem was code in class CoverageScript in cmdline.py that looked like this:
If there was no .coverage data file, then this code would fail. The fix was really simple: just check if the file exists before trying to combine it:
(Of course, all of these code examples have been simplified from the actual code...)
The problem with this has to do with how the CoverageScript class is tested. It's responsible for dealing with the command-line syntax, and invoking methods on a coverage.Coverage object. To make the testing faster and more focused, test_cmdline.py uses mocking. It doesn't use an actual Coverage object, it uses a mock, and checks that the right methods are being invoked on it.
The test for this bit of code looked like this, using a mocking helper that works from a sketch of methods being invoked:
This test means that "run --append foo.py" will make a Coverage object with no arguments, then call cov.start(), then cov.run_python_file with two arguments, etc.
The problem is that the product code (cmdline.py) will actually call os.path.exists, and maybe call .combine, depending on what it finds. This mocking test can't easily take that into account. The design of cmdline.py was that it was a thin-ish wrapper over the methods on a Coverage object. This made the mocking strategy straightforward. Adding logic in cmdline.py makes the testing more complicated.
OK, second approach: change Coverage.combine() to take a missing_ok=True parameter. Now cmdline.py could tell combine() to not freak out if the file didn't exist, and we could remove the os.path.exists conditional from cmdline.py. The code would look like this:
and the test would now look like this:
Coverage.combine() is part of the public API to coverage.py. Was I really going to extend that supported API for this use case? It would mean documenting, testing, and promising to support that option "forever". There's no nice way to add an unsupported argument to a supported method.
Extending the supported API to simplify my testing seemed like the tail wagging the dog. I'm all for letting testing concerns inform a design. Often the tests are simply proxies for the users of your API, and what makes the testing easier will also make for a better, more modular design.
But this just felt like me being lazy. I didn't want combine() to have a weird option just to save the caller from having to check if the file exists. I imagined explaining this option to someone else, and I didn't want my future self to have to sheepishly admit, "yeah, it made my tests easier..."
What finally turned me back from this choice was the principle of saying "no." Sometimes the best way to keep a product simple and good is to say "no" to extraneous features. Setting aside all the testing concerns, this option on Coverage.combine() just felt extraneous.
Having said "no" to changing the public API, it's back to a conditional in cmdline.py. To make testing CoverageScript easier, I use dependency injection to give the object a function to check for files. CoverageScript already had parameters on the constructor for this purpose, for example to get the stand-in for the Coverage class itself. Now the constructor will look like:
and the test code can provide a mock for _path_exists and check its arguments:
Yes, this makes the testing more involved. But that's my business, and this doesn't change the public interface in ways I didn't like.
When I started writing this blog post, I was absolutely certain I had made the right choice. As I wrote it, I wavered a bit. Would missing_ok=True be so bad to add to the public interface? Maybe not. It's not such a stretch, and a user of the API might plausibly convince me that it's genuinely helpful to them. If that happens, I can reverse all this. That would be ok too. Decisions, decisions...
I think Coverage.py v4.0 is ready. But to be on the safe side, I'm releasing it as a beta because there have been a ton of changes since v3.7.1. Try it: coverage.py 4.0b1.
Changes since 4.0a6:
If you are interested, there is a complete list of changes: CHANGES.txt.
Also available is the latest version of the Django coverage plugin: django_coverage_plugin 0.6. This uses the new plugin support in Coverage.py 4.0 to implement coverage measurement of Django templates.
My exercise is swimming, and it's an important part of my day. I track my distance. Usually I swim a mile or so. To swim a mile in a 25-yard pool, you have to make 36 round trips.
I say this as, "36 laps." The sign at my pool says a mile is 36 laps.
I was listening to the How to Do Everything podcast, and they had a question about whether a lap is once across a pool, or there and back. I smugly thought to myself, "there and back, of course."
To answer the question, they asked Natalie Coughlin, an Olympic swimmer, who said,
What!? How does this make sense? We already have a word for one end to the other, "a length." Are we really going to use both words to mean the same thing, and then have no word for there and back?
In any other sport, a lap takes you from a starting point, out some distance, and then back to where you started. Why should swimming be different? I thought this was supposed to be an erudite sport?
Looking for a higher authority, I consulted the glossary at USA Swimming:
Thanks a lot... This definition both exposes the absurdity, by defining lap to mean precisely "a length," and then throws out there that some people use the word differently (in the useful way), so we really don't know what we're talking about.
Can we do something about this? Can't the universe make just a little more sense?