A long-awaited feature of coverage.py is now available in a rough form:
Who Tests What annotates coverage data with the
name of the test function that ran the code.
To try it out:
- Install coverage.py v5.0a3.
- Add this line literally to the [run] section of your .coveragerc file:
dynamic_context = test_function
- Run your tests.
- The .coverage file is now a SQLite database. There is no change to
reporting yet, so you will need to do your own querying of the SQLite
database to get information out. See below for a description of the
The database can be accessed in any SQLite-compatible way you like. Note
that the schema is not (yet) part of the public API. That is, it may not
be guaranteed to stay the same. This is one of the things yet to be
decided. For now though, the database has these tables:
- file: maps full file paths to file_ids: id, path
- context: maps contexts (test function names) to contexts_ids: id, context
- line: the line execution data: file_id, context_id, lineno
- arc: similar to line, but for branch coverage: file_id, context_id, fromno, tono
It’s not the most convenient, but the information is all there. If you used
branch coverage, then the important data is in the “arc” table, and “line”
is empty. If you didn’t use branch coverage, then “line” has data and
“arc” is empty. For example, using the sqlite3 command-line tool, here’s a
query to see which tests ran a particular line:
...> distinct context.context from arc, file, context
...> where arc.file_id = file.id
...> and arc.context_id = context.id
...> and file.path like '%/xmlreport.py'
...> and arc.tono = 122;
BTW, there are also “static contexts” if you are interested in keeping
coverage data from different test runs separate: see
Contexts in the docs for details.
Some things to note and think about:
- The test function name recorded includes the test class if we can
figure it out. Sometimes this isn’t possible. Would it be better to
record the filename and line number?
- Is test_function too fine-grained for some people? Maybe chunking to
the test class or even the test file would be enough?
- Better would be to have test runnner plugins that could tell us the
test identifier. Anyone want to help with that?
- What other kinds of dynamic contexts might be useful?
- What would be good ways to report on this data? How are you navigating
the data to get useful information from it?
- How is the performance?
- We could have a “coverage extract” command that would be like the
opposite of “coverage combine”: it could pull out a subset of the data
so a readable report could be made from it.
Please try this out, and let me know how it goes. Thanks.
I was the guest on the most recent episode of the Talk Python To Me podcast:
It was a really fun conversation. I liked at one point being able to say,
“I’ll answer the question, and then I will challenge the premise of the
question.” Michael does a great job running the show, including providing a
of the episode!
We talked mostly about coverage.py, but there was a small digression into
what I do for work, which is the Open edX
team at edX. EdX has thousands of courses
from hundreds of institutions. But the software we write is open source,
and is used by a thousand other sites to do online education. It’s really
gratifying to see them doing the kinds of education that edX will never
The example I gave on the show is UKK Guru,
an Indonesian site running hundreds of small vocational courses that will
mean a lot to its learners.
Sorry to repeat myself! Mostly the show is about coverage.py, it’s good. :)
When I redesigned this site last
year, I chose PT Serif as the body typeface. It is narrow but readable,
has character, but is not quirky. It wasn’t until I had finished the
design and gotten it launched that I noticed something that bothered me.
The curly quotes are designed so that the bulbs are at the same height, and
the tails go up and down from there. It’s not that noticeable on a word,
but on single characters they just look cockeyed to me:
I lived with it for a while, but I couldn’t stop seeing it, so I decided to
fix it. All I had to do was to scooch the closing quotes up a little.
This site is generated with Django, so I wrote some middleware that would
find all of the closing curly quotes in the output, and wrap them with a
REPLACEMENTS = [
('”', '<span class="cq">”</span>'),
('’', '<span class="cq">’</span>'),
def process_response(self, request, response):
for before, after in REPLACEMENTS:
response.content = response.content.replace(before, after)
Then I added a little CSS to move that span, and the quotes were fixed:
But then I had lunch with David Jonathan Ross,
an accomplished and acclaimed
type designer. When I mentioned this tweak to him, he said, “Why not just
fix the font?”
This was a stark illustration of different experts’ tool sets. I’m
comfortable screwing around with Django, HTML, and CSS. Even though I am a
huge fan of fonts and type technology, and know a lot about both, I have
never modified a font. They have always been immutable to me. But
David is a type designer. Typefaces are the clay in his hands, so his first
reaction was to just change the font.
I was intrigued. We discussed how I might go about that. No shape changes
were required. I just had to move some glyphs vertically.
First I retrieved the four TrueType font files for the four versions of PT Serif.
My HTML pages had this line to get the fonts from Google:
Retrieving that URL got me a small file of CSS @font declarations, with URLs
for the individual font files that I could download.
There were two basic approaches I could try to changing the fonts: GUI font
editors, or programmatically. There are a handful of GUI font editors
available, some free, some not. They are a bit overwhelming, because you
are suddenly confronted with a lot of fiddly Bézier curves, and it is very
easy to make things look horrible.
Just adjusting the vertical height of a glyph isn’t hard. But the app might
have other ideas. For example, I tried the Glyphs
editor, and when I opened PT Serif, it immediately offered to fix four
problems. I didn’t know there were problems! Do I want to change those
things? I just wanted to move some quotes...
The programmatic option was looking more appealing, and I wanted to
understand the possibilities there anyway. David had recommended
fonttools, a suite of
tools written in Python for manipulating fonts. This sounded like
I first tried using fonttools as a Python library that I could write code
with to change the font. But the examples and docs were not immediately
helpful in understanding the arcane internal details.
So I tried something that David had suggested to me: fonttools provides a
ttx command that converts fonts to and from an XML format. You can then
modify the XML, and convert it back into a font.
I converted PT Serif into a .ttx XML file, and opened it up. I found the
definition of the right curly double-quote glyph:
<TTGlyph name="quotedblright" xMin="57" yMin="440" xMax="401" yMax="699">
<component glyphName="quotesinglbase" x="182" y="585" flags="0x4"/>
<component glyphName="quotesinglbase" x="0" y="585" flags="0x4"/>
PUSHB[ ] /* 1 value pushed */
CALL[ ] /* CallFunction */
IF[ ] /* If */
PUSHB[ ] /* 5 values pushed */
63 6 79 6 2
DELTAP1[ ] /* DeltaExceptionP1 */
PUSHB[ ] /* 7 values pushed */
First, notice that there are instructions there that look a lot like a byte
code of some kind. I had heard that font files had stuff like this in them,
but had never dived into the details. A quick look at the instructions
told me that the things I needed were not there.
The “component” pieces are the useful parts here. Fonts have repetitive
elements. Rather than repeat those curves over and over, the font file
can assemble a glyph from other parts. Here the quotedblright glyph is
made with two copies of the quotesinglbase character, each with an
x and y offset.
Could it be that just changing those offsets, and some other y-referencing
values, will be enough? Based on my CSS tweaks, I knew that I wanted to
raise the glyph by .1em. An “em” is a unit of measurement equal to the font
size, so in a 10-point font, an em is 10 points. I knew that font designs
usually are done on a grid with 1000 units to an em, so all I had to do
was add 100 to the y values.
I changed those first three lines to:
<TTGlyph name="quotedblright" xMin="57" yMin="540" xMax="401" yMax="799">
<component glyphName="quotesinglbase" x="182" y="685" flags="0x4"/>
<component glyphName="quotesinglbase" x="0" y="685" flags="0x4"/>
After converting the .ttx to a .woff2 file, changing my CSS to refer to the
new fonts, and uploading everything, it all works! Now my “quotes” are more
I’ve only just barely dipped my toe into these technologies. I’m afraid it’s
a Pandora’s box of tempting traps. For example, the upper-case Q in PT Serif
has a detached tail that I would rather were attached. But changing that
is a whole other level of effort and skill, so probably best left as-is.
The next alpha of Coverage.py 5.0
is ready: 5.0a2.
The big change is that instead of using a JSON-like file for storing the
collected data, we now use a SQLite database. This is in preparation for
new features down the road.
In theory, everything works as it did before. I need you to tell me whether
that’s true or not. Please test this alpha release. Let me know what you
If you try it, and it works, let me know!
Email is good.
If you see a problem, do this:
- First create a reproducible scenario, something that I can recreate.
- Try running that scenario with the environment variable
COVERAGE_STORAGE=json defined, which will use the old JSON storage
format. It will be very helpful to know if the results change.
- Write up the issue on GitHub.
Please provide as many details as you can.
The biggest change in behavior is that the data file is now created earlier
than before. If you are running tests simultaneously, this might mean that
you need parallel=true where you didn’t before. Keep an eye out for that.
Some other notes about these changes:
- For now, the old JSON storage code is still in place. It can be enabled
with a COVERAGE_STORAGE=json environment variable.
- But I would rather not keep that code around forever. One of the things
I’m trying to find out with this alpha is whether there’s any reason I
will need to keep it around.
- The database schema is still in flux, and will need to change to support
future features. I’m not sure whether to make the schema part of the
public interface to coverage.py or not. I want people to be able to
experiment with the collected data, but I also want to be able to change
it in the future if I need to.
Please test the code, and let me know what you find.
My weekends often revolve around my son Nat. He’s 28, and autistic. We have
a routine. Usually we walk somewhere around Boston, and we always get a
sweet snack of some kind, usually ice cream at one of the dozen or so
J.P. Licks stores around the city.
Last weekend was no different, but got complicated in a few ways. The
weekend before, I had suggested that we drive a short distance, and then
start walking from there. Nat loves his routines, and is deeply suspicious
of changes (more on this later!), and so he didn’t want to do that. We’ve
long known that advance notice can help with changes like this, so we said
we would do it next weekend. We even wrote it on his weekly calendar,
My idea was that we could drive to Back Bay, walk down the Esplanade along
the river, to Charles St, where there is a J.P. Licks for ice cream.
That would be about two miles of walking. I had a vague plan that
we could take the T back to the car at that point, but I had not made this
The alert reader at this point will raise their eyebrows about “vague
plan,” and you would not be wrong to be concerned...
The day of the different walk rolled around, and the weather called for some
rain, but my weather app said it was a ways off, so we set out. Nat was
fine with the plan to drive to a walk, because it had been on his calendar.
I took a small umbrella to be prepared.
We drove to the Esplanade, and started walking, and a light rain began. No
big deal. I gave Nat the umbrella to use. Nat usually walks ahead of me. I
realized his shirt said “Greater Boston” on the back, and I thought of
getting a picture of him in the rain and labelling it “Wetter Boston.” I
took a few different pictures to entertain myself as we went.
Then the rain got heavier. We were about a mile in at this point. Ice cream
was still ahead of us. We huddled under a shelter and waited. But the
rain wasn’t stopping. The weather app now indicated that the rain was here
Should we push on, or turn back? I ask Nat about it, and of course, he wants
to stick with the plan which by the way includes getting ice cream, so why
even consider anything else?
Now we both need the umbrella, so I put my arm around his shoulder, and
show him how to put his arm around my waist, and we head off. We try to
avoid the deeper puddles, but some are inevitable. Nat is a fast walker
with long legs, but in this tandem configuration, he’s taking smaller steps
than I am. With my arm around him, I can feel how taut his body is. Is it
always like that when he walks? Or because of the rain? Or my arm around
We take breaks under whatever shelters we can, and I finally get this
“Wetter Boston” shot to commemorate the deluge:
We dodge the construction of the new pedestrian walkway over Storrow Drive,
take the old pedestrian walkway over Storrow Drive, and finally make it to
J.P. Licks. We get ice cream, we eat ice cream, mission accomplished?
It’s at this point that things start to go wrong...
Nat has been participating in this plan, but I think the difference in the
plan, and the rain, have been slowly bothering him. And he knows that I am
stressed out about the rain, and the impossible job of staying dry. As
soon as he is done with his ice cream, he begins laughing really loudly in
the J.P. Licks. Loud enough that everyone in the store looks at us.
This laughing is not a new thing. It is not him being really amused by some
internal secret joke. It is an explicit attempt to aggravate me, to push a
stressed situation even further. If it happens in private, sometimes we
can just ignore it, or even join in, as a defense. But a public setting is
a different matter. That laughing is very effective at pushing a stressed
dad to a possible breaking point.
Some back story: on a previous ice cream trip a year ago, Nat had started
laughing like this while in the middle of eating. I saw it for what it
was, and knew that I had to react with a consequence that would get Nat’s
attention. I picked up his half-eaten ice cream, threw it away, and walked
out of the store. He followed me out, and was very upset. He made a lot
of noise on the street, which was horrible and embarrassing, but at least I
had shown him that there are consequences. I had won that battle.
That past incident is why this time he waited until after his ice cream was
done. He’s autistic, he’s not stupid. I can’t throw away ice cream he’s
already eaten. So now I have no leverage. The best thing I can think to do
is to walk out of the store, and keep the umbrella, and maybe the rain will
It doesn’t. Now we are walking down Charles St (a busy tourist destination)
in the rain, me with an umbrella, him trailing behind laughing as loud as
he can. People look at us, I am used to that. He’s being loud enough that
a workman comes out of a store and asks if everything is all right. Nat
ignores him, I bark “Yes!” over my shoulder, and keep walking.
At this point I have to admit that I lost this laughing battle. We stop,
and I get Nat under the umbrella. I point out that it is still raining,
and a long way back to the car. I suggest that we can take Lyft back to
the car. “No!” says Nat. We could take the T. “No!” Mom could come pick us
up? “No!” I try talking about this for ten minutes while we huddle in a
doorway. Nothing will budge him from the original plan.
It could have been that if I called a Lyft, he would have gotten in and been
fine. Or he would have gotten in and been upset, but we would have gotten
there. But he really seemed prepared to dig in and throw a fit about it.
And to be fair, it was a surprise change in the plan. He wanted to walk
back, and we were already wet. All we had to do was walk two more miles in
the rain, huddled under a folding umbrella.
So we walked two more miles in the rain, huddled under a folding umbrella.
I was tired, and wet. I was frustrated and dejected about the laughing
incident. We were late for lunch, so in spite of the ice cream, I was
The afternoon weather was delightful and sunny, observed while resting
indoors. Fun fact: two weeks earlier we had had another surprise
torrential downpour that resulted in me getting completely soaked, so the
novelty was wearing off. Another fun fact: Nat’s middle name is Isaac,
which means, “He will laugh.” Choose names wisely.
Sunday we were going to get haircuts. Our hair cutter is in Brookline,
right across the street from a J.P. Licks. As you have probably guessed,
our routine is to get ice cream after the haircut. Before we went, I said
to Nat, “We can’t get ice cream after the haircut, because you laughed too
much in J.P. Licks yesterday.” My wife Susan was very worried that this
would cause a scene later, but Nat accepted the news somberly.
Nat did great at the hair cutter, no problem at all. When we left the shop,
Nat walked over to the crosswalk. I said, “Where are we going?” He said
J.P. Licks. I said, “No, we can’t go there, do you know why?” He said,
“Because you laughed.” (Nat still confuses me/you pronouns.) “Right, so we
can’t go there.” Again, Nat dealt with this calmly and seriously. I hoped
that the consequence connection would work to prevent future incidents.
At this point, hard-hearted behaviorist that I am, even I felt bad for Nat.
Despite Saturday’s struggles, Sunday had been perfect. But the laughing is
bad, and needs to be countered. I love Nat, and want to have fun with him
and make him happy, but I also need to help him keep his behavior within
I compromised: “We can get a candy bar at CVS.” And so we did, Mounds for
me, Milky Way for him, and a sunny drive home.
• • •
Update: Today is sunny and gorgeous, great for a walk. After
posting this, I turned to Nat, and said, “How about we drive, then walk,
then get ice cream, then take the T back to the car, then drive home?” He
said, “OK,” and we did. Storrow Drive, Charles Street, the exact route I
had tried last weekend.
He checked in with me a few times along the way to make sure he understood
the plan (or was it to make sure that I understood the plan?) But
there was no friction ever. Simple as that. You never quite know what
you’re going to get.
I’m starting to make some progress on Who Tests What.
The first task is to change how coverage.py records the data it collects
during execution. Currently, all of the data is held in memory, and then
written to a JSON file at the end of the process.
But Who Tests What is going to increase the amount of data. If your test
suite has N tests, you will have roughly N times as much data to store.
Keeping it all in memory will become unwieldy. Also, since the data is
more complicated, you’ll want a richer way to access the data.
To solve both these problems, I’m switching over to using SQLite to store
the data. This will give us a way to write the data as it is collected,
rather than buffering it all to write at the end. BTW, there’s a third
side-benefit to this: we would be able to measure processes without having
to control their ending.
When running with --parallel, coverage adds the process id and a random
number to the name of the data file, so that many processes can be measured
independently. With JSON storage, we didn’t need to decide on this
filename until the end of the process. With SQLite, we need it at the
beginning. This has required a surprising amount of refactoring. (You can
follow the carnage on the data-sqlite
There’s one problem I don’t know how to solve: a process can start coverage
measurement, then fork, and continue measurement in both of the child
processes, as described in issue 56.
With JSON storage, the in-memory data is naturally forked when the
processes fork, and then each copy proceeds on its way. When each process
ends, it writes its data to a file that includes the (new) process id, and
all the data is recorded.
How can I support that use case with SQLite? The file name will be chosen
before the fork, and data will be dribbled into the file as it happens.
After the fork, both child processes will be trying to write to the same
database file, which will not work (SQLite is not good at concurrent
- Even with SQLite, buffer all the data in memory. This imposes a memory
penalty on everyone just for the rare case of measuring forking processes,
and loses the extra benefit of measuring non-ending processes.
- Make buffer-it-all be an option. This adds to the complexity of the code,
and will complicate testing. I don’t want to run every test twice, with
buffering and not. Does pytest offer tools for conveniently doing this
only for a subset of tests?
- Keep JSON storage as an option. This doesn’t have an advantage over #2,
and has all the complications.
- Somehow detect that two processes are now writing to the same SQLite file,
and separate them then?
- Use a new process just to own the SQLite database, with coverage talking to
it over IPC. That sounds complicated.
- Monkeypatch os.fork so we can deal with the split? Yuck.
- Some other thing I haven’t thought of?
Expect to see an alpha of coverage.py in the next few weeks with SQLite data
storage, and please test it. I’m sure there are other use cases that might
experience some turbulence...