Who tests what is here: Coverage.py 5.0a3

Sunday 7 October 2018

A long-awaited feature of coverage.py is now available in a rough form: Who Tests What annotates coverage data with the name of the test function that ran the code.

To try it out:

  • Install coverage.py v5.0a3.
  • Add this line literally to the [run] section of your .coveragerc file:
  • [run]
    dynamic_context = test_function
  • Run your tests.
  • The .coverage file is now a SQLite database. There is no change to reporting yet, so you will need to do your own querying of the SQLite database to get information out. See below for a description of the database schema.

The database can be accessed in any SQLite-compatible way you like. Note that the schema is not (yet) part of the public API. That is, it may not be guaranteed to stay the same. This is one of the things yet to be decided. For now though, the database has these tables:

  • file: maps full file paths to file_ids: id, path
  • context: maps contexts (test function names) to contexts_ids: id, context
  • line: the line execution data: file_id, context_id, lineno
  • arc: similar to line, but for branch coverage: file_id, context_id, fromno, tono

It’s not the most convenient, but the information is all there. If you used branch coverage, then the important data is in the “arc” table, and “line” is empty. If you didn’t use branch coverage, then “line” has data and “arc” is empty. For example, using the sqlite3 command-line tool, here’s a query to see which tests ran a particular line:

sqlite> select
   ...> distinct context.context from arc, file, context
   ...> where arc.file_id = file.id
   ...> and arc.context_id = context.id
   ...> and file.path like '%/xmlreport.py'
   ...> and arc.tono = 122;
context
------------------------------------------------------------
XmlPackageStructureTest.test_package_names
OmitIncludeTestsMixin.test_omit
OmitIncludeTestsMixin.test_include
OmitIncludeTestsMixin.test_omit_2
XmlReportTest.test_filename_format_showing_everything
XmlReportTest.test_no_source
OmitIncludeTestsMixin.test_include_as_string
OmitIncludeTestsMixin.test_omit_and_include
XmlReportTest.test_empty_file_is_100_not_0
OmitIncludeTestsMixin.test_omit_as_string
XmlReportTest.test_nonascii_directory
OmitIncludeTestsMixin.test_nothing_specified
XmlReportTest.test_curdir_source
XmlReportTest.test_deep_source
XmlPackageStructureTest.test_package_depth
XmlPackageStructureTest.test_source_prefix
XmlGoldTest.test_a_xml_2
XmlGoldTest.test_a_xml_1
XmlReportTest.test_filename_format_including_module
XmlReportTest.test_reporting_on_nothing
XmlReportTest.test_filename_format_including_filename
ReportingReturnValueTest.test_xml
OmitIncludeTestsMixin.test_include_2

BTW, there are also “static contexts” if you are interested in keeping coverage data from different test runs separate: see Measurement Contexts in the docs for details.

Some things to note and think about:

  • The test function name recorded includes the test class if we can figure it out. Sometimes this isn’t possible. Would it be better to record the filename and line number?
  • Is test_function too fine-grained for some people? Maybe chunking to the test class or even the test file would be enough?
  • Better would be to have test runnner plugins that could tell us the test identifier. Anyone want to help with that?
  • What other kinds of dynamic contexts might be useful?
  • What would be good ways to report on this data? How are you navigating the data to get useful information from it?
  • How is the performance?
  • We could have a “coverage extract” command that would be like the opposite of “coverage combine”: it could pull out a subset of the data so a readable report could be made from it.

Please try this out, and let me know how it goes. Thanks.

Me on Talk Python To Me

Tuesday 25 September 2018

I was the guest on the most recent episode of the Talk Python To Me podcast: #178: Coverage.py. It was a really fun conversation. I liked at one point being able to say, “I’ll answer the question, and then I will challenge the premise of the question.” Michael does a great job running the show, including providing a full transcript of the episode!

We talked mostly about coverage.py, but there was a small digression into what I do for work, which is the Open edX team at edX. EdX has thousands of courses from hundreds of institutions. But the software we write is open source, and is used by a thousand other sites to do online education. It’s really gratifying to see them doing the kinds of education that edX will never do.

The example I gave on the show is UKK Guru, an Indonesian site running hundreds of small vocational courses that will mean a lot to its learners.

Sorry to repeat myself! Mostly the show is about coverage.py, it’s good. :)

Fixing PT Serif

Saturday 15 September 2018

When I redesigned this site last year, I chose PT Serif as the body typeface. It is narrow but readable, has character, but is not quirky. It wasn’t until I had finished the design and gotten it launched that I noticed something that bothered me. The curly quotes are designed so that the bulbs are at the same height, and the tails go up and down from there. It’s not that noticeable on a word, but on single characters they just look cockeyed to me:

Close-up of PT Serif curly quotes on a word and a character

I lived with it for a while, but I couldn’t stop seeing it, so I decided to fix it. All I had to do was to scooch the closing quotes up a little. This site is generated with Django, so I wrote some middleware that would find all of the closing curly quotes in the output, and wrap them with a span:

REPLACEMENTS = [
    ('&#8221;', '<span class="cq">&#8221;</span>'),
    ('&#8217;', '<span class="cq">&#8217;</span>'),
]

class TweakOutputMiddleware:
    def process_response(self, request, response):
        for before, after in REPLACEMENTS:
            response.content = response.content.replace(before, after)
        return response

Then I added a little CSS to move that span, and the quotes were fixed:

.cq {
    position: relative;
    top: -.1em;
}

But then I had lunch with David Jonathan Ross, an accomplished and acclaimed type designer. When I mentioned this tweak to him, he said, “Why not just fix the font?”

This was a stark illustration of different experts’ tool sets. I’m comfortable screwing around with Django, HTML, and CSS. Even though I am a huge fan of fonts and type technology, and know a lot about both, I have never modified a font. They have always been immutable to me. But David is a type designer. Typefaces are the clay in his hands, so his first reaction was to just change the font.

I was intrigued. We discussed how I might go about that. No shape changes were required. I just had to move some glyphs vertically.

First I retrieved the four TrueType font files for the four versions of PT Serif. My HTML pages had this line to get the fonts from Google:

https://fonts.googleapis.com/css?family=PT+Serif:400,400i,700,700i|Source+Sans+Pro:400,400i,700,700i

Retrieving that URL got me a small file of CSS @font declarations, with URLs for the individual font files that I could download.

There were two basic approaches I could try to changing the fonts: GUI font editors, or programmatically. There are a handful of GUI font editors available, some free, some not. They are a bit overwhelming, because you are suddenly confronted with a lot of fiddly Bézier curves, and it is very easy to make things look horrible.

Just adjusting the vertical height of a glyph isn’t hard. But the app might have other ideas. For example, I tried the Glyphs editor, and when I opened PT Serif, it immediately offered to fix four problems. I didn’t know there were problems! Do I want to change those things? I just wanted to move some quotes...

The programmatic option was looking more appealing, and I wanted to understand the possibilities there anyway. David had recommended fonttools, a suite of tools written in Python for manipulating fonts. This sounded like home!

I first tried using fonttools as a Python library that I could write code with to change the font. But the examples and docs were not immediately helpful in understanding the arcane internal details.

So I tried something that David had suggested to me: fonttools provides a ttx command that converts fonts to and from an XML format. You can then modify the XML, and convert it back into a font.

I converted PT Serif into a .ttx XML file, and opened it up. I found the definition of the right curly double-quote glyph:

<TTGlyph name="quotedblright" xMin="57" yMin="440" xMax="401" yMax="699">
  <component glyphName="quotesinglbase" x="182" y="585" flags="0x4"/>
  <component glyphName="quotesinglbase" x="0" y="585" flags="0x4"/>
  <instructions>
    <assembly>
      PUSHB[ ] /* 1 value pushed */
      10
      CALL[ ] /* CallFunction */
      IF[ ] /* If */
        PUSHB[ ] /* 5 values pushed */
        63 6 79 6 2
        DELTAP1[ ] /* DeltaExceptionP1 */
        PUSHB[ ] /* 7 values pushed */

First, notice that there are instructions there that look a lot like a byte code of some kind. I had heard that font files had stuff like this in them, but had never dived into the details. A quick look at the instructions told me that the things I needed were not there.

The “component” pieces are the useful parts here. Fonts have repetitive elements. Rather than repeat those curves over and over, the font file can assemble a glyph from other parts. Here the quotedblright glyph is made with two copies of the quotesinglbase character, each with an x and y offset.

Could it be that just changing those offsets, and some other y-referencing values, will be enough? Based on my CSS tweaks, I knew that I wanted to raise the glyph by .1em. An “em” is a unit of measurement equal to the font size, so in a 10-point font, an em is 10 points. I knew that font designs usually are done on a grid with 1000 units to an em, so all I had to do was add 100 to the y values.

I changed those first three lines to:

<TTGlyph name="quotedblright" xMin="57" yMin="540" xMax="401" yMax="799">
  <component glyphName="quotesinglbase" x="182" y="685" flags="0x4"/>
  <component glyphName="quotesinglbase" x="0" y="685" flags="0x4"/>

After converting the .ttx to a .woff2 file, changing my CSS to refer to the new fonts, and uploading everything, it all works! Now my “quotes” are more balanced: “q”.

I’ve only just barely dipped my toe into these technologies. I’m afraid it’s a Pandora’s box of tempting traps. For example, the upper-case Q in PT Serif has a detached tail that I would rather were attached. But changing that is a whole other level of effort and skill, so probably best left as-is.

Coverage.py 5.0a2: SQLite storage

Monday 3 September 2018

The next alpha of Coverage.py 5.0 is ready: 5.0a2. The big change is that instead of using a JSON-like file for storing the collected data, we now use a SQLite database. This is in preparation for new features down the road.

In theory, everything works as it did before. I need you to tell me whether that’s true or not. Please test this alpha release. Let me know what you find.

If you try it, and it works, let me know! Email is good.

If you see a problem, do this:

  • First create a reproducible scenario, something that I can recreate.
  • Try running that scenario with the environment variable COVERAGE_STORAGE=json defined, which will use the old JSON storage format. It will be very helpful to know if the results change.
  • Write up the issue on GitHub. Please provide as many details as you can.

The biggest change in behavior is that the data file is now created earlier than before. If you are running tests simultaneously, this might mean that you need parallel=true where you didn’t before. Keep an eye out for that.

Some other notes about these changes:

  • For now, the old JSON storage code is still in place. It can be enabled with a COVERAGE_STORAGE=json environment variable.
  • But I would rather not keep that code around forever. One of the things I’m trying to find out with this alpha is whether there’s any reason I will need to keep it around.
  • The database schema is still in flux, and will need to change to support future features. I’m not sure whether to make the schema part of the public interface to coverage.py or not. I want people to be able to experiment with the collected data, but I also want to be able to change it in the future if I need to.

Please test the code, and let me know what you find.

A complicated weekend

Sunday 19 August 2018

My weekends often revolve around my son Nat. He’s 28, and autistic. We have a routine. Usually we walk somewhere around Boston, and we always get a sweet snack of some kind, usually ice cream at one of the dozen or so J.P. Licks stores around the city.

Last weekend was no different, but got complicated in a few ways. The weekend before, I had suggested that we drive a short distance, and then start walking from there. Nat loves his routines, and is deeply suspicious of changes (more on this later!), and so he didn’t want to do that. We’ve long known that advance notice can help with changes like this, so we said we would do it next weekend. We even wrote it on his weekly calendar, “Different walk.”

My idea was that we could drive to Back Bay, walk down the Esplanade along the river, to Charles St, where there is a J.P. Licks for ice cream. That would be about two miles of walking. I had a vague plan that we could take the T back to the car at that point, but I had not made this explicit.

The alert reader at this point will raise their eyebrows about “vague plan,” and you would not be wrong to be concerned...

The day of the different walk rolled around, and the weather called for some rain, but my weather app said it was a ways off, so we set out. Nat was fine with the plan to drive to a walk, because it had been on his calendar. I took a small umbrella to be prepared.

We drove to the Esplanade, and started walking, and a light rain began. No big deal. I gave Nat the umbrella to use. Nat usually walks ahead of me. I realized his shirt said “Greater Boston” on the back, and I thought of getting a picture of him in the rain and labelling it “Wetter Boston.” I took a few different pictures to entertain myself as we went.

Nat, walking in the rainNat, walking in the rainNat, walking in the rainNat, walking in the rain

Then the rain got heavier. We were about a mile in at this point. Ice cream was still ahead of us. We huddled under a shelter and waited. But the rain wasn’t stopping. The weather app now indicated that the rain was here to stay.

Should we push on, or turn back? I ask Nat about it, and of course, he wants to stick with the plan which by the way includes getting ice cream, so why even consider anything else?

Now we both need the umbrella, so I put my arm around his shoulder, and show him how to put his arm around my waist, and we head off. We try to avoid the deeper puddles, but some are inevitable. Nat is a fast walker with long legs, but in this tandem configuration, he’s taking smaller steps than I am. With my arm around him, I can feel how taut his body is. Is it always like that when he walks? Or because of the rain? Or my arm around his shoulder?

We take breaks under whatever shelters we can, and I finally get this “Wetter Boston” shot to commemorate the deluge:

Nat, in the rain, overlooking the Longfellow Bridge

We dodge the construction of the new pedestrian walkway over Storrow Drive, take the old pedestrian walkway over Storrow Drive, and finally make it to J.P. Licks. We get ice cream, we eat ice cream, mission accomplished?

It’s at this point that things start to go wrong...

Nat has been participating in this plan, but I think the difference in the plan, and the rain, have been slowly bothering him. And he knows that I am stressed out about the rain, and the impossible job of staying dry. As soon as he is done with his ice cream, he begins laughing really loudly in the J.P. Licks. Loud enough that everyone in the store looks at us.

This laughing is not a new thing. It is not him being really amused by some internal secret joke. It is an explicit attempt to aggravate me, to push a stressed situation even further. If it happens in private, sometimes we can just ignore it, or even join in, as a defense. But a public setting is a different matter. That laughing is very effective at pushing a stressed dad to a possible breaking point.

Some back story: on a previous ice cream trip a year ago, Nat had started laughing like this while in the middle of eating. I saw it for what it was, and knew that I had to react with a consequence that would get Nat’s attention. I picked up his half-eaten ice cream, threw it away, and walked out of the store. He followed me out, and was very upset. He made a lot of noise on the street, which was horrible and embarrassing, but at least I had shown him that there are consequences. I had won that battle.

That past incident is why this time he waited until after his ice cream was done. He’s autistic, he’s not stupid. I can’t throw away ice cream he’s already eaten. So now I have no leverage. The best thing I can think to do is to walk out of the store, and keep the umbrella, and maybe the rain will bother him.

It doesn’t. Now we are walking down Charles St (a busy tourist destination) in the rain, me with an umbrella, him trailing behind laughing as loud as he can. People look at us, I am used to that. He’s being loud enough that a workman comes out of a store and asks if everything is all right. Nat ignores him, I bark “Yes!” over my shoulder, and keep walking.

At this point I have to admit that I lost this laughing battle. We stop, and I get Nat under the umbrella. I point out that it is still raining, and a long way back to the car. I suggest that we can take Lyft back to the car. “No!” says Nat. We could take the T. “No!” Mom could come pick us up? “No!” I try talking about this for ten minutes while we huddle in a doorway. Nothing will budge him from the original plan.

It could have been that if I called a Lyft, he would have gotten in and been fine. Or he would have gotten in and been upset, but we would have gotten there. But he really seemed prepared to dig in and throw a fit about it. And to be fair, it was a surprise change in the plan. He wanted to walk back, and we were already wet. All we had to do was walk two more miles in the rain, huddled under a folding umbrella.

So we walked two more miles in the rain, huddled under a folding umbrella. I was tired, and wet. I was frustrated and dejected about the laughing incident. We were late for lunch, so in spite of the ice cream, I was hungry.

The afternoon weather was delightful and sunny, observed while resting indoors. Fun fact: two weeks earlier we had had another surprise torrential downpour that resulted in me getting completely soaked, so the novelty was wearing off. Another fun fact: Nat’s middle name is Isaac, which means, “He will laugh.” Choose names wisely.

Sunday we were going to get haircuts. Our hair cutter is in Brookline, right across the street from a J.P. Licks. As you have probably guessed, our routine is to get ice cream after the haircut. Before we went, I said to Nat, “We can’t get ice cream after the haircut, because you laughed too much in J.P. Licks yesterday.” My wife Susan was very worried that this would cause a scene later, but Nat accepted the news somberly.

Nat did great at the hair cutter, no problem at all. When we left the shop, Nat walked over to the crosswalk. I said, “Where are we going?” He said J.P. Licks. I said, “No, we can’t go there, do you know why?” He said, “Because you laughed.” (Nat still confuses me/you pronouns.) “Right, so we can’t go there.” Again, Nat dealt with this calmly and seriously. I hoped that the consequence connection would work to prevent future incidents.

At this point, hard-hearted behaviorist that I am, even I felt bad for Nat. Despite Saturday’s struggles, Sunday had been perfect. But the laughing is bad, and needs to be countered. I love Nat, and want to have fun with him and make him happy, but I also need to help him keep his behavior within acceptable bounds.

I compromised: “We can get a candy bar at CVS.” And so we did, Mounds for me, Milky Way for him, and a sunny drive home.

•    •    •

Update: Today is sunny and gorgeous, great for a walk. After posting this, I turned to Nat, and said, “How about we drive, then walk, then get ice cream, then take the T back to the car, then drive home?” He said, “OK,” and we did. Storrow Drive, Charles Street, the exact route I had tried last weekend.

He checked in with me a few times along the way to make sure he understood the plan (or was it to make sure that I understood the plan?) But there was no friction ever. Simple as that. You never quite know what you’re going to get.

SQLite data storage for coverage.py

Tuesday 14 August 2018

I’m starting to make some progress on Who Tests What. The first task is to change how coverage.py records the data it collects during execution. Currently, all of the data is held in memory, and then written to a JSON file at the end of the process.

But Who Tests What is going to increase the amount of data. If your test suite has N tests, you will have roughly N times as much data to store. Keeping it all in memory will become unwieldy. Also, since the data is more complicated, you’ll want a richer way to access the data.

To solve both these problems, I’m switching over to using SQLite to store the data. This will give us a way to write the data as it is collected, rather than buffering it all to write at the end. BTW, there’s a third side-benefit to this: we would be able to measure processes without having to control their ending.

When running with --parallel, coverage adds the process id and a random number to the name of the data file, so that many processes can be measured independently. With JSON storage, we didn’t need to decide on this filename until the end of the process. With SQLite, we need it at the beginning. This has required a surprising amount of refactoring. (You can follow the carnage on the data-sqlite branch.)

There’s one problem I don’t know how to solve: a process can start coverage measurement, then fork, and continue measurement in both of the child processes, as described in issue 56. With JSON storage, the in-memory data is naturally forked when the processes fork, and then each copy proceeds on its way. When each process ends, it writes its data to a file that includes the (new) process id, and all the data is recorded.

How can I support that use case with SQLite? The file name will be chosen before the fork, and data will be dribbled into the file as it happens. After the fork, both child processes will be trying to write to the same database file, which will not work (SQLite is not good at concurrent access).

Possible solutions:

  1. Even with SQLite, buffer all the data in memory. This imposes a memory penalty on everyone just for the rare case of measuring forking processes, and loses the extra benefit of measuring non-ending processes.
  2. Make buffer-it-all be an option. This adds to the complexity of the code, and will complicate testing. I don’t want to run every test twice, with buffering and not. Does pytest offer tools for conveniently doing this only for a subset of tests?
  3. Keep JSON storage as an option. This doesn’t have an advantage over #2, and has all the complications.
  4. Somehow detect that two processes are now writing to the same SQLite file, and separate them then?
  5. Use a new process just to own the SQLite database, with coverage talking to it over IPC. That sounds complicated.
  6. Monkeypatch os.fork so we can deal with the split? Yuck.
  7. Some other thing I haven’t thought of?

Expect to see an alpha of coverage.py in the next few weeks with SQLite data storage, and please test it. I’m sure there are other use cases that might experience some turbulence...

Older: