Quick hack CSV review tool

Tuesday 4 December 2018

Let’s say you are running a conference, and let’s say your Call for Proposals is open, and is saving people’s talk ideas into a spreadsheet.

I am in this situation. Reviewing those proposals is a pain, because there are large paragraphs of text, and spreadsheets are a terrible way to read them. I did the typical software engineer thing: I spent an hour writing a tool to make reading them easier.

The result is csv_review.py. It’s a terminal program that reads a CSV file (the exported proposals). It displays a row at a time on the screen, wrapping text as needed. It has commands for moving around the rows. It collects comments into a second CSV file. That’s it.

There are probably already better ways to do this. Everyone knows that to get answers from the internet, you don’t ask questions, instead you present wrong answers. More people will correct you than will help you. So this tool is my wrong answer to how to review CFP proposals. Correct me!

Coverage.py 5.0a4: the sys.path to hell

Sunday 25 November 2018

Another alpha of Coverage.py 5.0 is available: 5.0a4. This fixes a few problems with the new SQLite-based storage. Please give it a try, especially to experiment with dynamic contexts.

The challenge with this release was something that started as a seemingly simple fix. Coverage.py tries to emulate how Python runs programs, including how the first element of sys.path is set. A few people run coverage with sys.path fully configured, and coverage’s setting of sys.path[0] was breaking their stuff.

The proposal was simple: delete the one line of code that set sys.path[0]. I tried that, and it seemed to work. Fixed!

Not so fast: the Windows builds failed. This started a multi-week adventure of debugging and refactoring. The Windows builds were failing not because of Windows itself, but because on Windows, I don’t use pytest-xdist, which parallelizes tests into worker processes. With xdist, the tests were all passing. Without xdist, a few sys.path-related tests were failing.

It turns out that xdist manipulates sys.path itself, which was masking the fact that I had removed an important step from coverage.py. First thing to do was to adjust my test code so that even with xdist, my tests didn’t get xdist’s path changes.

Then I had to re-think how to adjust sys.path. That required refactoring how I ran the user’s Python code, so that I could apply the path changes a little earlier than I used to. That made me look at how I was testing that layer of code with mocks, and I changed it from explicit dependency injection to implicit mock patching.

A few more little fixes were needed here and there along the way. All told, the “one line” fix ended up being 14 files changed, 587 insertions, 427 deletions.

Careful with negative assertions

Sunday 4 November 2018

A cautionary tale about testing that things are unequal...

We had a test that was kind of like this:

def test_thing():
    data = "I am the data"

But someone refactored the test oh-so-slightly, like this:

def test_thing():
    data = "I am the data"
    modified = modify_another_way(change_the_thing(data)),

Now the test isn’t testing what it should be testing, and will pass even if change_the_thing and modify_another_way both return their argument unchanged. (I’ll explain why below.)

Negative tests (asserting that two things are unequal) is really tricky, because there are infinite things unequal to your value. Your assertion could pass because you accidentally have a different one of those unequal values than you thought.

Better would be to know what unequal value you are producing, and test that you have produced that value, with an equality assertion. Then if something unexpectedly shifts out from under you, you will find out.

Why the test was broken: the refactorer left the trailing comma on the “modified =” line, so “modified” is a 1-element tuple. The comparison is now between a tuple and a string, which are always unequal, even if the first element of the tuple is the same as the string.

Why warnings is mysterious

Saturday 27 October 2018

I recently went through a process I’ve done many times before: I tried to configure the Python warnings module, and was mystified. I set the PYTHONWARNINGS environment variable, and it doesn’t do what it seems like it should. I read the docs again, I think I understand what they are telling me, but the code isn’t doing what it seems like it should be doing. Frustrating.

I had some time today to dig into it, and now I understand better. The docs are misleading and/or incomplete. The module is not designed for maximum utility. Let me explain.

Here’s what the docs tell you: PYTHONWARNINGS (or the -W command-line option) is a comma-separated list of filters. Each filter is a 5-tuple of fields separated by colons. The third and fourth fields are of interest here, they are category and module.

Let’s start with module: it’s the module that caused the warning. The docs say it is a regular expression. This is false! Internally, this string is used as part of a regex match operation, but first it is escaped, so if you include an asterisk in your setting, you will be trying to match module names that have a literal asterisk in them, which is impossible.

OK, so the module string is a literal string, not a regex, but the escaped string is being used as part of re.match, so it should be possible to suppress warnings from an entire package (like backports.*) just by specifying “backports”, right? Nope! After being escaped, a $ is added to the end, so your literal string must be an exact match on the entire module name. Sigh.

Just to add to the confusion, the docs have long included this example, which isn’t even a sensible regex, never mind that regexes aren’t usable here:

error:::mymodule[.*]    # Convert warnings to errors in "mymodule"
                        # and any subpackages of "mymodule"

These concerns about the regex behavior are the topic of bpo 34624, BTW.

On to category: this is actually the class of warning exception being used, so you can say (for example) DeprecationWarning here. In my case, I wanted to suppress the deprecation warnings that pytest raises. Pytest helpfully uses a base class, pytest.PytestDeprecationWarning, so I used that as the category. But this causes an error message at startup:

Invalid -W option ignored: invalid module name: ‘pytest’

Huh? pytest is an importable module! Nope: these category names are imported early in the startup sequence, before your real sys.path is built, so you cannot name third-party modules here...!

There are probably other things about warnings that confuse people. These are the ones I uncovered today after a long rage-fueled debugging session. An important developer skill is an irrational belief that things can be understood, and be made to make sense. Another is knowing when to give up and just accept the confusion. This morning I fully embraced the first.

In my case, I was trying to suppress warnings reported while running tests with pytest. Pytest has its own setting for warnings filters, and it uses its own copy of the warnings.py code for reading them, so that the regexes are not escaped! This is very useful, but could also add to the mystery, since the pytest docs don’t mention the difference.

And since pytest interprets its settings after sys.path has been configured, I can use third-party warning categories there. So this works perfectly:

filterwarnings =

It’s very satisfying to have some mysteries solved.

Who tests what is here: Coverage.py 5.0a3

Sunday 7 October 2018

A long-awaited feature of coverage.py is now available in a rough form: Who Tests What annotates coverage data with the name of the test function that ran the code.

To try it out:

  • Install coverage.py v5.0a3.
  • Add this line literally to the [run] section of your .coveragerc file:
  • [run]
    dynamic_context = test_function
  • Run your tests.
  • The .coverage file is now a SQLite database. There is no change to reporting yet, so you will need to do your own querying of the SQLite database to get information out. See below for a description of the database schema.

The database can be accessed in any SQLite-compatible way you like. Note that the schema is not (yet) part of the public API. That is, it may not be guaranteed to stay the same. This is one of the things yet to be decided. For now though, the database has these tables:

  • file: maps full file paths to file_ids: id, path
  • context: maps contexts (test function names) to contexts_ids: id, context
  • line: the line execution data: file_id, context_id, lineno
  • arc: similar to line, but for branch coverage: file_id, context_id, fromno, tono

It’s not the most convenient, but the information is all there. If you used branch coverage, then the important data is in the “arc” table, and “line” is empty. If you didn’t use branch coverage, then “line” has data and “arc” is empty. For example, using the sqlite3 command-line tool, here’s a query to see which tests ran a particular line:

sqlite> select
   ...> distinct context.context from arc, file, context
   ...> where arc.file_id = file.id
   ...> and arc.context_id = context.id
   ...> and file.path like '%/xmlreport.py'
   ...> and arc.tono = 122;

BTW, there are also “static contexts” if you are interested in keeping coverage data from different test runs separate: see Measurement Contexts in the docs for details.

Some things to note and think about:

  • The test function name recorded includes the test class if we can figure it out. Sometimes this isn’t possible. Would it be better to record the filename and line number?
  • Is test_function too fine-grained for some people? Maybe chunking to the test class or even the test file would be enough?
  • Better would be to have test runner plugins that could tell us the test identifier. Anyone want to help with that?
  • What other kinds of dynamic contexts might be useful?
  • What would be good ways to report on this data? How are you navigating the data to get useful information from it?
  • How is the performance?
  • We could have a “coverage extract” command that would be like the opposite of “coverage combine”: it could pull out a subset of the data so a readable report could be made from it.

Please try this out, and let me know how it goes. Thanks.

Me on Talk Python To Me

Tuesday 25 September 2018

I was the guest on the most recent episode of the Talk Python To Me podcast: #178: Coverage.py. It was a really fun conversation. I liked at one point being able to say, “I’ll answer the question, and then I will challenge the premise of the question.” Michael does a great job running the show, including providing a full transcript of the episode!

We talked mostly about coverage.py, but there was a small digression into what I do for work, which is the Open edX team at edX. EdX has thousands of courses from hundreds of institutions. But the software we write is open source, and is used by a thousand other sites to do online education. It’s really gratifying to see them doing the kinds of education that edX will never do.

The example I gave on the show is UKK Guru, an Indonesian site running hundreds of small vocational courses that will mean a lot to its learners.

Sorry to repeat myself! Mostly the show is about coverage.py, it’s good. :)