Pytest-cov support for who-tests-what

Tuesday 8 October 2019

I’ve added a new option to the pytest-cov coverage plugin for pytest: --cov-context=test will set the dynamic context based on pytest test phases. Each test has a setup, run, and teardown phase. This gives you the best test information in the coverage database:

  • The full test id is used in the context. You have the test file name, and the test class name if you are using class-based tests.
  • Parameterized tests start a new context for each new set of parameter values.
  • Execution is a little faster because coverage.py doesn’t have to poll for test starts.

For example, here is a repo of simple pytest tests in a number of forms: pytest-gallery. I can run the tests with test contexts being recorded:

$ pytest -v --cov=. --cov-context=test
======================== test session starts =========================
platform darwin -- Python 3.6.9, pytest-5.2.1, py-1.8.0, pluggy-0.12.0 -- /usr/local/virtualenvs/pytest-cov/bin/python3.6
cachedir: .pytest_cache
rootdir: /Users/ned/lab/pytest-gallery
plugins: cov-2.8.1
collected 25 items

test_fixtures.py::test_fixture PASSED                          [  4%]
test_fixtures.py::test_two_fixtures PASSED                     [  8%]
test_fixtures.py::test_with_expensive_data PASSED              [ 12%]
test_fixtures.py::test_with_expensive_data2 PASSED             [ 16%]
test_fixtures.py::test_parametrized_fixture[1] PASSED          [ 20%]
test_fixtures.py::test_parametrized_fixture[2] PASSED          [ 24%]
test_fixtures.py::test_parametrized_fixture[3] PASSED          [ 28%]
test_function.py::test_function1 PASSED                        [ 32%]
test_function.py::test_function2 PASSED                        [ 36%]
test_parametrize.py::test_parametrized[1-101] PASSED           [ 40%]
test_parametrize.py::test_parametrized[2-202] PASSED           [ 44%]
test_parametrize.py::test_parametrized_with_id[one] PASSED     [ 48%]
test_parametrize.py::test_parametrized_with_id[two] PASSED     [ 52%]
test_parametrize.py::test_parametrized_twice[3-1] PASSED       [ 56%]
test_parametrize.py::test_parametrized_twice[3-2] PASSED       [ 60%]
test_parametrize.py::test_parametrized_twice[4-1] PASSED       [ 64%]
test_parametrize.py::test_parametrized_twice[4-2] PASSED       [ 68%]
test_skip.py::test_always_run PASSED                           [ 72%]
test_skip.py::test_never_run SKIPPED                           [ 76%]
test_skip.py::test_always_skip SKIPPED                         [ 80%]
test_skip.py::test_always_skip_string SKIPPED                  [ 84%]
test_unittest.py::MyTestCase::test_method1 PASSED              [ 88%]
test_unittest.py::MyTestCase::test_method2 PASSED              [ 92%]
tests.json::hello PASSED                                       [ 96%]
tests.json::goodbye PASSED                                     [100%]

---------- coverage: platform darwin, python 3.6.9-final-0 -----------
Name                  Stmts   Miss  Cover
-----------------------------------------
conftest.py              16      0   100%
test_fixtures.py         19      0   100%
test_function.py          4      0   100%
test_parametrize.py       8      0   100%
test_skip.py             12      3    75%
test_unittest.py         17      0   100%
-----------------------------------------
TOTAL                    76      3    96%


=================== 22 passed, 3 skipped in 0.18s ====================

Then I can see the contexts that were collected:

$ sqlite3 -csv .coverage "select context from context"
context
""
test_fixtures.py::test_fixture|setup
test_fixtures.py::test_fixture|run
test_fixtures.py::test_two_fixtures|setup
test_fixtures.py::test_two_fixtures|run
test_fixtures.py::test_with_expensive_data|setup
test_fixtures.py::test_with_expensive_data|run
test_fixtures.py::test_with_expensive_data2|run
test_fixtures.py::test_parametrized_fixture[1]|setup
test_fixtures.py::test_parametrized_fixture[1]|run
test_fixtures.py::test_parametrized_fixture[2]|setup
test_fixtures.py::test_parametrized_fixture[2]|run
test_fixtures.py::test_parametrized_fixture[3]|setup
test_fixtures.py::test_parametrized_fixture[3]|run
test_function.py::test_function1|run
test_function.py::test_function2|run
test_parametrize.py::test_parametrized[1-101]|run
test_parametrize.py::test_parametrized[2-202]|run
test_parametrize.py::test_parametrized_with_id[one]|run
test_parametrize.py::test_parametrized_with_id[two]|run
test_parametrize.py::test_parametrized_twice[3-1]|run
test_parametrize.py::test_parametrized_twice[3-2]|run
test_parametrize.py::test_parametrized_twice[4-1]|run
test_parametrize.py::test_parametrized_twice[4-2]|run
test_skip.py::test_always_run|run
test_skip.py::test_always_skip|teardown
test_unittest.py::MyTestCase::test_method1|setup
test_unittest.py::MyTestCase::test_method1|run
test_unittest.py::MyTestCase::test_method2|run
test_unittest.py::MyTestCase::test_method2|teardown
tests.json::hello|run
tests.json::goodbye|run

Version 2.8.0 of pytest-cov (and later) has the new feature. Give it a try. BTW, I also snuck another alpha of coverage.py (5.0a8) in at the same time, to get a needed API right.

Still missing from all this is a really useful way to report on the data. Get in touch if you have needs or ideas.

Sponsor me on GitHub?

Monday 7 October 2019

tl;dr: You can sponsor me on GitHub, but I’m not sure why you would.

In May, GitHub launched GitHub Sponsors, a feature on their site for people to support each other financially. It’s still in beta, but now I’m in the program, so you can sponsor me if you want.

I’m very interested in the question of how the creators of open source software can benefit more from what they create, considering how much value others get from it.

To be honest, I’m not sure GitHub Sponsors is going to make a big difference. It’s another form of what I’ve called an internet tip jar: it focuses on one person giving another person money. Don’t get me wrong: I’m all for enabling interpersonal connections of all sorts. But I don’t think that will scale to improve the situation meaningfully.

I think a significant shift will only come with a change in how businesses give back to open source, since they are the major beneficiaries. See my post about Tidelift and “Corporations and open source, why and how” for more about this.

I’m participating in GitHub Sponsors because I want to try every possible avenue. Since it’s on GitHub, it will get more attention than most tip jars, so maybe it will work out differently. Participating is a good way for me to understand it.

GitHub lets me define tiers of sponsorship, with different incentives, similar to Kickstarter. I don’t know what will motivate people, and I don’t have existing incentives at my fingertips to offer, so I’ve just created three generic tiers ($3, $10, $30 per month). If GitHub Sponsors appeals to you, let me know what I could do with a tier that might attract other people.

The question mark in the title is not because I’m making a request of you. It’s because I’m uncertain whether and why people will become sponsors through GitHub Sponsors. We’ll see what happens.

Coverage.py 5.0a7, and the future of pytest-cov

Tuesday 24 September 2019

Progress continues in the Python coverage world. Two recent things: first, the latest alpha of Coverage.py 5.0 is available: 5.0a7. Second, pytest-cov is supporing coverage.py 5.0, and we’re talking about the future of pytest-cov.

There are two big changes in Coverage.py 5.0a7. First, there is a new reporting command: coverage json produces a JSON file with information similar to the XML report. In coverage.py 4.x, the data storage was a lightly cloaked JSON file. That file was not in a supported format, and in fact, it is gone in 5.0. This command produces a supported JSON format for people who want programmatic access to details of the coverage data. A huge thanks to Matt Bachmann for implementing it.

The second big change is to the SQL schema in the 5.x data file, which is a SQLite database. Previously, each line measured produced a row in the “line” table. But this proved too bulky for large projects. Now line numbers are stored in a compact binary form. There is just one row in the “line_bits” table for each file and context measured. This makes it more difficult to use the data with ad-hoc queries. Coverage provides functions for working with the line number bitmaps, but I’m interested in other ideas about how to make the data more usable.

The pytest-cov changes are to support coverage.py 5.0. Those changes are already on the master branch.

I’m also working on a pull request to add a --cov-contexts=test option so that pytest can announce when tests change, for accurate and detailed dynamic contexts.

Longer-term, I’d like to shrink the size of the pytest-cov plugin. Pytest should be about running tests, not reporting on coverage after the tests are run. Too much of the code, and too many of the bug reports, are due to it trying to take on more than it needs to. The command-line arguments are getting convoluted, for no good reason. I’ve written an issue to get feedback: Proposal: pytest-cov should do less. If you have opinions one way or the other, that would be a good place to talk about them.

Variable fonts

Tuesday 17 September 2019

We’re all used to fonts coming in different weights (normal, bold), or sometimes different widths (normal, condensed, extended). Geometrically, there’s no reason that these variations need to be discrete. It’s a limitation of technology that we’ve been given a few specific weights or widths to choose from.

Over the years there have been a few attempts to make those variation dimensions continuous rather than discrete. Knuth’s Metafont was one, Adobe’s Multiple Master fonts were another. The latest is OpenType’s variable fonts.

In a variable font, the type designer not only decides on the shapes of the glyphs, but on the axes of variability. Weight and width are two obvious ones, but the choice is arbitrary.

One of the great things about variable fonts is that browsers have good support for them. You can use a variable font on a web page, and set the values of the variability dimensions using CSS.

Browser support also means you can play with the variability without any special tools. Nick Sherman’s v-fonts.com is a gallery and playground of variable fonts. Each is displayed with sliders for its axes. You can drag the sliders and see the font change in real time in your browser.

Many of the fonts are gimmicky, either to show off the technology, or because exotic display faces are where variability can be used most broadly. But here are a few of my choices that demonstrate variability to its best advantage:

Antonia Variable includes an optical size axis. Optical size refers to the adjustments that have to be made to shapes to compensate for the size of the font. At tiny sizes, letters have to be wider, and features sturdier in order for type to remain legible, but also seem like the same family. It’s kind of like how babies have the same features as adults, but smaller and plumper.

Sample of Antonia Variable

Bradley DJR Variable, is another good example of an optical size axis.

Sample of Bradley DJR Variable

UT Morph is an ultra-geometric display face with two stark axes, positive and negative. This shows how variability can be used to control completely new aspects of a design.

Sample of UT Morph

Recursive has some really interesting axes that use variability in eye-opening ways without being cartoonish: proportion (how monospaced is it), expression (how swoopy is it), and italic (changes a few letter shapes).

Sample of Recursive

Variable fonts are still a new technology, but we’ll see them being used more and more. Don’t expect to see fonts stretching and squashing before your eyes though. Site designers will use variability to make some choices, and you won’t even realize variability was involved. Like all good typography, it won’t draw attention to itself.

Don’t omit tests from coverage

Thursday 29 August 2019

There’s a common idea out there that I want to refute. It’s this: when measuring coverage, you should omit your tests from measurement. Searching GitHub shows that lots of people do this.

This is a bad idea. Your tests are real code, and the whole point of coverage is to give you information about your code. Why wouldn’t you want that information about your tests?

You might say, “but all my tests run all their code, so it’s useless information.” Consider this scenario: you have three tests written, and you need a fourth, similar to the third. You copy/paste the third test, tweak the details, and now you have four tests. Except oops, you forgot to change the name of the test.

Tests are weird: you have to name them, but the names don’t matter. Nothing calls the name directly. It’s really easy to end up with two same-named tests. Which means you only have one test, because the new one overwrites the old. Coverage would alert you to the problem.

Also, if your test suite is large, you likely have helper code in there as well as straight-up tests. Are you sure you need all that helper code? If you run coverage on the tests (and the helpers), you’d know about some weird clause in there that is never used. That’s odd, why is that? It’s probably useful to know. Maybe it’s a case you no longer need to consider. Maybe your tests aren’t exercising everything you thought.

The only argument against running coverage on tests is that it “artificially” inflates the results. True, it’s much easier to get 100% coverage on a test file than a product file. But so what? Your coverage goal was chosen arbitrarily anyway. Instead of aiming for 90% coverage, you should include your tests and aim for 95% coverage. 90% doesn’t have a magical meaning.

What’s the downside of including tests in coverage? “People will write more tests as a way to get the easy coverage.” Sounds good to me. If your developers are trying to game the stats, they’ll find a way, and you have bigger problems.

True, it makes the reports larger, but if your tests are 100% covered, you can exclude those files from the report with [report] skip_covered setting.

Your tests are important. You’ve put significant work into them. You want to know everything you can about them. Coverage can help. Don’t omit tests from coverage.

Why your mock doesn’t work

Friday 2 August 2019

Mocking is a powerful technique for isolating tests from undesired interactions among components. But often people find their mock isn’t taking effect, and it’s not clear why. Hopefully this explanation will clear things up.

BTW: it’s really easy to over-use mocking. These are good explanations of alternative approaches:

A quick aside about assignment

Before we get to fancy stuff like mocks, I want to review a little bit about Python assignment. You may already know this, but bear with me. Everything that follows is going to be directly related to this simple example.

Variables in Python are names that refer to values. If we assign a second name, the names don’t refer to each other, they both refer to the same value. If one of the names is then assigned again, the other name isn’t affected:

x23x = 23xy23y = xxy1223x = 12

If this is unfamiliar to you, or you just want to look at more pictures like this, Python Names and Values goes into much more depth about the semantics of Python assignment.

Importing

Let’s say we have a simple module like this:

# mod.py

val = "original"

def update_val():
    global val
    val = "updated"

We want to use val from this module, and also call update_val to change val. There are two ways we could try to do it. At first glance, it seems like they would do the same thing.

The first version imports the names we want, and uses them:

# code1.py

from mod import val, update_val

print(val)
update_val()
print(val)

The second version imports the module, and uses the names as attributes on the module object:

# code2.py

import mod

print(mod.val)
mod.update_val()
print(mod.val)

This seems like a subtle distinction, almost a stylistic choice. But code1.py prints “original original”: the value hasn’t changed! Code2.py does what we expected: it prints “original updated.” Why the difference?

Let’s look at code1.py more closely:

# code1.py

from mod import val, update_val

print(val)
update_val()
print(val)

After “from mod import val”, when we first print val, we have this:

mod.pyval‘original’code1.pyval

“from mod import val” means, import mod, and then do the assignment “val = mod.val”. This makes our name val refer to the same object as mod’s name val.

After “update_val()”, when we print val again, our world looks like this:

mod.pyval‘original’‘updated’code1.pyval

update_val has reassigned mod’s val, but that has no effect on our val. This is the same behavior as our x and y example, but with imports instead of more obvious assignments. In code1.py, “from mod import val” is an assignment from mod.val to val, and works exactly like “y = x” does. Later assignments to mod.val don’t affect our val, just as later assignments to x don’t affect y.

Now let’s look at code2.py again:

# code2.py

import mod

print(mod.val)
mod.update_val()
print(mod.val)

The “import mod” statement means, make my name mod refer to the entire mod module. Accessing mod.val will reach into the mod module, find its val name, and use its value.

mod.pyval‘original’code2.pymod

Then after “update_val()”, mod’s name val has been changed:

mod.pyval‘original’‘updated’code2.pymod

Now we print mod.val again, and see its updated value, just as we expected.

OK, but what about mocks?

Mocking is a fancy kind of assignment: replace an object (or function) with a different one. We’ll use the mock.patch function in a with statement. It makes a mock object, assigns it to the name given, and then restores the original value at the end of the with statement.

Let’s consider this (very roughly sketched) product code and test:

# product.py

from os import listdir

def my_function():
    files = listdir(some_directory)
    # ... use the file names ...
# test.py

def test_it():
    with mock.patch("os.listdir") as listdir:
        listdir.return_value = ['a.txt', 'b.txt', 'c.txt']
        my_function()

After we’ve imported product.py, both the os module and product.py have a name “listdir” which refers to the built-in listdir() function. The references look like this:

os modulelistdirlistdir()product.pylistdir

The mock.patch in our test is really just a fancy assignment to the name “os.listdir”. During the test, the references look like this:

os modulelistdirlistdir()mock!product.pylistdir

You can see why the mock doesn’t work: we’re mocking something, but it’s not the thing our product code is going to call. This situation is exactly analogous to our code1.py example from earlier.

You might be thinking, “ok, so let’s do that code2.py thing to make it work!” If we do, it will work. Your product code and test will now look like this (the test code is unchanged):

# product.py

import os

def my_function():
    files = os.listdir(some_directory)
    # ... use the file names ...
# test.py

def test_it():
    with mock.patch("os.listdir") as listdir:
        listdir.return_value = ['a.txt', 'b.txt', 'c.txt']
        my_function()

When the test is run, the references look like this:

os modulelistdirlistdir()mock!product.pyos

Because the product code refers to the os module, changing the name in the module is enough to affect the product code.

But there’s still a problem: this will mock that function for any module using it. This might be a more widespread effect than you intended. Perhaps your product code also calls some helpers, which also need to list files. The helpers might end up using your mock (depending how they imported os.listdir!), which isn’t what you wanted.

Mock it where it’s used

The best approach to mocking is to mock the object where it is used, not where it is defined. Your product and test code will look like this:

# product.py

from os import listdir

def my_function():
    files = listdir(some_directory)
    # ... use the file names ...
# test.py

def test_it():
    with mock.patch("product.listdir") as listdir:
        listdir.return_value = False
        my_function()

The only difference here from our first try is that we mock “product.listdir”, not “os.listdir”. That seems odd, because listdir isn’t defined in product.py. That’s fine, the name “listdir” is in both the os module and in product.py, and they are both references to the thing you want to mock. Neither is a more real name than the other.

By mocking where the object is used, we have tighter control over what callers are affected. Since we only want product.py’s behavior to change, we mock the name in product.py. This also makes the test more clearly tied to product.py.

As before, our references look like this once product.py has been fully imported:

os modulelistdirlistdir()product.pylistdir

The difference now is how the mock changes things. During the test, our references look like this:

os modulelistdirlistdir()product.pylistdirmock!

The code in product.py will use the mock, and no other code will. Just what we wanted!

Is this OK?

At this point, you might be concerned: it seems like mocking is kind of delicate. Notice that even with our last example, how we create the mock depends on something as arbitrary as how we imported the function. If our code had “import os” at the top, we wouldn’t have been able to create our mock properly. This is something that could be changed in a refactoring, but at least mock.patch will fail in that case.

You are right to be concerned: mocking is delicate. It depends on implementation details of the product code to construct the test. There are many reasons to be wary of mocks, and there are other approaches to solving the problems of isolating your product code from problematic dependencies.

If you do use mocks, at least now you know how to make them work, but again, there are other approaches. See the links at the top of this page.

Older: