Coverage 5.0 beta 1

Monday 11 November 2019

I want to finish coverage.py 5.0. It has some big changes, so I need people to try it and tell me if it’s ready. Please install coverage.py 5.0 beta 1 and try it in your environment.

I especially want to hear from you if you tried the earlier alphas of 5.0. There have been some changes in the SQLite database that were needed to make measurement efficient enough for large test suites, but that hinder ad-hoc querying.

If you haven’t taken a look at coverage.py 5.0 yet, the big change is the addition of “contexts.” These can record not just that a line was executed, but something about why it was executed. Any number of contexts can be recorded for a line. They could be different operating systems, or versions of Python, or the name of the test that was running. I think it could enable some really interesting tooling.

If you are interested in recording test names as contexts, the pytest-cov pytest plugin now has a “--cov-context” option to do just that.

Contexts increase the data requirements, so data storage is now a SQLite file rather than a JSON file. The summary of what’s new in 5.0 is here: Major changes in 5.0.

Please try this. Soon 5.0 will be done, and people will begin installing it unknowingly. I would really like to minimize the turmoil when that happens.

Debugging TensorFlow coverage

Sunday 27 October 2019

It started with a coverage.py issue: Coverage not working for TensorFlow Model call function. A line in the code is executing, but coverage.py marks it as unexecuted. How could that be?

TensorFlow was completely new to me. I knew it had some unusual execution semantics, but I didn’t know what it involved. What could it be doing that would interfere with coverage measurement? Once I had instructions to reproduce the issue, I could see that it was true: a line that clearly printed output during the test run was marked as unexecuted.

The code in question was in a file called routenet_model.py. It had a line like this:

print('******** in call ******************')

It was the only such line in the code, and sure enough, the test output showed that “**** in call ****” text, so the line was definitely running.

The first step was to see who was calling the product code. It seemed like something about a caller was getting in the way, since other code in that file was marked as executed. I added this to get a stack trace at that point:

import inspect
print("\n".join("%30s : %s:%d" % (t[3],t[1],t[2]) for t in inspect.stack()[::-1]))
print('******** in call ******************')

When I re-ran the test, I saw a long stack trace that ended like this (I’ve abbreviated some of the file paths):

              ...     ...
           compute : site-packages/tensorflow/python/ops/map_fn.py:257
          <lambda> : /private/tmp/bug856/demo-routenet/tests/utils/test_utils.py:31
          __call__ : site-packages/tensorflow/python/keras/engine/base_layer.py:634
           wrapper : site-packages/tensorflow/python/autograph/impl/api.py:146
    converted_call : site-packages/tensorflow/python/autograph/impl/api.py:453
          tf__call : /var/folders/j2/gr3cj3jn63s5q8g3bjvw57hm0000gp/T/tmps9vwjn47.py:10
    converted_call : site-packages/tensorflow/python/autograph/impl/api.py:349
 _call_unconverted : site-packages/tensorflow/python/autograph/impl/api.py:258
******** in call ******************

This stack shows the function name, the file path, and the line number in a compact way. It’s a useful enough debugging helper that I have it as a vim abbreviation.

Hmm, interesting: there’s a temporary Python file (tmps9vwjn47.py) in the call stack. That’s definitely unusual. The file is gone by the time the tests are done, so to get the contents, I grab the filename from the stack trace, and copy the contents elsewhere:

import inspect
print("\n".join("%30s : %s:%d" % (t[3],t[1],t[2]) for t in inspect.stack()[::-1]))
with open("/tmp/bug856/weird.py", "w") as fout:
    with open(inspect.stack()[2].filename) as fin:
        fout.write(fin.read())
print('******** in call ******************')

I named the copied file “weird.py” because a temporary Python file is weird any time, but this is where it gets really weird: weird.py is a 528-line Python file, but it doesn’t have the function indicated in the stack trace: there’s nothing named tf__call in it. The stack trace also indicates that line 10 is running, but line 10 is a comment:

1 # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
4 # you may not use this file except in compliance with the License.
5 # You may obtain a copy of the License at
6 #
7 #     http://www.apache.org/licenses/LICENSE-2.0
8 #
9 # Unless required by applicable law or agreed to in writing, software
10 # distributed under the License is distributed on an "AS IS" BASIS,
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
14 # ==============================================================================
15 """Control flow statements: loops, conditionals, etc."""
16
17 from __future__ import absolute_import
18 from __future__ import division
19 from __future__ import print_function
20
21 ... etc ...

Something is truly weird here. To add to the confusion, I can print the entire FrameInfo from the stack trace, with:

print(repr(inspect.stack()[2]))

and it shows:

FrameInfo(
    frame=<frame at 0x7ff1f0e05080, file '/var/folders/j2/gr3cj3jn63s5q8g3bjvw57hm0000gp/T/tmp1ii_1_na.py', line 14, code tf__call>,
    filename='/var/folders/j2/gr3cj3jn63s5q8g3bjvw57hm0000gp/T/tmp1ii_1_na.py',
    lineno=14,
    function='tf__call',
    code_context=["      print(ag__.converted_call(repr, None, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (ag__.converted_call('stack', inspect, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (), None)[2],), None))\n"],
    index=0
)

The code_context attribute there shows a plausible line of code, but it doesn’t correspond to the code in the file at all. This is a twist on a long-standing gotcha with Python stack traces. When Python code is running, it has filenames and line numbers in the frames on the call stack, but it doesn’t keep the source of the code it runs. To populate a stack trace with lines of code, it reads the file on disk. The classic problem with this is that the file on disk may have changed since the code started running. So the lines of source in the stack trace might be wrong because they are newer than the actual code that is running.

But that’s not what we’re seeing here. Now the line of code in the stack trace doesn’t match the file on disk at all. It seems to correspond to what is running, but not what is on disk. The reason is that Python uses a module called linecache to read the line of source. As the name implies, linecache caches file contents, so that reading many different lines from a file won’t try to open the same file many times.

What must have happened here is that the file had one program in it, and then was read (and cached) by linecache for some reason. That first program is what is running. Then the file was re-written with a second program. Linecache checks the modification time to invalidate the cache, but if the file was rewritten quickly enough to not have a different modification time, then the stale cache would be used. This is why the stack trace has the correct line of code, even though the file on disk doesn’t.

A quick look in the __pycache__ directory in the tmp directory shows a .pyc file, and if I dump it with show_pyc.py, I can see that it has the code I’m interested in. But rather than try to read disassembled bytecode, I can get the source from the stale copy in linecache!

import inspect
print("\n".join("%30s : %s:%d" % (t[3],t[1],t[2]) for t in inspect.stack()[::-1]))
with open("/tmp/bug856/better.py", "w") as fout:
    import linecache
    fout.write("".join(linecache.getlines(inspect.stack()[2].filename)))
print('******** in call ******************')

When I run this, I get a file better.py that makes clear why coverage.py claimed the original line wasn’t executed. Here’s the start of better.py:

1 def create_converted_entity_factory():
2
3   def create_converted_entity(ag__, ag_source_map__, ag_module__):
4
5     def tf__call(self, inputs, training=None):
6       do_return = False
7       retval_ = ag__.UndefinedReturnValue()
8       import inspect, sys
9       print(ag__.converted_call('join', '\n', ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (('%30s : %s:%d' % (t[3], t[1], t[2]) for t in ag__.converted_call('getouterframes', inspect, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (ag__.converted_call('_getframe', sys, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (), None),), None)[::-1]),), None))
10       with open('/tmp/bug856/better.py', 'w') as fout:
11         import linecache
12         ag__.converted_call('write', fout, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (ag__.converted_call('join', '', ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (ag__.converted_call('getlines', linecache, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (ag__.converted_call('stack', inspect, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (), None)[2].filename,), None),), None),), None)
13       print('******** in call ******************')
14
15     ... lines omitted ...
16
17     return tf__call
18   return create_converted_entity

This is the code from our original routenet_model.py (including all the debugging code that I put in there), translated into some kind of annotated form. The reason coverage.py said the product code wasn’t run is because it wasn’t run! A copy of the code was run.

Now I realize something about inspect.stack(): the first frame it shows is your caller. If I had used a stack trace that showed the current frame first, it would have shown that my debugging code was not in the file I thought it was.

It turns out that inspect.stack() is a one-line helper using other things:

def stack(context=1):
    """Return a list of records for the stack above the caller's frame."""
    return getouterframes(sys._getframe(1), context)

Changing my stack trace one-liner to use getoutframes(sys._getframe()) is better, but is still confusing in this case because TensorFlow rewrites function calls, including sys._getframe, so the resulting stack trace ends with:

__call__ : site-packages/tensorflow/python/keras/engine/base_layer.py:634
       wrapper : site-packages/tensorflow/python/autograph/impl/api.py:146
converted_call : site-packages/tensorflow/python/autograph/impl/api.py:453
      tf__call : /var/folders/j2/gr3cj3jn63s5q8g3bjvw57hm0000gp/T/tmpcwhc1y2a.py:10
converted_call : site-packages/tensorflow/python/autograph/impl/api.py:321

Even now, I can’t quite wrap my head around why it comes out that way.

The next step is to decide what to do about this. The converted code has a parameter called ag_source_map__, which is a map from converted code back to source code. This could be used to get the coverage right, perhaps in a plugin, but I need to hear from TensorFlow people to see what would be the best approach. I’ve written a TensorFlow issue to start the conversation.

Pytest-cov support for who-tests-what

Tuesday 8 October 2019

I’ve added a new option to the pytest-cov coverage plugin for pytest: --cov-context=test will set the dynamic context based on pytest test phases. Each test has a setup, run, and teardown phase. This gives you the best test information in the coverage database:

  • The full test id is used in the context. You have the test file name, and the test class name if you are using class-based tests.
  • Parameterized tests start a new context for each new set of parameter values.
  • Execution is a little faster because coverage.py doesn’t have to poll for test starts.

For example, here is a repo of simple pytest tests in a number of forms: pytest-gallery. I can run the tests with test contexts being recorded:

$ pytest -v --cov=. --cov-context=test
======================== test session starts =========================
platform darwin -- Python 3.6.9, pytest-5.2.1, py-1.8.0, pluggy-0.12.0 -- /usr/local/virtualenvs/pytest-cov/bin/python3.6
cachedir: .pytest_cache
rootdir: /Users/ned/lab/pytest-gallery
plugins: cov-2.8.1
collected 25 items

test_fixtures.py::test_fixture PASSED                          [  4%]
test_fixtures.py::test_two_fixtures PASSED                     [  8%]
test_fixtures.py::test_with_expensive_data PASSED              [ 12%]
test_fixtures.py::test_with_expensive_data2 PASSED             [ 16%]
test_fixtures.py::test_parametrized_fixture[1] PASSED          [ 20%]
test_fixtures.py::test_parametrized_fixture[2] PASSED          [ 24%]
test_fixtures.py::test_parametrized_fixture[3] PASSED          [ 28%]
test_function.py::test_function1 PASSED                        [ 32%]
test_function.py::test_function2 PASSED                        [ 36%]
test_parametrize.py::test_parametrized[1-101] PASSED           [ 40%]
test_parametrize.py::test_parametrized[2-202] PASSED           [ 44%]
test_parametrize.py::test_parametrized_with_id[one] PASSED     [ 48%]
test_parametrize.py::test_parametrized_with_id[two] PASSED     [ 52%]
test_parametrize.py::test_parametrized_twice[3-1] PASSED       [ 56%]
test_parametrize.py::test_parametrized_twice[3-2] PASSED       [ 60%]
test_parametrize.py::test_parametrized_twice[4-1] PASSED       [ 64%]
test_parametrize.py::test_parametrized_twice[4-2] PASSED       [ 68%]
test_skip.py::test_always_run PASSED                           [ 72%]
test_skip.py::test_never_run SKIPPED                           [ 76%]
test_skip.py::test_always_skip SKIPPED                         [ 80%]
test_skip.py::test_always_skip_string SKIPPED                  [ 84%]
test_unittest.py::MyTestCase::test_method1 PASSED              [ 88%]
test_unittest.py::MyTestCase::test_method2 PASSED              [ 92%]
tests.json::hello PASSED                                       [ 96%]
tests.json::goodbye PASSED                                     [100%]

---------- coverage: platform darwin, python 3.6.9-final-0 -----------
Name                  Stmts   Miss  Cover
-----------------------------------------
conftest.py              16      0   100%
test_fixtures.py         19      0   100%
test_function.py          4      0   100%
test_parametrize.py       8      0   100%
test_skip.py             12      3    75%
test_unittest.py         17      0   100%
-----------------------------------------
TOTAL                    76      3    96%


=================== 22 passed, 3 skipped in 0.18s ====================

Then I can see the contexts that were collected:

$ sqlite3 -csv .coverage "select context from context"
context
""
test_fixtures.py::test_fixture|setup
test_fixtures.py::test_fixture|run
test_fixtures.py::test_two_fixtures|setup
test_fixtures.py::test_two_fixtures|run
test_fixtures.py::test_with_expensive_data|setup
test_fixtures.py::test_with_expensive_data|run
test_fixtures.py::test_with_expensive_data2|run
test_fixtures.py::test_parametrized_fixture[1]|setup
test_fixtures.py::test_parametrized_fixture[1]|run
test_fixtures.py::test_parametrized_fixture[2]|setup
test_fixtures.py::test_parametrized_fixture[2]|run
test_fixtures.py::test_parametrized_fixture[3]|setup
test_fixtures.py::test_parametrized_fixture[3]|run
test_function.py::test_function1|run
test_function.py::test_function2|run
test_parametrize.py::test_parametrized[1-101]|run
test_parametrize.py::test_parametrized[2-202]|run
test_parametrize.py::test_parametrized_with_id[one]|run
test_parametrize.py::test_parametrized_with_id[two]|run
test_parametrize.py::test_parametrized_twice[3-1]|run
test_parametrize.py::test_parametrized_twice[3-2]|run
test_parametrize.py::test_parametrized_twice[4-1]|run
test_parametrize.py::test_parametrized_twice[4-2]|run
test_skip.py::test_always_run|run
test_skip.py::test_always_skip|teardown
test_unittest.py::MyTestCase::test_method1|setup
test_unittest.py::MyTestCase::test_method1|run
test_unittest.py::MyTestCase::test_method2|run
test_unittest.py::MyTestCase::test_method2|teardown
tests.json::hello|run
tests.json::goodbye|run

Version 2.8.0 of pytest-cov (and later) has the new feature. Give it a try. BTW, I also snuck another alpha of coverage.py (5.0a8) in at the same time, to get a needed API right.

Still missing from all this is a really useful way to report on the data. Get in touch if you have needs or ideas.

Sponsor me on GitHub?

Monday 7 October 2019

tl;dr: You can sponsor me on GitHub, but I’m not sure why you would.

In May, GitHub launched GitHub Sponsors, a feature on their site for people to support each other financially. It’s still in beta, but now I’m in the program, so you can sponsor me if you want.

I’m very interested in the question of how the creators of open source software can benefit more from what they create, considering how much value others get from it.

To be honest, I’m not sure GitHub Sponsors is going to make a big difference. It’s another form of what I’ve called an internet tip jar: it focuses on one person giving another person money. Don’t get me wrong: I’m all for enabling interpersonal connections of all sorts. But I don’t think that will scale to improve the situation meaningfully.

I think a significant shift will only come with a change in how businesses give back to open source, since they are the major beneficiaries. See my post about Tidelift and “Corporations and open source, why and how” for more about this.

I’m participating in GitHub Sponsors because I want to try every possible avenue. Since it’s on GitHub, it will get more attention than most tip jars, so maybe it will work out differently. Participating is a good way for me to understand it.

GitHub lets me define tiers of sponsorship, with different incentives, similar to Kickstarter. I don’t know what will motivate people, and I don’t have existing incentives at my fingertips to offer, so I’ve just created three generic tiers ($3, $10, $30 per month). If GitHub Sponsors appeals to you, let me know what I could do with a tier that might attract other people.

The question mark in the title is not because I’m making a request of you. It’s because I’m uncertain whether and why people will become sponsors through GitHub Sponsors. We’ll see what happens.

Coverage.py 5.0a7, and the future of pytest-cov

Tuesday 24 September 2019

Progress continues in the Python coverage world. Two recent things: first, the latest alpha of Coverage.py 5.0 is available: 5.0a7. Second, pytest-cov is supporing coverage.py 5.0, and we’re talking about the future of pytest-cov.

There are two big changes in Coverage.py 5.0a7. First, there is a new reporting command: coverage json produces a JSON file with information similar to the XML report. In coverage.py 4.x, the data storage was a lightly cloaked JSON file. That file was not in a supported format, and in fact, it is gone in 5.0. This command produces a supported JSON format for people who want programmatic access to details of the coverage data. A huge thanks to Matt Bachmann for implementing it.

The second big change is to the SQL schema in the 5.x data file, which is a SQLite database. Previously, each line measured produced a row in the “line” table. But this proved too bulky for large projects. Now line numbers are stored in a compact binary form. There is just one row in the “line_bits” table for each file and context measured. This makes it more difficult to use the data with ad-hoc queries. Coverage provides functions for working with the line number bitmaps, but I’m interested in other ideas about how to make the data more usable.

The pytest-cov changes are to support coverage.py 5.0. Those changes are already on the master branch.

I’m also working on a pull request to add a --cov-contexts=test option so that pytest can announce when tests change, for accurate and detailed dynamic contexts.

Longer-term, I’d like to shrink the size of the pytest-cov plugin. Pytest should be about running tests, not reporting on coverage after the tests are run. Too much of the code, and too many of the bug reports, are due to it trying to take on more than it needs to. The command-line arguments are getting convoluted, for no good reason. I’ve written an issue to get feedback: Proposal: pytest-cov should do less. If you have opinions one way or the other, that would be a good place to talk about them.

Variable fonts

Tuesday 17 September 2019

We’re all used to fonts coming in different weights (normal, bold), or sometimes different widths (normal, condensed, extended). Geometrically, there’s no reason that these variations need to be discrete. It’s a limitation of technology that we’ve been given a few specific weights or widths to choose from.

Over the years there have been a few attempts to make those variation dimensions continuous rather than discrete. Knuth’s Metafont was one, Adobe’s Multiple Master fonts were another. The latest is OpenType’s variable fonts.

In a variable font, the type designer not only decides on the shapes of the glyphs, but on the axes of variability. Weight and width are two obvious ones, but the choice is arbitrary.

One of the great things about variable fonts is that browsers have good support for them. You can use a variable font on a web page, and set the values of the variability dimensions using CSS.

Browser support also means you can play with the variability without any special tools. Nick Sherman’s v-fonts.com is a gallery and playground of variable fonts. Each is displayed with sliders for its axes. You can drag the sliders and see the font change in real time in your browser.

Many of the fonts are gimmicky, either to show off the technology, or because exotic display faces are where variability can be used most broadly. But here are a few of my choices that demonstrate variability to its best advantage:

Antonia Variable includes an optical size axis. Optical size refers to the adjustments that have to be made to shapes to compensate for the size of the font. At tiny sizes, letters have to be wider, and features sturdier in order for type to remain legible, but also seem like the same family. It’s kind of like how babies have the same features as adults, but smaller and plumper.

Sample of Antonia Variable

Bradley DJR Variable, is another good example of an optical size axis.

Sample of Bradley DJR Variable

UT Morph is an ultra-geometric display face with two stark axes, positive and negative. This shows how variability can be used to control completely new aspects of a design.

Sample of UT Morph

Recursive has some really interesting axes that use variability in eye-opening ways without being cartoonish: proportion (how monospaced is it), expression (how swoopy is it), and italic (changes a few letter shapes).

Sample of Recursive

Variable fonts are still a new technology, but we’ll see them being used more and more. Don’t expect to see fonts stretching and squashing before your eyes though. Site designers will use variability to make some choices, and you won’t even realize variability was involved. Like all good typography, it won’t draw attention to itself.

Older: