Sunday 22 January 2017 — This is almost eight years old. Be careful.

It was the best of times, it was the worst of times...

This week saw the release of three different versions of Coverage.py. This is not what I intended. Clearly something was getting tangled up. It had to do with some tricky exception handling. The story is kind of long and intricate, but has a number of chewy nuggets that fascinate me. Your mileage may vary.

Writing it all out, many of these missteps seem obvious and stupid. If you take nothing else from this, know that everyone makes mistakes, and we are all still trying to figure out the best way to solve some problems.

It started because I wanted to get the test suite running well on Jython. Jython is hard to support in Coverage.py: it can do “coverage run”, but because it doesn’t have the same internals as CPython, it can’t do “coverage report” or any of the other reporting code. Internally, there’s one place in the common reporting code where we detect this, and raise an exception. Before all the changes I’m about to describe, that code looked like this:

for attr in ['co_lnotab', 'co_firstlineno']:
    if not hasattr(self.code, attr):
        raise CoverageException(
            "This implementation of Python doesn't support code analysis.\n"
            "Run coverage.py under CPython for this command."
        )

The CoverageException class is derived from Exception. Inside of Coverage.py, all exceptions raised are derived from CoverageException. This is a good practice for any library. For the coverage command-line tool, it means we can catch CoverageException at the top of main() so that we can print the message without an ugly traceback from the internals of Coverage.py.

The problem with running the test suite under Jython is that this “can’t support code analysis” exception was being raised from hundreds of tests. I wanted to get to zero failures or errors, either by making the tests pass (where the operations were supported on Jython) or skipping the tests (where the operations were unsupported).

There are lots of tests in the Coverage.py test suite that are skipped for all sorts of reasons. But I didn’t want to add decorators or conditionals to hundreds of tests for the Jython case. First, it would be a lot of noise in the tests. Second, it’s not always immediately clear from a test that it is going to touch the analysis code. Lastly and most importantly, if someday in the future I figured out how to do analysis on Jython, or if it grew the features to make the current code work, I didn’t want to have to then remove all that test-skipping noise.

So I wanted to somehow automatically skip tests when this particular exception was raised. The unittest module already has a way to do this: tests are skipped by raising a unittest.SkipTest exception. If the exception raised for “can’t support code analysis” derived from SkipTest, then the tests would be skipped automatically. Genius idea!

So in 4.3.2, the code changed to this (spread across a few files):

from coverage.backunittest import unittest

class StopEverything(unittest.SkipTest):
    """An exception that means everything should stop.

    This derives from SkipTest so that tests that spring this trap will be
    skipped automatically, without a lot of boilerplate all over the place.

    """
    pass

class IncapablePython(CoverageException, StopEverything):
    """An operation is attempted that this version of Python cannot do."""
    pass

...

# Alternative Python implementations don't always provide all the
# attributes on code objects that we need to do the analysis.
for attr in ['co_lnotab', 'co_firstlineno']:
    if not hasattr(self.code, attr):
        raise IncapablePython(
            "This implementation of Python doesn't support code analysis.\n"
            "Run coverage.py under another Python for this command."
        )

It felt a little off to derive a product exception (StopEverything) from a testing exception (SkipTest), but that seemed acceptable. One place in the code, I had to deal specifically with StopEverything. In an inner loop of reporting, we catch exceptions that might happen on individual files being reported. But if this exception happens once, it will happen for all the files, so we wanted to end the report, not show this failure for every file. In pseudo-code, the loop looked like this:

for f in files_to_report:
    try:
        generate_report_for_file(f)
    except StopEverything:
        # Don't report this on single files, it's a systemic problem.
        raise
    except Exception as ex:
        record_exception_for_file(f, ex)

This all seemed to work well: the tests skipped properly, without a ton of noise all over the place. There were no test failures in any supported environment. Ship it!

Uh-oh: very quickly, reports came in that coverage didn’t work on Python 2.6 any more. In retrospect, it was obvious: the whole point of the “from coverage.backunittest” line in the code above was because Python 2.6 doesn’t have unittest.SkipTest. For the Coverage.py tests on 2.6, I install unittest2 to get a backport of things 2.6 is missing, and that gave me SkipTest, but without my test requirements, it doesn’t exist.

So my tests passed on 2.6 because I installed a package that provided what was missing, but in the real world, unittest.SkipTest is truly missing.

This is a conundrum that I don’t have a good answer to:

How can you test your code to be sure it works properly when the testing requirements aren’t installed?

To fix the problem, I changed the definition of StopEverything. Coverage.py 4.3.3 went out the door with this:

class StopEverything(unittest.SkipTest if env.TESTING else object):
    """An exception that means everything should stop."""
    pass

The env.TESTING setting was a pre-existing variable: it’s true if we are running the coverage.py test suite. This also made me uncomfortable: as soon as you start conditionalizing on whether you are running tests or not, you have a very slippery slope. In this case it seemed OK, but it wasn’t: it hid the fact that deriving an exception from object is a dumb thing to do.

So 4.3.3 failed also, and not just on Python 2.6. As soon as an exception was raised inside that reporting loop that I showed above, Python noticed that I was trying to catch a class that doesn’t derive from Exception. Of course, my test suite didn’t catch this, because when I was running my tests, my exception derived from SkipTest.

Changing “object” to “Exception” would fix the problem, but I didn’t like the test of env.TESTING anyway. So for 4.3.4, the code is:

class StopEverything(getattr(unittest, 'SkipTest', Exception)):
    """An exception that means everything should stop."""
    pass

This is better, first because it uses Exception rather than object. But also, it’s duck-typing the base class rather than depending on env.TESTING.

But as I kept working on getting rid of test failures on Jython, I got to this test failure (pseudo-code):

def test_sort_report_by_invalid_option(self):
    msg = "Invalid sorting option: 'Xyzzy'"
    with self.assertRaisesRegex(CoverageException, msg):
        coverage.report(sort='Xyzzy')

This is a reporting operation, so Jython will fail with a StopEverything exception saying, “This implementation of Python doesn’t support code analysis.” StopEverything is a CoverageException, so the assertRaisesRegex will catch it, but it will fail because the messages don’t match.

StopEverything is both a CoverageException and a SkipTest, but the SkipTest is the more important aspect. To fix the problem, I did this, but felt silly:

def test_sort_report_by_invalid_option(self):
    msg = "Invalid sorting option: 'Xyzzy'"
    with self.assertRaisesRegex(CoverageException, msg):
        try:
            coverage.report(sort='Xyzzy')
        except SkipTest:
            raise SkipTest()

I knew this couldn’t be the right solution. Talking it over with some co-workers (OK, I was griping and whining), we came up with the better solution. I realized that CoverageException is used in the code base to mean, “an ordinary problem from inside Coverage.py.” StopEverything is not an ordinary problem. It reminded me of typical mature exception hierarchies, where the main base class, like Exception, isn’t actually the root of the hierarchy. There are always a few special-case classes that derive from a real root higher up.

For example, in Python, the classes Exception, SystemExit, and KeyboardInterrupt all derive from BaseException. This is so “except Exception” won’t interfere with SystemExit and KeyboardInterrupt, two exceptions meant to forcefully end the program.

I needed the same thing here, for the same reason. I want to have a way to catch “all” exceptions without interfering with the exceptions that mean “end now!” I adjusted my exception hierarchy, and now the code looks like this:

class BaseCoverageException(Exception):
    """The base of all Coverage exceptions."""
    pass

class CoverageException(BaseCoverageException):
    """A run-of-the-mill exception specific to coverage.py."""
    pass

class StopEverything(
        BaseCoverageException,
        getattr(unittest, 'SkipTest', Exception)
    ):
    """An exception that means everything should stop."""
    pass

Now I could remove the weird SkipTest dance in that test. The catch clause in my main() function changes from CoverageException to BaseCoverageException, and things work great. The end...?

One of the reasons I write this stuff down is because I’m hoping to get feedback that will improve my solution, or advance my understanding. As I lay out this story, I can imagine points of divergence: places in this narrative where a reader might object and say, “you should blah blah blah.” For example:

“You shouldn’t bother supporting 2.6.” Perhaps not, but that doesn’t change the issues explored here, just makes them less likely.
“You shouldn’t bother supporting Jython.” Ditto.
“You should just have dependencies for the things you need, like unittest2.” Coverage.py has a long-standing tradition of having no dependencies. This is driven by a desire to be available to people porting to new platforms, without having to wait for the dependencies to be ported.
“You should have more realistic integration testing.” I agree. I’m looking for ideas about how to test the scenario of having no test dependencies installed.

That’s my whole tale. Ideas are welcome.

Update: the story continues, but fair warning: metaclasses ahead!

Comments

Ionel Cristian Mărieș 12:11 PM on 23 Jan 2017

It looks to me like the typical case of „test-induced design damage”. Code should not be burdened with test runner workarounds.

I bet that one day someone will complain about coverage making their test runner skip tests.

I think a better way is to handle this in coverage's test suite. Possible solution: wrap all your tests in a decorator that reraises with a SkipException.

Ned Batchelder 12:40 PM on 23 Jan 2017

@Ionel, that's an interesting point. Do you have an example of wrapping the tests that doesn't involve adding a decorator to every test or test class?

Ionel Cristian Mărieș 1:47 PM on 23 Jan 2017

How about this:

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
    result = yield
    if result.excinfo and result.excinfo[0] is CrappoException:
        pytest.skip(str(result.excinfo[1]))

(Ronny approves :D)

Julian Berman 11:08 AM on 24 Jan 2017

For integration testing, I'm increasingly using auxiliary toxenvs which do not install test requirements and which perform a simple smoke test of one known configuration. Then you just run them alongside your toxenvs which run your test suite. We do this for e.g. permutations of our setuptools extras.

Sample:

[toxenv:cli]
deps =
commands =
pip install {toxinidir}
{envbindir}/coverage run someexample
{envbindir}/coverage html

You can get more elaborate obviously with your smoke testing, but yeah, in a basic sense, tox has worked OK for that use case.

It sounds like that in this case even the simple smoke test there would have possibly found your issue?

Leon Matthews 10:16 PM on 26 Jan 2017

Wonderful write-up Ned. Thank you for taking the time to share it with us all. I can certainly relate - we ignore our gut feelings (life-data trained heuristics?) at our peril...

Hartmut Niemann 4:37 PM on 16 Feb 2017

Big thank you!

This python coverage tool is simply great. I had a test suite for some c-python 2.7 code and i was able to produce (beautifully colored) reports in about an hour or so. THANK YOU ALL!

For Jython, I needed a little longer, but a few minutes ago I got the first meaningful coverage list.

On Windows, file path case seems to make a difference, I got zero coverage when I ran 030_src\mytool.py instead of 030_Src\mytool.py

A tale of two exceptions

Comments

Add a comment: