Small choices, big decisions: coverage run --append

Wednesday 5 August 2015This is nine years old. Be careful.

A seemingly simple change to fix a small bug lead me to some interesting software design choices. I’ll try to explain.

In the new beta of coverage.py, I had a regression where the “run --append” option didn’t work when there wasn’t an existing data file. The problem was code in class CoverageScript in cmdline.py that looked like this:

if options.append:
    self.coverage.combine(".coverage")
self.coverage.save()

If there was no .coverage data file, then this code would fail. The fix was really simple: just check if the file exists before trying to combine it:

if options.append:
    if os.path.exists(".coverage"):
        self.coverage.combine(".coverage")
self.coverage.save()

(Of course, all of these code examples have been simplified from the actual code...)

The problem with this has to do with how the CoverageScript class is tested. It’s responsible for dealing with the command-line syntax, and invoking methods on a coverage.Coverage object. To make the testing faster and more focused, test_cmdline.py uses mocking. It doesn’t use an actual Coverage object, it uses a mock, and checks that the right methods are being invoked on it.

The test for this bit of code looked like this, using a mocking helper that works from a sketch of methods being invoked:

self.cmd_executes("run --append foo.py", """\
    .Coverage()
    .start()
    .run_python_file('foo.py', ['foo.py'])
    .stop()
    .combine('.coverage')
    .save()
    """, path_exists=True)

This test means that “run --append foo.py” will make a Coverage object with no arguments, then call cov.start(), then cov.run_python_file with two arguments, etc.

The problem is that the product code (cmdline.py) will actually call os.path.exists, and maybe call .combine, depending on what it finds. This mocking test can’t easily take that into account. The design of cmdline.py was that it was a thin-ish wrapper over the methods on a Coverage object. This made the mocking strategy straightforward. Adding logic in cmdline.py makes the testing more complicated.

OK, second approach: change Coverage.combine() to take a missing_ok=True parameter. Now cmdline.py could tell combine() to not freak out if the file didn’t exist, and we could remove the os.path.exists conditional from cmdline.py. The code would look like this:

if options.append:
    self.coverage.combine(".coverage", missing_ok=True)
self.coverage.save()

and the test would now look like this:

self.cmd_executes("run --append foo.py", """\
    .Coverage()
    .start()
    .run_python_file('foo.py', ['foo.py'])
    .stop()
    .combine('.coverage', missing_ok=True)
    .save()
    """, path_exists=True)

Coverage.combine() is part of the public API to coverage.py. Was I really going to extend that supported API for this use case? It would mean documenting, testing, and promising to support that option “forever”. There’s no nice way to add an unsupported argument to a supported method.

Extending the supported API to simplify my testing seemed like the tail wagging the dog. I’m all for letting testing concerns inform a design. Often the tests are simply proxies for the users of your API, and what makes the testing easier will also make for a better, more modular design.

But this just felt like me being lazy. I didn’t want combine() to have a weird option just to save the caller from having to check if the file exists. I imagined explaining this option to someone else, and I didn’t want my future self to have to sheepishly admit, “yeah, it made my tests easier...”

What finally turned me back from this choice was the principle of saying “no.” Sometimes the best way to keep a product simple and good is to say “no” to extraneous features. Setting aside all the testing concerns, this option on Coverage.combine() just felt extraneous.

Having said “no” to changing the public API, it’s back to a conditional in cmdline.py. To make testing CoverageScript easier, I use dependency injection to give the object a function to check for files. CoverageScript already had parameters on the constructor for this purpose, for example to get the stand-in for the Coverage class itself. Now the constructor will look like:

class CoverageScript(object):
    """The command-line interface to coverage.py."""

    def __init__(self, ..., _path_exists=None):
        ...
        self.path_exists = _path_exists or os.path.exists

    def do_run(self, options, args):
        ...

        if options.append:
            if self.path_exists(".coveragerc"):
                self.coverage.combine(".coveragerc")
        self.coverage.save()

and the test code can provide a mock for _path_exists and check its arguments:

self.cmd_executes("run --append foo.py", """\
    .Coverage()
    .start()
    .run_python_file('foo.py', ['foo.py'])
    .stop()
    .path_exists('.coverage')
    .combine('.coverage')
    .save()
    """, path_exists=True)

Yes, this makes the testing more involved. But that’s my business, and this doesn’t change the public interface in ways I didn’t like.

When I started writing this blog post, I was absolutely certain I had made the right choice. As I wrote it, I wavered a bit. Would missing_ok=True be so bad to add to the public interface? Maybe not. It’s not such a stretch, and a user of the API might plausibly convince me that it’s genuinely helpful to them. If that happens, I can reverse all this. That would be ok too. Decisions, decisions...

Comments

[gravatar]
Have you considered not using a mock at all? This code looks to me (to use the language of Gary Bernhardt) like it falls into the outer “procedural glue” of Coverage, not the functional core on the inside, and that therefore you should simply test it as part of your handful of integration tests where the code runs real I/O against real files.
[gravatar]
Another way to look at it is that, aside from the do_run method and one use of os.path.exists, cmdline.py is functionally pure. It takes the command-line arguments and builds a work plan or outputs useful diagnostics/docmentation.

The test is painful to add because os.path.exists exercises a side effect to pull in the enormous super-global variable that is system state. It's painful and out-of-place in the existing tests because the rest don't need it. This object isn't about producing side effects, it's about parsing user input.

Imagine if the data of what work to do were separated from the side-effectful methods that read system state and kick off test jobs. You'd have your existing tests of all the command-line arg parsing that's going to call methods with different args and assert on their return values, and for the new code you'd have a new test file that's going to pass those data objects in, mock the outside world, and assert that those mocks were called correctly (maybe with a few assertions on return values).
[gravatar]
The library "Click" has good support for just such tests as above, e.g. invoking random CLI commands with temporary filesystems etc.

Probably overkill to switch your entire app to a different command-line argument parser for a single test, but thought it might be useful for others...

Also, another way to fix your .combine() call would be to simply create an empty file if there isn't a .coverage already. Also kind of a distasteful hack. Yet another would be to add an "(append=True)" option to .save() -- which might be easier for others using coverage via Python instead of the CLI.
[gravatar]
@brandon, I guess I am not as well-versed in the teachings of Gary as I should be. :) I don't understand what makes this code ill-suited to mocking? I certainly don't understand why its place in the coverage.py world makes it need integration tests. As Peter points out, it's nearly pure-functional.

On a purely practical level, the command-line parsing has a number of combinations to try, it would add a lot of time to do them all with integration tests.
[gravatar]
@Peter: I'm not following your description of a better way to approach this. It sounds kind of like what I've already done, so I must be missing something.
[gravatar]
Let me unpack my comment a bit. If we want to test this:
def add_pure(a, b):
  return a + b
We can test by passing in various arguments and asserting on the returned value. "Pure" functions like add_pure depend only on their arguments and change nothing else in the system. It is "referentially transparent", we could cache or replace
any call to add with the sum of its arguments.

If we want to test this:
def add_effectful(a, b):
  f = open('/tmp/sum', 'w')
  f.write(a + b)
We're in a whole new world of testing because add_effectful has "side effects". This side effect writes to the filesystem and is not referentially transparent. Our test can't assert on the return value. In an isolated test we'd have to mock and assert that it was called properly; in an integrated test we'd have to check the effect happened (eg the file now exists with the right contents).

This side effect is really obvious because it crosses the process boundary by writing to the filesystem, but it's still a side effect if it updated a global variable, ran a query (even a SELECT) against the database, or did anything else that another method might detect as a change in state.

The filesystem is really tricky because it feels solid and standard, but it is actually a super-global variable. It is state that is accessible from anywhere in your program, and even from other runs of your program.

I went and skimmed coverage.py and saw that only do_run and one use of os.path.exists have side effects. do_run is effectful because it's calling arbitrary code that could do anything (side effects are transitive - if you call a method with side effects, to any external method you are considered to have side effects). os.path.exists is effectful because it's reading external state.

If not for those two bits of code, you could test all of coverage.py in the simple style where you pass args and assert, rather than integrate or set up mocks and expect.

The code you contemplated adding was also effectful, and that seemed to be the thing you were struggling with. I think the whole piece of code would benefit from separating out sections that have side effects.

If you want more examples and a longer explanation I gave a talk on this earlier this year. I used Ruby/ActiveRecord examples, but if you squint it applies equally well to Python/Django ORM. https://push.cx/2015/railsconf.
[gravatar]
@Peter: I see your point, but I'm not quite with you on the conclusion. I can definitely see that os.path.exists is a side-effect, and weighed the pros and cons of using it in this blog post. Sounds like you would have changed the public API to make the testing simpler.

I'm not sure how to remove the rest of the side-effects from CoverageScript. After all, it's entire purpose is to actually *do* things, like run your program. I could make it purely functional, in that it would write some sort of other program that would then get executed. That seems like a long way to go to get nice testing. It would essentially move the funky mocking sketch thing I have in test_cmdline.py into cmdline.py itself. What is the value in that?
[gravatar]
To me the missing_ok API feels slightly better, because it doesn't rely on the Look Before You Leap nature of os.path.exists().

In practice the race condition between the existence check and the file maybe disappearing seems unlikely to cause pain.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.