|Ned Batchelder : Blog | Code | Text | Site|
Flaws in coverage measurement
» Home : Blog : October 2007
Coverage testing is a great way to find out what parts of your code are not tested by your test suite. You turn on coverage.py, then run your tests. At the end, coverage can show you which lines were never executed, either by line number or visually in an annotated source file.
When your test coverage is less than 100%, coverage testing works well: it points you to the lines in your code that are never run, showing the way to new tests to write. The ultimate goal, of course, is to get your test coverage to 100%.
But then you have problems, because 100% test coverage doesn't really mean much. There are dozens of ways your code or your tests could still broken, but now you aren't getting any directions. The measurement coverage.py provides is more accurately called statement coverage, because it tells you which statements were executed. Statement coverage testing has taken you to the end of its road, and the bad news is, you aren't at your destination, but you've run out of road.
By way of illustration, here are a few examples of 100% statement coverage of buggy code.
With multiple branches in a function, there may be combinations that aren't tested, even though each individual line is covered by a test:
Loops can have similar issues:
You can often simplify a function by putting complexity into data tables, but there's no way to measure which parts of a data structure were used:
Real code often contains implied conditionals that don't live on a separate line to be measured:
Although we have 100% coverage, we never found out that due to a typo, the second condition on line 3 will divide by zero.
Conditionals can also be hidden inside functions that aren't being measured in the first place.
The replace method here is essentially a big if statement on the condition that the string contains the substring being replaced. Our test never takes that path, but the if is hidden from us, so our coverage testing doesn't help us find the missed coverage.
Just because your tests execute the code doesn't mean they properly test the results.
Here our "sort" routine passes all the tests, and the coverage is 100%. But, oops, we forgot to check that the list returned is really sorted.
Of course, these examples are absurd. It's easy to see where we went wrong in each of them. Most likely, though, your tests have the same underlying problems, but in ways that are much more difficult to find.
Improved tools could help some of these cases, but not all. Some C-based tools provide branch analysis that could help with the path problems above. But no tool can guarantee there aren't path problems (what if a loop works incorrectly if executed a prime number of times?), and no tool will point out that your tests aren't checking the important things about results.
For more on the problems of coverage testing, the wikipedia article on Code Coverage has a number of fine jumping-off points. Cem Kaner has a depressingly exhaustive overview of the Measurement of the Extent of Testing. After perusing it, you may wonder why you bother with puny statement coverage testing at all!
Statement coverage testing is a good measure of what isn't being tested in your code. It's a good start for understanding the completeness of your tests. Brian Merick's How to Misuse Code Coverage sums it up best: "Coverage tools are only helpful if they're used to enhance thought, not replace it."