Tuesday 30 October 2007 — This is 17 years old. Be careful.
Coverage testing is a great way to find out what parts of your code are not tested by your test suite. You turn on coverage.py, then run your tests. At the end, coverage can show you which lines were never executed, either by line number or visually in an annotated source file.
When your test coverage is less than 100%, coverage testing works well: it points you to the lines in your code that are never run, showing the way to new tests to write. The ultimate goal, of course, is to get your test coverage to 100%.
But then you have problems, because 100% test coverage doesn’t really mean much. There are dozens of ways your code or your tests could still broken, but now you aren’t getting any directions. The measurement coverage.py provides is more accurately called statement coverage, because it tells you which statements were executed. Statement coverage testing has taken you to the end of its road, and the bad news is, you aren’t at your destination, but you’ve run out of road.
By way of illustration, here are a few examples of 100% statement coverage of buggy code.
Combinations of paths
With multiple branches in a function, there may be combinations that aren’t tested, even though each individual line is covered by a test:
def two_branches(a, b):
if a:
d = 0
else:
d = 2
if b:
x = 2/d
else:
x = d/2
return x
# These tests give 100% coverage:
two_branches(False, False) == 1
two_branches(True, False) == 0
two_branches(False, True) == 1
# This test fails with a ZeroDivisionError:
two_branches(True, True)
Loops can have similar issues:
def loop_paths(a):
while a:
x = 1
a -= 1
return x
# This test gives 100% coverage:
loop_paths(1) == 1
# This test fails with a NameError:
loop_paths(0)
Data-driven code
You can often simplify a function by putting complexity into data tables, but there’s no way to measure which parts of a data structure were used:
divisors = {
'x': 1,
'y': 0,
}
def data_driven(thing):
return 2/divisors.get(thing)
# This test gives 100% coverage:
data_driven('x') == 2
# This test fails with a ZeroDivisionError:
data_driven('y')
Hidden conditionals
Real code often contains implied conditionals that don’t live on a separate line to be measured:
def implied_conditional(a):
if (a % 2 == 0) or (a % 0 == 0):
print "Special case"
return a+2
# 100% coverage:
implied_conditional(0) == 2
implied_conditional(2) == 4
Although we have 100% coverage, we never found out that due to a typo, the second condition on line 3 will divide by zero.
Conditionals can also be hidden inside functions that aren’t being measured in the first place.
def fix_url(u):
# If we're an https url, make it http.
return u.replace('https://', 'xyzzyWRONG:')
# 100% coverage:
fix_url('http://foo.com') == 'http://foo.com'
The replace method here is essentially a big if statement on the condition that the string contains the substring being replaced. Our test never takes that path, but the if is hidden from us, so our coverage testing doesn’t help us find the missed coverage.
Incomplete tests
Just because your tests execute the code doesn’t mean they properly test the results.
def my_awesome_sort(l):
# Magic mumbo-jumbo that will sort the list (NOT!)
l.reverse()
return l
# 100% code coverage!
l = [4,2,5,3,1]
type(my_awesome_sort(l)) == list
len(my_awesome_sort(l)) == 5
my_awesome_sort(l)[0] == 1
Here our “sort” routine passes all the tests, and the coverage is 100%. But, oops, we forgot to check that the list returned is really sorted.
Real world
Of course, these examples are absurd. It’s easy to see where we went wrong in each of them. Most likely, though, your tests have the same underlying problems, but in ways that are much more difficult to find.
Improved tools could help some of these cases, but not all. Some C-based tools provide branch analysis that could help with the path problems above. But no tool can guarantee there aren’t path problems (what if a loop works incorrectly if executed a prime number of times?), and no tool will point out that your tests aren’t checking the important things about results.
For more on the problems of coverage testing, the wikipedia article on Code Coverage has a number of fine jumping-off points. Cem Kaner has a depressingly exhaustive overview of the Measurement of the Extent of Testing. After perusing it, you may wonder why you bother with puny statement coverage testing at all!
Statement coverage testing is a good measure of what isn’t being tested in your code. It’s a good start for understanding the completeness of your tests. Brian Merick’s How to Misuse Code Coverage sums it up best: “Coverage tools are only helpful if they’re used to enhance thought, not replace it.”
Comments
It's also why unit testing has always had an aura of menace and horror about it for me - exhaustive testing can be a bit tedious.
On my project, I put the test cases for foo.py in foo.test.py. I don't know why I do this; it is just what my predecessor did.
In this case, in the HTML index page, clicking on the foo.py line takes you to the foo.test.py coverage instead.
If I rename the test cases to foo_test.py, it functions perfectly, and is quite an eye-opener to the amount of code never executed.
(This bug-report can be filed under "If it hurts when you do that, then don't do that," but I thought I would let you know.)
Thanks for a very useful utility.
True. Moreover, there's no way to to measure which variables are used at all. I was concerned that I was spending effort creating and updating a number of unneeded self. variables. Coverage.py told me that I am most certainly executing all code that does so. :-) But it would be nice to eliminate that code if the variables are not used.
No disparagement implied. Great program and great examples of "things that can go wrong." But, in case someone knows, is there a Py tool that will flag unused variables -- and perhaps even unused portions of data structures?
Running a syntax checker (such as PyLint and Flake8) can help you find unnused variables.
I use an Emacs plugin flymake that highlights lines with syntax errors, including unused variables/imports, bad indentation, and really anything that goes agaisnt PEP8 standards.
Add a comment: