« | » Main « | »

Python testing, book and podcast

Sunday 24 January 2016

Two good things in the Python testing world intersected this week.

Harry Percival wrote a great book called Test-Driven Development with Python. I should have written about it long ago. It's a step-by-step example of building real software (Django web applications) using Test-Driven Development.

Test-Driven Development with Python

Harry describes the philosophy, the methods, and the steps, of doing real TDD with Django. Even if you aren't using Django, this book shows the way to use TDD for serious projects. I'm not yet a TDD convert, but it was very helpful to see it in action and understand more about it.

The entire book is available to read online if you like. Taking the meta to a whole new level, Harry also has the source for the book, including tests on GitHub.

Brian Okken has been running a podcast devoted to Python testing, called Python Test Podcast.

Python Test Podcast

His latest episode is an interview with Harry. People must have thought I was nuts driving to work the other day, I was nodding so much. It was a good conversation. Highly recommended.

Collecting pings from software?

Friday 22 January 2016

Let's say I have a piece of software. In this case, it's some automation for installing and upgrading Open edX. I want to know how it is being used, for example, how many people in the last month used certain versions or switches.

To collect information like that, I can put together a URL in the program, and ping that URL. What's a good simple way to collect that information? What server or service is easy to use and can help me look at the data? Is this something I should use classic marketing web analytics for? Is there a more developer-centric service out there?

This is one of those things that seems easy enough to just do with bit.ly, or a dead-stupid web server with access logs, but I'm guessing there are better ways I don't yet know about.

Isolated @memoize

Saturday 16 January 2016

When calling functions that are expensive, and expected to return the same results for the same input, lots of people like using an @memoize decorator. It uses a cache to quickly return the same results if they have been produced before. Here's a simplified one, adapted from a collection of @memoize implementations:

def memoize(func):
    cache = {}

    def memoizer(*args, **kwargs):
        key = str(args) + str(kwargs)
        if key not in cache:
            cache[key] = func(*args, **kwargs)
        return cache[key]

    return memoizer

def expensive_fn(a, b):
    return a + b        # Not actually expensive!

This is great, and does what we want: repeated calls to expensive_fn with the same arguments will use the cached values instead of actually invoking the function.

But there's a potential problem: the cache dictionary is a global. Don't be fooled by the fact that it isn't literally a global: it doesn't use the global keyword, and it isn't a module-level variable. But it is global in the sense that there is only one cache dictionary for expensive_fn for the entire process.

Globals can interfere with disciplined testing. One ideal of automated tests in a suite is that each test be isolated from all the others. What happens in test1 shouldn't affect test99. But here, if test1 and test99 both call expensive_fn with arguments (1, 2), then test1 will run the function, but test99 will get the cached value. Worse, if I run the complete suite, test99 gets a cached value, but if I run test99 alone, it runs the function.

This might not be a problem, if expensive_fn is truly a pure function with no side effects. But sometimes that's not the case.

I inherited a project that used @memoize to retrieve some fixed data from a web site. @memoize is great here because it means each resource will be fetched only once, no matter how the program uses them. The test suite used Betamax to fake the network access.

Betamax is great: it automatically monitors network access, and stores a "cassette" for each test case, which is a JSON record of what was requested and returned. The next time the tests are run, the cassette is used, and the network access is faked.

The problem is that test1's cassette will have the network request for the memoized resource, and test99's cassette will not, because it never requested the resource, because @memoize made the request unnecessary. Now if I run test99 by itself, it has no way to get the resource, and the test fails. Test1 and test99 weren't properly isolated, because they shared the global cache of memoized values.

My solution was to use an @memoize that I could clear between tests. Instead of writing my own, I used the lru_cache decorator from functools (or from the functools32 if you are still using Python 2.7). It offers a .cache_clear function that can be used to clear all the values from the hidden global cache. It's on each decorated function, so we have to keep a list of them:

import functools

# A list of all the memoized functions, so that
# `clear_memoized_values` can clear them all.
_memoized_functions = []

def memoize(func):
    """Cache the value returned by a function call."""
    func = functools.lru_cache()(func)
    return func

def clear_memoized_values():
    """Clear all the values saved by @memoize, to ensure isolated tests."""
    for func in _memoized_functions:

Now an automatic fixture (for py.test) or a setUp function, can clear the cache before each test:

# For py.test:

def reset_all_memoized_functions():
    """Clears the values cached by @memoize before each test."""

# For unittest:

class MyTestCaseBase(unittest.TestCase):
    def setUp(self):

In truth, it might be better to distinguish between the various reasons for using @memoize. A pure function might be fine to cache between tests, who cares when the value is computed? But other uses clearly should be isolated. @memoize isn't magic, you have to think about what it is doing for you, and when you want to have more control.

Coverage.py 4.1b1

Sunday 10 January 2016

I wasn't planning on any big features for Coverage.py 4.1, but I ended up making a big change. tl;dr: branch coverage is implemented differently, and as a result, your coverage totals are slightly different. Try it: Coverage.py 4.1 beta 1.

Because of Python 3.5's async and await keywords, the existing branch analysis based on bytecode was completely out of gas. The code had long felt jury-rigged, and there were long-standing bugs that seemed impossible to solve. The async features compiled very differently than their synchronous counterparts, and I didn't see a way to bring them into the bytecode fold.

So I ripped out the bytecode analysis and replaced it with AST (Abstract Syntax Tree) analysis. I like it much better: it's based on the structure of the code that people see and can reason about. Four old bugs were fixed as a result, along with the two or three new bugs reported on Python 3.5.

As a result though, coverage.py now calculates totals differently, because the full set of possible code paths is different. So your results will likely shift a little bit, especially if you are using branch measurement. They might be higher, they might be lower. For example, class docstrings now count as executable statements (because they are executable statements), and paths through "except" clauses probably were being overlooked.

Please try it out and let me know if you see anything wrong. I won't be surprised if there are complex code structures that are analyzed wrong somehow. Coverage.py 4.1b1: you know you want it...

« | » Main « | »