Two good things in the Python testing world intersected this week.

Harry Percival wrote a great book called Test-Driven Development with Python. I should have written about it long ago. It's a step-by-step example of building real software (Django web applications) using Test-Driven Development.

Test-Driven Development with Python

Harry describes the philosophy, the methods, and the steps, of doing real TDD with Django. Even if you aren't using Django, this book shows the way to use TDD for serious projects. I'm not yet a TDD convert, but it was very helpful to see it in action and understand more about it.

The entire book is available to read online if you like. Taking the meta to a whole new level, Harry also has the source for the book, including tests on GitHub.

Brian Okken has been running a podcast devoted to Python testing, called Python Test Podcast.

Python Test Podcast

His latest episode is an interview with Harry. People must have thought I was nuts driving to work the other day, I was nodding so much. It was a good conversation. Highly recommended.

tagged: , » react

Let's say I have a piece of software. In this case, it's some automation for installing and upgrading Open edX. I want to know how it is being used, for example, how many people in the last month used certain versions or switches.

To collect information like that, I can put together a URL in the program, and ping that URL. What's a good simple way to collect that information? What server or service is easy to use and can help me look at the data? Is this something I should use classic marketing web analytics for? Is there a more developer-centric service out there?

This is one of those things that seems easy enough to just do with bit.ly, or a dead-stupid web server with access logs, but I'm guessing there are better ways I don't yet know about.

tagged: » 6 reactions

When calling functions that are expensive, and expected to return the same results for the same input, lots of people like using an @memoize decorator. It uses a cache to quickly return the same results if they have been produced before. Here's a simplified one, adapted from a collection of @memoize implementations:

def memoize(func):
    cache = {}

    def memoizer(*args, **kwargs):
        key = str(args) + str(kwargs)
        if key not in cache:
            cache[key] = func(*args, **kwargs)
        return cache[key]

    return memoizer

@memoize
def expensive_fn(a, b):
    return a + b        # Not actually expensive!

This is great, and does what we want: repeated calls to expensive_fn with the same arguments will use the cached values instead of actually invoking the function.

But there's a potential problem: the cache dictionary is a global. Don't be fooled by the fact that it isn't literally a global: it doesn't use the global keyword, and it isn't a module-level variable. But it is global in the sense that there is only one cache dictionary for expensive_fn for the entire process.

Globals can interfere with disciplined testing. One ideal of automated tests in a suite is that each test be isolated from all the others. What happens in test1 shouldn't affect test99. But here, if test1 and test99 both call expensive_fn with arguments (1, 2), then test1 will run the function, but test99 will get the cached value. Worse, if I run the complete suite, test99 gets a cached value, but if I run test99 alone, it runs the function.

This might not be a problem, if expensive_fn is truly a pure function with no side effects. But sometimes that's not the case.

I inherited a project that used @memoize to retrieve some fixed data from a web site. @memoize is great here because it means each resource will be fetched only once, no matter how the program uses them. The test suite used Betamax to fake the network access.

Betamax is great: it automatically monitors network access, and stores a "cassette" for each test case, which is a JSON record of what was requested and returned. The next time the tests are run, the cassette is used, and the network access is faked.

The problem is that test1's cassette will have the network request for the memoized resource, and test99's cassette will not, because it never requested the resource, because @memoize made the request unnecessary. Now if I run test99 by itself, it has no way to get the resource, and the test fails. Test1 and test99 weren't properly isolated, because they shared the global cache of memoized values.

My solution was to use an @memoize that I could clear between tests. Instead of writing my own, I used the lru_cache decorator from functools (or from the functools32 if you are still using Python 2.7). It offers a .cache_clear function that can be used to clear all the values from the hidden global cache. It's on each decorated function, so we have to keep a list of them:

import functools

# A list of all the memoized functions, so that
# `clear_memoized_values` can clear them all.
_memoized_functions = []

def memoize(func):
    """Cache the value returned by a function call."""
    func = functools.lru_cache()(func)
    _memoized_functions.append(func)
    return func

def clear_memoized_values():
    """Clear all the values saved by @memoize, to ensure isolated tests."""
    for func in _memoized_functions:
        func.cache_clear()

Now an automatic fixture (for py.test) or a setUp function, can clear the cache before each test:

# For py.test:

@pytest.fixture(autouse=True)
def reset_all_memoized_functions():
    """Clears the values cached by @memoize before each test."""
    clear_memoized_values()

# For unittest:

class MyTestCaseBase(unittest.TestCase):
    def setUp(self):
        super().setUp()
        clear_memoized_values()

In truth, it might be better to distinguish between the various reasons for using @memoize. A pure function might be fine to cache between tests, who cares when the value is computed? But other uses clearly should be isolated. @memoize isn't magic, you have to think about what it is doing for you, and when you want to have more control.

I wasn't planning on any big features for Coverage.py 4.1, but I ended up making a big change. tl;dr: branch coverage is implemented differently, and as a result, your coverage totals are slightly different. Try it: Coverage.py 4.1 beta 1.

Because of Python 3.5's async and await keywords, the existing branch analysis based on bytecode was completely out of gas. The code had long felt jury-rigged, and there were long-standing bugs that seemed impossible to solve. The async features compiled very differently than their synchronous counterparts, and I didn't see a way to bring them into the bytecode fold.

So I ripped out the bytecode analysis and replaced it with AST (Abstract Syntax Tree) analysis. I like it much better: it's based on the structure of the code that people see and can reason about. Four old bugs were fixed as a result, along with the two or three new bugs reported on Python 3.5.

As a result though, coverage.py now calculates totals differently, because the full set of possible code paths is different. So your results will likely shift a little bit, especially if you are using branch measurement. They might be higher, they might be lower. For example, class docstrings now count as executable statements (because they are executable statements), and paths through "except" clauses probably were being overlooked.

Please try it out and let me know if you see anything wrong. I won't be surprised if there are complex code structures that are analyzed wrong somehow. Coverage.py 4.1b1: you know you want it...

tagged: » 2 reactions

When I look back on 2015, I'm happy about a number of things, but two that stand out have nothing to do with software or the virtual world.

Twelve years ago, I wrote about learning to juggle five balls, and then an an update on how it was going. In the years since, I occasionally tried to improve, but wasn't getting anywhere.

This past year, partly inspired by watching co-workers learning to juggle three balls, I made a new concerted effort. I've made significant progress. Now I can routinely make ten throws, and cleanly catch all five balls. 15 or 20 throws is not unusual, and my record has increased to 30. More importantly, now every day that I practice for even 10 minutes feels like I'm getting slightly better.

This video by Niels Duinker helped to give me drills. The four-ball tosses were especially helpful, because they enabled me to hear how my throws were lop-sided. Practicing those until they evened out really opened up the way to move forward. Now it no longer feels like a crisis to have five balls in motion.

My old feeling when trying five balls was frustration. Now I feel the improvement, and it's encouraging and make it enjoyable to practice, which leads to more improvement. I'm looking forward to more progress in 2016.

My other physical activity is swimming. I set a goal for myself to swim a total of 150 miles in 2015. After a slow start to the year due to blizzards and colds, I caught up in the fall, and ended with a total of 154.02 miles. (The precision is illusory: I counted the distance of summer swims at the beach as very rough guesses.)

This year in the pool I figured out flip turns. I used to think I got dizzy trying to do them. That wasn't the problem, the problem was water up the nose, being too far from the wall, end up too deep, not breathing enough, etc. But I watched others, and practiced, and now they feel natural and easy. I like feeling confident in this new skill.

In the coming year, I want to add dolphin kicks to the repertoire, and then maybe consider the butterfly.

I spend a lot of time online, and writing software. That involves many kinds of abstract mental activities. It's great to be able to learn new skills and techniques in that world, and there's no shortage of things to learn. But having physical world challenges is satisfying in a very different way.

tagged: , , » react

Combining traditional Christmas advent calendars with online programming exercises: Advent of Code is a nicely made collection of programming problems.

One of the things I like about these problems is there are always two parts, and you don't see the second part of the problem until you have solved the first part. This usually leads to refactorings or repurposing of your code, which is a valuable exercise in and of itself.

For many of the problems, there are interesting follow-on questions. One common one would be, "Write the code that generates the sample inputs to be sure there's a single answer." For some of the problems, any random input would do, but for some, there's a constraint that has to be met.

I have a collection of approachable problems at Kindling projects. If Advent of Code will stay up after Christmas, I'll definitely add it to that page.

Today's is the first where a simple brute force approach isn't going to work, and I'll need a cleverer algorithm... Hmmm...

Open source software is great. It's hard to remember that there was a time before Linux, Apache, Python, Ruby, Postgres, etc. These days, when you want to build something, you can start with an enormous amount of high-quality free infrastructure already available to you.

But there's a problem with open source: the work that goes into it is largely unpaid. The work that happens is because of individuals' free time and spare energy. There are exceptions: many companies contribute to open source, some even fund developers full-time to work on it. But the ecosystem is full of useful and important projects that only exist because someone gave away their time and energy.

Not having real funding is holding back open source, because it makes it hard to get started (you have to have spare time and energy), and it makes it hard to stick with it. Two recent blog posts underscore this last point: David MacIver's Throwing in the Towel, and Ryan Bigg's Open source work.

We've gotten pretty far on this model. But we can do a lot better if we find ways to put real resources (money) into the system. Russell Keith-Magee said the whole thing better than I could in his PyCon AU 2015 talk, Money, Money, Money: Writing software, in a rich (wo)man's world:

If a company can find money for foosball tables and meditative ball-pits, they should be able to find the resources to help maintain the software on which they've been basing their success.

Russell has started a GitHub repo as a conversation about how we might be able to make changes. Each issue in Paying the Piper is an idea for funding open source, with discussion. Please go be part of it.

Personally, I think we should try asking companies to donate, and if we make it dead simple enough for them to do the right thing, they just might.

Questions for everyone: do you think you could get your employer to donate to an open-source funding non-profit? What are the hurdles? What could we do together to get them over those hurdles?

Django is at the forefront of this, having just funded a part-time fund-raising position: Introducing the DSF's Director of Advancement. It will be very interesting to see how that works out.

I have a double interest in this: first, the general interest in seeing the Python world grow and flourish. Second, my own work on coverage.py would be a little easier if there were some money flowing back. It wouldn't have to be much, I'm not thinking I could support myself with it, but some tangible return would make the time easier to justify.

tagged: » 6 reactions

I fixed five bugs in coverage.py 4.0.2 to produce coverage.py 4.0.3. XML reports now have correct <source> elements in complex cases, and a mysterious problem that seriously borked really unusual cases has been fixed.

tagged: » react

I fixed three more bugs in coverage.py 4.0.1 to produce coverage.py 4.0.2. Non-ASCII characters in program source and filenames should work better, and if you are setting your own trace function, that works again... :)

tagged: » react

My day job is working on Open edX. It's large, and our requirements files are getting unruly. In particular, our requirements file for installing our other GitHub repos has grown very long in the tooth.

First, we have a mix of -e installs and non-e installs. -e means, check out the git working tree, and then install it as the code. This makes it easy to use the code: you can skip the discipline of writing and properly maintaining a setup.py. Just changing the SHA in the github URL should bring in new code.

We also have inconsistent use of "#egg" syntax in the URLs, and we don't always include the version number, and when we do, we use one of three different syntaxes for it.

Worse, we'd developed a cargo-cult mentality about the mysteries of what pip might do. No one had confidence about the different behavior to expect from the different syntaxes. Sometimes updated code was being installed, and sometimes not.

I did an experiment where I made a simple package with just a version number in it (version_dummy), and I tried installing it in various ways. I found that I had to include a version number in the hash fragment at the end of the URL to get it to update properly. Then another engineer did a similar experiment and came to the opposite conclusion, that just changing the SHA would be enough.

As bad as cargo-culting is, this was even worse: two experiments designed to answer the same question, with different results! It was time to get serious.

An important property of science is reproducibility: another investigator should be able to run your experiment to see if they get the same results. On top of that, I knew I'd want to re-run my own experiment many times as I thought of new twists to try.

So I wrote a shell script that automated the installation and verification of versions. You can run it yourself: create a new virtualenv, then run the script.

I asked in the #pypa IRC channel for help with my mystery, and they had the clue I needed to get to the bottom of why we got two different answers. I had a git URL that looked like this:

git+https://github.com/nedbat/version_dummy@123abc456#egg=version_dummy

He had a URL like this:

git+https://github.com/otherguy/example@789xyz456#egg=example

These look similar enough that they should behave the same, right? The difference is that mine has an underscore in the name, and his does not. My suffix ('#egg=version_dummy') is being parsed inside pip as if the package name was "version" and the version was "dummy"! This meant that updating the SHA wouldn't install new code, because pip thought it knew what version it would get ("dummy"), and that's the version it already had, so why install it?

Writing my experiment.sh script gave me a good place to try out different scenarios of updating my version_dummy from version 1.0 to 2.0.

Things I learned:

  • -e installs work even if you only change the SHA, although there remains superstition around the office that this is not true. That might just be superstition, or there might be scenarios where it fails that I haven't tried yet.
  • If you use a non-e install, you have to supply an explicit version number on the URL, because punctuation in the package name can confuse pip.
  • If you install a package non-e, and then update it with a -e install, you will have both installed, and you'll need to uninstall it twice to really get rid of it.
  • There are probably more scenarios that I haven't tried yet that will come back to bite me later. :(

If anyone has more information, I'm really interested.

For Susan's birthday today, a cake about biking through fields of lavender:

Cake about biking through fields of lavender

This is a dream bike ride for Susan, which she has not yet made real. This cake will have to do for now. It's just a chocolate cake mix, baked into a rectangle pan and two mixing bowls. The bottoms of the mixing bowls became the hillocks. The road is Hershey bars, but she is biking off-road on the scree of crumbled cookies.

tagged: » 1 reaction

I fixed a few bugs in coverage.py 4.0 to produce coverage.py 4.0.1. Try it. :)

tagged: » react

Older:

Mon 14:

Appveyor

Even older...