« | » Main « | »

Coverage.py v3.5.3

Saturday 29 September 2012

The latest bug-fix release of coverage.py is now available: Coverage.py v3.5.3. Eight bugs were fixed, the most visible of which is that the line numbers in the HTML report now line up better with the source lines (again). Thanks to a simple CSS fix from Marius Gedminas, this perennial annoyance may finally be eradicated.

All the details are in the change history, as usual.

It's getting chilly out there, cover up...

Gems all the way down

Wednesday 19 September 2012

Today I had to update a function that computed a date range for charting. It accepted a parameter of how many days back to draw the chart. I needed to adapt it to do months too.

Days are easy because datetime.timedelta() accepts a days= argument, and then you can subtract the timedelta from your date. But timedelta does not accept a months= argument, because the date N months before the current date is not well-defined, and depends on exactly what you mean by "month." For example, what is the date one month before March 30th?

So I had to write my own code to subtract months from a date just as I wanted. Since I was charting months, I actually only needed the answer to be the first day of the month N months ago, which simplified the problem.

Rather than drop the logic right into the compute_chart_dates() function, I pulled it out into its own function. This made it easy to test the month subtraction logic directly. Thinking about testing often leads to better-structured code, because of the extra demands of having usable surface area for the tests to attach to.

Here's my code:

def subtract_months(when, months):
    """Return the first of the month a certain number of months ago.

    `when`: the datetime.date to count back from.

    `months`: the number of months to subtract.
    
    """
    when = when.replace(day=1)
    years, months = divmod(months, 12)
    if years:
        when = when.replace(year=when.year-years)
    for _ in range(months):
        when -= datetime.timedelta(days=1)
        when = when.replace(day=1)
    return when

When I was done, I had a function that did just what I wanted. I also had a handful of tests of the tricky edge cases, to be sure I had gotten those right. Then I could very simply use that function to extend my chart date function.

Afterward, I thought about how pleasing it was to write the subtract_months() function, and to consider all of its aspects, and to get it just right. It felt like a gem in my hand.

I felt a little bad, too, because it felt indulgent to focus so much effort on this one small function. Shouldn't I be concentrating more on the rest of the code?

But I realized, this code is the way it's supposed to be: tight, focused, solid. It's a pure function, which makes it easy to reason about, and easy to test. This code is the ideal to aim for. Rather than scolding myself for thinking about the gem, I should be trying to make all the code gems.

Of course, the little utilities like subtract_months are easier to make gem-like than the twisty traffic jams at the heart of any real piece of software. But just because it's hard doesn't mean you shouldn't try. The best code will be gems all the way down.

Mocking datetime.today

Thursday 13 September 2012

Mocking is a great way to isolate your code from distracting dependencies while testing. But, it can be an arcane art unto itself. Today I wrote a test for code that uses the current date. While testing, it can't use the actual current date, because the test will produce different results on different days.

A solution is to mock out the datetime.datetime.today() function to return a known fixed date. But a few factors complicate matters. First, datetime.datetime is written in C, so Mock can't replace attributes on the class, so you can't simply mock out just the today() function.

Second, I want to mock just one function in the module, not the whole module. There are a few suggestions of how to do this out there: Michael Foord wrote about Partial mocking in the Mock docs, William John Bert showed another way, and of course, a Stack Overflow question about it.

None of these worked for me, perhaps because of subtle differences between my code under test and theirs. When mocking, it's critical to mock at the appropriate place. If your module has "import datetime", then you need to mock "mymodule.datetime" as a module. If instead you have "from datetime import datetime", then you need to mock "mymodule.datetime" as a class. Datetime's eponymous class structure only adds to the confusion.

I ended up with help from tos9 (Julian Berman) on the #python-testing IRC channel, and used this code in my test class:

def setUp(self):
    datetime_patcher = mock.patch.object(
        my_module.datetime, 'datetime', 
        mock.Mock(wraps=datetime.datetime)
    )
    mocked_datetime = datetime_patcher.start()
    mocked_datetime.today.return_value = datetime.datetime(2012, 6, 16)
    self.addCleanup(datetime_patcher.stop)

Here, mock.patch.object is being used to patch the datetime attribute (the class) of the datetime import in my module. It replaces it with a mock, one that wraps the real datetime class. Here, "wraps" means that anything not explicitly changed on the mock is proxied to the real datetime class, so most of our functionality is in place. We change the return value of today() to be a specific date, accomplishing our goal.

If you haven't seen it before, addCleanup() is a new feature of unittest in 2.7. Instead of writing a tearDown method in which you clean up all the stuff you did in setUp, you can register callables with addCleanup, and they will be called to clean up at the end of tests. Because you can register as many as you like, it's easier to modularize your setup and teardown code.

BTW, Julian also has a clever decorator to auto-register the cleanup functions for patches, and has packaged it into a mixin: ivoire/tests/util.py. Check it out.

Removing overlapping regex matches

Saturday 8 September 2012

In a Stack Overflow question a few months ago, a petitioner wanted to remove all the matches of a number of regexes. The complication was that the regexes could overlap.

Simply using re.sub() on each pattern in turn wouldn't work, because the overlapping matches wouldn't be fully matched once the first patterns were removed from the string.

The solution is to match the regexes, and note the locations of the matches, and then in a second pass, delete all those parts of the string. Here's an updated version of my answer:

def remove_regexes(text, patterns):
    """Remove all the matches of any pattern in patterns from text."""
    bytes = bytearray(text)
    for pat in patterns:
        for m in re.finditer(pat, text):
            start, end = m.span()
            bytes[start:end] = [0] * (end-start)
    new_string = ''.join(chr(c) for c in bytes if c)
    return new_string

There are a few rarely-used features of Python at work here. First, I use a bytearray, which is kind of like a mutable string. Like strings, it is a sequence of bytes. Unlike strings, you can change the bytes in place. This is handy for us to mark which portions of the string are being removed.

I initialize the bytearray to have the same contents as the text string, then for each pattern, I find all the matches for the pattern, and remove them from the bytearray by replacing the matched bytes with a zero bytes.

The re.finditer method gives us an iterator over all the matches, and produces a match object for each one. Match objects are usually just tested and then examined for the matched string, but they have other methods on them too. Here I use m.span(), which returns a two-tuple containing the starting and ending indexes of the match, suitable for use as a slice. I unpack them into start and end, and then use those indexes to write zero bytes into my bytearray using slice assignment.

Because I match against the original unchanged string, the overlapping regexes are not a problem. When all of the patterns have been matched, what's left in my bytearray are zero bytes where the patterns matched, and real byte values where they didn't. A list comprehension joins all the good bytes back together, and produces a string.

Nothing earth-shattering here, just a nice showcase of some little-used Python features.

« | » Main « | »