Blog | Ned Batchelder

Starting with pytest’s parametrize

Wednesday 13 August 2025

Pytest’s parametrize feature is powerful but it looks scary. I hope this step-by-step explanation helps people use it more.

Writing tests can be difficult and repetitive. Pytest has a feature called parametrize that can make it reduce duplication, but it can be hard to understand if you are new to the testing world. It’s not as complicated as it seems.

Let’s say you have a function called add_nums() that adds up a list of numbers, and you want to write tests for it. Your tests might look like this:

def test_123():
    assert add_nums([1, 2, 3]) == 6

def test_negatives():
    assert add_nums([1, 2, -3]) == 0

def test_empty():
    assert add_nums([]) == 0

This is great: you’ve tested some behaviors of your add_nums() function. But it’s getting tedious to write out more test cases. The names of the function have to be different from each other, and they don’t mean anything, so it’s extra work for no benefit. The test functions all have the same structure, so you’re repeating uninteresting details. You want to add more cases but it feels like there’s friction that you want to avoid.

If we look at these functions, they are very similar. In any software, when we have functions that are similar in structure, but differ in some details, we can refactor them to be one function with parameters for the differences. We can do the same for our test functions.

Here the functions all have the same structure: call add_nums() and assert what the return value should be. The differences are the list we pass to add_nums() and the value we expect it to return. So we can turn those into two parameters in our refactored function:

def test_add_nums(nums, expected_total):
    assert add_nums(nums) == expected_total

Unfortunately, tests aren’t run like regular functions. We write the test functions, but we don’t call them ourselves. That’s the reason the names of the test functions don’t matter. The test runner (pytest) finds functions named test_* and calls them for us. When they have no parameters, pytest can call them directly. But now that our test function has two parameters, we have to give pytest instructions about how to call it.

To do that, we use the @pytest.mark.parametrize decorator. Using it looks like this:

import pytest

@pytest.mark.parametrize(
    "nums, expected_total",
    [
        ([1, 2, 3], 6),
        ([1, 2, -3], 0),
        ([], 0),
    ]
)
def test_add_nums(nums, expected_total):
    assert add_nums(nums) == expected_total

There’s a lot going on here, so let’s take it step by step.

If you haven’t seen a decorator before, it starts with @ and is like a prologue to a function definition. It can affect how the function is defined or provide information about the function.

The parametrize decorator is itself a function call that takes two arguments. The first is a string (“nums, expected_total”) that names the two arguments to the test function. Here the decorator is instructing pytest, “when you call test_add_nums, you will need to provide values for its nums andexpected_total parameters.”

The second argument to parametrize is a list of the values to supply as the arguments. Each element of the list will become one call to our test function. In this example, the list has three tuples, so pytest will call our test function three times. Since we have two parameters to provide, each element of the list is a tuple of two values.

The first tuple is ([1, 2, 3], 6), so the first time pytest calls test_add_nums, it will call it as test_add_nums([1, 2, 3], 6). All together, pytest will call us three times, like this:

test_add_nums([1, 2, 3], 6)
test_add_nums([1, 2, -3], 0)
test_add_nums([], 0)

This will all happen automatically. With our original test functions, when we ran pytest, it showed the results as three passing tests because we had three separate test functions. Now even though we only have one function, it still shows as three passing tests! Each set of values is considered a separate test that can pass or fail independently. This is the main advantage of using parametrize instead of writing three separate assert lines in the body of a simple test function.

What have we gained?

We don’t have to write three separate functions with different names.
We don’t have to repeat the same details in each function (assert, add_nums(), ==).
The differences between the tests (the actual data) are written succinctly all in one place.
Adding another test case is as simple as adding another line of data to the decorator.

Coverage.py regex pragmas

Monday 28 July 2025

Coverage.py uses regexes to define pragma syntax. This is surprisingly powerful.

Coverage.py lets you indicate code to exclude from measurement by adding comments to your Python files. But coverage implements them differently than other similar tools. Rather than having fixed syntax for these comments, they are defined using regexes that you can change or add to. This has been surprisingly powerful.

The basic behavior: coverage finds lines in your source files that match the regexes. These lines are excluded from measurement, that is, it’s OK if they aren’t executed. If a matched line is part of a multi-line statement the whole multi-line statement is excluded. If a matched line introduces a block of code the entire block is excluded.

At first, these regexes were just to make it easier to implement the basic “here’s the comment you use” behavior for pragma comments. But it also enabled pragma-less exclusions. You could decide (for example) that you didn’t care to test any __repr__ methods. By adding def __repr__ as an exclusion regex, all of those methods were automatically excluded from coverage measurement without having to add a comment to each one. Very nice.

Not only did this let people add custom exclusions in their projects, but it enabled third-party plugins that could configure regexes in other interesting ways:

covdefaults adds a bunch of default exclusions, and also platform- and version-specific comment syntaxes.
coverage-conditional-plugin gives you a way to create comment syntaxes for entire files, for whether other packages are installed, and so on.

Then about a year ago, Daniel Diniz contributed a change that amped up the power: regexes could match multi-line patterns. This sounds like not that large a change, but it enabled much more powerful exclusions. As a sign, it made it possible to support four different feature requests.

To make it work, Daniel changed the matching code. Originally, it was a loop over the lines in the source file, checking each line for a match against the regexes. The new code uses the entire source file as the target string, and loops over the matches against that text. Each match is converted into a set of line numbers and added to the results.

The power comes from being able to use one pattern to match many lines. For example, one of the four feature requests was how to exclude an entire file. With configurable multi-line regex patterns, you can do this yourself:

\A(?s:.*# pragma: exclude file.*)\Z

With this regex, if you put the comment “# pragma: exclude file” in your source file, the entire file will be excluded. The \A and \Z match the start and end of the target text, which remember is the entire file. The (?s:...) means the s/DOTALL flag is in effect, so . can match newlines. This pattern matches the entire source file if the desired pragma is somewhere in the file.

Another requested feature was excluding code between two lines. We can use “# no cover: start” and “# no cover: end” as delimiters with this regex:

# no cover: start(?s:.*?)# no cover: stop

Here (?s:.*?) means any number of any character at all, but as few as possible. A star in regexes means as many as possible, but star-question-mark means as few as possible. We need the minimal match so that we don’t match from the start of one pair of comments all the way through to the end of a different pair of comments.

This regex approach is powerful, but is still fairly shallow. For example, either of these two examples would get the wrong lines if you had a string literal with the pragma text in it. There isn’t a regex that skips easily over string literals.

This kind of difficulty hit home when I added a new default pattern to exclude empty placeholder methods like this:

def not_yet(self): ...

def also_not_this(self):
    ...

async def definitely_not_this(
    self,
    arg1,
):
    ...

We can’t just match three dots, because ellipses can be used in other places than empty function bodies. We need to be more delicate. I ended up with:

^\s*(((async )?def .*?)?\)(\s*->.*?)?:\s*)?\.\.\.\s*(#|$)

This craziness ensures the ellipsis is part of an (async) def, that the ellipsis appears first in the body (but no docstring allowed, doh!), allows for a comment on the line, and so on. And even with a pattern this complex, it would incorrectly match this contrived line:

def f(): print("(well): ... #2 false positive!")

So regexes aren’t perfect, but they’re a pretty good balance: flexible and powerful, and will work great on real code even if we can invent weird edge cases where they fail.

What started as a simple implementation expediency has turned into a powerful configuration option that has done more than I would have thought.

Coverage 7.10.0: patch

Thursday 24 July 2025

Coverage 7.10 has some significant new features that have solved some long-standing problems.

Years ago I greeted a friend returning from vacation and asked how it had been. She answered, “It was good, I got a lot done!” I understand that feeling. I just had a long vacation myself, and used the time to clean up some old issues and add some new features in coverage.py v7.10.

The major new feature is a configuration option, [run] patch. With it, you specify named patches that coverage can use to monkey-patch some behavior that gets in the way of coverage measurement.

The first is subprocess. Coverage works great when you start your program with coverage measurement, but has long had the problem of how to also measure the coverage of sub-processes that your program created. The existing solution had been a complicated two-step process of creating obscure .pth files and setting environment variables. Whole projects appeared on PyPI to handle this for you.

Now, patch = subprocess will do this for you automatically, and clean itself up when the program ends. It handles sub-processes created by the subprocess module, the os.system() function, and any of the execv or spawnv families of functions.

This alone has spurred one user to exclaim,

The latest release of Coverage feels like a Christmas present! The native support for Python subprocesses is so good!

Another patch is _exit. This patches os._exit() so that coverage saves its data before exiting. The os._exit() function is an immediate and abrupt termination of the program, skipping all kinds of registered clean up code. This patch makes it possible to collect coverage data from programs that end this way.

The third patch is execv. The execv functions end the current program and replace it with a new program in the same process. The execv patch arranges for coverage to save its data before the current program is ended.

Now that these patches are available, it seems silly that it’s taken so long. They (mostly) weren’t difficult. I guess it took looking at the old issues, realizing the friction they caused, and thinking up a new way to let users control the patching. Monkey-patching is a bit invasive, so I’ve never wanted to do it implicitly. The patch option gives the user an explicit way to request what they need without having to get into the dirty details themselves.

Another process-oriented feature was contributed by Arkady Gilinsky: with --save-signal=USR1 you can specify a user signal that coverage will attend to. When you send the signal to your running coverage process, it will save the collected data to disk. This gives a way to measure coverage in a long-running process without having to end the process.

There were some other fixes and features along the way, like better HTML coloring of multi-line statements, and more default exclusions (if TYPE_CHECKING: and ...).

It feels good to finally address some of these pain points. I also closed some stale issues and pull requests. There is more to do, always more to do, but this feels like a real step forward. Give coverage 7.10.0 a try and let me know how it works for you.

2048: iterators and iterables

Tuesday 15 July 2025

Making a simple game, I waded into a classic iterator/iterable confusion.

I wrote a low-tech terminal-based version of the classic 2048 game and had some interesting difficulties with iterators along the way.

2048 has a 4×4 grid with sliding tiles. Because the tiles can slide left or right and up or down, sometimes we want to loop over the rows and columns from 0 to 3, and sometimes from 3 to 0. My first attempt looked like this:

N = 4
if sliding_right:
    cols = range(N-1, -1, -1)   # 3 2 1 0
else:
    cols = range(N)             # 0 1 2 3

if sliding_down:
    rows = range(N-1, -1, -1)   # 3 2 1 0
else:
    rows = range(N)             # 0 1 2 3

for row in rows:
    for col in cols:
        ...

This worked, but those counting-down ranges are ugly. Let’s make it nicer:

cols = range(N)                 # 0 1 2 3
if sliding_right:
    cols = reversed(cols)       # 3 2 1 0

rows = range(N)                 # 0 1 2 3
if sliding_down:
    rows = reversed(rows)       # 3 2 1 0

for row in rows:
    for col in cols:
        ...

Looks cleaner, but it doesn’t work! Can you see why? It took me a bit of debugging to see the light.

range() produces an iterable: something that can be iterated over. Similar but different is that reversed() produces an iterator: something that is already iterating. Some iterables (like ranges) can be used more than once, creating a new iterator each time. But once an iterator like reversed() has been consumed, it is done. Iterating it again will produce no values.

If “iterable” vs “iterator” is already confusing here’s a quick definition: an iterable is something that can be iterated, that can produce values in a particular order. An iterator tracks the state of an iteration in progress. An analogy: the pages of a book are iterable; a bookmark is an iterator. The English hints at it: an iter-able is able to be iterated at some point, an iterator is actively iterating.

The outer loop of my double loop was iterating only once over the rows, so the row iteration was fine whether it was going forward or backward. But the columns were being iterated again for each row. If the columns were going forward, they were a range, a reusable iterable, and everything worked fine.

But if the columns were meant to go backward, they were a one-use-only iterator made by reversed(). The first row would get all the columns, but the other rows would try to iterate using a fully consumed iterator and get nothing.

The simple fix was to use list() to turn my iterator into a reusable iterable:

cols = list(reversed(cols))

The code was slightly less nice, but it worked. An even better fix was to change my doubly nested loop into a single loop:

for row, col in itertools.product(rows, cols):

That also takes care of the original iterator/iterable problem, so I can get rid of that first fix:

cols = range(N)
if sliding_right:
    cols = reversed(cols)

rows = range(N)
if sliding_down:
    rows = reversed(rows)

for row, col in itertools.product(rows, cols):
    ...

Once I had this working, I wondered why product() solved the iterator/iterable problem. The docs have a sample Python implementation that shows why: internally, product() is doing just what my list() call did: it makes an explicit iterable from each of the iterables it was passed, then picks values from them to make the pairs. This lets product() accept iterators (like my reversed range) rather than forcing the caller to always pass iterables.

If your head is spinning from all this iterable / iterator / iteration talk, I don’t blame you. Just now I said, “it makes an explicit iterable from each of the iterables it was passed.” How does that make sense? Well, an iterator is an iterable. So product() can take either a reusable iterable (like a range or a list) or it can take a use-once iterator (like a reversed range). Either way, it populates its own reusable iterables internally.

Python’s iteration features are powerful but sometimes require careful thinking to get right. Don’t overlook the tools in itertools, and mind your iterators and iterables!

• • •

Some more notes:

1: Another way to reverse a range: you can slice them!

>>> range(4)
range(0, 4)
>>> range(4)[::-1]
range(3, -1, -1)
>>> reversed(range(4))
<range_iterator object at 0x10307cba0>

It didn’t occur to me to reverse-slice the range, since reversed is right there, but the slice gives you a new reusable range object while reversing the range gives you a use-once iterator.

2: Why did product() explicitly store the values it would need but reversed did not? Two reasons: first, reversed() depends on the __reversed__ dunder method, so it’s up to the original object to decide how to implement it. Ranges know how to produce their values in backward order, so they don’t need to store them all. Second, product() is going to need to use the values from each iterable many times and can’t depend on the iterables being reusable.

Math factoid of the day: 63

Monday 16 June 2025

Two geometric facts about 63, but how to connect them?

63 is a centered octahedral number. That means if you build an approximation of an octahedron with cubes, one size of octahedron will have 63 cubes.

In the late 1700’s René Just Haüy developed a theory about how crystals formed: successive layers of fundamental primitives in orderly arrangements. One of those arrangements was stacking cubes together to make an octahedron.

Start with one cube:

Add six more cubes around it, one on each face. Now we have seven:

Add another layer, adding a cube to touch each visible cube, making 25:

25 cubes arranged like an octahedron five cubes wide

One more layer and we have a total of 63:

63 cubes arranged like an octahedron seven cubes wide

The remaining numbers in the sequence less than 10,000 are 129, 231, 377, 575, 833, 1159, 1561, 2047, 2625, 3303, 4089, 4991, 6017, 7175, 8473, 9919.

63 also shows up in the Delannoy numbers: the number of ways to traverse a grid from the lower left corner to upper right using only steps north, east, or northeast. Here are the 63 ways of moving on a 3×3 grid:

63 different ways to traverse a 3x3 grid

(Diagram from Wikipedia)

In fact, the number of cubes in a Haüy octahedron with N layers is the same as the number of Delannoy steps on a 3×N grid!

Since the two ideas are both geometric and fairly simple, I would love to find a geometric explanation for the correspondence. The octahedron is three-dimensional, and the Delannoy grids have that tantalizing 3 in them. It seems like there should be a way to convert Haüy coordinates to Delannoy coordinates to show how they relate. But I haven’t found one...

• • •

Colophon: I made the octahedron diagrams by asking Claude to write a Python program to do it. It wasn’t a fast process because it took pushing and prodding to get the diagrams to come out the way I liked. But Claude was very competent, and I could think about the results rather than about projections or color spaces. I could dip into it for 10 minutes at a time over a number of days without having to somehow reconstruct a mental context.

This kind of casual hobby programming is perfect for AI assistance. I don’t need the code to be perfect or even good, I just want the diagrams to be nice. I don’t have the focus time to learn how to write the program, so I can leave it to an imperfect assistant.

Digital Equipment Corporation no more

Monday 9 June 2025

Tech giants come and go

Today is the 39-year anniversary of my first day working for Digital Equipment Corporation. It was my first real job in the tech world, two years out of college. I wrote about it 19 years ago, but it’s on my mind again.

More and more, I find that people have never heard of Digital (as we called it) or DEC (as they preferred we didn’t call it but everyone did). It’s something I’ve had to get used to. I try to relate a story from that time, and I find that even experienced engineers with deep knowledge of technologies don’t know of the company.

I mention this not in a crabby “kids these days” kind of way. It does surprise me, but I’m taking it as a learning opportunity. If there’s a lesson to learn, it is this:

This too shall pass.

I am now working for Netflix, and one of the great things about it is that everyone has heard of Netflix. I can mention my job to anyone and they are impressed in some way. Techies know it as one of the FAANG companies, and “civilians” know it for the entertainment it produces and delivers.

When I joined Digital in 1986, at least among tech people, it was similar. Everyone knew about Digital and what they had done: the creation of the minicomputer, the genesis of Unix and C, the ubiquitous VT100. Many foundations of the software world flowed directly and famously from Digital.

These days Digital isn’t quite yet a footnote to history, but it is more and more unknown even among the most tech-involved. And the tech world carries on!

My small team at Netflix has a number of young engineers, less than two years out of college, and even an intern still in college. I’m sure they felt incredibly excited to join a company as well-known and influential as Netflix. In 39 years when they tell a story from the early days of their career will they start with, “Have you heard of Netflix?” and have to adjust to the blank stares they get in return?

This too shall pass.

Older:

May 15:

PyCon summer camp

May 4:

Filtering GitHub actions by changed files

Apr 18:

Regex affordances

Apr 6:

Find the bear

Apr 3:

Nedflix

Mar 29:

Human sorting improved