Generating data shapes with Hypothesis

Sunday 21 December 2025

I used Hypothesis to generate random data structure schemas, and then generate random data using them. I learned a lot along the way.

In my last blog post (A testing conundrum), I described trying to test my Hasher class which hashes nested data. I couldn’t get Hypothesis to generate usable data for my test. I wanted to assert that two equal data items would hash equally, but Hypothesis was finding pairs like [0] and [False]. These are equal but hash differently because the hash takes the types into account.

In the blog post I said,

If I had a schema for the data I would be comparing, I could use it to steer Hypothesis to generate realistic data. But I don’t have that schema...

I don’t want a fixed schema for the data Hasher would accept, but tests to compare data generated from the same schema. It shouldn’t compare a list of ints to a list of bools. Hypothesis is good at generating things randomly. Usually it generates data randomly, but we can also use it to generate schemas randomly!

Hypothesis basics

Before describing my solution, I’ll take a quick detour to describe how Hypothesis works.

Hypothesis calls their randomness machines “strategies”. Here is a strategy that will produce random integers between -99 and 1000:

import hypothesis.strategies as st
st.integers(min_value=-99, max_value=1000)

Strategies can be composed:

st.lists(st.integers(min_value=-99, max_value=1000), max_size=50)

This will produce lists of integers from -99 to 1000. The lists will have up to 50 elements.

Strategies are used in tests with the @given decorator, which takes a strategy and runs the test a number of times with different example data drawn from the strategy. In your test you check a desired property that holds true for any data the strategy can produce.

To demonstrate, here’s a test of sum() that checks that summing a list of numbers in two halves gives the same answer as summing the whole list:

from hypothesis import given, strategies as st

@given(st.lists(st.integers(min_value=-99, max_value=1000), max_size=50))
def test_sum(nums):
    # We don't have to test sum(), this is just an example!
    mid = len(nums) // 2
    assert sum(nums) == sum(nums[:mid]) + sum(nums[mid:])

By default, Hypothesis will run the test 100 times, each with a different randomly generated list of numbers.

Schema strategies

The solution to my data comparison problem is to have Hypothesis generate a random schema in the form of a strategy, then use that strategy to generate two examples. Doing this repeatedly will get us pairs of data that have the same “shape” that will work well for our tests.

This is kind of twisty, so let’s look at it in pieces. We start with a list of strategies that produce primitive values:

primitives = [
    st.none(),
    st.booleans(),
    st.integers(min_value=-1000, max_value=10_000_000),
    st.floats(min_value=-100, max_value=100),
    st.text(max_size=10),
    st.binary(max_size=10),
]

Then a list of strategies that produce hashable values, which are all the primitives, plus tuples of any of the primitives:

def tuples_of(elements):
    """Make a strategy for tuples of some other strategy."""
    return st.lists(elements, max_size=3).map(tuple)

# List of strategies that produce hashable data.
hashables = primitives + [tuples_of(s) for s in primitives]

We want to be able to make nested dictionaries with leaves of some other type. This function takes a leaf-making strategy and produces a strategy to make those dictionaries:

def nested_dicts_of(leaves):
    """Make a strategy for recursive dicts with leaves from another strategy."""
    return st.recursive(
        leaves,
        lambda children: st.dictionaries(st.text(max_size=10), children, max_size=3),
        max_leaves=10,
    )

Finally, here’s our strategy that makes schema strategies:

nested_data_schemas = st.recursive(
    st.sampled_from(primitives),
    lambda children: st.one_of(
        children.map(lambda s: st.lists(s, max_size=5)),
        children.map(tuples_of),
        st.sampled_from(hashables).map(lambda s: st.sets(s, max_size=10)),
        children.map(nested_dicts_of),
    ),
    max_leaves=3,
)

For debugging, it’s helpful to generate an example strategy from this strategy, and then an example from that, many times:

for _ in range(50):
    print(repr(nested_data_schemas.example().example()))

Hypothesis is good at making data we’d never think to try ourselves. Here is some of what it made:

[None, None, None, None, None]
{}
[{False}, {False, True}, {False, True}, {False, True}]
{(1.9, 80.64553337755876), (-41.30770818038395, 9.42967906108538, -58.835811641800085), (31.102786990742203,), (28.2724197133397, 6.103515625e-05, -84.35107066147154), (7.436329211943294e-263,), (-17.335739410320514, 1.5029061311609365e-292, -8.17077562035881), (-8.029363284353857e-169, 49.45840191722425, -15.301768150196054), (5.960464477539063e-08, 1.1518373121077722e-213), (), (-0.3262457914511714,)}
[b'+nY2~\xaf\x8d*\xbb\xbf', b'\xe4\xb5\xae\xa2\x1a', b'\xb6\xab\xafEi\xc3C\xab"\xe1', b'\xf0\x07\xdf\xf5\x99', b'2\x06\xd4\xee-\xca\xee\x9f\xe4W']
{'fV': [81.37177374286324, 3.082323424992609e-212, 3.089885728465406e-151, -9.51475773638932e-86, -17.061851038597922], 'J»\x0c\x86肭|\x88\x03\x8aU': [29.549966208819654]}
[{}, -68.48316192397687]
None
['\x85\U0004bf04°', 'pB\x07iQT', 'TRUE', '\x1a5ùZâ\U00048752\U0005fdf8ê', '\U000fe0b9m*¤\U000b9f1e']
(14.232866652585258, -31.193835515904652, 62.29850355163285)
{'': {'': None, \U000be8de§\nÈ\U00093608u': None, 'Y\U000709e4¥ùU)GE\U000dddc5¬': None}}
[{(), (b'\xe7', b'')}, {(), (b'l\xc6\x80\xdf\x16\x91', b'', b'\x10,')}, {(b'\xbb\xfb\x1c\xf6\xcd\xff\x93\xe0\xec\xed',), (b'g',), (b'\x8e9I\xcdgs\xaf\xd1\xec\xf7', b'\x94\xe6#', b'?\xc9\xa0\x01~$k'), (b'r', b'\x8f\xba\xe6\xfe\x92n\xc7K\x98\xbb', b'\x92\xaa\xe8\xa6s'), (b'f\x98_\xb3\xd7', b'\xf4+\xf7\xbcU8RV', b'\xda\xb0'), (b'D',), (b'\xab\xe9\xf6\xe9', b'7Zr\xb7\x0bl\xb6\x92\xb8\xad', b'\x8f\xe4]\x8f'), (b'\xcf\xfb\xd4\xce\x12\xe2U\x94mt',), (b'\x9eV\x11', b'\xc5\x88\xde\x8d\xba?\xeb'), ()}, {(b'}', b'\xe9\xd6\x89\x8b')}, {(b'\xcb`', b'\xfd', b'w\x19@\xee'), ()}]
((), (), ())

Finally writing the test

Time to use all of this in a test:

@given(nested_data_schemas.flatmap(lambda s: st.tuples(s, s)))
def test_same_schema(data_pair):
    data1, data2 = data_pair
    h1, h2 = Hasher(), Hasher()
    h1.update(data1)
    h2.update(data2)
    if data1 == data2:
        assert h1.digest() == h2.digest()
    else:
        # Strictly speaking, unequal data could produce equal hashes,
        # but it's very unlikely, so test for it anyway.
        assert h1.digest() != h2.digest()

Here I use the .flatmap() method to draw an example from the nested_data_schemas strategy and call the provided lambda with the drawn example, which is itself a strategy. The lambda uses st.tuples to make tuples with two examples drawn from the strategy. So we get one data schema, and two examples from it as a tuple passed into the test as data_pair. The test then unpacks the data, hashes them, and makes the appropriate assertion.

This works great: the tests pass. To check that the test was working well, I made some breaking tweaks to the Hasher class. If Hypothesis is configured to generate enough examples, it finds data examples demonstrating the failures.

I’m pleased with the results. Hypothesis is something I’ve been wanting to use more, so I’m glad I took this chance to learn more about it and get it working for these tests. To be honest, this is way more than I needed to test my Hasher class. But once I got started, I wanted to get it right, and learning is always good.

I’m a bit concerned that the standard setting (100 examples) isn’t enough to find the planted bugs in Hasher. There are many parameters in my strategies that could be tweaked to keep Hypothesis from wandering too broadly, but I don’t know how to decide what to change.

Actually

The code in this post is different than the actual code I ended up with. Mostly this is because I was working on the code while I was writing this post, and discovered some problems that I wanted to fix. For example, the tuples_of function makes homogeneous tuples: varying lengths with elements all of the same type. This is not the usual use of tuples (see Lists vs. Tuples). Adapting for heterogeneous tuples added more complexity, which was interesting to learn, but I didn’t want to go back and add it here.

You can look at the final strategies.py to see that and other details, including type hints for everything, which was a journey of its own.

Postscript: AI assistance

I would not have been able to come up with all of this by myself. Hypothesis is very powerful, but requires a new way of thinking about things. It’s twisty to have functions returning strategies, and especially strategies producing strategies. The docs don’t have many examples, so it can be hard to get a foothold on the concepts.

Claude helped me by providing initial code, answering questions, debugging when things didn’t work out, and so on. If you are interested, this is one of the discussions I had with it.

A testing conundrum

Thursday 18 December 2025

A useful class that is hard to test thoroughly, and my failed attempt to use Hypothesis to do it.

Update: I found a solution which I describe in Generating data shapes with Hypothesis.

In coverage.py, I have a class for computing the fingerprint of a data structure. It’s used to avoid doing duplicate work when re-processing the same data won’t add to the outcome. It’s designed to work for nested data, and to canonicalize things like set ordering. The slightly simplified code looks like this:

class Hasher:
    """Hashes Python data for fingerprinting."""

    def __init__(self) -> None:
        self.hash = hashlib.new("sha3_256")

    def update(self, v: Any) -> None:
        """Add `v` to the hash, recursively if needed."""
        self.hash.update(str(type(v)).encode("utf-8"))
        match v:
            case None:
                pass
            case str():
                self.hash.update(v.encode("utf-8"))
            case bytes():
                self.hash.update(v)
            case int() | float():
                self.hash.update(str(v).encode("utf-8"))
            case tuple() | list():
                for e in v:
                    self.update(e)
            case dict():
                for k, kv in sorted(v.items()):
                    self.update(k)
                    self.update(kv)
            case set():
                self.update(sorted(v))
            case _:
                raise ValueError(f"Can't hash {v = }")
        self.hash.update(b".")

    def digest(self) -> bytes:
        """Get the full binary digest of the hash."""
        return self.hash.digest()

To test this, I had some basic tests like:

def test_string_hashing():
    # Same strings hash the same.
    # Different strings hash differently.
    h1 = Hasher()
    h1.update("Hello, world!")
    h2 = Hasher()
    h2.update("Goodbye!")
    h3 = Hasher()
    h3.update("Hello, world!")
    assert h1.digest() != h2.digest()
    assert h1.digest() == h3.digest()

def test_dict_hashing():
    # The order of keys doesn't affect the hash.
    h1 = Hasher()
    h1.update({"a": 17, "b": 23})
    h2 = Hasher()
    h2.update({"b": 23, "a": 17})
    assert h1.digest() == h2.digest()

The last line in the update() method adds a dot to the running hash. That was to solve a problem covered by this test:

def test_dict_collision():
    # Nesting matters.
    h1 = Hasher()
    h1.update({"a": 17, "b": {"c": 1, "d": 2}})
    h2 = Hasher()
    h2.update({"a": 17, "b": {"c": 1}, "d": 2})
    assert h1.digest() != h2.digest()

The most recent change to Hasher was to add the set() clause. There (and in dict()), we are sorting the elements to canonicalize them. The idea is that equal values should hash equally and unequal values should not. Sets and dicts are equal regardless of their iteration order, so we sort them to get the same hash.

I added a test of the set behavior:

def test_set_hashing():
    h1 = Hasher()
    h1.update({(1, 2), (3, 4), (5, 6)})
    h2 = Hasher()
    h2.update({(5, 6), (1, 2), (3, 4)})
    assert h1.digest() == h2.digest()
    h3 = Hasher()
    h3.update({(1, 2)})
    assert h1.digest() != h3.digest()

But I wondered if there was a better way to test this class. My small one-off tests weren’t addressing the full range of possibilities. I could read the code and feel confident, but wouldn’t a more comprehensive test be better? This is a pure function: inputs map to outputs with no side-effects or other interactions. It should be very testable.

This seemed like a good candidate for property-based testing. The Hypothesis library would let me generate data, and I could check that the desired properties of the hash held true.

It took me a while to get the Hypothesis strategies wired up correctly. I ended up with this, but there might be a simpler way:

from hypothesis import strategies as st

scalar_types = [
    st.none(),
    st.booleans(),
    st.integers(),
    st.floats(allow_infinity=False, allow_nan=False),
    st.text(),
    st.binary(),
]

scalars = st.one_of(*scalar_types)

def tuples_of(strat):
    return st.lists(strat, max_size=3).map(tuple)

hashable_types = scalar_types + [tuples_of(s) for s in scalar_types]

# Homogeneous sets: all elements same type.
homogeneous_sets = (
    st.sampled_from(hashable_types)
    .flatmap(lambda s: st.sets(s, max_size=5))
)

# Full nested Python data.
python_data = st.recursive(
    scalars,
    lambda children: (
        st.lists(children, max_size=5)
        | tuples_of(children)
        | homogeneous_sets
        | st.dictionaries(st.text(), children, max_size=5)
    ),
    max_leaves=10,
)

This doesn’t make completely arbitrary nested Python data: sets are forced to have elements all of the same type or I wouldn’t be able to sort them. Dictionaries only have strings for keys. But this works to generate data similar to the real data we hash. I wrote this simple test:

from hypothesis import given

@given(python_data)
def test_one(data):
    # Hashing the same thing twice.
    h1 = Hasher()
    h1.update(data)
    h2 = Hasher()
    h2.update(data)
    assert h1.digest() == h2.digest()

This didn’t find any failures, but this is the easy test: hashing the same thing twice produces equal hashes. The trickier test is to get two different data structures, and check that their equality matches their hash equality:

@given(python_data, python_data)
def test_two(data1, data2):
    h1 = Hasher()
    h1.update(data1)
    h2 = Hasher()
    h2.update(data2)

    if data1 == data2:
        assert h1.digest() == h2.digest()
    else:
        assert h1.digest() != h2.digest()

This immediately found problems, but not in my code:

> assert h1.digest() == h2.digest()
E AssertionError: assert b'\x80\x15\xc9\x05...' == b'\x9ap\xebD...'
E
E   At index 0 diff: b'\x80' != b'\x9a'
E
E   Full diff:
E   - (b'\x9ap\xebD...)'
E   + (b'\x80\x15\xc9\x05...)'
E Falsifying example: test_two(
E     data1=(False, False, False),
E     data2=(False, False, 0),
E )

Hypothesis found that (False, False, False) is equal to (False, False, 0), but they hash differently. This is correct. The Hasher class takes the types of the values into account in the hash. False and 0 are equal, but they are different types, so they hash differently. The same problem shows up for 0 == 0.0 and 0.0 == -0.0. The theory of my test was incorrect: some values that are equal should hash differently.

In my real code, this isn’t an issue. I won’t ever be comparing values like this to each other. If I had a schema for the data I would be comparing, I could use it to steer Hypothesis to generate realistic data. But I don’t have that schema, and I’m not sure I want to maintain that schema. This Hasher is useful as it is, and I’ve been able to reuse it in new ways without having to update a schema.

I could write a smarter equality check for use in the tests, but that would roughly approximate the code in Hasher itself. Duplicating product code in the tests is a good way to write tests that pass but don’t tell you anything useful.

I could exclude bools and floats from the test data, but those are actual values I need to handle correctly.

Hypothesis was useful in that it didn’t find any failures others than the ones I described. I can’t leave those tests in the automated test suite because I don’t want to manually examine the failures, but at least this gave me more confidence that the code is good as it is now.

Testing is a challenge unto itself. This brought it home to me again. It’s not easy to know precisely what you want code to do, and it’s not easy to capture that intent in tests. For now, I’m leaving just the simple tests. If anyone has ideas about how to test Hasher more thoroughly, I’m all ears.

Autism Adulthood, 3rd edition

Tuesday 18 November 2025

My wife’s book is out today, you should buy it.

Today is the publication of the third edition of Autism Adulthood: Insights and Creative Strategies for a Fulfilling Life. It’s my wife Susan’s book collecting stories and experiences from people all along the autism spectrum, from the self-diagnosed to the profound.

The cover of Autism Adulthood: a person raising their arms in celebration, silhouetted against a dawning sky

The book includes dozens of interviews with autistic adults, their parents, caregivers, researchers, and professionals. Everyone’s experience of autism is different. Reading others’ stories and perspectives can give us a glimpse into other possibilities for ourselves and our loved ones.

If you have someone in your life on the spectrum, or are on it yourself, I guarantee you will find new ways to understand the breadth of what autism means and what it can be.

Susan has also written two other non-fiction autism books, including a memoir of our early days with our son Nat. Of course I highly recommend all of them.

Why your mock breaks later

Sunday 16 November 2025

An overly aggressive mock can work fine, but then break much later. Why?

In Why your mock doesn’t work I explained this rule of mocking:

Mock where the object is used, not where it’s defined.

That blog post explained why that rule was important: often a mock doesn’t work at all if you do it wrong. But in some cases, the mock will work even if you don’t follow this rule, and then it can break much later. Why?

Let’s say you have code like this:

# user.py

def get_user_settings():
    with open(Path("~/settings.json").expanduser()) as f:
        return json.load(f)

def add_two_settings():
    settings = get_user_settings()
    return settings["opt1"] + settings["opt2"]

You write a simple test:

def test_add_two_settings():
    # NOTE: need to create ~/settings.json for this to work:
    #   {"opt1": 10, "opt2": 7}
    assert add_two_settings() == 17

As the comment in the test points out, the test will only pass if you create the correct settings.json file in your home directory. This is bad: you don’t want to require finicky environments for your tests to pass.

The thing we want to avoid is opening a real file, so it’s a natural impulse to mock out open():

# test_user.py

from io import StringIO
from unittest.mock import patch

@patch("builtins.open")
def test_add_two_settings(mock_open):
    mock_open.return_value = StringIO('{"opt1": 10, "opt2": 7}')
    assert add_two_settings() == 17

Nice, the test works without needing to create a file in our home directory!

Much later...

One day your test suite fails with an error like:

...
  File ".../site-packages/coverage/python.py", line 55, in get_python_source
    source_bytes = read_python_source(try_filename)
  File ".../site-packages/coverage/python.py", line 39, in read_python_source
    return source.replace(b"\r\n", b"\n").replace(b"\r", b"\n")
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
TypeErrorreplace() argument 1 must be str, not bytes

What happened!? Coverage.py code runs during your tests, invoked by the Python interpreter. The mock in the test changed the builtin open, so any use of it anywhere during the test is affected. In some cases, coverage.py needs to read your source code to record the execution properly. When that happens, coverage.py unknowingly uses the mocked open, and bad things happen.

When you use a mock, patch it where it’s used, not where it’s defined. In this case, the patch would be:

@patch("myproduct.user.open")
def test_add_two_settings(mock_open):
    ... etc ...

With a mock like this, the coverage.py code would be unaffected.

Keep in mind: it’s not just coverage.py that could trip over this mock. There could be other libraries used by your code, or you might use open yourself in another part of your product. Mocking the definition means anything using the object will be affected. Your intent is to only mock in one place, so target that place.

Postscript

I decided to add some code to coverage.py to defend against this kind of over-mocking. There is a lot of over-mocking out there, and this problem only shows up in coverage.py with Python 3.14. It’s not happening to many people yet, but it will happen more and more as people start testing with 3.14. I didn’t want to have to answer this question many times, and I didn’t want to force people to fix their mocks.

From a certain perspective, I shouldn’t have to do this. They are in the wrong, not me. But this will reduce the overall friction in the universe. And the fix was really simple:

open = open

This is a top-level statement in my module, so it runs when the module is imported, long before any tests are run. The assignment to open will create a global in my module, using the current value of open, the one found in the builtins. This saves the original open for use in my module later, isolated from how builtins might be changed later.

This is an ad-hoc fix: it only defends one builtin. Mocking other builtins could still break coverage.py. But open is a common one, and this will keep things working smoothly for those cases. And there’s precedent: I’ve already been using a more involved technique to defend against mocking of the os module for ten years.

Even better!

No blog post about mocking is complete without encouraging a number of other best practices, some of which could get you out of the mocking mess:

  • Use autospec=True to make your mocks strictly behave like the original object: see Why your mock still doesn’t work.
  • Make assertions about how your mock was called to be sure everything is connected up properly.
  • Use verified fakes instead of auto-generated mocks: Fast tests for slow services: why you should use verified fakes.
  • Separate your code so that computing functions like our add_two_settings don’t also do I/O. This makes the functions easier to test in the first place. Take a look at Function Core, Imperative Shell.
  • Dependency injection lets you explicitly pass test-specific objects where they are needed instead of relying on implicit access to a mock.

Three releases, one new organization

Sunday 9 November 2025

This weekend I made three releases of coverage.py. What happened?

It’s been a busy, bumpy week with coverage.py. Some things did not go smoothly, and I didn’t handle everything as well as I could have.

It started with trying to fix issue 2064 about conflicts between the “sysmon” measurement core and a concurrency setting.

To measure your code, coverage.py needs to know what code got executed. To know that, it collects execution events from the Python interpreter. CPython now has two mechanisms for this: trace functions and sys.monitoring. Coverage.py has two implementations of a trace function (in C and in Python), and an implementation of a sys.monitoring listener. These three components are the measurement cores, known as “ctrace”, “pytrace”, and “sysmon”.

The fastest is sysmon, but there are coverage.py features it doesn’t yet support. With Python 3.14, sysmon is the default core. Issue 2064 complained that when the defaulted core conflicted with an explicit concurrency choice, the conflict resulted in an error. I agreed with the issue: since the core was defaulted, it shouldn’t be an error, we should choose a different core.

But I figured if you explicitly asked for the sysmon core and also a conflicting setting, that should be an error because you’ve got two settings that can’t be used together.

Implementing all that got a little involved because of “metacov”: coverage.py coverage-measuring itself. The sys.monitoring facility in Python was added in 3.12, but wasn’t fully fleshed out enough to do branch coverage until 3.14. When we measure ourselves, we use branch coverage, so 3.12 and 3.13 needed some special handling to avoid causing the error that sysmon plus branch coverage would cause.

I got it all done, and released 7.11.1 on Friday.

Soon, issue 2077 arrived. Another fix in 7.11.1 involved some missing branches when using the sysmon core. That fix required parsing the source code during execution. But sometimes the “code” can’t be parsed: Jinja templates compile html files to Python and use the html file as the file name for the code. When coverage.py tries to parse the html file as Python, of course it fails. My fix didn’t account for this. I fixed that on Saturday and released 7.11.2.

In the meantime, issue 2076 and issue 2078 both pointed out that now some settings combinations that used to produce warnings now produced errors. This is a breaking change, they said, and should not have been released as a patch version.

To be honest, my first reaction was that it wasn’t that big a deal, the settings were in conflict. Fix the settings and all will be well. It’s hard to remember all of the possibilities when making changes like this, it’s easy to make mistakes, and semantic versioning is bound to have judgement calls anyway. I had already spent a while getting 7.11.1 done, and .2 followed just a day later. I was annoyed and didn’t want to have to re-think everything.

But the more I thought about it, I decided they were right: it does break pipelines that used to work. And falling back to a different core is fine: the cores differ in speed and compatibility but (for the most part) produce the same results. Changing the requested core with a warning is a fine way to deal with the settings conflict without stopping test suites from running.

So I just released 7.11.3 to go back to the older behavior. Maybe I won’t have to do another release tomorrow!

While all this was going on, I also moved the code from my personal GitHub account to a new coveragepy GitHub organization!

Coverage.py is basically a one-man show. Maybe the GitHub organization will make others feel more comfortable chiming in, but I doubt it. I’d like to have more people to talk through changes with. Maybe I wouldn’t have had to make three releases in three days if someone else had been around as a sounding board.

I’m in the #coverage-py channel if you want to talk about any aspect of coverage.py, or I can be reached in lots of other ways. I’d love to talk to you.

Side project advice

Thursday 30 October 2025

A chat about side projects from a Boston Python project night: choose your paths and forgive yourself.

Last night was a Boston Python project night where I had a good conversation with a few people that was mostly guided by questions from a nice guy named Mark.

How to write nice code in research

Mark works in research and made the classic observation that research code is often messy, and asked about how to make it nicer.

I pointed out that for software engineers, the code is the product. For research, the results are the product, so there’s a reason the code can be and often is messier. It’s important to keep the goal in mind. I mentioned it might not be worth it to add type annotations, detailed docstrings, or whatever else would make the code “nice”.

But the more you can make “nice” a habit, the less work it will be to do it as a matter of course. Even in a result-driven research environment, you’ll be able to write code the way you want, or at least push back a little bit. Code usually lives longer than people expect, so the nicer you can make it, the better it will be.

Side projects

Side projects are a good opportunity to work differently. If work means messy code, your side project could be pristine. If work is very strict, your side project can be thrown together just for fun. You get to set the goals.

And different side projects can be different. I develop coverage.py very differently than fun math art projects. Coverage.py has an extensive test suite run on many versions of Python (including nightly builds of the tip of main). The math art projects usually have no tests at all.

Side projects are a great place to decide how you want to code and to practice that style. Later you can bring those skills and learnings back to a work environment.

Forgive yourself

Mark said one of his difficulties with side projects is perfectionism. He’ll come back to a project and find he wants to rewrite the whole thing.

My advice is: forgive yourself. It’s OK to rewrite the whole thing. It’s OK to not rewrite the whole thing. It’s OK to ignore it for months at a time. It’s OK to stop in the middle of a project and never come back to it. It’s OK to obsess about “irrelevant” details.

The great thing about a side project is that you are the only person who decides what and how it should be.

How to stay motivated

But how to stay motivated on side projects? For me, it’s very motivating that many people use and get value from coverage.py. It’s a service to the community that I find rewarding. Other side projects will have other motivations: a chance to learn new things, flex different muscles, stretch myself in new ways.

Find a reason that motivates you, and structure your side projects to lean into that reason. Don’t forget to forgive yourself if it doesn’t work out the way you planned or if you change your mind.

How to write something people will use

Sure, it’s great to have a project that many people use, but how do you find a project that will end up like that? The best way is to write something that you find useful. Then talk about it with people. You never know what will catch on.

I mentioned my cog project, which I first wrote in 2004 for one reason, but which is now being used by other people (including me) for different purposes. It took years to catch on.

Of course there’s no guarantee something like that will happen: it most likely won’t. But I don’t know of a better way to make something people will use than to start by making something that you will use.

Other topics

The discussion wasn’t as linear as this. We touched on other things along the way: unit tests vs system tests, obligations to support old versions of software, how to navigate huge code bases. There were probably other tangents that I’ve forgotten.

Project nights are almost never just about projects: they are about connecting with people in lots of different ways. This discussion felt like a good connection. I hope the ideas of choosing your own paths and forgiving yourself hit home.

Older: