EdText

Monday 9 February 2026

edtext is a utility inspired by the ed editor for selecting and manipulating lines of text.

I have a new small project: edtext provides text selection and manipulation functions inspired by the classic ed text editor.

I’ve long used cog to build documentation and HTML presentations. Cog interpolates text from elsewhere, like source code or execution output. Often I don’t want the full source file or all of the lines of output. I want to be able to choose the lines, and sometimes I need to tweak the lines with a regex to get the results I want.

Long ago I wrote my own ad-hoc function to include a file and over the years it had grown “organically”, to use a positive word. It had become baroque and confusing. Worse, it still didn’t do all the things I needed.

The old function has 16 arguments (!), nine of which are for selecting the lines of text:

start=None,
end=None,
start_has=None,
end_has=None,
start_from=None,
end_at=None,
start_nth=1,
end_nth=1,
line_count=None,

Recently I started a new presentation, and when I couldn’t express what I needed with these nine arguments, I thought of a better way: the ed text editor has concise mechanisms for addressing lines of text. Ed addressing evolved into vim and sed, and probably other things too, so it might already be familiar to you.

I wrote edtext to replace my ad-hoc function that I was copying from project to project. Edtext lets me select subsets of lines using ed/sed/vim address ranges. Now if I have a source file like this with section-marking comments:

import pytest

# section1
def six_divided(x):
    return 6 / x

# Check the happy paths

@pytest.mark.parametrize(
    "x, expected",
    [ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
    assert six_divided(x) == expected
# end

# section2
# etc....

then with an include_file helper that reads the file and gives me an EdText object, I can select just section1 with:

include_file("test_six_divided.py")["/# section1/+;/# end/-"]

EdText allows slicing with a string containing an ed address range. Ed addresses often (but don’t always) use regexes, and they have a similar powerful compact feeling. “/# section1/” finds the next line containing that string, and the “+” suffix adds one, so our range starts with the line after the section1 comment. The semicolon means to look for the end line starting from the start line, then we find “# end”, and the “-” suffix means subtract one. So our range ends with the line before the “# end” comment, giving us:

def six_divided(x):
    return 6 / x

# Check the happy paths

@pytest.mark.parametrize(
    "x, expected",
    [ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
    assert six_divided(x) == expected

Most of ed addressing is implemented, and there’s a sub() method to make regex replacements on selected lines. I can run pytest, put the output into an EdText object, then use:

pytest_edtext["1", "/collected/,$-"].sub("g/====", r"0.0\ds", "0.01s")

This slice uses two address ranges. The first selects just the first line, the pytest command itself. The second range gets the lines from “collected” to the second-to-last line. Slicing gives me a new EdText object, then I use .sub() to tweak the output: on any line containing “====”, change the total time to “0.01s” so that slight variations in the duration of the test run doesn’t cause needless changes in the output.

It was very satisfying to write edtext: it’s small in scope, but useful. It has a full test suite. It might even be done!

Testing: exceptions and caches

Sunday 25 January 2026

Nicer ways to test exceptions and to test cached function results.

Two testing-related things I found recently.

Unified exception testing

Kacper Borucki blogged about parameterizing exception testing, and linked to pytest docs and a StackOverflow answer with similar approaches.

The common way to test exceptions is to use pytest.raises as a context manager, and have separate tests for the cases that succeed and those that fail. Instead, this approach lets you unify them.

I tweaked it to this, which I think reads nicely:

from contextlib import nullcontext as produces

import pytest
from pytest import raises

@pytest.mark.parametrize(
    "example_input, result",
    [
        (3, produces(2)),
        (2, produces(3)),
        (1, produces(6)),
        (0, raises(ZeroDivisionError)),
        ("Hello", raises(TypeError)),
    ],
)
def test_division(example_input, result):
    with result as e:
        assert (6 / example_input) == e

One parameterized test that covers both good and bad outcomes. Nice.

AntiLRU

The @functools.lru_cache decorator (and its convenience cousin @cache) are good ways to save the result of a function so that you don’t have to compute it repeatedly. But, they hide an implicit global in your program: the dictionary of cached results.

This can interfere with testing. Your tests should all be isolated from each other. You don’t want a side effect of one test to affect the outcome of another test. The hidden global dictionary will do just that. The first test calls the cached function, then the second test gets the cached value, not a newly computed one.

Ideally, lru_cache would only be used on pure functions: the result only depends on the arguments. If it’s only used for pure functions, then you don’t need to worry about interactions between tests because the answer will be the same for the second test anyway.

But lru_cache is used on functions that pull information from the environment, perhaps from a network API call. The tests might mock out the API to check the behavior under different API circumstances. Here’s where the interference is a real problem.

The lru_cache decorator makes a .clear_cache method available on each decorated function. I had some code that explicitly called that method on the cached functions. But then I added a new cached function, forgot to update the conftest.py code that cleared the caches, and my tests were failing.

A more convenient approach is provided by pytest-antilru: it’s a pytest plugin that monkeypatches @lru_cache to track all of the cached functions, and clears them all between tests. The caches are still in effect during each test, but can’t interfere between them.

It works great. I was able to get rid of all of the manually maintained cache clearing in my conftest.py.

No more .html

Friday 2 January 2026

My site used to have URLs ending with .html. Not anymore.

This morning I shared a link to this site, and the recipient said, “it looks like a file.” I thought they meant the page was all black and white with no color. No, they were talking about the URL, which ended with “.html”.

This site started almost 24 years ago as a static site: a pile of .html files created on my machine and uploaded to the server. The URLs naturally had .html extensions. It was common in web sites of the time.

Over the years, the technology has changed. In 2008, it was still a static site on the host, but produced with Django running locally. In 2021, it became a real Django site on the host.

Through all these changes, the URLs remained the same—they still had the old-fashioned .html extension. I was used to them, so it never struck me as odd. But when it was pointed out today, it suddenly seemed obviously out of date.

So now the site prefers URLs with no extension. The fashion in URLs changed quite some time ago: for 2026, I’m going to party like it’s 2006!

The old URLs still work, but get a permanent redirect to the modern style. If you notice anything amiss, please let me know, as always.

Generating data shapes with Hypothesis

Sunday 21 December 2025

I used Hypothesis to generate random data structure schemas, and then generate random data using them. I learned a lot along the way.

In my last blog post (A testing conundrum), I described trying to test my Hasher class which hashes nested data. I couldn’t get Hypothesis to generate usable data for my test. I wanted to assert that two equal data items would hash equally, but Hypothesis was finding pairs like [0] and [False]. These are equal but hash differently because the hash takes the types into account.

In the blog post I said,

If I had a schema for the data I would be comparing, I could use it to steer Hypothesis to generate realistic data. But I don’t have that schema...

I don’t want a fixed schema for the data Hasher would accept, but tests to compare data generated from the same schema. It shouldn’t compare a list of ints to a list of bools. Hypothesis is good at generating things randomly. Usually it generates data randomly, but we can also use it to generate schemas randomly!

Hypothesis basics

Before describing my solution, I’ll take a quick detour to describe how Hypothesis works.

Hypothesis calls their randomness machines “strategies”. Here is a strategy that will produce random integers between -99 and 1000:

import hypothesis.strategies as st
st.integers(min_value=-99, max_value=1000)

Strategies can be composed:

st.lists(st.integers(min_value=-99, max_value=1000), max_size=50)

This will produce lists of integers from -99 to 1000. The lists will have up to 50 elements.

Strategies are used in tests with the @given decorator, which takes a strategy and runs the test a number of times with different example data drawn from the strategy. In your test you check a desired property that holds true for any data the strategy can produce.

To demonstrate, here’s a test of sum() that checks that summing a list of numbers in two halves gives the same answer as summing the whole list:

from hypothesis import given, strategies as st

@given(st.lists(st.integers(min_value=-99, max_value=1000), max_size=50))
def test_sum(nums):
    # We don't have to test sum(), this is just an example!
    mid = len(nums) // 2
    assert sum(nums) == sum(nums[:mid]) + sum(nums[mid:])

By default, Hypothesis will run the test 100 times, each with a different randomly generated list of numbers.

Schema strategies

The solution to my data comparison problem is to have Hypothesis generate a random schema in the form of a strategy, then use that strategy to generate two examples. Doing this repeatedly will get us pairs of data that have the same “shape” that will work well for our tests.

This is kind of twisty, so let’s look at it in pieces. We start with a list of strategies that produce primitive values:

primitives = [
    st.none(),
    st.booleans(),
    st.integers(min_value=-1000, max_value=10_000_000),
    st.floats(min_value=-100, max_value=100),
    st.text(max_size=10),
    st.binary(max_size=10),
]

Then a list of strategies that produce hashable values, which are all the primitives, plus tuples of any of the primitives:

def tuples_of(elements):
    """Make a strategy for tuples of some other strategy."""
    return st.lists(elements, max_size=3).map(tuple)

# List of strategies that produce hashable data.
hashables = primitives + [tuples_of(s) for s in primitives]

We want to be able to make nested dictionaries with leaves of some other type. This function takes a leaf-making strategy and produces a strategy to make those dictionaries:

def nested_dicts_of(leaves):
    """Make a strategy for recursive dicts with leaves from another strategy."""
    return st.recursive(
        leaves,
        lambda children: st.dictionaries(st.text(max_size=10), children, max_size=3),
        max_leaves=10,
    )

Finally, here’s our strategy that makes schema strategies:

nested_data_schemas = st.recursive(
    st.sampled_from(primitives),
    lambda children: st.one_of(
        children.map(lambda s: st.lists(s, max_size=5)),
        children.map(tuples_of),
        st.sampled_from(hashables).map(lambda s: st.sets(s, max_size=10)),
        children.map(nested_dicts_of),
    ),
    max_leaves=3,
)

For debugging, it’s helpful to generate an example strategy from this strategy, and then an example from that, many times:

for _ in range(50):
    print(repr(nested_data_schemas.example().example()))

Hypothesis is good at making data we’d never think to try ourselves. Here is some of what it made:

[None, None, None, None, None]
{}
[{False}, {False, True}, {False, True}, {False, True}]
{(1.9, 80.64553337755876), (-41.30770818038395, 9.42967906108538, -58.835811641800085), (31.102786990742203,), (28.2724197133397, 6.103515625e-05, -84.35107066147154), (7.436329211943294e-263,), (-17.335739410320514, 1.5029061311609365e-292, -8.17077562035881), (-8.029363284353857e-169, 49.45840191722425, -15.301768150196054), (5.960464477539063e-08, 1.1518373121077722e-213), (), (-0.3262457914511714,)}
[b'+nY2~\xaf\x8d*\xbb\xbf', b'\xe4\xb5\xae\xa2\x1a', b'\xb6\xab\xafEi\xc3C\xab"\xe1', b'\xf0\x07\xdf\xf5\x99', b'2\x06\xd4\xee-\xca\xee\x9f\xe4W']
{'fV': [81.37177374286324, 3.082323424992609e-212, 3.089885728465406e-151, -9.51475773638932e-86, -17.061851038597922], 'J»\x0c\x86肭|\x88\x03\x8aU': [29.549966208819654]}
[{}, -68.48316192397687]
None
['\x85\U0004bf04°', 'pB\x07iQT', 'TRUE', '\x1a5ùZâ\U00048752\U0005fdf8ê', '\U000fe0b9m*¤\U000b9f1e']
(14.232866652585258, -31.193835515904652, 62.29850355163285)
{'': {'': None, \U000be8de§\nÈ\U00093608u': None, 'Y\U000709e4¥ùU)GE\U000dddc5¬': None}}
[{(), (b'\xe7', b'')}, {(), (b'l\xc6\x80\xdf\x16\x91', b'', b'\x10,')}, {(b'\xbb\xfb\x1c\xf6\xcd\xff\x93\xe0\xec\xed',), (b'g',), (b'\x8e9I\xcdgs\xaf\xd1\xec\xf7', b'\x94\xe6#', b'?\xc9\xa0\x01~$k'), (b'r', b'\x8f\xba\xe6\xfe\x92n\xc7K\x98\xbb', b'\x92\xaa\xe8\xa6s'), (b'f\x98_\xb3\xd7', b'\xf4+\xf7\xbcU8RV', b'\xda\xb0'), (b'D',), (b'\xab\xe9\xf6\xe9', b'7Zr\xb7\x0bl\xb6\x92\xb8\xad', b'\x8f\xe4]\x8f'), (b'\xcf\xfb\xd4\xce\x12\xe2U\x94mt',), (b'\x9eV\x11', b'\xc5\x88\xde\x8d\xba?\xeb'), ()}, {(b'}', b'\xe9\xd6\x89\x8b')}, {(b'\xcb`', b'\xfd', b'w\x19@\xee'), ()}]
((), (), ())

Finally writing the test

Time to use all of this in a test:

@given(nested_data_schemas.flatmap(lambda s: st.tuples(s, s)))
def test_same_schema(data_pair):
    data1, data2 = data_pair
    h1, h2 = Hasher(), Hasher()
    h1.update(data1)
    h2.update(data2)
    if data1 == data2:
        assert h1.digest() == h2.digest()
    else:
        # Strictly speaking, unequal data could produce equal hashes,
        # but it's very unlikely, so test for it anyway.
        assert h1.digest() != h2.digest()

Here I use the .flatmap() method to draw an example from the nested_data_schemas strategy and call the provided lambda with the drawn example, which is itself a strategy. The lambda uses st.tuples to make tuples with two examples drawn from the strategy. So we get one data schema, and two examples from it as a tuple passed into the test as data_pair. The test then unpacks the data, hashes them, and makes the appropriate assertion.

This works great: the tests pass. To check that the test was working well, I made some breaking tweaks to the Hasher class. If Hypothesis is configured to generate enough examples, it finds data examples demonstrating the failures.

I’m pleased with the results. Hypothesis is something I’ve been wanting to use more, so I’m glad I took this chance to learn more about it and get it working for these tests. To be honest, this is way more than I needed to test my Hasher class. But once I got started, I wanted to get it right, and learning is always good.

I’m a bit concerned that the standard setting (100 examples) isn’t enough to find the planted bugs in Hasher. There are many parameters in my strategies that could be tweaked to keep Hypothesis from wandering too broadly, but I don’t know how to decide what to change.

Actually

The code in this post is different than the actual code I ended up with. Mostly this is because I was working on the code while I was writing this post, and discovered some problems that I wanted to fix. For example, the tuples_of function makes homogeneous tuples: varying lengths with elements all of the same type. This is not the usual use of tuples (see Lists vs. Tuples). Adapting for heterogeneous tuples added more complexity, which was interesting to learn, but I didn’t want to go back and add it here.

You can look at the final strategies.py to see that and other details, including type hints for everything, which was a journey of its own.

Postscript: AI assistance

I would not have been able to come up with all of this by myself. Hypothesis is very powerful, but requires a new way of thinking about things. It’s twisty to have functions returning strategies, and especially strategies producing strategies. The docs don’t have many examples, so it can be hard to get a foothold on the concepts.

Claude helped me by providing initial code, answering questions, debugging when things didn’t work out, and so on. If you are interested, this is one of the discussions I had with it.

A testing conundrum

Thursday 18 December 2025

A useful class that is hard to test thoroughly, and my failed attempt to use Hypothesis to do it.

Update: I found a solution which I describe in Generating data shapes with Hypothesis.

In coverage.py, I have a class for computing the fingerprint of a data structure. It’s used to avoid doing duplicate work when re-processing the same data won’t add to the outcome. It’s designed to work for nested data, and to canonicalize things like set ordering. The slightly simplified code looks like this:

class Hasher:
    """Hashes Python data for fingerprinting."""

    def __init__(self) -> None:
        self.hash = hashlib.new("sha3_256")

    def update(self, v: Any) -> None:
        """Add `v` to the hash, recursively if needed."""
        self.hash.update(str(type(v)).encode("utf-8"))
        match v:
            case None:
                pass
            case str():
                self.hash.update(v.encode("utf-8"))
            case bytes():
                self.hash.update(v)
            case int() | float():
                self.hash.update(str(v).encode("utf-8"))
            case tuple() | list():
                for e in v:
                    self.update(e)
            case dict():
                for k, kv in sorted(v.items()):
                    self.update(k)
                    self.update(kv)
            case set():
                self.update(sorted(v))
            case _:
                raise ValueError(f"Can't hash {v = }")
        self.hash.update(b".")

    def digest(self) -> bytes:
        """Get the full binary digest of the hash."""
        return self.hash.digest()

To test this, I had some basic tests like:

def test_string_hashing():
    # Same strings hash the same.
    # Different strings hash differently.
    h1 = Hasher()
    h1.update("Hello, world!")
    h2 = Hasher()
    h2.update("Goodbye!")
    h3 = Hasher()
    h3.update("Hello, world!")
    assert h1.digest() != h2.digest()
    assert h1.digest() == h3.digest()

def test_dict_hashing():
    # The order of keys doesn't affect the hash.
    h1 = Hasher()
    h1.update({"a": 17, "b": 23})
    h2 = Hasher()
    h2.update({"b": 23, "a": 17})
    assert h1.digest() == h2.digest()

The last line in the update() method adds a dot to the running hash. That was to solve a problem covered by this test:

def test_dict_collision():
    # Nesting matters.
    h1 = Hasher()
    h1.update({"a": 17, "b": {"c": 1, "d": 2}})
    h2 = Hasher()
    h2.update({"a": 17, "b": {"c": 1}, "d": 2})
    assert h1.digest() != h2.digest()

The most recent change to Hasher was to add the set() clause. There (and in dict()), we are sorting the elements to canonicalize them. The idea is that equal values should hash equally and unequal values should not. Sets and dicts are equal regardless of their iteration order, so we sort them to get the same hash.

I added a test of the set behavior:

def test_set_hashing():
    h1 = Hasher()
    h1.update({(1, 2), (3, 4), (5, 6)})
    h2 = Hasher()
    h2.update({(5, 6), (1, 2), (3, 4)})
    assert h1.digest() == h2.digest()
    h3 = Hasher()
    h3.update({(1, 2)})
    assert h1.digest() != h3.digest()

But I wondered if there was a better way to test this class. My small one-off tests weren’t addressing the full range of possibilities. I could read the code and feel confident, but wouldn’t a more comprehensive test be better? This is a pure function: inputs map to outputs with no side-effects or other interactions. It should be very testable.

This seemed like a good candidate for property-based testing. The Hypothesis library would let me generate data, and I could check that the desired properties of the hash held true.

It took me a while to get the Hypothesis strategies wired up correctly. I ended up with this, but there might be a simpler way:

from hypothesis import strategies as st

scalar_types = [
    st.none(),
    st.booleans(),
    st.integers(),
    st.floats(allow_infinity=False, allow_nan=False),
    st.text(),
    st.binary(),
]

scalars = st.one_of(*scalar_types)

def tuples_of(strat):
    return st.lists(strat, max_size=3).map(tuple)

hashable_types = scalar_types + [tuples_of(s) for s in scalar_types]

# Homogeneous sets: all elements same type.
homogeneous_sets = (
    st.sampled_from(hashable_types)
    .flatmap(lambda s: st.sets(s, max_size=5))
)

# Full nested Python data.
python_data = st.recursive(
    scalars,
    lambda children: (
        st.lists(children, max_size=5)
        | tuples_of(children)
        | homogeneous_sets
        | st.dictionaries(st.text(), children, max_size=5)
    ),
    max_leaves=10,
)

This doesn’t make completely arbitrary nested Python data: sets are forced to have elements all of the same type or I wouldn’t be able to sort them. Dictionaries only have strings for keys. But this works to generate data similar to the real data we hash. I wrote this simple test:

from hypothesis import given

@given(python_data)
def test_one(data):
    # Hashing the same thing twice.
    h1 = Hasher()
    h1.update(data)
    h2 = Hasher()
    h2.update(data)
    assert h1.digest() == h2.digest()

This didn’t find any failures, but this is the easy test: hashing the same thing twice produces equal hashes. The trickier test is to get two different data structures, and check that their equality matches their hash equality:

@given(python_data, python_data)
def test_two(data1, data2):
    h1 = Hasher()
    h1.update(data1)
    h2 = Hasher()
    h2.update(data2)

    if data1 == data2:
        assert h1.digest() == h2.digest()
    else:
        assert h1.digest() != h2.digest()

This immediately found problems, but not in my code:

> assert h1.digest() == h2.digest()
E AssertionError: assert b'\x80\x15\xc9\x05...' == b'\x9ap\xebD...'
E
E   At index 0 diff: b'\x80' != b'\x9a'
E
E   Full diff:
E   - (b'\x9ap\xebD...)'
E   + (b'\x80\x15\xc9\x05...)'
E Falsifying example: test_two(
E     data1=(False, False, False),
E     data2=(False, False, 0),
E )

Hypothesis found that (False, False, False) is equal to (False, False, 0), but they hash differently. This is correct. The Hasher class takes the types of the values into account in the hash. False and 0 are equal, but they are different types, so they hash differently. The same problem shows up for 0 == 0.0 and 0.0 == -0.0. The theory of my test was incorrect: some values that are equal should hash differently.

In my real code, this isn’t an issue. I won’t ever be comparing values like this to each other. If I had a schema for the data I would be comparing, I could use it to steer Hypothesis to generate realistic data. But I don’t have that schema, and I’m not sure I want to maintain that schema. This Hasher is useful as it is, and I’ve been able to reuse it in new ways without having to update a schema.

I could write a smarter equality check for use in the tests, but that would roughly approximate the code in Hasher itself. Duplicating product code in the tests is a good way to write tests that pass but don’t tell you anything useful.

I could exclude bools and floats from the test data, but those are actual values I need to handle correctly.

Hypothesis was useful in that it didn’t find any failures others than the ones I described. I can’t leave those tests in the automated test suite because I don’t want to manually examine the failures, but at least this gave me more confidence that the code is good as it is now.

Testing is a challenge unto itself. This brought it home to me again. It’s not easy to know precisely what you want code to do, and it’s not easy to capture that intent in tests. For now, I’m leaving just the simple tests. If anyone has ideas about how to test Hasher more thoroughly, I’m all ears.

Autism Adulthood, 3rd edition

Tuesday 18 November 2025

My wife’s book is out today, you should buy it.

Today is the publication of the third edition of Autism Adulthood: Insights and Creative Strategies for a Fulfilling Life. It’s my wife Susan’s book collecting stories and experiences from people all along the autism spectrum, from the self-diagnosed to the profound.

The cover of Autism Adulthood: a person raising their arms in celebration, silhouetted against a dawning sky

The book includes dozens of interviews with autistic adults, their parents, caregivers, researchers, and professionals. Everyone’s experience of autism is different. Reading others’ stories and perspectives can give us a glimpse into other possibilities for ourselves and our loved ones.

If you have someone in your life on the spectrum, or are on it yourself, I guarantee you will find new ways to understand the breadth of what autism means and what it can be.

Susan has also written two other non-fiction autism books, including a memoir of our early days with our son Nat. Of course I highly recommend all of them.

Older: