School is artificial

Saturday 25 April 2026

The real world is not like school.

One of the hard parts of moving from school to “the real world” is adjusting to all the ways that school is artificial. It’s different from the real world.

I’ve been thinking about this because of questions I see young learners commonly asking. Too often the questions are meaningless in the real world, and even if you could get answers, the answers would use useless.

How long does it take to learn Python? In school, learning is divided into discrete labelled chunks. A class called “Beginning Python” might last four months. Everyone in the class will be taught the same things at the same pace. The objectives are laid out by the teacher, and at the end you will get a grade.

Outside of school, learning happens as needed, at your own pace, guided by your own goals. Only you will know if you have learned enough, deeply enough, for what you want to do.

An answer will be useless to you anyway: will you feel bad if it’s taking you longer than them? Maybe they started from a different point than you did. Maybe they are learning different material, or to a finer degree of detail. Comparison is the thief of joy: learn what you need, the way you need to.

What does “learn Python” even mean? There’s no end to what might be included in a broad term like “Python”: there’s the language itself, the standard library, and the enormous ecosystem of third-party packages. Add to that the culture and conventions, and maybe even the community. Nobody knows all of it. It doesn’t stay still, Python keeps changing, growing, and expanding. You have to decide for yourself what’s important for you to learn. The point isn’t to finish it. Classes in school can be finished; topics in the real world cannot.

School gives you neatly labelled units with clear-cut criteria at the end. The real world doesn’t work that way.

No, but how long did it take you? It doesn’t matter. Everyone’s situation is different. In school, your classmates are very similar to you: you’ve been taking roughly the same classes with the same material all your life. Outside of school, everyone is much more different. My pace, my learning style, my needs are all different than yours. Comparing won’t help you learn.

How will I know when I am not a beginner? In school, classes have labels like beginner and advanced, or remedial and gifted. Outside of school, these labels are meaningless. Knowledge isn’t laid out conveniently in a straight line. For example: I’ve been using Python for 25 years, and know more about one particular dark corner of Python (sys.settrace) than almost anyone. At the same time, I know literally nothing about tkinter. Am I an expert or a beginner?

If someone could tell you whether you were a beginner or not, what would you do with the answer? In school, it tells you that you are ready to take the next course. But in the real world, no one needs the answer. What’s important is whether you understand the next concept, tool, or technique to make progress in whatever you’re building. Focus on your own goals and path, and keep moving forward. Labels are fake.

Why do I need to learn topic XYZ? School curricula don’t always match what you need to know. Your software engineering course may include theoretical math that computer scientists want to teach you, but that math may be very hard to use directly in the real world. Some of those concepts are good to know, some might be artifacts of mismatched goals.

Schools need to deliver their packaged pathways to many students. You only need to learn the things you need for your path. You won’t always know ahead of time what you’ll need. School can be a good way to learn things that many people like you learn. Once you are on your own, you get to (and have to) choose the topics yourself.

Is it still worthwhile to learn programming? Technology is moving very fast these days, especially because of the rise of AI in programming. Schools are much slower to adjust. A school’s course now may not match what employers want to see in four years. At today’s pace of change, it’s impossible to guess what employers will want to see in four years.

Learn how to learn, and stay flexible. Communication will always be key, so keep talking to people.

Choose a goal. Move toward it. Do what you need to do.

BTW, I see now that this post is very similar to a post from six years ago: How long did it take you to learn Python? I guess people are still asking, and I feel strongly about it!

Linklint

Sunday 12 April 2026

Linklint is a Sphinx extension to suppress excessive links in the Python documentation.

I wrote a Sphinx extension to eliminate excessive links: linklint. It started as a linter to check and modify .rst files, but it grew into a Sphinx extension that works without changing the source files.

It all started with a topic in the discussion forums: Should not underline links, which argued that the underlining was distracting from the text. Of course we did not remove underlines, they are important for accessibility and for seeing that there are links at all.

But I agreed that there were places in the docs that had too many links. In particular, there are two kinds of link that are excessive:

  • Links within a section to the same section. These arise naturally when describing a function (or class or module). Mentioning the function again in the description will link to the function. But we’re already reading about the function. The link is pointless and confusing.
  • A second (or third, etc) instance of the same link in a single paragraph. The first mention of a referent should be linked, but subsequent ones don’t need to be.

Linklint is a Sphinx extension that suppresses these two kinds of links during the build process. It examines the doctree (the abstract syntax tree of the documentation) and finds and modifies references matching our criteria for excessiveness. It’s running now in the CPython documentation, where it suppressed 3612 links. Nice.

I had another idea for a kind of link to suppress: “obvious” references. For example, I don’t think it’s useful to link every instance of “str” to the str() constructor. Is there anyone who needs that link because they don’t know what “str” means? And if they don’t know, is that the right place to take them?

There are three problems with that idea: first, not everyone agrees that “obvious” links should be suppressed at all. Second, even among those who do, people won’t agree on what is obvious. Sure, int and str. But what about list, dict, set? Third, there are some places where a link to str() needs to be kept, like “See str() for details.” Sphinx has a syntax for references to suppress the link, but there’s no syntax to force a link when linklint wants to suppress it.

So linklint doesn’t suppress obvious links. Maybe we can do it in the future once there’s been some more thought about it.

In the meantime, linklint is working to stop many excessive links. It was a small project that turned out much better than I expected when I started on it. A Sphinx extension is a really powerful way to adjust or enhance documentation without causing churn in the .rst source files. Sphinx itself can be complex and mysterious, but with a skilled code reading assistant, I was able to build this utility and improve the documentation.

Human.json

Sunday 22 March 2026

human.json is an interesting idea, but comes with the usual semantic web problems.

Human.json is a new idea for asserting that a site is authored by a person, and for vouching for other sites’ authorship. I’ve added one to this site.

It’s a fun idea, and I’ve joined in, but to be honest, I have some concerns. When I made my human.json file, I looked through my browser history, and saw a number of sites that were clearly personal sites that I liked. But if I list one, am I claiming to know that there is no AI content on that site? I can’t know that for sure.

I haven’t let this stop me from adding my own /human.json, and I’ll be interested to see what comes of it.

Human.json isn’t a new idea. There have been a number of attempts to add structured data to web pages:

  • <meta name="author"> tags are a simple way to claim authorship. I was surprised to see that this site didn’t have it, so I added it.
  • JSON-LD is a way to embed structured metadata into a page.
  • FOAF (Friend of a Friend) was an earlier attempt to model interpersonal relationships with structured data, as was XFN.
  • humans.txt is not structured, it’s a .txt file. This makes it all the stranger: if it’s free-form text to be read by people, why not an HTML page?

These ideas are all appealing in their ways, but I don’t think the messy complicated real world will yield to our desire for structure and categories.

To get a sense of the current state, I wrote a simple web crawler to explore human.json, meta tags, and JSON-LD. Currently it finds 214 vouched sites and 60 people’s names across 40 human.json files. This is a very small number, but the proposal is only two weeks old.

Like any hand-edited files, human.json files are strict: four of the human.json files I found had errors. But beyond simple editing mistakes, people use structured data incorrectly. As an example, Flickr embeds JSON-LD that claims there’s a person named “Flickr”, right next to where it says there’s a website named Flickr and an organization named Flickr.

Flickr’s goof about being a person isn’t such a big deal. But the goal of human.json is to indicate human authorship. If I were using AI to generate web content (ugh, “content”), I’d do whatever I could to mark it as human. The vouching is meant to build a web of trust, but it will be easy for the network to spring a leak and grant trust to sites that don’t deserve it.

Human.json wants to declare a binary categorization: your content is AI-generated or it isn’t. What about a site with 100% human-written text and also AI artwork? Should it be vouched for? The world doesn’t often provide us with tidy yes/no distinctions.

There is already discussion about how to address some of these issues, but I think at heart structured data like this is trying to sweep back the sea of complex human reality.

I don’t mean to be overly negative. I love these “small web” touches. I like anything that gets people talking to each other. When I found errors in human.json files, I sent emails to the authors, and got nice emails in return. Connection!

I like that human.json is simple; I don’t like that it is simplistic. But we can’t blame human.json for that, it’s a common pitfall in all attempts to organize the messy world.

I’ve added my file. We’ll see where it goes.

Pytest parameter functions

Friday 27 February 2026

Pytest’s parametrize can be made even more powerful with your own helper functions to build test cases.

Pytest’s parametrize is a great feature for writing tests without repeating yourself needlessly. (If you haven’t seen it before, read Starting with pytest’s parametrize first). When the data gets complex, it can help to use functions to build the data parameters.

I’ve been working on a project involving multi-line data, and the parameterized test data was getting awkward to create and maintain. I created helper functions to make it nicer. The actual project is a bit gnarly, so I’ll use a simpler example to demonstrate.

Here’s a function that takes a multi-line string and returns two numbers, the lengths of the shortest and longest non-blank lines:

def non_blanks(text: str) -> tuple[int, int]:
    """Stats of non-blank lines: shortest and longest lengths."""
    lengths = [len(ln) for ln in text.splitlines() if ln]
    return min(lengths), max(lengths)

We can test it with a simple parameterized test with two test cases:

import pytest
from non_blanks import non_blanks

@pytest.mark.parametrize(
    "text, short, long",
    [
        ("abcde\na\nabc\n", 1, 5),
        ("""\
A long line
The next line is blank:

Short.
Much much longer line, more than anyone thought.
""", 6, 48),
    ]
)
def test_non_blanks(text, short, long):
    assert non_blanks(text) == (short, long)

I really dislike how the multi-line string breaks the indentation flow, so I wrap strings like that in textwrap.dedent:

@pytest.mark.parametrize(
    "text, short, long",
    [
        ("abcde\na\nabc\n", 1, 5),
        (textwrap.dedent("""\
            A long line
            The next line is blank:

            Short.
            Much much longer line, more than anyone thought.
            """),
        6, 48),
    ]
)

(For brevity, this and following examples only show the parametrize decorator, the test function itself stays the same.)

This looks nicer, but I have to remember to use dedent, which adds a little bit of visual clutter. I also need to remember that first backslash so that the string won’t start with a newline.

As the test data gets more elaborate, I might not want to have it all inline in the decorator. I’d like to have some of the large data in its own file:

@pytest.mark.parametrize(
    "text, short, long",
    [
        ("abcde\na\nabc\n", 1, 5),
        (textwrap.dedent("""\
            A long line
            The next line is blank:

            Short.
            Much much longer line, more than anyone thought.
            """),
        6, 48),
        (Path("gettysburg.txt").read_text(), 18, 80),
    ]
)

Now things are getting complicated. Here’s where a function can help us. Each test case needs a string and three numbers. The string is sometimes provided explicitly, sometimes read from a file.

We can use a function to create the correct data for each case from its most convenient form. We’ll take a string and use it as either a file name or literal data. We’ll deal with the initial newline, and dedent the multi-line strings:

def nb_case(text, short, long):
    """Create data for test_non_blanks."""
    if "\n" in text:
        # Multi-line string: it's actual data.
        if text[0] == "\n":     # Remove a first newline
            text = text[1:]
        text = textwrap.dedent(text)
    else:
        # One-line string: it's a file name.
        text = Path(text).read_text()
    return (text, short, long)

Now the test data is more direct:

@pytest.mark.parametrize(
    "text, short, long",
    [
        nb_case("abcde\na\nabc\n", 1, 5),
        nb_case("""
            A long line
            The next line is blank:

            Short.
            Much much longer line, more than anyone thought.
            """,
            6, 48),
        nb_case("gettysburg.txt", 18, 80),
    ]
)

One nice thing about parameterized tests is that pytest creates a distinct ID for each one. The helps with reporting failures and with selecting tests to run. But the ID is made from the test data. Here, our last test case has an ID using the entire Gettysburg Address, over 1500 characters. It was very short for a speech, but it’s very long for an ID!

This is what the pytest output looks like with our current IDs:

test_non_blank.py::test_non_blanks[abcde\na\nabc\n-1-5] PASSED
test_non_blank.py::test_non_blanks[A long line\nThe next line is blank:\n\nShort.\nMuch much longer line, more than anyone thought.\n-6-48] PASSED
test_non_blank.py::test_non_blanks[Four score and seven years ago our fathers brought forth on this continent, a\nnew nation, conceived in Liberty, and dedicated to the proposition that all men\nare created equal.\n\nNow we are engaged in a great civil war, testing whether that nation, or any\nnation so conceived and so dedicated, can long endure. We are met on a great\nbattle-field of that war. We have come to dedicate a portion of that field, as a\nfinal resting place for those who here gave their lives that that nation might\nlive. It is altogether fitting and proper that we should do this.\n\nBut, in a larger sense, we can not dedicate \u2013 we can not consecrate we can not\nhallow \u2013 this ground. The brave men, living and dead, who struggled here, have\nconsecrated it far above our poor power to add or detract. The world will little\nnote, nor long remember what we say here, but it can never forget what they did\nhere. It is for us the living, rather, to be dedicated here to the unfinished\nwork which they who fought here have thus far so nobly advanced. It is rather\nfor us to be here dedicated to the great task remaining before us that from\nthese honored dead we take increased devotion to that cause for which they gave\nthe last full measure of devotion \u2013 that we here highly resolve that these dead\nshall not have died in vain that this nation, under God, shall have a new birth\nof freedom \u2013 and that government of the people, by the people, for the people,\nshall not perish from the earth.\n-18-80] PASSED

Even that first shortest test has an awkward and hard to use test name.

For more control over the test data, instead of creating tuples to use as test cases, you can use pytest.param to create the internal parameters object that pytest needs. Each of these can have an explicit ID assigned. Pytest will still assign an ID if you don’t provide one.

Here’s an updated nb_case() function using pytest.param:

def nb_case(text, short, long, id=None):
    if "\n" in text:
        # Multi-line string: it's actual data.
        if text[0] == "\n":     # Remove a first newline
            text = text[1:]
        text = textwrap.dedent(text)
    else:
        # One-line string: it's a file name.
        id = id or text
        text = Path(text).read_text()
    return pytest.param(text, short, long, id=id)

Now we can provide IDs for test cases. The ones reading from a file will use the file name as the ID:

@pytest.mark.parametrize(
    "text, short, long",
    [
        nb_case("abcde\na\nabc\n", 1, 5, id="little"),
        nb_case("""
            A long line
            The next line is blank:

            Short.
            Much much longer line, more than anyone thought.
            """,
            6, 48, id="four"),
        nb_case("gettysburg.txt", 18, 80),
    ]
)

Now our tests have useful IDs:

test_non_blank.py::test_non_blanks[little] PASSED
test_non_blank.py::test_non_blanks[four] PASSED
test_non_blank.py::test_non_blanks[gettysburg.txt] PASSED

The exact details of my case() function aren’t important here. Your tests will need different helpers, and you might make different decisions about what to do for these tests. But a function like this lets you write your complex test cases in the way you like best to make your tests as concise, expressive and readable as you want.

EdText

Monday 9 February 2026

edtext is a utility inspired by the ed editor for selecting and manipulating lines of text.

I have a new small project: edtext provides text selection and manipulation functions inspired by the classic ed text editor.

I’ve long used cog to build documentation and HTML presentations. Cog interpolates text from elsewhere, like source code or execution output. Often I don’t want the full source file or all of the lines of output. I want to be able to choose the lines, and sometimes I need to tweak the lines with a regex to get the results I want.

Long ago I wrote my own ad-hoc function to include a file and over the years it had grown “organically”, to use a positive word. It had become baroque and confusing. Worse, it still didn’t do all the things I needed.

The old function has 16 arguments (!), nine of which are for selecting the lines of text:

start=None,
end=None,
start_has=None,
end_has=None,
start_from=None,
end_at=None,
start_nth=1,
end_nth=1,
line_count=None,

Recently I started a new presentation, and when I couldn’t express what I needed with these nine arguments, I thought of a better way: the ed text editor has concise mechanisms for addressing lines of text. Ed addressing evolved into vim and sed, and probably other things too, so it might already be familiar to you.

I wrote edtext to replace my ad-hoc function that I was copying from project to project. Edtext lets me select subsets of lines using ed/sed/vim address ranges. Now if I have a source file like this with section-marking comments:

import pytest

# section1
def six_divided(x):
    return 6 / x

# Check the happy paths

@pytest.mark.parametrize(
    "x, expected",
    [ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
    assert six_divided(x) == expected
# end

# section2
# etc....

then with an include_file helper that reads the file and gives me an EdText object, I can select just section1 with:

include_file("test_six_divided.py")["/# section1/+;/# end/-"]

EdText allows slicing with a string containing an ed address range. Ed addresses often (but don’t always) use regexes, and they have a similar powerful compact feeling. “/# section1/” finds the next line containing that string, and the “+” suffix adds one, so our range starts with the line after the section1 comment. The semicolon means to look for the end line starting from the start line, then we find “# end”, and the “-” suffix means subtract one. So our range ends with the line before the “# end” comment, giving us:

def six_divided(x):
    return 6 / x

# Check the happy paths

@pytest.mark.parametrize(
    "x, expected",
    [ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
    assert six_divided(x) == expected

Most of ed addressing is implemented, and there’s a sub() method to make regex replacements on selected lines. I can run pytest, put the output into an EdText object, then use:

pytest_edtext["1", "/collected/,$-"].sub("g/====", r"0.0\ds", "0.01s")

This slice uses two address ranges. The first selects just the first line, the pytest command itself. The second range gets the lines from “collected” to the second-to-last line. Slicing gives me a new EdText object, then I use .sub() to tweak the output: on any line containing “====”, change the total time to “0.01s” so that slight variations in the duration of the test run doesn’t cause needless changes in the output.

It was very satisfying to write edtext: it’s small in scope, but useful. It has a full test suite. It might even be done!

Testing: exceptions and caches

Sunday 25 January 2026

Nicer ways to test exceptions and to test cached function results.

Two testing-related things I found recently.

Unified exception testing

Kacper Borucki blogged about parameterizing exception testing, and linked to pytest docs and a StackOverflow answer with similar approaches.

The common way to test exceptions is to use pytest.raises as a context manager, and have separate tests for the cases that succeed and those that fail. Instead, this approach lets you unify them.

I tweaked it to this, which I think reads nicely:

from contextlib import nullcontext as produces

import pytest
from pytest import raises

@pytest.mark.parametrize(
    "example_input, result",
    [
        (3, produces(2)),
        (2, produces(3)),
        (1, produces(6)),
        (0, raises(ZeroDivisionError)),
        ("Hello", raises(TypeError)),
    ],
)
def test_division(example_input, result):
    with result as e:
        assert (6 / example_input) == e

One parameterized test that covers both good and bad outcomes. Nice.

AntiLRU

The @functools.lru_cache decorator (and its convenience cousin @cache) are good ways to save the result of a function so that you don’t have to compute it repeatedly. But, they hide an implicit global in your program: the dictionary of cached results.

This can interfere with testing. Your tests should all be isolated from each other. You don’t want a side effect of one test to affect the outcome of another test. The hidden global dictionary will do just that. The first test calls the cached function, then the second test gets the cached value, not a newly computed one.

Ideally, lru_cache would only be used on pure functions: the result only depends on the arguments. If it’s only used for pure functions, then you don’t need to worry about interactions between tests because the answer will be the same for the second test anyway.

But lru_cache is used on functions that pull information from the environment, perhaps from a network API call. The tests might mock out the API to check the behavior under different API circumstances. Here’s where the interference is a real problem.

The lru_cache decorator makes a .clear_cache method available on each decorated function. I had some code that explicitly called that method on the cached functions. But then I added a new cached function, forgot to update the conftest.py code that cleared the caches, and my tests were failing.

A more convenient approach is provided by pytest-antilru: it’s a pytest plugin that monkeypatches @lru_cache to track all of the cached functions, and clears them all between tests. The caches are still in effect during each test, but can’t interfere between them.

It works great. I was able to get rid of all of the manually maintained cache clearing in my conftest.py.

Older: