Human.json is a new
idea for asserting that a site is authored by a person, and for vouching for
other sites’ authorship. I’ve added one to this site.
It’s a fun idea, and I’ve joined in, but to be honest, I have some concerns.
When I made my human.json file, I looked through my browser history, and saw a
number of sites that were clearly personal sites that I liked. But if I list
one, am I claiming to know that there is no AI content on that site? I can’t
know that for sure.
I haven’t let this stop me from adding my own
/human.json, and I’ll be interested to see what
comes of it.
Human.json isn’t a new idea. There have been a number of attempts to add
structured data to web pages:
<meta name="author"> tags are a simple
way to claim authorship. I was surprised to see that this site didn’t have it,
so I added it.- JSON-LD is a way to embed structured
metadata into a page.
- FOAF (Friend of a Friend)
was an earlier attempt to model interpersonal relationships with structured
data, as was XFN.
- humans.txt is not structured, it’s a
.txt file. This makes it all the stranger: if it’s free-form text to be read by
people, why not an HTML page?
These ideas are all appealing in their ways, but I don’t think the messy
complicated real world will yield to our desire for structure and
categories.
To get a sense of the current state, I wrote a simple web
crawler to explore human.json, meta tags, and JSON-LD. Currently it finds
214 vouched sites and 60 people’s names across 40 human.json files. This is a
very small number, but the proposal is only two weeks old.
Like any hand-edited files, human.json files are strict: four of the
human.json files I found had errors. But beyond simple editing mistakes, people
use structured data incorrectly. As an example, Flickr
embeds JSON-LD that claims there’s a person named “Flickr”, right next to where
it says there’s a website named Flickr and an organization named Flickr.
Flickr’s goof about being a person isn’t such a big deal. But the goal of
human.json is to indicate human authorship. If I were using AI to generate web
content (ugh, “content”), I’d do whatever I could to mark it as human. The
vouching is meant to build a web of trust, but it will be easy for the network
to spring a leak and grant trust to sites that don’t deserve it.
Human.json wants to declare a binary categorization: your content is
AI-generated or it isn’t. What about a site with 100% human-written text and
also AI artwork? Should it be vouched for? The world doesn’t often provide us
with tidy yes/no distinctions.
There is already
discussion about
how to address some of these issues, but I think at heart structured data like
this is trying to sweep back the sea of complex human reality.
I don’t mean to be overly negative. I love these “small web” touches. I like
anything that gets people talking to each other. When I found errors in
human.json files, I sent emails to the authors, and got nice emails in return.
Connection!
I like that human.json is simple; I don’t like that it is simplistic. But we
can’t blame human.json for that, it’s a common pitfall in
all attempts to organize the messy world.
I’ve added my file. We’ll see where it goes.
Pytest’s parametrize is a great feature for writing tests without repeating
yourself needlessly. (If you haven’t seen it before, read
Starting with pytest’s parametrize first).
When the data gets complex, it can help to use functions to build the
data parameters.
I’ve been working on a project
involving multi-line data, and the parameterized test data was getting
awkward to create and maintain. I created helper functions to make it nicer.
The actual project is a bit gnarly, so I’ll use a simpler example to
demonstrate.
Here’s a function that takes a multi-line string and returns two numbers,
the lengths of the shortest and longest non-blank lines:
def non_blanks(text: str) -> tuple[int, int]:
"""Stats of non-blank lines: shortest and longest lengths."""
lengths = [len(ln) for ln in text.splitlines() if ln]
return min(lengths), max(lengths)
We can test it with a simple parameterized test with two test cases:
import pytest
from non_blanks import non_blanks
@pytest.mark.parametrize(
"text, short, long",
[
("abcde\na\nabc\n", 1, 5),
("""\
A long line
The next line is blank:
Short.
Much much longer line, more than anyone thought.
""", 6, 48),
]
)
def test_non_blanks(text, short, long):
assert non_blanks(text) == (short, long)
I really dislike how the multi-line string breaks the indentation flow, so I
wrap strings like that in textwrap.dedent:
@pytest.mark.parametrize(
"text, short, long",
[
("abcde\na\nabc\n", 1, 5),
(textwrap.dedent("""\
A long line
The next line is blank:
Short.
Much much longer line, more than anyone thought.
"""),
6, 48),
]
)
(For brevity, this and following examples only show the parametrize
decorator, the test function itself stays the same.)
This looks nicer, but I have to remember to use dedent, which adds a little
bit of visual clutter. I also need to remember that first backslash so that the
string won’t start with a newline.
As the test data gets more elaborate, I might not want to have it all inline
in the decorator. I’d like to have some of the large data in its own file:
@pytest.mark.parametrize(
"text, short, long",
[
("abcde\na\nabc\n", 1, 5),
(textwrap.dedent("""\
A long line
The next line is blank:
Short.
Much much longer line, more than anyone thought.
"""),
6, 48),
(Path("gettysburg.txt").read_text(), 18, 80),
]
)
Now things are getting complicated. Here’s where a function can help us.
Each test case needs a string and three numbers. The string is sometimes
provided explicitly, sometimes read from a file.
We can use a function to create the correct data for each case from its
most convenient form. We’ll take a string and use it as either a file name or
literal data. We’ll deal with the initial newline, and dedent the multi-line
strings:
def nb_case(text, short, long):
"""Create data for test_non_blanks."""
if "\n" in text:
# Multi-line string: it's actual data.
if text[0] == "\n": # Remove a first newline
text = text[1:]
text = textwrap.dedent(text)
else:
# One-line string: it's a file name.
text = Path(text).read_text()
return (text, short, long)
Now the test data is more direct:
@pytest.mark.parametrize(
"text, short, long",
[
nb_case("abcde\na\nabc\n", 1, 5),
nb_case("""
A long line
The next line is blank:
Short.
Much much longer line, more than anyone thought.
""",
6, 48),
nb_case("gettysburg.txt", 18, 80),
]
)
One nice thing about parameterized tests is that pytest creates a distinct ID
for each one. The helps with reporting failures and with selecting tests to run.
But the ID is made from the test data. Here, our last test case has an ID using
the entire Gettysburg Address, over 1500 characters. It was
very short for a speech, but it’s very long for an ID!
This is what the pytest output looks like with our current IDs:
test_non_blank.py::test_non_blanks[abcde\na\nabc\n-1-5] PASSED
test_non_blank.py::test_non_blanks[A long line\nThe next line is blank:\n\nShort.\nMuch much longer line, more than anyone thought.\n-6-48] PASSED
test_non_blank.py::test_non_blanks[Four score and seven years ago our fathers brought forth on this continent, a\nnew nation, conceived in Liberty, and dedicated to the proposition that all men\nare created equal.\n\nNow we are engaged in a great civil war, testing whether that nation, or any\nnation so conceived and so dedicated, can long endure. We are met on a great\nbattle-field of that war. We have come to dedicate a portion of that field, as a\nfinal resting place for those who here gave their lives that that nation might\nlive. It is altogether fitting and proper that we should do this.\n\nBut, in a larger sense, we can not dedicate \u2013 we can not consecrate we can not\nhallow \u2013 this ground. The brave men, living and dead, who struggled here, have\nconsecrated it far above our poor power to add or detract. The world will little\nnote, nor long remember what we say here, but it can never forget what they did\nhere. It is for us the living, rather, to be dedicated here to the unfinished\nwork which they who fought here have thus far so nobly advanced. It is rather\nfor us to be here dedicated to the great task remaining before us that from\nthese honored dead we take increased devotion to that cause for which they gave\nthe last full measure of devotion \u2013 that we here highly resolve that these dead\nshall not have died in vain that this nation, under God, shall have a new birth\nof freedom \u2013 and that government of the people, by the people, for the people,\nshall not perish from the earth.\n-18-80] PASSED
Even that first shortest test has an awkward and hard to use test name.
For more control over the test data, instead of creating tuples to use as
test cases, you can use pytest.param to create the
internal parameters object that pytest needs. Each of these can have an explicit
ID assigned. Pytest will still assign an ID if you don’t provide one.
Here’s an updated nb_case() function using pytest.param:
def nb_case(text, short, long, id=None):
if "\n" in text:
# Multi-line string: it's actual data.
if text[0] == "\n": # Remove a first newline
text = text[1:]
text = textwrap.dedent(text)
else:
# One-line string: it's a file name.
id = id or text
text = Path(text).read_text()
return pytest.param(text, short, long, id=id)
Now we can provide IDs for test cases. The ones reading from a file will use
the file name as the ID:
@pytest.mark.parametrize(
"text, short, long",
[
nb_case("abcde\na\nabc\n", 1, 5, id="little"),
nb_case("""
A long line
The next line is blank:
Short.
Much much longer line, more than anyone thought.
""",
6, 48, id="four"),
nb_case("gettysburg.txt", 18, 80),
]
)
Now our tests have useful IDs:
test_non_blank.py::test_non_blanks[little] PASSED
test_non_blank.py::test_non_blanks[four] PASSED
test_non_blank.py::test_non_blanks[gettysburg.txt] PASSED
The exact details of my case() function aren’t important here. Your
tests will need different helpers, and you might make different decisions about
what to do for these tests. But a function like this lets you write your complex
test cases in the way you like best to make your tests as concise, expressive
and readable as you want.
I have a new small project: edtext provides text
selection and manipulation functions inspired by the classic ed
text editor.
I’ve long used cog to build documentation and HTML
presentations. Cog interpolates text from elsewhere, like source code or
execution output. Often I don’t want the full source file or all of the lines of
output. I want to be able to choose the lines, and sometimes I need to tweak the
lines with a regex to get the results I want.
Long ago I wrote my own ad-hoc function to include a
file and over the years it had grown “organically”, to use a positive word. It
had become baroque and confusing. Worse, it still didn’t do all the things I
needed.
The old function has 16 arguments (!), nine of which are for selecting the
lines of text:
start=None,
end=None,
start_has=None,
end_has=None,
start_from=None,
end_at=None,
start_nth=1,
end_nth=1,
line_count=None,
Recently I started a new presentation, and when I couldn’t express what I
needed with these nine arguments, I thought of a better way: the
ed text editor has concise mechanisms for addressing lines
of text. Ed addressing evolved into vim and sed, and probably other things too,
so it might already be familiar to you.
I wrote edtext to replace my ad-hoc function that I
was copying from project to project. Edtext lets me select subsets of lines
using ed/sed/vim address ranges. Now if I have a source file like this with
section-marking comments:
import pytest
# section1
def six_divided(x):
return 6 / x
# Check the happy paths
@pytest.mark.parametrize(
"x, expected",
[ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
assert six_divided(x) == expected
# end
# section2
# etc....
then with an include_file helper that reads the file and gives me an
EdText object, I can select just section1 with:
include_file("test_six_divided.py")["/# section1/+;/# end/-"]
EdText allows slicing with a string containing an ed address range. Ed
addresses often (but don’t always) use regexes, and they have a similar powerful
compact feeling. “/# section1/” finds the next line containing that string, and
the “+” suffix adds one, so our range starts with the line after the section1
comment. The semicolon means to look for the end line starting from the start
line, then we find “# end”, and the “-” suffix means subtract one. So our range
ends with the line before the “# end” comment, giving us:
def six_divided(x):
return 6 / x
# Check the happy paths
@pytest.mark.parametrize(
"x, expected",
[ (4, 1.5), (3, 2.0), (2, 3.0), ]
)
def test_six_divided(x, expected):
assert six_divided(x) == expected
Most of ed addressing is implemented, and there’s a sub() method to
make regex replacements on selected lines. I can run pytest, put the output into
an EdText object, then use:
pytest_edtext["1", "/collected/,$-"].sub("g/====", r"0.0\ds", "0.01s")
This slice uses two address ranges. The first selects just the first line,
the pytest command itself. The second range gets the lines from “collected” to
the second-to-last line. Slicing gives me a new EdText object, then I use
.sub() to tweak the output: on any line containing “====”, change the
total time to “0.01s” so that slight variations in the duration of the test run
doesn’t cause needless changes in the output.
It was very satisfying to write edtext: it’s small in
scope, but useful. It has a full test suite. It might even be done!
Two testing-related things I found recently.
Unified exception testing
Kacper Borucki blogged about parameterizing exception
testing, and linked to pytest docs and a
StackOverflow answer with similar approaches.
The common way to test exceptions is to use
pytest.raises as a context manager, and have
separate tests for the cases that succeed and those that fail. Instead, this
approach lets you unify them.
I tweaked it to this, which I think reads nicely:
from contextlib import nullcontext as produces
import pytest
from pytest import raises
@pytest.mark.parametrize(
"example_input, result",
[
(3, produces(2)),
(2, produces(3)),
(1, produces(6)),
(0, raises(ZeroDivisionError)),
("Hello", raises(TypeError)),
],
)
def test_division(example_input, result):
with result as e:
assert (6 / example_input) == e
One parameterized test that covers both good and bad outcomes. Nice.
AntiLRU
The @functools.lru_cache decorator (and its
convenience cousin @cache) are good ways to save the result of a function
so that you don’t have to compute it repeatedly. But, they hide an implicit
global in your program: the dictionary of cached results.
This can interfere with testing. Your tests should all be isolated from each
other. You don’t want a side effect of one test to affect the outcome of another
test. The hidden global dictionary will do just that. The first test calls the
cached function, then the second test gets the cached value, not a newly
computed one.
Ideally, lru_cache would only be used on pure functions: the result only depends on the arguments.
If it’s only used for pure functions, then you don’t need to worry about interactions between tests
because the answer will be the same for the second test anyway.
But lru_cache is used on functions that pull information from the
environment, perhaps from a network API call. The tests might mock out the API
to check the behavior under different API circumstances. Here’s where the
interference is a real problem.
The lru_cache decorator makes a .clear_cache method available on each
decorated function. I had some code that explicitly called that method on the
cached functions. But then I added a new cached function, forgot to update the
conftest.py code that cleared the caches, and my tests were failing.
A more convenient approach is provided by
pytest-antilru: it’s a pytest plugin that monkeypatches
@lru_cache to track all of the cached functions, and clears them all
between tests. The caches are still in effect during each test, but can’t
interfere between them.
It works great. I was able to get rid of all of the manually maintained cache
clearing in my conftest.py.
This morning I shared a link to this site, and the recipient said, “it looks
like a file.” I thought they meant the page was all black and white with no
color. No, they were talking about the URL, which ended with “.html”.
This site started almost 24 years ago as
a static site: a pile of .html files created on my machine and uploaded to the
server. The URLs naturally had .html extensions. It was common in web sites of
the time.
Over the years, the technology has changed. In 2008, it was still a static
site on the host, but produced with
Django running locally. In 2021, it became a
real Django site on the host.
Through all these changes, the URLs remained the same—they still had
the old-fashioned .html extension. I was used to them, so it never struck me as
odd. But when it was pointed out today, it suddenly seemed obviously out of
date.
So now the site prefers URLs with no extension. The fashion in URLs changed
quite some time ago: for 2026, I’m going to party like it’s 2006!
The old URLs still work, but get a permanent redirect to the modern style.
If you notice anything amiss,
please let me know, as
always.
In my last blog post (A testing conundrum), I described
trying to test my Hasher class which hashes nested data. I couldn’t get
Hypothesis to generate usable data for my test. I wanted to assert that two
equal data items would hash equally, but Hypothesis was finding pairs like
[0] and [False]. These are equal but hash differently because the
hash takes the types into account.
In the blog post I said,
If I had a schema for the data I would be comparing, I could use it to
steer Hypothesis to generate realistic data. But I don’t have that
schema...
I don’t want a fixed schema for the data Hasher would accept, but tests to
compare data generated from the same schema. It shouldn’t compare a list of
ints to a list of bools. Hypothesis is good at generating things randomly.
Usually it generates data randomly, but we can also use it to generate schemas
randomly!
Hypothesis basics
Before describing my solution, I’ll take a quick detour to describe how
Hypothesis works.
Hypothesis calls their randomness machines “strategies”. Here is a strategy
that will produce random integers between -99 and 1000:
import hypothesis.strategies as st
st.integers(min_value=-99, max_value=1000)
Strategies can be composed:
st.lists(st.integers(min_value=-99, max_value=1000), max_size=50)
This will produce lists of integers from -99 to 1000. The lists will have up
to 50 elements.
Strategies are used in tests with the @given decorator, which takes a
strategy and runs the test a number of times with different example data drawn
from the strategy. In your test you check a desired property that holds true
for any data the strategy can produce.
To demonstrate, here’s a test of sum() that checks that summing a list of
numbers in two halves gives the same answer as summing the whole list:
from hypothesis import given, strategies as st
@given(st.lists(st.integers(min_value=-99, max_value=1000), max_size=50))
def test_sum(nums):
# We don't have to test sum(), this is just an example!
mid = len(nums) // 2
assert sum(nums) == sum(nums[:mid]) + sum(nums[mid:])
By default, Hypothesis will run the test 100 times, each with a different
randomly generated list of numbers.
Schema strategies
The solution to my data comparison problem is to have Hypothesis generate a
random schema in the form of a strategy, then use that strategy to generate two
examples. Doing this repeatedly will get us pairs of data that have the same
“shape” that will work well for our tests.
This is kind of twisty, so let’s look at it in pieces. We start with a list
of strategies that produce primitive values:
primitives = [
st.none(),
st.booleans(),
st.integers(min_value=-1000, max_value=10_000_000),
st.floats(min_value=-100, max_value=100),
st.text(max_size=10),
st.binary(max_size=10),
]
Then a list of strategies that produce hashable values, which are all the
primitives, plus tuples of any of the primitives:
def tuples_of(elements):
"""Make a strategy for tuples of some other strategy."""
return st.lists(elements, max_size=3).map(tuple)
# List of strategies that produce hashable data.
hashables = primitives + [tuples_of(s) for s in primitives]
We want to be able to make nested dictionaries with leaves of some other
type. This function takes a leaf-making strategy and produces a strategy to
make those dictionaries:
def nested_dicts_of(leaves):
"""Make a strategy for recursive dicts with leaves from another strategy."""
return st.recursive(
leaves,
lambda children: st.dictionaries(st.text(max_size=10), children, max_size=3),
max_leaves=10,
)
Finally, here’s our strategy that makes schema strategies:
nested_data_schemas = st.recursive(
st.sampled_from(primitives),
lambda children: st.one_of(
children.map(lambda s: st.lists(s, max_size=5)),
children.map(tuples_of),
st.sampled_from(hashables).map(lambda s: st.sets(s, max_size=10)),
children.map(nested_dicts_of),
),
max_leaves=3,
)
For debugging, it’s helpful to generate an example strategy from this
strategy, and then an example from that, many times:
for _ in range(50):
print(repr(nested_data_schemas.example().example()))
Hypothesis is good at making data we’d never think to try ourselves. Here is
some of what it made:
[None, None, None, None, None]
{}
[{False}, {False, True}, {False, True}, {False, True}]
{(1.9, 80.64553337755876), (-41.30770818038395, 9.42967906108538, -58.835811641800085), (31.102786990742203,), (28.2724197133397, 6.103515625e-05, -84.35107066147154), (7.436329211943294e-263,), (-17.335739410320514, 1.5029061311609365e-292, -8.17077562035881), (-8.029363284353857e-169, 49.45840191722425, -15.301768150196054), (5.960464477539063e-08, 1.1518373121077722e-213), (), (-0.3262457914511714,)}
[b'+nY2~\xaf\x8d*\xbb\xbf', b'\xe4\xb5\xae\xa2\x1a', b'\xb6\xab\xafEi\xc3C\xab"\xe1', b'\xf0\x07\xdf\xf5\x99', b'2\x06\xd4\xee-\xca\xee\x9f\xe4W']
{'fV': [81.37177374286324, 3.082323424992609e-212, 3.089885728465406e-151, -9.51475773638932e-86, -17.061851038597922], 'J»\x0c\x86肭|\x88\x03\x8aU': [29.549966208819654]}
[{}, -68.48316192397687]
None
['\x85\U0004bf04°', 'pB\x07iQT', 'TRUE', '\x1a5ùZâ\U00048752+¹\U0005fdf8ê', '\U000fe0b9m*¤\U000b9f1e']
(14.232866652585258, -31.193835515904652, 62.29850355163285)
{'': {'': None, 'Ã\U000be8de§\nÈ\U00093608u': None, 'Y\U000709e4¥ùU)GE\U000dddc5¬': None}}
[{(), (b'\xe7', b'')}, {(), (b'l\xc6\x80\xdf\x16\x91', b'', b'\x10,')}, {(b'\xbb\xfb\x1c\xf6\xcd\xff\x93\xe0\xec\xed',), (b'g',), (b'\x8e9I\xcdgs\xaf\xd1\xec\xf7', b'\x94\xe6#', b'?\xc9\xa0\x01~$k'), (b'r', b'\x8f\xba\xe6\xfe\x92n\xc7K\x98\xbb', b'\x92\xaa\xe8\xa6s'), (b'f\x98_\xb3\xd7', b'\xf4+\xf7\xbcU8RV', b'\xda\xb0'), (b'D',), (b'\xab\xe9\xf6\xe9', b'7Zr\xb7\x0bl\xb6\x92\xb8\xad', b'\x8f\xe4]\x8f'), (b'\xcf\xfb\xd4\xce\x12\xe2U\x94mt',), (b'\x9eV\x11', b'\xc5\x88\xde\x8d\xba?\xeb'), ()}, {(b'}', b'\xe9\xd6\x89\x8b')}, {(b'\xcb`', b'\xfd', b'w\x19@\xee'), ()}]
((), (), ())
Finally writing the test
Time to use all of this in a test:
@given(nested_data_schemas.flatmap(lambda s: st.tuples(s, s)))
def test_same_schema(data_pair):
data1, data2 = data_pair
h1, h2 = Hasher(), Hasher()
h1.update(data1)
h2.update(data2)
if data1 == data2:
assert h1.digest() == h2.digest()
else:
# Strictly speaking, unequal data could produce equal hashes,
# but it's very unlikely, so test for it anyway.
assert h1.digest() != h2.digest()
Here I use the .flatmap() method to draw an example from the
nested_data_schemas strategy and call the provided lambda with the drawn
example, which is itself a strategy. The lambda uses st.tuples to make
tuples with two examples drawn from the strategy. So we get one data schema, and
two examples from it as a tuple passed into the test as data_pair. The
test then unpacks the data, hashes them, and makes the appropriate
assertion.
This works great: the tests pass. To check that the test was working well, I
made some breaking tweaks to the Hasher class. If Hypothesis is configured to
generate enough examples, it finds data examples demonstrating the failures.
I’m pleased with the results. Hypothesis is something I’ve been wanting to
use more, so I’m glad I took this chance to learn more about it and get it
working for these tests. To be honest, this is way more than I needed to test
my Hasher class. But once I got started, I wanted to get it right, and learning
is always good.
I’m a bit concerned that the standard setting (100 examples) isn’t enough to
find the planted bugs in Hasher. There are many parameters in my strategies that
could be tweaked to keep Hypothesis from wandering too broadly, but I don’t know
how to decide what to change.
Actually
The code in this post is different than the actual code I ended up with.
Mostly this is because I was working on the code while I was writing this post,
and discovered some problems that I wanted to fix. For example, the
tuples_of function makes homogeneous tuples: varying lengths with
elements all of the same type. This is not the usual use of tuples (see
Lists vs. Tuples). Adapting for heterogeneous tuples added
more complexity, which was interesting to learn, but I didn’t want to go back
and add it here.
You can look at the final strategies.py to see
that and other details, including type hints for everything, which was a journey
of its own.
Postscript: AI assistance
I would not have been able to come up with all of this by myself. Hypothesis
is very powerful, but requires a new way of thinking about things. It’s twisty
to have functions returning strategies, and especially strategies producing
strategies. The docs don’t have many examples, so it can be hard to get a
foothold on the concepts.
Claude helped me by providing initial code, answering questions, debugging
when things didn’t work out, and so on. If you are interested,
this is
one of the discussions I had with it.
Older: