Hobby Hilbert Simplex

Friday 26 September 2025

An exploration and explanation of how to generate interesting swoopy art

I saw a generative art piece I liked and wanted to learn how it was made. Starting with the artist’s Kotlin code, I dug into three new algorithms, hacked together some Python code, experimented with alternatives, and learned a lot. Now I can explain it to you.

It all started with this post by aBe on Mastodon:

I love how these lines separate and reunite. And the fact that I can express this idea in 3 or 4 lines of code.

For me they’re lives represented by closed paths that end where they started, spending part of the journey together, separating while we go in different directions and maybe reconnecting again in the future.

#CreativeCoding #algorithmicart #proceduralArt #OPENRNDR #Kotlin

80 wobbly black hobby curves with low opacity. In some places the curves travel together, but sometimes they split in 2 or 3 groups and later reunite. Due to the low opacity, depending on how many curves overlap the result is brighter or darker.

The drawing is made by choosing 10 random points, drawing a curve through those points, then slightly scooching the points and drawing another curve. There are 40 curves, each slightly different than the last. Occasionally the next curve makes a jump, which is why they separate and reunite.

Eventually I made something similar:

Along the way I had to learn about three techniques I got from the Kotlin code: Hobby curves, Hilbert sorting, and simplex noise.

Each of these algorithms tries to do something “natural” automatically, so that we can generate art that looks nice without any manual steps.

Hobby curves

To draw swoopy curves through our random points, we use an algorithm developed by John Hobby as part of Donald Knuth’s Metafont type design system. Jake Low has a great interactive page for playing with Hobby curves, you should try it.

Here are three examples of Hobby curves through ten random points:

The curves are nice, but kind of a scribble, because we’re joining points together in the order we generated them (shown by the green lines). If you asked a person to connect random points, they wouldn’t jump back and forth across the canvas like this. They would find a nearby point to use next, producing a more natural tour of the set.

We’re generating everything automatically, so we can’t manually intervene to choose a natural order for the points. Instead we use Hilbert sorting.

Hilbert sorting

The Hilbert space-filling fractal visits every square in a 2D grid. Hilbert sorting uses a Hilbert fractal traversing the canvas, and sorts the points by when their square is visited by the fractal. This gives a tour of the points that corresponds more closely to what people expect. Points that are close together in space are likely (but not guaranteed) to be close in the ordering.

If we sort the points using Hilbert sorting, we get much nicer curves. Here are the same points as last time:

Here are pairs of the same points, unsorted and sorted side-by-side:

If you compare closely, the points in each pair are the same, but the sorted points are connected in a better order, producing nicer curves.

Simplex noise

Choosing random points would be easy to do with a random number generator, but we want the points to move in interesting graceful ways. To do that, we use simplex noise. This is a 2D function (let’s call the inputs u and v) that produces a value from -1 to 1. The important thing is the function is continuous: if you sample it at two (u,v) coordinates that are close together, the results will be close together. But it’s also random: the continuous curves you get are wavy in unpredictable ways. Think of the simplex noise function as a smooth hilly landscape.

To get an (x,y) point for our drawing, we choose a (u,v) coordinate to produce an x value and a completely different (u,v) coordinate for the y. To get the next (x,y) point, we keep the u values the same and change the v values by just a tiny bit. That makes the (x,y) points move smoothly but interestingly.

Here are the trails of four points taking 50 steps using this scheme:

If we use seven points taking five steps, and draw curves through the seven points at each step, we get examples like this:

I’ve left the points visible, and given them large steps so the lines are very widely spaced to show the motion. Taking out the points and drawing more lines with smaller steps gives us this:

With 40 lines drawn wider with some transparency, we start to see the smoky fluidity:

Jumps

In his Mastodon post, aBe commented on the separating of the lines as one of the things he liked about this. But why do they do that? If we are moving the points in small increments, why do the curves sometimes make large jumps?

The first reason is because of Hobby curves. They do a great job drawing a curve through a set of points as a person might. But a downside of the algorithm is sometimes changing a point a small amount makes the entire curve take a different route. If you play around with the interactive examples on Jake Low’s page you will see the curve can unexpectedly take a different shape.

As we inch our points along, sometimes the Hobby curve jumps.

The second reason is due to Hilbert sorting. Each of our lines is sorted independently of how the previous line was sorted. If a point’s small motion moves it into a different grid square, it can change the sorting order, which changes the Hobby curve even more.

If we sort the first line, and then keep that order of points for all the lines, the result has fewer jumps, but the Hobby curves still act unpredictably:

Colophon

This was all done with Python, using other people’s implementations of the hard parts: hobby.py, hilbertcurve, and super-simplex. My code is on GitHub (nedbat/fluidity), but it’s a mess. Think of it as a woodworking studio with half-finished pieces and wood chips strewn everywhere.

A lot of the learning and experimentation was in my Jupyter notebook. Part of the process for work like this is playing around with different values of tweakable parameters and seeds for the random numbers to get the effect you want, either artistic or pedagogical. The notebook shows some of the thumbnail galleries I used to pick the examples to show.

I went on to play with animations, which led to other learnings, but those will have to wait for another blog post.

Testing is better than DSA

Monday 22 September 2025

People should spend less time learning DSA, more time learning testing.

I see new learners asking about “DSA” a lot. Data Structures and Algorithms are of course important: considered broadly, they are the two ingredients that make up all programs. But in my opinion, “DSA” as an abstract field of study is over-emphasized.

I understand why people focus on DSA: it’s a concrete thing to learn about, there are web sites devoted to testing you on it, and most importantly, because job interviews often involve DSA coding questions.

Before I get to other opinions, let me make clear that anything you can do to help you get a job is a good thing to do. If grinding leetcode will land you a position, then do it.

But I hope companies hiring entry-level engineers aren’t asking them to reverse linked lists or balance trees. Asking about techniques that can be memorized ahead of time won’t tell them anything about how well you can work. The stated purpose of those interviews is to see how well you can figure out solutions, in which case memorization will defeat the point.

The thing new learners don’t understand about DSA is that actual software engineering almost never involves implementing the kinds of algorithms that “DSA” teaches you. Sure, it can be helpful to work through some of these puzzles and see how they are solved, but writing real code just doesn’t involve writing that kind of code.

Here is what I think in-the-trenches software engineers should know about data structures and algorithms:

  • Data structures are ways to organize data. Learn some of the basics: linked list, array, hash table, tree. By “learn” I mean understand what it does and why you might want to use one.
  • Different data structures can be used to organize the same data in different ways. Learn some of the trade-offs between structures that are similar.
  • Algorithms are ways of manipulating data. I don’t mean named algorithms like Quicksort, but algorithms as any chunk of code that works on data and does something with it.
  • How you organize data affects what algorithms you can use to work with the data. Some data structures will be slow for some operations where another structure will be fast.
  • Algorithms have a “time complexity” (Big O): how the code slows as the data grows. Get a sense of what this means.
  • Python has a number of built-in data structures. Learn how they work, and the time complexity of their operations.
  • Learn how to think about your code to understand its time complexity.
  • Read a little about more esoteric things like Bloom filters, so you can find them later in the unlikely case you need them.

Here are some things you don’t need to learn:

  • The details of a dozen different sorting algorithms. Look at two to see different ways of approaching the same problem, then move on.
  • The names of “important” algorithms. Those have all been implemented for you.
  • The answers to all N problems on some quiz web site. You won’t be asked these exact questions, and they won’t come up in your real work. Again: try a few to get a feel for how some algorithms work. The exact answers are not what you need.

Of course some engineers need to implement hash tables, or sorting algorithms or whatever. We love those engineers: they write libraries we can use off the shelf so we don’t have to implement them ourselves.

There have been times when I implemented something that felt like An Algorithm (for example, Finding fuzzy floats), but it was more about considering another perspective on my data, looking at the time complexity, and moving operations around to avoid quadratic behavior. It wasn’t opening a textbook to find the famous algorithm that would solve my problem.

Again: if it will help you get a job, deep-study DSA. But don’t be disappointed when you don’t use it on the job.

If you want to prepare yourself for a career, and also stand out in job interviews, learn how to write tests:

  • This will be a skill you use constantly. Real-world software means writing tests much more than school teaches you to.
  • In a job search, testing experience will stand out more than DSA depth. It shows you’ve thought about what it takes to write high-quality software instead of just academic exercises.
  • It’s not obvious how to test code well. It’s a puzzle and a problem to solve. If you like figuring out solutions to tricky questions, focus on how to write code so that it can be tested, and how to test it.
  • Testing not only gives you more confidence in your code, it helps you write better code in the first place.
  • Testing applies everywhere, from tiny bits of code to entire architectures, assisting you in design and implementation at all scales.
  • If pursued diligently, testing is an engineering discipline in its own right, with a fascinating array of tools and techniques.

Less DSA, more testing.

Finding unneeded pragmas

Sunday 24 August 2025

A proof-of-concept tool for finding unneeded coverage.py exclusion pragmas

To answer a long-standing coverage.py feature request, I threw together an experiment: a tool to identify lines that have been excluded from coverage, but which were actually executed.

The program is a standalone file in the coverage.py repo. It is unsupported. I’d like people to try it to see what they think of the idea. Later we can decide what to do with it.

To try it: copy warn_executed.py from GitHub. Create a .toml file that looks something like this:

# Regexes that identify excluded lines:
warn-executed = [
    "pragma: no cover",
    "raise AssertionError",
    "pragma: cant happen",
    "pragma: never called",
    ]

# Regexes that identify partial branch lines:
warn-not-partial = [
    "pragma: no branch",
    ]

These are exclusion regexes that you’ve used in your coverage runs. The program will print out any line identified by a pattern and that ran during your tests. It might be that you don’t need to exclude the line, because it ran.

In this file, none of your coverage settings or the default regexes are assumed: you need to explicitly specify all the patterns you want flagged.

Run the program with Python 3.11 or higher, giving the name of the coverage data file and the name of your new TOML configuration file. It will print the lines that might not need excluding:

$ python3.12 warn_executed.py .coverage warn.toml

The reason for a new list of patterns instead of just reading the existing coverage settings is that some exclusions are “don’t care” rather than “this will never happen.” For example, I exclude “def __repr__” because some __repr__’s are just to make my debugging easier. I don’t care if the test suite runs them or not. It might run them, so I don’t want it to be a warning that they actually ran.

This tool is not perfect. For example, I exclude “if TYPE_CHECKING:” because I want that entire clause excluded. But the if-line itself is actually run. If I include that pattern in the warn-executed list, it will flag all of those lines. Maybe I’m forgetting a way to do this: it would be good to have a way to exclude the body of the if clause while understanding that the if-line itself is executed.

Give warn_executed.py a try and comment on the issue about what you think of it.

Starting with pytest’s parametrize

Wednesday 13 August 2025

Pytest’s parametrize feature is powerful but it looks scary. I hope this step-by-step explanation helps people use it more.

Writing tests can be difficult and repetitive. Pytest has a feature called parametrize that can make it reduce duplication, but it can be hard to understand if you are new to the testing world. It’s not as complicated as it seems.

Let’s say you have a function called add_nums() that adds up a list of numbers, and you want to write tests for it. Your tests might look like this:

def test_123():
    assert add_nums([1, 2, 3]) == 6

def test_negatives():
    assert add_nums([1, 2, -3]) == 0

def test_empty():
    assert add_nums([]) == 0

This is great: you’ve tested some behaviors of your add_nums() function. But it’s getting tedious to write out more test cases. The names of the function have to be different from each other, and they don’t mean anything, so it’s extra work for no benefit. The test functions all have the same structure, so you’re repeating uninteresting details. You want to add more cases but it feels like there’s friction that you want to avoid.

If we look at these functions, they are very similar. In any software, when we have functions that are similar in structure, but differ in some details, we can refactor them to be one function with parameters for the differences. We can do the same for our test functions.

Here the functions all have the same structure: call add_nums() and assert what the return value should be. The differences are the list we pass to add_nums() and the value we expect it to return. So we can turn those into two parameters in our refactored function:

def test_add_nums(nums, expected_total):
    assert add_nums(nums) == expected_total

Unfortunately, tests aren’t run like regular functions. We write the test functions, but we don’t call them ourselves. That’s the reason the names of the test functions don’t matter. The test runner (pytest) finds functions named test_* and calls them for us. When they have no parameters, pytest can call them directly. But now that our test function has two parameters, we have to give pytest instructions about how to call it.

To do that, we use the @pytest.mark.parametrize decorator. Using it looks like this:

import pytest

@pytest.mark.parametrize(
    "nums, expected_total",
    [
        ([1, 2, 3], 6),
        ([1, 2, -3], 0),
        ([], 0),
    ]
)
def test_add_nums(nums, expected_total):
    assert add_nums(nums) == expected_total

There’s a lot going on here, so let’s take it step by step.

If you haven’t seen a decorator before, it starts with @ and is like a prologue to a function definition. It can affect how the function is defined or provide information about the function.

The parametrize decorator is itself a function call that takes two arguments. The first is a string (“nums, expected_total”) that names the two arguments to the test function. Here the decorator is instructing pytest, “when you call test_add_nums, you will need to provide values for its nums andexpected_total parameters.”

The second argument to parametrize is a list of the values to supply as the arguments. Each element of the list will become one call to our test function. In this example, the list has three tuples, so pytest will call our test function three times. Since we have two parameters to provide, each element of the list is a tuple of two values.

The first tuple is ([1, 2, 3], 6), so the first time pytest calls test_add_nums, it will call it as test_add_nums([1, 2, 3], 6). All together, pytest will call us three times, like this:

test_add_nums([1, 2, 3], 6)
test_add_nums([1, 2, -3], 0)
test_add_nums([], 0)

This will all happen automatically. With our original test functions, when we ran pytest, it showed the results as three passing tests because we had three separate test functions. Now even though we only have one function, it still shows as three passing tests! Each set of values is considered a separate test that can pass or fail independently. This is the main advantage of using parametrize instead of writing three separate assert lines in the body of a simple test function.

What have we gained?

  • We don’t have to write three separate functions with different names.
  • We don’t have to repeat the same details in each function (assert, add_nums(), ==).
  • The differences between the tests (the actual data) are written succinctly all in one place.
  • Adding another test case is as simple as adding another line of data to the decorator.

Coverage.py regex pragmas

Monday 28 July 2025

Coverage.py uses regexes to define pragma syntax. This is surprisingly powerful.

Coverage.py lets you indicate code to exclude from measurement by adding comments to your Python files. But coverage implements them differently than other similar tools. Rather than having fixed syntax for these comments, they are defined using regexes that you can change or add to. This has been surprisingly powerful.

The basic behavior: coverage finds lines in your source files that match the regexes. These lines are excluded from measurement, that is, it’s OK if they aren’t executed. If a matched line is part of a multi-line statement the whole multi-line statement is excluded. If a matched line introduces a block of code the entire block is excluded.

At first, these regexes were just to make it easier to implement the basic “here’s the comment you use” behavior for pragma comments. But it also enabled pragma-less exclusions. You could decide (for example) that you didn’t care to test any __repr__ methods. By adding def __repr__ as an exclusion regex, all of those methods were automatically excluded from coverage measurement without having to add a comment to each one. Very nice.

Not only did this let people add custom exclusions in their projects, but it enabled third-party plugins that could configure regexes in other interesting ways:

  • covdefaults adds a bunch of default exclusions, and also platform- and version-specific comment syntaxes.
  • coverage-conditional-plugin gives you a way to create comment syntaxes for entire files, for whether other packages are installed, and so on.

Then about a year ago, Daniel Diniz contributed a change that amped up the power: regexes could match multi-line patterns. This sounds like not that large a change, but it enabled much more powerful exclusions. As a sign, it made it possible to support four different feature requests.

To make it work, Daniel changed the matching code. Originally, it was a loop over the lines in the source file, checking each line for a match against the regexes. The new code uses the entire source file as the target string, and loops over the matches against that text. Each match is converted into a set of line numbers and added to the results.

The power comes from being able to use one pattern to match many lines. For example, one of the four feature requests was how to exclude an entire file. With configurable multi-line regex patterns, you can do this yourself:

\A(?s:.*# pragma: exclude file.*)\Z

With this regex, if you put the comment “# pragma: exclude file” in your source file, the entire file will be excluded. The \A and \Z match the start and end of the target text, which remember is the entire file. The (?s:...) means the s/DOTALL flag is in effect, so . can match newlines. This pattern matches the entire source file if the desired pragma is somewhere in the file.

Another requested feature was excluding code between two lines. We can use “# no cover: start” and “# no cover: end” as delimiters with this regex:

# no cover: start(?s:.*?)# no cover: stop

Here (?s:.*?) means any number of any character at all, but as few as possible. A star in regexes means as many as possible, but star-question-mark means as few as possible. We need the minimal match so that we don’t match from the start of one pair of comments all the way through to the end of a different pair of comments.

This regex approach is powerful, but is still fairly shallow. For example, either of these two examples would get the wrong lines if you had a string literal with the pragma text in it. There isn’t a regex that skips easily over string literals.

This kind of difficulty hit home when I added a new default pattern to exclude empty placeholder methods like this:

def not_yet(self): ...

def also_not_this(self):
    ...

async def definitely_not_this(
    self,
    arg1,
):
    ...

We can’t just match three dots, because ellipses can be used in other places than empty function bodies. We need to be more delicate. I ended up with:

^\s*(((async )?def .*?)?\)(\s*->.*?)?:\s*)?\.\.\.\s*(#|$)

This craziness ensures the ellipsis is part of an (async) def, that the ellipsis appears first in the body (but no docstring allowed, doh!), allows for a comment on the line, and so on. And even with a pattern this complex, it would incorrectly match this contrived line:

def f(): print("(well): ... #2 false positive!")

So regexes aren’t perfect, but they’re a pretty good balance: flexible and powerful, and will work great on real code even if we can invent weird edge cases where they fail.

What started as a simple implementation expediency has turned into a powerful configuration option that has done more than I would have thought.

Coverage 7.10.0: patch

Thursday 24 July 2025

Coverage 7.10 has some significant new features that have solved some long-standing problems.

Years ago I greeted a friend returning from vacation and asked how it had been. She answered, “It was good, I got a lot done!” I understand that feeling. I just had a long vacation myself, and used the time to clean up some old issues and add some new features in coverage.py v7.10.

The major new feature is a configuration option, [run] patch. With it, you specify named patches that coverage can use to monkey-patch some behavior that gets in the way of coverage measurement.

The first is subprocess. Coverage works great when you start your program with coverage measurement, but has long had the problem of how to also measure the coverage of sub-processes that your program created. The existing solution had been a complicated two-step process of creating obscure .pth files and setting environment variables. Whole projects appeared on PyPI to handle this for you.

Now, patch = subprocess will do this for you automatically, and clean itself up when the program ends. It handles sub-processes created by the subprocess module, the os.system() function, and any of the execv or spawnv families of functions.

This alone has spurred one user to exclaim,

The latest release of Coverage feels like a Christmas present! The native support for Python subprocesses is so good!

Another patch is _exit. This patches os._exit() so that coverage saves its data before exiting. The os._exit() function is an immediate and abrupt termination of the program, skipping all kinds of registered clean up code. This patch makes it possible to collect coverage data from programs that end this way.

The third patch is execv. The execv functions end the current program and replace it with a new program in the same process. The execv patch arranges for coverage to save its data before the current program is ended.

Now that these patches are available, it seems silly that it’s taken so long. They (mostly) weren’t difficult. I guess it took looking at the old issues, realizing the friction they caused, and thinking up a new way to let users control the patching. Monkey-patching is a bit invasive, so I’ve never wanted to do it implicitly. The patch option gives the user an explicit way to request what they need without having to get into the dirty details themselves.

Another process-oriented feature was contributed by Arkady Gilinsky: with --save-signal=USR1 you can specify a user signal that coverage will attend to. When you send the signal to your running coverage process, it will save the collected data to disk. This gives a way to measure coverage in a long-running process without having to end the process.

There were some other fixes and features along the way, like better HTML coloring of multi-line statements, and more default exclusions (if TYPE_CHECKING: and ...).

It feels good to finally address some of these pain points. I also closed some stale issues and pull requests. There is more to do, always more to do, but this feels like a real step forward. Give coverage 7.10.0 a try and let me know how it works for you.

Older: