Perspective juggling

Saturday 23 January 2021

Taylor Glenn is a great juggler. I don’t mean that she has set world records, or does more-difficult tricks than anyone else. She is certainly very accomplished, but what I love about her is that she brings an easy grace to her juggling, and a friendly encouraging air. I have used some of her instructional videos to improve my technique.

Her latest video, Perspective Juggling, is eye-opening. It’s filmed from above, presenting a fresh view of the juggling. This makes some of the tricks more understandable by seeing the hands move more while the balls seem to move less. Other tricks somehow seem more mysterious. All of them are fluid and mesmerizing.

Take a look:

Watch the video on YouTube

As a juggler, the only improvement I would make would be to flip it around so that her hands reach up toward the top, to be more like a first-person view.

Flourish

Sunday 17 January 2021

Flourish is a visual toy app that draws harmonographs, sinuous curves simulating a multi-pendulum trace:

Front page of Flourish, showing thumbnails of harmonographs

Each harmonograph is determined by a few dozen parameter values, which are chosen randomly. The number of parameters depends on the number of pendulums, which defaults to 3.

Click a thumbnail to see a larger version. The large-image pages have thumbnails of “adjacent” images. Each harmonograph is determined by a few dozen parameter values. For each parameter, four nearby values are substituted, giving four thumbnails for each parameter. Clicking an adjacent thumbnail continues your exploration of the parameter space:

A large harmonograph, with adjacent thumbnails

The settings dialog lets you adjust the number of pendulums (which determines the number of parameters) and the kinds of symmetry you are interested in.

I started this because I wanted to understand how the parameters affected the outcome, but I was also interested to give it a purely visual design. As an engineer, it was tempting to present the values of the parameters quantitatively, but I like the simplicity of just clicking curves you like.

I repeated a trick I’ve used in other mathy visual toys: when you download a PNG file of an image, the parameter values are stored in a data island in the file. You can re-upload the image, and Flourish will extract the parameters and put you back into the parameter-space exploration at that point.

This is one of those side projects that let me use different sorts of things than I usually do: numpy, SVG, sass, Docker, and so on. I had more ideas for things to add (there is color in the code but not the UI). Maybe someday I will build them.

BTW, I am happy that my first post of 2021 is called “Flourish.” I hope it is a harbinger of things to come.

Experimenting with git storage

Saturday 19 December 2020

The recent blog post Commits are snapshots, not diffs did a good job explaining away a common git misconception, and helped me finally understand it. To really wrap my head around it, I checked it empirically.

The misconception starts because git presents commits as diffs, and lets you manipulate them (rebase, cherry-pick, etc) as if they were diffs. But internally, git commits are not diffs, they are complete copies of the file at each revision that changes the file.

At first glance, this seems dumb: why store the whole file over again just because one line changed? The reason is speed and immutability. If git stored each commit as a diff against the previous version (as RCS did), then getting the latest version of a file would require replaying all the diffs against the very first version of the file, getting slower and slower as the repo accumulated more commits. This means the most common checkout would get worse and worse over time.

If git stored the latest version of a file, and diffs going backward in time (as Subversion does), then getting older versions would get slower and slower, which isn’t so bad. But it would require re-writing the previous commit each time a new commit was made. This would ruin git’s hash-based immutability.

So, surprisingly, git stores the full contents of the file each time the file changes. I wanted to see this for myself, so I did an experiment.

First, make a new git repo:

$ mkdir gitsize
$ cd gitsize
$ git init
Initialized empty Git repository in /tmp/gitsize/.git/

I used a Python program (makebig.py) to create large files with repeatably random contents and one changeable word in the middle:

# Make a 1Mb randomish file, repeatably

import random, sys

random.seed(int(sys.argv[1]))

for lineno in range(2048):
    if lineno == 1000:
        print(sys.argv[2])
    print("".join(chr(random.randint(32, 126)) for _ in range(512)))

Let’s make a big file with “one” in the middle, and commit it:

$ python /tmp/makebig.py 1 one > one.txt
$ ls -lh
total 2136
-rw-r--r--  1 ned  wheel   1.0M Dec 19 11:13 one.txt
$ git add one.txt
$ git commit -m "One"
[master (root-commit) 8fceff3] One
 1 file changed, 2049 insertions(+)
 create mode 100644 one.txt

Git stores everything in the .git directory, with the file contents in the .git/objects directory:

$ ls -Rlh .git/objects/*
.git/objects/13:
total 1720
-r--r--r--  1 ned  wheel   859K Dec 19 11:14 b581d8695866f880eac2fef47c2594bbebbb3b

.git/objects/7d:
total 8
-r--r--r--  1 ned  wheel    52B Dec 19 11:14 32a67a911e8a04ad1703712481afe93b00c7af

.git/objects/8f:
total 8
-r--r--r--  1 ned  wheel   127B Dec 19 11:14 ceff3e3926764197742b01639a42765e34cd72

.git/objects/info:

.git/objects/pack:

Git stores three kinds of things: blobs, trees, and commits. We now have one of each. Blobs store the file contents. You can see the b581d8 blob is 859Kb, which is our 1Mb file with a little compression applied.

Now we change the file just a little bit by writing it over again with a different word in the middle:

$ python /tmp/makebig.py 1 one-changed > one.txt
$ git diff
diff --git a/one.txt b/one.txt
index 13b581d..b13026a 100644
--- a/one.txt
+++ b/one.txt
@@ -998,7 +998,7 @@ wLh&#DvF%em\Bb}^Y<gk?5vR8npq{ ~".][T|@.At@~fGYf<0/=cth`e}/}='qBFb&FP?+ENmAA:_g+0
 u$d|\v=y$oi@\, (o`=a49|!r\LL^B:y.f)*@5^bR\,Ck=i (.. snipped)
 lbY#m++>32X>^gh\/q34})uxZ"e/p;Ybb9\k,UTLPb*?3l7 (.. snipped)
 B11\\!x]jM9m't"KD%|,&r(lfh%vfT}~{jOQYBb?|TZ(<<R (.. snipped)
-one
+one-changed
 >Mu2P-/=8Z+A&"#@'"8*~fb]kkn;>}Ie.)wGjjHsbO5Nw]" (.. snipped)
 Vl {Q)k|{E!vF*@S')U5bK3u1fInN*ZrIe{-qXW}Fr`6*#N (.. snipped)
 3lF#jR!"JxXjAvih 4I6E\W:Y.*}b@eZ8xl-"*c/!pe"$Mx (.. snipped)

Commit the change, and we can look again at the .git storage:

$ git commit -am "One, changed"
[master a2410c8] One, changed
 1 file changed, 1 insertion(+), 1 deletion(-)
$ ls -Rlh .git/objects/*
.git/objects/0e:
total 8
-r--r--r--  1 ned  wheel    52B Dec 19 11:22 2de9f34b9140c3e99c5d5106a1078d22aa9063

.git/objects/13:
total 1720
-r--r--r--  1 ned  wheel   859K Dec 19 11:14 b581d8695866f880eac2fef47c2594bbebbb3b

.git/objects/7d:
total 8
-r--r--r--  1 ned  wheel    52B Dec 19 11:14 32a67a911e8a04ad1703712481afe93b00c7af

.git/objects/8f:
total 8
-r--r--r--  1 ned  wheel   127B Dec 19 11:14 ceff3e3926764197742b01639a42765e34cd72

.git/objects/a2:
total 8
-r--r--r--  1 ned  wheel   163B Dec 19 11:22 410c8b799b7829e1360649011754739e0a5c50

.git/objects/b1:
total 1720
-r--r--r--  1 ned  wheel   859K Dec 19 11:22 3026a4c10928821aa2b89b3e67d766dfbd533a

.git/objects/info:

.git/objects/pack:

Now, as promised, there are two blobs, each 859Kb. The original file contents are still in blob b581d8, and there’s a new blob (3026a4) to hold the updated contents.

Even though we changed just one line in a 2000-line file, git stores the full contents of both revisions of the file.

Isn’t this terrible!? Won’t my repos balloon to unmanageable sizes? Nope, because git has another trick up its sleeve. It can store those blobs in “pack files”, which store repeated sequences of bytes once.

Git will automatically pack blobs when it makes sense to, but we can ask it to explicitly in order to see them in action:

$ git gc --aggressive
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
$ ls -Rlh .git/objects/*
.git/objects/info:
total 16
-rw-r--r--  1 ned  wheel   1.2K Dec 19 11:41 commit-graph
-rw-r--r--  1 ned  wheel    54B Dec 19 11:41 packs

.git/objects/pack:
total 1720
-r--r--r--  1 ned  wheel   1.2K Dec 19 11:41 pack-a0d87c64abc0f03070fd14449891fe20ca98926b.idx
-r--r--r--  1 ned  wheel   855K Dec 19 11:41 pack-a0d87c64abc0f03070fd14449891fe20ca98926b.pack

Now instead of individual blob files, we have one pack file. And it’s a little smaller than either of the blobs!

This may seem like a semantic game: doesn’t this show that commits are deltas? It’s not the same, for a few reasons:

  • Reconstructing a file doesn’t require revisiting its history. Every revision is available with the same amount of effort.
  • The sharing between blobs is at a conceptually different layer than the blob storage. Git stores a commit as a full snapshot of all of the files’ contents. The file contents might be stored in a shared-bytes way within the pack files.
  • The object model is full-file contents in blobs, and commits referencing those blobs. If you removed pack files from the implementation, the conceptual model and all operations would work the same, just take more disk space.
  • The storage savings in a pack file are not limited to a single file. If two files (or two revisions of two different files) are very similar, their bytes will be shared.

To demonstrate this last point, we’ll make another file with almost the same contents as one.txt:

$ python /tmp/makebig.py 1 two > two.txt
$ ls -lh
total 4280
-rw-r--r--  1 ned  wheel   1.0M Dec 19 11:18 one.txt
-rw-r--r--  1 ned  wheel   1.0M Dec 19 11:49 two.txt
$ git add two.txt
$ git commit -m "Two"
[master 079baa5] Two
 1 file changed, 2049 insertions(+)
 create mode 100644 two.txt
$ git gc --aggressive
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (9/9), done.
Total 9 (delta 2), reused 4 (delta 0), pack-reused 0
$ ls -Rlh .git/objects/*
.git/objects/info:
total 16
-rw-r--r--  1 ned  wheel   1.2K Dec 19 11:50 commit-graph
-rw-r--r--  1 ned  wheel    54B Dec 19 11:50 packs

.git/objects/pack:
total 1720
-r--r--r--  1 ned  wheel   1.3K Dec 19 11:50 pack-36b681bfc8ebef963bb8a7dcfe65addab822f5d4.idx
-r--r--r--  1 ned  wheel   855K Dec 19 11:50 pack-36b681bfc8ebef963bb8a7dcfe65addab822f5d4.pack

Now we have two separate source files in our working tree, each 1Mb. But in the .git storage there is still just one 855Kb pack file. The parts of one.txt and two.txt that are the same are only stored once.

As another example, let’s change two.txt completely by using a different random seed, commit it, then change it back again:

$ python /tmp/makebig.py 2 two > two.txt
$ git commit -am "Two, completely changed"
[master 6dac887] Two, completely changed
 1 file changed, 2049 insertions(+), 2049 deletions(-)
 rewrite two.txt (86%)
$ python /tmp/makebig.py 1 two > two.txt
$ git commit -am "Never mind, I liked it the old way"
[master c06ad2f] Never mind, I liked it the old way
 1 file changed, 2049 insertions(+), 2049 deletions(-)
 rewrite two.txt (86%)

Looking at the storage, our pack file is twice the size, because we’ve had two completely different 1Mb-chunks of data. But thinking about two.txt, its first and third revisions were nearly identical, so they can share bytes in the pack file:

$ git gc --aggressive
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (11/11), done.
Writing objects: 100% (13/13), done.
Total 13 (delta 3), reused 6 (delta 0), pack-reused 0
$ ls -Rlh .git/objects/*
.git/objects/info:
total 16
-rw-r--r--  1 ned  wheel   1.3K Dec 19 11:58 commit-graph
-rw-r--r--  1 ned  wheel    54B Dec 19 11:58 packs

.git/objects/pack:
total 3432
-r--r--r--  1 ned  wheel   1.4K Dec 19 11:58 pack-49f264f911dc97e529dc56a4f6ad450f8013f720.idx
-r--r--r--  1 ned  wheel   1.7M Dec 19 11:58 pack-49f264f911dc97e529dc56a4f6ad450f8013f720.pack

If git stored diffs, we’d need two different megabyte-sized diffs for the two complete changes we’ve made to two.txt.

Note that in this experiment I have used “git gc” to force the storage into its most compact form. You typically wouldn’t do this. Git will automatically repack files when it makes sense to.

Git doesn’t store diffs, it stores the complete contents of the file at each revision. But beneath those full-file snapshots is clever redundancy-removing byte storage that makes the total size even smaller than a diff-based system could achieve.

If you want to know more, How Git Works is a good overview, and Git Internals - Git Objects is the authoritative reference.

Favicons with ImageMagick

Saturday 5 December 2020

I needed to revisit how favicons work today. First I wanted to do an empirical experiment to see what size and format would get used by browsers. This has always been a confusing landscape. Some pages offer dozens of different files to use as the icon. I wasn’t going to go crazy with all of that, so I just wanted to see what would do a simple job.

To run my experiment, I used ImageMagick to create a test favicon.ico, and also some different-sized png files. So I would know what I was looking at, each size is actually a different visual image: the 32-pixel icon shows “32”, and so on.

This is how I made them:

for size in 16 32 48 ; do
    magick convert \
        -background lightgray \
        -fill black \
        -size ${size}x${size} \
        -gravity center \
        -bordercolor black \
        -border 1 \
        label:${size} \
        icon_${size}.bmp
done
for size in 16 32 48 64 96 128 256; do
    magick convert \
        -background lime \
        -fill black \
        -size ${size}x${size} \
        -gravity center \
        -bordercolor black \
        -border 1 \
        label:${size} \
        icon_${size}.png
done
magick convert *.bmp favicon.ico

Playing with these a bit showed me that favicon.ico is not that reliable, and the simplest thing to do that works well is just to use a 32-pixel PNG file.

I wanted to make an icon of a circled Sleepy Snake image. I started with GIMP, but got lost in selections, paths, and alpha channels, so I went back to ImageMagick:

magick convert SleePYsnake.png \
    -background white -alpha remove -alpha off \
    SleePYwhite.png
magick convert \
    -size 3600x3600 xc:Purple -fill LightBlue \
    -stroke black -strokewidth 30 \
    -draw "circle 1100,1000 1100,1700" -transparent LightBlue \
    mask.png
magick convert SleePYwhite.png mask.png -composite temp.png
magick convert temp.png -transparent Purple temp2.png
magick convert temp2.png -crop 1430x1430+385+285 +repage round.png
magick convert round.png -resize 32x32 round_32.png

Probably some of these steps could be combined. The ImageMagick execution model is still a bit baffling to me. It made these intermediate steps:

The six images made by the pipeline above.

I made that montage made with:

magick montage \
    SleePYsnake.png SleePYwhite.png mask.png temp.png temp2.png round.png \
    -geometry 300x300 -background '#ccc' -mode concatenate -tile 2x \
    favicon_stages.png

In the end, I got the result I wanted:

32-pixel rendering of Sleepy Snake

Mad Libs

Friday 27 November 2020

When people ask what they should implement to practice programming, I often say, Mad Libs. It’s a game, so it might appeal to youthful minds, but it’s purely text-based, so it won’t be overwhelming to implement. It can start simple, for beginners, but get complicated if you are more advanced.

Mad Libs is a language game. One person, the reader, has a story with blanks in it. The other player(s) provide words to go in the blanks, but they don’t know the story. The result is usually funny.

For example, the story might be:

There was a tiny     (adjective)         (noun)     who was feeling very     (adjective)    . After     (verb)    ’ing, she felt     (adjective)    .

(An actual story would be longer, usually a full paragraph.) The reader will ask for the words, either in order, or randomized:

Give me an adjective.
“Purple”.
Give ma a noun.
“Barn”.
A verb.
“Jump”.
Another adjective.
“Lumpy”.
Another adjective.
“Perturbed”.

Then the reader presents the finished story:

There was a tiny perturbed barn who was feeling very purple. After jumping, she felt lumpy.

To do this in software, the program will be the reader, prompting the user for words, and then output the finished story. There are a few different ways you could structure it, of different complexities:

  • Hard-code the story in the code.
  • Represent the story in a data structure (maybe a list?) in the code.
  • Read the story from a data file, to make it easier to use different stories.
  • Provide a way to edit stories.
  • Make a Mad Libs web application.

Each of these has design choices to be made. How will you separate the text from the blanks? How will you indicate what kind of word goes in each blank? How complex a story structure will you allow?

There are other bells and whistles you can add along the way, for any of the stages of complexity:

  • Randomize the order the words are requested.
  • Have a way for a provided word to be re-used in the story for some coherence.
  • Stories that have random paths through them, so that it is not always the same story, giving more variety.
  • Smarter text manipulation, so that “a” or “an” would be used appropriately with a provided word.

If you are interested, you can read the details of how I approached it years ago with my son: Programming madlibs.

30% of people can juggle

Tuesday 3 November 2020

I’ve long wondered what portion of the general public can juggle. I couldn’t find an answer searching the web, so I used the best polling method I have, Twitter:

I realize that my Twitter followers skew toward people like me, so I ran a second poll to try to get data outside of my bubble:

These polls are by no means scientific, and are still very skewed toward the savvy and educated. If you ask a tech-bubble person to ask a friend, the friend is still from a small slice of the population as a whole.

But this is the best data we’ve got. I’ll say that in general, 20–30% of people can juggle.

Since I was making polls, and since 30% was higher than I would have guessed, I made a third poll to see what other people would guess:

There’s a nice symmetry to the idea that about 70% of people are surprised that about 30% of people can juggle!

If you have a better source of data about the general public, let me know.

Older:

Sep 20:

Scriv