« | » Main « | »

Mushroom cake

Tuesday 24 March 2015

My youngest son Ben turned 17 today. He is fascinated with mushrooms, so we made him a mushroom cake. Actually a trio of cakes:

Mushroom cake

It looks a bit like cupcakes, but no cupcakes were harmed in the making of this cake.

The main mushroom has a stem ("It's called a stipe, Dad") made of two 4.5-inch cake rounds. The cap ("pileus, Dad") was baked in the bottom of a stainless steel mixing bowl. The two stem pieces bulged more than we expected, so we sliced them off and made caps for the medium mushrooms. They are supported by stacked Ring-Dings for the stem.

The dots are mega M&M's. The tiny mushrooms are mini-marshmallows supporting white chocolate Reese's peanut butter cups. Gummi worms add character.

A cut-away view of the medium mushroom:

Mushroom cake, cut-away view

Delicious.

Finding temp file creators

Saturday 14 March 2015

One of the things that is very useful about Python is its extreme introspectability and malleability. Taken too far, it can make your code an unmaintainable mess, but it can be very handy when trying to debug large and complex projects.

Open edX is one such project. Its main repository has about 200,000 lines of Python spread across 1500 files. The test suite has 8000 tests.

I noticed that running the test suite left a number of temporary directories behind in /tmp. They all had names like tmp_dwqP1Y, made by the tempfile module in the standard library. Our tests have many calls to mkdtemp, which requires the caller to delete the directory when done. Clearly, some of these cleanups were not happening.

To find the misbehaved code, I could grep through the code for calls to mkdtemp, and then reason through which of those calls eventually deleted the file, and which did not. That sounded tedious, so instead I took the fun route: an aggressive monkeypatch to find the litterbugs for me.

My first thought was to monkeypatch mkdtemp itself. But most uses of the function in our code look like this:

from tempfile import mkdtemp
...
d = mkdtemp()

Because the function was imported directly, if my monkeypatching code ran after this import, the call wouldn't be patched. (BTW, this is one more small reason to prefer importing modules, and using module.function in the code.)

Looking at the implementation of mkdtemp, it makes use of a helper function in the tempfile module, _get_candidate_names. This helper is a generator that produces those typical random tempfile names. If I monkeypatched that internal function, then all callers would use my code regardless of how they had imported the public function. Monkeypatching the internal helper had the extra advantage that using any of the public functions in tempfile would call that helper, and get my changes.

To find the problem code, I would put information about the caller into the name of the temporary file. Then each temp file left behind would be a pointer of sorts to the code that created it. So I wrote my own _get_candidate_names like this:

import inspect
import os.path
import tempfile

real_get_candidate_names = tempfile._get_candidate_names

def get_candidate_names_hacked():
    stack = "-".join(
        "{}{}".format(
            os.path.basename(t[1]).replace(".py", ""),
            t[2],
        )
        for t in inspect.stack()[4:1:-1]
    )
    for name in real_get_candidate_names():
        yield "_" + stack + "_" + name

tempfile._get_candidate_names = get_candidate_names_hacked

This code uses inspect.stack to get the call stack. We slice it oddly, to get the closest three calling frames in the right order. Then we extract the filenames from the frames, strip off the ".py", and concatenate them together along with the line number. This gives us a string that indicates the caller.

The real _get_candidate_names function is used to get a generator of good random names, and we add our stack inspection onto the name, and yield it.

Then we can monkeypatch our function into tempfile. Now as long as this module gets imported before any temporary files are created, the files will have names like this:

tmp_case53-case78-test_import_export289_DVPmzy/
tmp_test_video36-test_video143-tempfile455_2upTdS.srt

The first shows that the file was created in test_import_export.py at line 289, called from case.py line 78, from case.py line 53. The second shows that test_video.py has a few functions calling eventually into tempfile.py.

I would be very reluctant to monkeypatch private functions inside other modules for production code. But as a quick debugging trick, it works great.

« | » Main « | »