Drawing Cairo SVG in a Jupyter notebook

Sunday 27 January 2019

Quick tip: if you want to draw figures using Cairo in a Jupyter notebook, here’s how to do it, at least this was how I did it:

from io import BytesIO

import cairo
import IPython.display

svgio = BytesIO()
with cairo.SVGSurface(svgio, 200, 200) as surface:
    # These lines are copied verbatim from the
    # pycairo page: https://pycairo.readthedocs.io
    context = cairo.Context(surface)
    x, y, x1, y1 = 0.1, 0.5, 0.4, 0.9
    x2, y2, x3, y3 = 0.6, 0.1, 0.9, 0.5
    context.scale(200, 200)
    context.set_line_width(0.04)
    context.move_to(x, y)
    context.curve_to(x1, y1, x2, y2, x3, y3)
    context.stroke()
    context.set_source_rgba(1, 0.2, 0.2, 0.6)
    context.set_line_width(0.02)
    context.move_to(x, y)
    context.line_to(x1, y1)
    context.move_to(x2, y2)
    context.line_to(x3, y3)
    context.stroke()
    # end of pycairo copy
IPython.display.SVG(data=svgio.getvalue())

Counting lines of code

Saturday 19 January 2019

I wrote an Open edX blog post about the need to move from Python 2 to Python 3. For emphasis, I wanted to say how much code there was. Open edX is a large project spread across a number of repos. Why spend 30 minutes writing a blog post when you can first spend two hours fiddling around with line-counting tools to get a vague factoid for the blog post?

The old standard tool for line-counting is cloc. It has way too many options, many of which don’t work quite the way I would have expected, but it gets the job done, with some bash support. My resulting monster is below.

It over-counts JavaScript code because there are lots of places that JavaScript gets checked into git that isn’t code we wrote. I don’t know what to do about that. Oh well.

BTW, on the subject of line counting: once, helping someone with a program, I saw they were using semicolons to end their Python statements. I said they didn’t need them, and they replied, “Yes I do, because my manager’s line-counting software requires them.” !!!

Be careful out there...

#!/bin/bash

# Count lines of code in a tree of git repos.
# Needs cloc (https://github.com/AlDanial/cloc)

REPORTDIR=/tmp/cloc-reports
mkdir -p $REPORTDIR
rm -rf $REPORTDIR/*

cat <<EOF > $REPORTDIR/exclude-files.txt
package-lock.json
EOF

cat <<EOF > $REPORTDIR/more-langs.txt
reStructured Text
    filter remove_matches xyzzy
    extension rst
    3rd_gen_scale 1.0
SVG Graphics
    filter remove_html_comments
    extension svg
    3rd_gen_scale 1.0
EOF

find . -name .git -type d -prune | while read d; do
    dd=$(dirname "$d")
    if [[ $dd == ./src/third-party/* ]]; then
        # Ignore repos in the "third-party" tree.
        continue;
    fi
    echo "==== $dd =============================================="
    cd $dd
    git remote -v

    REPORTHEAD=$REPORTDIR/${dd##*/}
    cloc \
        --report-file=$REPORTHEAD.txt \
        --read-lang-def=$REPORTDIR/more-langs.txt \
        --ignored=$REPORTHEAD.ignored \
        --vcs=git \
        --not-match-d='.*\.egg-info' \
        --exclude-dir=node_modules,vendor,locale \
        --exclude-ext=png,jpg,gif,ttf,eot,woff,mo,xcf \
        --exclude-list-file=$REPORTDIR/exclude-files.txt \
        .
    cd -
done

cloc \
    --sum-reports \
    --read-lang-def=$REPORTDIR/more-langs.txt \
    $REPORTDIR/*.txt

Advice

Tuesday 1 January 2019

Toward the end of last year, a co-worker on her birthday was asking people for life advice. She caught me off-guard, and all I could think of at the moment was “put money into your 401(k).”

But the question stuck with me, and I eventually gave her these bits:

Don’t compare your insides to other people’s outsides

We’re constantly being exposed to other people’s public personas. They share their victories on Facebook. They do talks at conferences explaining their successes. They are loud and visible when they are feeling good.

But we know all about our own internal feelings, all the time, the good and the bad. It’s really easy to think that our world is full of bad feelings, and other people’s worlds are not. But that’s because we compare the full spectrum of our experience to the carefully curated presentation of others.

Don’t do that. Everyone has feelings of insecurity, and failures, and bad feelings. They just don’t show them to you. Don’t compare your insides to their outsides.

Know yourself

...and keep that separate from other people’s ideas that they will try to project onto you. Society has ideas about what people should do, or how they should behave. Your career, or your family, or your personal life: people will try to fit all of these things into some comfortable preconception. Don’t let them. You decide.

Be aware of your brand

This sounds kind of business-y, but just means: tend your reputation well. Your actions have an effect on people. People will get to know you, and form opinions about you. It would be nice if they didn’t, but they do, so be aware of that process, and be mindful in your interactions. You will leave opinion-footprints everywhere you go. You want them to be good opinions. Decide what you want people to think of when they think of you, and aim for that.

Put money into your 401(k)

OK, this was my original glib answer, but it’s true: if you have a way to save for retirement, especially if someone will match your money, do it. You may feel now like you can’t afford it, but that feeling won’t go away in the future. Start now.

Read the whole recipe first, and check you have all the ingredients

This wasn’t my advice, but I liked it, on both a literal and metaphorical level.

Have a happy and mindful 2019!

A thing I learned about Python recursion

Thursday 20 December 2018

Working on a programming challenge, I was surprised by something. I built a tree structure with a recursive function. Then I tried to use a recursive function to sum up some values across the tree, and was met with a RecursionError. How could I have enough stack depth to build the tree, but not enough to then sum up its values?

Python has a limit on how large its stack can grow, 1000 frames by default. If you recur more than that, a RecursionError will be raised. My recursive summing function seemed simple enough. Here are the relevant methods:

class Leaf:
    def __init__(self):
        self.val = 0        # will have a value.

    def value(self):
        return self.val

class Node:
    def __init__(self):
        self.children = []  # will have nodes added to it.

    def value(self):
        return sum(c.value() for c in self.children)

My code made a tree about 600 levels deep, meaning the recursive builder function had used 600 stack frames, and Python had no problem with that. Why would value() then overflow the stack?

The answer is that each call to value() uses two stack frames. The line that calls sum() is using a generator comprehension to iterate over the children. In Python 3, all comprehensions (and in Python 2 all except list comprehensions) are actually compiled as nested functions. Executing the generator comprehension calls that hidden nested function, using up an extra stack frame.

It’s roughly as if the code was like this:

def value(self):
    def _comprehension():
        for c in self.children:
            yield c.value()
    return sum(_comprehension())

Here we can see the two function calls that use the two frames: _comprehension() and then value().

Comprehensions do this so that the variables set in the comprehension don’t leak out into the surrounding code. It works great, but it costs us a stack frame per invocation.

That explains the difference between the builder and the summer: the summer is using two stack frames for each level of the tree. I’m glad I could fix this, but sad that the code is not as nice as using a comprehension:

class Node:
    ...
    def value(self):
        total = 0
        for c in self.children:
            total += c.value()
        return total

Oh well.

Update: Jonathan Slenders suggested using a recursive generator to flatten the sequence of nodes, then summing the flat sequence:

class Leaf:
    ...
    def values(self):
        yield self.val

class Node:
    ...
    def values(self):
        for c in self.children:
            yield from c.values()

    def value(self):
        return sum(self.values())

This is clever, and solves the problem. My real code had a mixture of two different nodes, one using sum() the other using max(), so it wouldn’t have worked for me. But it’s nice for when it does.

Advent of code presentation

Wednesday 19 December 2018

At Boston Python last night, I did a presentation about solutions to a particular Advent of Code puzzle.

If you haven’t seen Advent of Code, give it a look. A new puzzle each day in December until Christmas. This is the fourth year running, and you can go back and look at the past years (and days).

My presentation landing page has links to the slides and the code.

The presentation took a particular Advent of Code puzzle (December 14, 2016) and explained out a few different solutions, with a small detour into unit testing.

The code shows a few different ways to deal with the problem:

During the talk, an audience member suggested that itertools.tee could be useful, which I hadn’t considered. So I tried that out also, though it wasn’t as nice as I had hoped, and maybe is holding on to too much state.

Sorry I didn’t write out the text of the talk itself...

Quick hack CSV review tool

Tuesday 4 December 2018

Let’s say you are running a conference, and let’s say your Call for Proposals is open, and is saving people’s talk ideas into a spreadsheet.

I am in this situation. Reviewing those proposals is a pain, because there are large paragraphs of text, and spreadsheets are a terrible way to read them. I did the typical software engineer thing: I spent an hour writing a tool to make reading them easier.

The result is csv_review.py. It’s a terminal program that reads a CSV file (the exported proposals). It displays a row at a time on the screen, wrapping text as needed. It has commands for moving around the rows. It collects comments into a second CSV file. That’s it.

There are probably already better ways to do this. Everyone knows that to get answers from the internet, you don’t ask questions, instead you present wrong answers. More people will correct you than will help you. So this tool is my wrong answer to how to review CFP proposals. Correct me!

Older: