Tuesday 1 January 2019

Toward the end of last year, a co-worker on her birthday was asking people for life advice. She caught me off-guard, and all I could think of at the moment was “put money into your 401(k).”

But the question stuck with me, and I eventually gave her these bits:

Don’t compare your insides to other people’s outsides

We’re constantly being exposed to other people’s public personas. They share their victories on Facebook. They do talks at conferences explaining their successes. They are loud and visible when they are feeling good.

But we know all about our own internal feelings, all the time, the good and the bad. It’s really easy to think that our world is full of bad feelings, and other people’s worlds are not. But that’s because we compare the full spectrum of our experience to the carefully curated presentation of others.

Don’t do that. Everyone has feelings of insecurity, and failures, and bad feelings. They just don’t show them to you. Don’t compare your insides to their outsides.

Know yourself

...and keep that separate from other people’s ideas that they will try to project onto you. Society has ideas about what people should do, or how they should behave. Your career, or your family, or your personal life: people will try to fit all of these things into some comfortable preconception. Don’t let them. You decide.

Be aware of your brand

This sounds kind of business-y, but just means: tend your reputation well. Your actions have an effect on people. People will get to know you, and form opinions about you. It would be nice if they didn’t, but they do, so be aware of that process, and be mindful in your interactions. You will leave opinion-footprints everywhere you go. You want them to be good opinions. Decide what you want people to think of when they think of you, and aim for that.

Put money into your 401(k)

OK, this was my original glib answer, but it’s true: if you have a way to save for retirement, especially if someone will match your money, do it. You may feel now like you can’t afford it, but that feeling won’t go away in the future. Start now.

Read the whole recipe first, and check you have all the ingredients

This wasn’t my advice, but I liked it, on both a literal and metaphorical level.

Have a happy and mindful 2019!

A thing I learned about Python recursion

Thursday 20 December 2018

Working on a programming challenge, I was surprised by something. I built a tree structure with a recursive function. Then I tried to use a recursive function to sum up some values across the tree, and was met with a RecursionError. How could I have enough stack depth to build the tree, but not enough to then sum up its values?

Python has a limit on how large its stack can grow, 1000 frames by default. If you recur more than that, a RecursionError will be raised. My recursive summing function seemed simple enough. Here are the relevant methods:

class Leaf:
    def __init__(self):
        self.val = 0        # will have a value.

    def value(self):
        return self.val

class Node:
    def __init__(self):
        self.children = []  # will have nodes added to it.

    def value(self):
        return sum(c.value() for c in self.children)

My code made a tree about 600 levels deep, meaning the recursive builder function had used 600 stack frames, and Python had no problem with that. Why would value() then overflow the stack?

The answer is that each call to value() uses two stack frames. The line that calls sum() is using a generator comprehension to iterate over the children. In Python 3, all comprehensions (and in Python 2 all except list comprehensions) are actually compiled as nested functions. Executing the generator comprehension calls that hidden nested function, using up an extra stack frame.

It’s roughly as if the code was like this:

def value(self):
    def _comprehension():
        for c in self.children:
            yield c.value()
    return sum(_comprehension())

Here we can see the two function calls that use the two frames: _comprehension() and then value().

Comprehensions do this so that the variables set in the comprehension don’t leak out into the surrounding code. It works great, but it costs us a stack frame per invocation.

That explains the difference between the builder and the summer: the summer is using two stack frames for each level of the tree. I’m glad I could fix this, but sad that the code is not as nice as using a comprehension:

class Node:
    def value(self):
        total = 0
        for c in self.children:
            total += c.value()
        return total

Oh well.

Update: Jonathan Slenders suggested using a recursive generator to flatten the sequence of nodes, then summing the flat sequence:

class Leaf:
    def values(self):
        yield self.val

class Node:
    def values(self):
        for c in self.children:
            yield from c.values()

    def value(self):
        return sum(self.values())

This is clever, and solves the problem. My real code had a mixture of two different nodes, one using sum() the other using max(), so it wouldn’t have worked for me. But it’s nice for when it does.

Advent of code presentation

Wednesday 19 December 2018

At Boston Python last night, I did a presentation about solutions to a particular Advent of Code puzzle.

If you haven’t seen Advent of Code, give it a look. A new puzzle each day in December until Christmas. This is the fourth year running, and you can go back and look at the past years (and days).

My presentation landing page has links to the slides and the code.

The presentation took a particular Advent of Code puzzle (December 14, 2016) and explained out a few different solutions, with a small detour into unit testing.

The code shows a few different ways to deal with the problem:

During the talk, an audience member suggested that itertools.tee could be useful, which I hadn’t considered. So I tried that out also, though it wasn’t as nice as I had hoped, and maybe is holding on to too much state.

Sorry I didn’t write out the text of the talk itself...

Quick hack CSV review tool

Tuesday 4 December 2018

Let’s say you are running a conference, and let’s say your Call for Proposals is open, and is saving people’s talk ideas into a spreadsheet.

I am in this situation. Reviewing those proposals is a pain, because there are large paragraphs of text, and spreadsheets are a terrible way to read them. I did the typical software engineer thing: I spent an hour writing a tool to make reading them easier.

The result is It’s a terminal program that reads a CSV file (the exported proposals). It displays a row at a time on the screen, wrapping text as needed. It has commands for moving around the rows. It collects comments into a second CSV file. That’s it.

There are probably already better ways to do this. Everyone knows that to get answers from the internet, you don’t ask questions, instead you present wrong answers. More people will correct you than will help you. So this tool is my wrong answer to how to review CFP proposals. Correct me! 5.0a4: the sys.path to hell

Sunday 25 November 2018

Another alpha of 5.0 is available: 5.0a4. This fixes a few problems with the new SQLite-based storage. Please give it a try, especially to experiment with dynamic contexts.

The challenge with this release was something that started as a seemingly simple fix. tries to emulate how Python runs programs, including how the first element of sys.path is set. A few people run coverage with sys.path fully configured, and coverage’s setting of sys.path[0] was breaking their stuff.

The proposal was simple: delete the one line of code that set sys.path[0]. I tried that, and it seemed to work. Fixed!

Not so fast: the Windows builds failed. This started a multi-week adventure of debugging and refactoring. The Windows builds were failing not because of Windows itself, but because on Windows, I don’t use pytest-xdist, which parallelizes tests into worker processes. With xdist, the tests were all passing. Without xdist, a few sys.path-related tests were failing.

It turns out that xdist manipulates sys.path itself, which was masking the fact that I had removed an important step from First thing to do was to adjust my test code so that even with xdist, my tests didn’t get xdist’s path changes.

Then I had to re-think how to adjust sys.path. That required refactoring how I ran the user’s Python code, so that I could apply the path changes a little earlier than I used to. That made me look at how I was testing that layer of code with mocks, and I changed it from explicit dependency injection to implicit mock patching.

A few more little fixes were needed here and there along the way. All told, the “one line” fix ended up being 14 files changed, 587 insertions, 427 deletions.

Careful with negative assertions

Sunday 4 November 2018

A cautionary tale about testing that things are unequal...

We had a test that was kind of like this:

def test_thing():
    data = "I am the data"

But someone refactored the test oh-so-slightly, like this:

def test_thing():
    data = "I am the data"
    modified = modify_another_way(change_the_thing(data)),

Now the test isn’t testing what it should be testing, and will pass even if change_the_thing and modify_another_way both return their argument unchanged. (I’ll explain why below.)

Negative tests (asserting that two things are unequal) is really tricky, because there are infinite things unequal to your value. Your assertion could pass because you accidentally have a different one of those unequal values than you thought.

Better would be to know what unequal value you are producing, and test that you have produced that value, with an equality assertion. Then if something unexpectedly shifts out from under you, you will find out.

Why the test was broken: the refactorer left the trailing comma on the “modified =” line, so “modified” is a 1-element tuple. The comparison is now between a tuple and a string, which are always unequal, even if the first element of the tuple is the same as the string.