This week at work we ran the first Open edX Conference, bringing together people using Open edX, the open source platform that powers edX. It was an exciting, exhilarating, exhausting time.

It was our first time organizing a conference, and we did it on short notice, about four months. Where we didn't know what to do, we mostly made it be like PyCon: 30-minute talks, 10 minutes between talks, a few opportunities for lightning talks, etc.

Judging from the #openedxcon tweet stream, and from talking to people afterward, people seemed to really like it.

I gave part of the edX keynote, and as usually happens when I give a talk, there are things I know I am going to say, and things that seem to just pop out of my mouth once I get going. I was showing two examples of long-tail Open edX sites, and making the point that edX would never have put these particular courses online itself. I said,

It's not open source until stuff starts happening beyond our reach.

But it got tweeted as:

"It isn't open source until stuff starts happening that is beyond our control." - Ned Batchelder @OpenEdX #openedxcon

How meta: I say something, then the community turns it into something else, beyond my control! This was widely re-quoted, and was repeated by our CEO at another edX event later that week.

There's a difference between "beyond our reach" and "beyond our control." Not a huge difference, but I was talking more about reach at the time. But maybe that's a sign that things really are working, when it is beyond your control, and it's still good. Just like I "said."

And Open edX is going well: there are about 60 sites running 400 courses, all over the world. EdX has as outsized goal of educating a billion students by 2020, and Open edX will be a significant part of that. The 160 people at the conference were excited to help by running their own sites and courses. The conference was a success, even the parts beyond our control...

My son Max is a senior at NYU, studying film. He has to finish his senior project. It costs real money to make real films. Give him some money to help!

He's made a really compelling pitch video, so even if you don't think you're going to give money, at least go and watch it to see how cool my son is... :)

Max pitching his fundraiser

In 1964, Richard Feynman gave a series of seven lectures at Cornell called The Character of Physical Law. They were recorded by the BBC, and are now on YouTube. These are great.

These are not advanced lectures, they were intended for a general audience, and Feynman does a great job inhabiting the world of fundamental physics. He's clearly one of the top experts, but explains in such a personal approachable style that you are right alongside him as he explores this world looking for answers, following in the footsteps of Newton and Einstein.

If you've never heard Feynman, at least dip into the first one if only to hear his deep, thick New York accent. He also is witty: he places the French Revolution in 1783, and says, "Accurate to three decimal places, not bad for a physicist!" It's disarmingly out of character for an intellectual, but Feynman is the real thing, discussing not just the basics of forces and particles, but the philosophical implications for scientists and all thinkers:

I converted the videos to pure audio and listened to them in my car, which meant I couldn't see what he was drawing on the blackboard, but it was enlightening nonetheless. Highly recommended: The Character of Physical Law.

tagged: » 2 reactions

I do a lot of side projects in the Python world. I write coverage.py. I give talks at PyCon, like the one about iterators, or the one with the Unicode sandwich. I hang out in the #python IRC channel and help people with their programs. I organize Boston Python.

I enjoy these things, and I don't get paid for them. But if you want to help me out, here's how you can: my son Max is in his last semester at NYU film school, which means he and his friends are making significant short films. These films need funding. If you've liked something I've done for you in the Python world, how about tossing some money over to a film?

Max will be doing a film of his own this semester, but his Kickstarter isn't live yet. In the meantime, he's the cinematographer on his friend Jacob's film Go To Hell. So give a little money to Jacob, and in a month or so I'll hit you up again to give a lot of money to Max. :)

The first alpha of the next major version of coverage.py is available: coverage.py v4.0a1.

The big new feature is support for the gevent, greenlet, and eventlet concurrency libraries. Previously, these libraries' behind-the-scenes stack swapping would confuse coverage.py. Now coverage adapts to give accurate coverage measurement. To enable it, use the "concurrency" setting to specify which library you are using.

Huge thanks to Peter Portante for getting the concurrency support started, and Joe Jevnik for the last final push.

Also new is that coverage.py will read its configuration from setup.cfg if there is no .coveragerc file. This lets you keep more of your project configuration in one place.

Lastly, the textual summary report now shows missing branches if you are using branch coverage.

One warning: I'm moving around lots of internals. People have a tendency to use what they need to to get their plugin or tool to work, so some of those third-party packages may now be broken. Let me know what you find.

Full details of other changes are in the CHANGES.txt file.

tagged: » react

I thought today was going to be a good day. I was going to release the first alpha version of coverage.py 4.0. I finally finished the support for gevent and other concurrency libraries like it, and I wanted to get the code out for people to try it.

So I made the kits and pushed them to PyPI. I used to not do that, because people would get the betas by accident. But pip now understands about pre-releases and real releases, and won't install an alpha version by default. Only if you explicitly use --pre will you get an alpha.

About 10 minutes after I pushed the kits, someone I was chatting with on IRC said, "Did you just release a new version of coverage?" Turns out his Travis build was failing.

He was using coveralls to report his coverage statistics, and it was failing. Turns out coveralls uses internals from coverage.py to do its work, and I've made big refactorings to the internals, so their code was broken. But how did the alpha get installed in the first place?

He was using tox, and it turns out that when tox installs dependencies, it defaults to using the --pre switch! Why? I don't know.

OK, I figured I would just hide the new version on PyPI. That way, if people wanted to try it, they could use "pip install coverage==4.0a1", and no one else would be bothered with it. Nope: pip will find the newer version even if it is hidden on PyPI. Why? I don't know.

In my opinion:

  • Coveralls shouldn't have used coverage.py internals.
  • Tox shouldn't use the --pre switch by default.
  • Pip shouldn't install hidden versions when there is no version information specified.

So now the kit is removed entirely from PyPI while I figure out a new approach. Some possibilities, none of them great:

  1. Distribute the kit the way I used to, with a download on my site. This sucks because I don't know if there's a way to do this so that pip will find it, and I don't know if it can handle pre-built binary kits like that.
  2. Do whatever I need to do to coverage.py so that coveralls will continue to work. This sucks because I don't know how much I will have to add back, and I don't want to establish a precedent, and it doesn't solve the problem that people really don't expect to be using alphas of their testing tools on Travis.
  3. Make a new package on PyPI: coverage-prerelease, and instruct people to install from there. This sucks because tools like coveralls won't refer to it, so either you can't ever use it with coveralls, or if you install it alongside, then you have two versions of coverage fighting with each other? I think?
  4. Make a pull request against coveralls to fix their use of the now-missing coverage.py internals. This sucks (but not much) because I don't want to have to understand their code, and I don't have a simple way to run it, and I wish they had tried to stick to supported methods in the first place.
  5. Leave it broken, and let people fix it by overriding their tox.ini settings to not use --pre, or wait until people complain to coveralls and they fix their code. This sucks because there will be lots of people with broken builds.

Software is hard, yo.

tagged: » 9 reactions

A friend recommended a technical talk today: How to Design an a Good API and Why it Matters by Joshua Bloch. Looks good! It's also an hour long...

For a variety of reasons, it's hard to watch an hour-long video. I'd prefer to read the same content. But it isn't available textually. For my own talks, I produce full text as part of the preparation (for example, the Unicode sandwich talk).

I've even transcribed other peoples' PyCon talks: Stop Mocking, Start Testing, and Speedily Practical Large-Scale Tests. It was a good way to ensure I actually watched them!

People put slide decks up on SlideShare, but decks vary wildly in how well they contain the content. Some simply provide a backdrop, which is entertaining during a talk, but useless afterward.

Is there some way we can pool efforts to get more talks transcribed or summarized? Surely others would like to see it done? And there must be people eager to contribute in some way who could spend the time? Does something like this already exist?

I know the full talk, with the real speaker really speaking to me, is the best way to get their message. For example, Richard Feynman's series The Character of Physical Law just wouldn't be the same without his accent and delivery. But if the choice is reading a lengthy summary or not getting the message at all, I'll definitely take the summary.

Or maybe I'm an old codger stuck in text-world while all the younguns just want video?

Ben spent the summer at a RISD program for high-schoolers (obligatory celebration cake was here.) He majored in comics, and this is his final project. It's five pages long, click to see the entire comic as one long image, then possibly click again to enlarge it so you can read it:

Ben's Avis comic

I can't tell you how proud this comic makes me. Ben has always been a naturally talented artist, but this is a quantum leap up in technique and execution for him. Also, it's a really sweet story.

I've been writing about Ben's progress as an artist here for a long time:

Now at 16, Ben continues to amaze me with what he can do. In the robot movie post, I said, "I've always tried to encourage their creative sides, and they haven't let me down." Still true.

When reviewing GitHub pull requests, I sometimes want to get the proposed code onto my own machine, for running it, or just reading it in my own editor. Here are a few ways to do it, with a digression into git/alias/shell weirdness tacked on the end.

If the pull request is from a branch in the same repo, you can just check out the branch by name:

$ git checkout joe/proposed-feature

But you might not remember the name of the branch, or it might be in a different fork. Better is to be able to request the code by the pull request number.

The first technique I found was to modify the repo's .git/config file so that when you fetch code from the remote, it automatically pulls the pull request branches also. On GitHub, pull requests are at refspecs like "refs/pull/1234" (no, I don't really know what refspecs are, but I look forward to the day when I do...) Bert Belder wrote up a description of how to tweak your repo to automatically pull down all the pull request branches. You add this line to the [remote "origin"] section of your .git/config:

fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

Now when you "git fetch origin", you'll get all the pull request branches, and you can simply check out the one you want with "git checkout pr/1234".

But this means having to edit your repo's .git/config file before you can get the pull request code. If you have many repos, you're always going to be finding the ones that haven't been tweaked yet.

A technique I liked better is on Corey Frang's gist, provided by Rico Sta. Cruz: Global .gitconfig aliases for pull request management. Here, you update your single ~/.gitconfig file to define a new command that will pull down a pull request branch when you need it:

[alias]
copr = "!f() { git fetch -fu ${2:-origin} refs/pull/$1/head:pr/$1 &&
                    git checkout pr/$1; }; f"

(That should all be on one line, but I wanted it to be readable here.) This gives us a new command, "git copr" (for CheckOut Pull Request) that gets branches from pull requests:

$ git copr 1234            # gets and switches to pr/1234 from origin
$ git copr 789 upstream    # gets and switches to pr/789 from upstream

This technique has the advantage that once you define the alias, it's available in any repo, and also, it both fetches the branch and switches you to it.

BTW: finding and collecting these kinds of shortcuts can be daunting, because if you don't understand every bit of them, then you're in cargo-cult territory. "This thing worked for someone else, and if I copy it here, then it will work for me!"

In a few of the aliases on these pages, I see that the commands end with "&& :". I asked about this in the #git IRC channel, and was told that it was pointless: "&&" joins two shell commands, and runs the second one if the first one succeeded, and ":" is a shell built-in that simply succeeds (it's the same as "true"). So what does "&& :" add to the command? Seemed like it was pointless; we were stumped.

Then I also asked why other aliases took the form they did. Our copr alias has this form:

"!f() { command1; command2; }; f"

The bang character escapes from git syntax to the shell. Then we define a shell function called f with two commands in it, then we call the function. Why define the function? Why not just define the alias to be the two commands?

More discussion and experimentation turned up the answer. The way git invokes the shell, the arguments to the alias are available as $1, $2, etc, but they are also appended to the command line. As an example, let's define three different git aliases, each of which uses two arguments:

[alias]
    ee1 = "!echo 2 is $2 stop; echo 1 is $1 stop"
    ee2 = "!echo 2 is $2 stop; echo 1 is $1 stop && :"
    ee3 = "!f() { echo 2 is $2 stop; echo 1 is $1 stop; }; f"

When we try these, the first does a bad thing, but the second and third are good:

$ git ee1 one two
1 is one stop
2 is two stop one two
$ git ee2 one two
1 is one stop
2 is two stop
$ git ee3 one two
1 is one stop
2 is two stop

The second one works because the ":" command eats up the extra arguments. The third one works because the eventual command run is "f one two", so the values are passed to the function. So the "&& :" wasn't pointless afterall, it was needed to make the arguments work properly.

From previous cargo-cult expeditions, my ~/.gitconfig has other aliases using a different form:

[alias]
    ee4 = !sh -c 'echo 1 is $1 stop && echo 2 is $2 stop'
    ee5 = !sh -c 'echo 1 is $1 stop && echo 2 is $2 stop' -

These do this:

$ git ee4 one two
1 is two stop
2 is stop
$ git ee5 one two
1 is one stop
2 is two stop

(No, I have no idea why ee4 does what it does.) So we have three odd forms that all are designed to let you access arguments positionally, but not get confused by them:

[alias]
    cmd1 = "!command1 && command2 && :"
    cmd2 = "!f() { command1; command2; }; f"
    cmd3 = !sh -c 'command1 && command2' -

All of them work, I like the function-defining one best, it seems most programmery, and least shell-tricky. I'm sure there's something here I'm misunderstanding, or a subtlety I'm overlooking, but I've learned stuff today.

tagged: » 4 reactions

One of the interesting things about helping beginning programmers is to see the way they think. After programming for so long, and using Python for so long, it's hard to remember how confusing it can all be. Beginners can reacquaint us with the difficulties.

Python has a handy way to iterate over all the elements of a sequence, such as a list:

for x in seq:
    doit(x)

But if you've only learned a few basic things, or are coming from a language like C or Javascript, you might do it like this:

i = 0
while i < len(seq):
    x = seq[i]
    doit(x)
    i += 1

(BTW, I did a talk at the PyCon before last all about iteration in Python, including these sorts of comparisons of techniques: Loop Like a Native.)

Once you learn about the range() builtin function, you know you can loop over the indexes of the sequence like this:

for i in range(len(seq)):
    x = seq[i]
    doit(x)

These two styles of loop are commonly seen. But when I saw this on Stackoverflow, I did a double-take:

i = 0
while i in range(len(seq)):
    x = seq[i]
    doit(x)
    i += 1

This is truly creative! It's an amalgam of the two beginner loops we've already seen, and at first glance, looks like a syntax error.

In fact, this works in both Python 2 and Python 3. In Python 2, range() produces a list, and lists support the "in" operator for checking element membership. In Python 3, range() produces a range object which also supports "in".

So each time around the loop, a new range is constructed, and it's examined for the value of i. It works, although it's baroque and performs poorly in Python 2, being O(n2) instead of O(n).

People are creative! Just when I thought there's no other ways to loop over a list, a new technique arrives!

tagged: » 8 reactions

At edX, I help with the Open edX community, which includes being a traffic cop with the flow of pull requests. We have 15 or so different repos that make up the entire platform, so it's tricky to get a picture of what's happening where.

So I made a chart:

Pull requests, charted by age.

The various teams internal to edX are responsible for reviewing pull requests in their areas of expertise, so this chart is organized by teams, with most-loaded at the top. The colors indicate the time since the pull request was opened. The bars are clickable, showing details of the pull requests in each bunch.

This was a fun project because of the new stuff I got to play with along the way. The pull request data is gathered by a Python program running on Heroku, using the GitHub API of course. The summary of the appropriate pull requests are stored in a JSON file. A GitHub webhook pings Heroku when a pull request changes, and the Python updates the JSON.

Then I used d3.js in the HTML page to retrieve the JSON, slice and dice it, and build an SVG chart. The clickable bars open to show HTML tables embedded with a foreignObject. This was complicated to get right, but drawing the tables with SVG would be painful, and drawing the bars with HTML would be painful. This let me use the best tool for each job.

D3.js is an impressive piece of work, but took some getting used to. Mike Bostock's writings helped explain what was going on. The key insight: d3 is not a charting library. It's a way to use data to create pages, of turning data into DOM nodes.

So far, the chart seems to have helped edX stay aware of how pull requests are moving. It hasn't made everything speedy, but at least we know where things are stalled, and it has encouraged teams to try to avoid being at the top. I'd like to add more to it, for example, other ways of sorting and grouping, and more information about the pull requests themselves.

The code is part of our repo-tools if you are interested.

tagged: » react

That is all.

tagged: » 3 reactions

Older:

Fri 14:

Pi day

Even older...