« | » Main « | »

Fetching GitHub pull requests

Thursday 31 July 2014

When reviewing GitHub pull requests, I sometimes want to get the proposed code onto my own machine, for running it, or just reading it in my own editor. Here are a few ways to do it, with a digression into git/alias/shell weirdness tacked on the end.

If the pull request is from a branch in the same repo, you can just check out the branch by name:

$ git checkout joe/proposed-feature

But you might not remember the name of the branch, or it might be in a different fork. Better is to be able to request the code by the pull request number.

The first technique I found was to modify the repo's .git/config file so that when you fetch code from the remote, it automatically pulls the pull request branches also. On GitHub, pull requests are at refspecs like "refs/pull/1234" (no, I don't really know what refspecs are, but I look forward to the day when I do...) Bert Belder wrote up a description of how to tweak your repo to automatically pull down all the pull request branches. You add this line to the [remote "origin"] section of your .git/config:

fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

Now when you "git fetch origin", you'll get all the pull request branches, and you can simply check out the one you want with "git checkout pr/1234".

But this means having to edit your repo's .git/config file before you can get the pull request code. If you have many repos, you're always going to be finding the ones that haven't been tweaked yet.

A technique I liked better is on Corey Frang's gist, provided by Rico Sta. Cruz: Global .gitconfig aliases for pull request management. Here, you update your single ~/.gitconfig file to define a new command that will pull down a pull request branch when you need it:

copr = "!f() { git fetch -fu ${2:-origin} refs/pull/$1/head:pr/$1 &&
                    git checkout pr/$1; }; f"

(That should all be on one line, but I wanted it to be readable here.) This gives us a new command, "git copr" (for CheckOut Pull Request) that gets branches from pull requests:

$ git copr 1234            # gets and switches to pr/1234 from origin
$ git copr 789 upstream    # gets and switches to pr/789 from upstream

This technique has the advantage that once you define the alias, it's available in any repo, and also, it both fetches the branch and switches you to it.

BTW: finding and collecting these kinds of shortcuts can be daunting, because if you don't understand every bit of them, then you're in cargo-cult territory. "This thing worked for someone else, and if I copy it here, then it will work for me!"

In a few of the aliases on these pages, I see that the commands end with "&& :". I asked about this in the #git IRC channel, and was told that it was pointless: "&&" joins two shell commands, and runs the second one if the first one succeeded, and ":" is a shell built-in that simply succeeds (it's the same as "true"). So what does "&& :" add to the command? Seemed like it was pointless; we were stumped.

Then I also asked why other aliases took the form they did. Our copr alias has this form:

"!f() { command1; command2; }; f"

The bang character escapes from git syntax to the shell. Then we define a shell function called f with two commands in it, then we call the function. Why define the function? Why not just define the alias to be the two commands?

More discussion and experimentation turned up the answer. The way git invokes the shell, the arguments to the alias are available as $1, $2, etc, but they are also appended to the command line. As an example, let's define three different git aliases, each of which uses two arguments:

    ee1 = "!echo 2 is $2 stop; echo 1 is $1 stop"
    ee2 = "!echo 2 is $2 stop; echo 1 is $1 stop && :"
    ee3 = "!f() { echo 2 is $2 stop; echo 1 is $1 stop; }; f"

When we try these, the first does a bad thing, but the second and third are good:

$ git ee1 one two
1 is one stop
2 is two stop one two
$ git ee2 one two
1 is one stop
2 is two stop
$ git ee3 one two
1 is one stop
2 is two stop

The second one works because the ":" command eats up the extra arguments. The third one works because the eventual command run is "f one two", so the values are passed to the function. So the "&& :" wasn't pointless afterall, it was needed to make the arguments work properly.

From previous cargo-cult expeditions, my ~/.gitconfig has other aliases using a different form:

    ee4 = !sh -c 'echo 1 is $1 stop && echo 2 is $2 stop'
    ee5 = !sh -c 'echo 1 is $1 stop && echo 2 is $2 stop' -

These do this:

$ git ee4 one two
1 is two stop
2 is stop
$ git ee5 one two
1 is one stop
2 is two stop

(No, I have no idea why ee4 does what it does.) So we have three odd forms that all are designed to let you access arguments positionally, but not get confused by them:

    cmd1 = "!command1 && command2 && :"
    cmd2 = "!f() { command1; command2; }; f"
    cmd3 = !sh -c 'command1 && command2' -

All of them work, I like the function-defining one best, it seems most programmery, and least shell-tricky. I'm sure there's something here I'm misunderstanding, or a subtlety I'm overlooking, but I've learned stuff today.

Creative looping

Wednesday 30 July 2014

One of the interesting things about helping beginning programmers is to see the way they think. After programming for so long, and using Python for so long, it's hard to remember how confusing it can all be. Beginners can reacquaint us with the difficulties.

Python has a handy way to iterate over all the elements of a sequence, such as a list:

for x in seq:

But if you've only learned a few basic things, or are coming from a language like C or Javascript, you might do it like this:

i = 0
while i < len(seq):
    x = seq[i]
    i += 1

(BTW, I did a talk at the PyCon before last all about iteration in Python, including these sorts of comparisons of techniques: Loop Like a Native.)

Once you learn about the range() builtin function, you know you can loop over the indexes of the sequence like this:

for i in range(len(seq)):
    x = seq[i]

These two styles of loop are commonly seen. But when I saw this on Stackoverflow, I did a double-take:

i = 0
while i in range(len(seq)):
    x = seq[i]
    i += 1

This is truly creative! It's an amalgam of the two beginner loops we've already seen, and at first glance, looks like a syntax error.

In fact, this works in both Python 2 and Python 3. In Python 2, range() produces a list, and lists support the "in" operator for checking element membership. In Python 3, range() produces a range object which also supports "in".

So each time around the loop, a new range is constructed, and it's examined for the value of i. It works, although it's baroque and performs poorly in Python 2, being O(n2) instead of O(n).

People are creative! Just when I thought there's no other ways to loop over a list, a new technique arrives!

Tracking pull request ages with d3

Sunday 20 July 2014

At edX, I help with the Open edX community, which includes being a traffic cop with the flow of pull requests. We have 15 or so different repos that make up the entire platform, so it's tricky to get a picture of what's happening where.

So I made a chart:

Pull requests, charted by age.

The various teams internal to edX are responsible for reviewing pull requests in their areas of expertise, so this chart is organized by teams, with most-loaded at the top. The colors indicate the time since the pull request was opened. The bars are clickable, showing details of the pull requests in each bunch.

This was a fun project because of the new stuff I got to play with along the way. The pull request data is gathered by a Python program running on Heroku, using the GitHub API of course. The summary of the appropriate pull requests are stored in a JSON file. A GitHub webhook pings Heroku when a pull request changes, and the Python updates the JSON.

Then I used d3.js in the HTML page to retrieve the JSON, slice and dice it, and build an SVG chart. The clickable bars open to show HTML tables embedded with a foreignObject. This was complicated to get right, but drawing the tables with SVG would be painful, and drawing the bars with HTML would be painful. This let me use the best tool for each job.

D3.js is an impressive piece of work, but took some getting used to. Mike Bostock's writings helped explain what was going on. The key insight: d3 is not a charting library. It's a way to use data to create pages, of turning data into DOM nodes.

So far, the chart seems to have helped edX stay aware of how pull requests are moving. It hasn't made everything speedy, but at least we know where things are stalled, and it has encouraged teams to try to avoid being at the top. I'd like to add more to it, for example, other ways of sorting and grouping, and more information about the pull requests themselves.

The code is part of our repo-tools if you are interested.

« | » Main « | »