|Ned Batchelder : Blog | Code | Text | Site|
A friend recommended a technical talk today: How to Design an a Good API and Why it Matters by Joshua Bloch. Looks good! It's also an hour long...
For a variety of reasons, it's hard to watch an hour-long video. I'd prefer to read the same content. But it isn't available textually. For my own talks, I produce full text as part of the preparation (for example, the Unicode sandwich talk).
People put slide decks up on SlideShare, but decks vary wildly in how well they contain the content. Some simply provide a backdrop, which is entertaining during a talk, but useless afterward.
Is there some way we can pool efforts to get more talks transcribed or summarized? Surely others would like to see it done? And there must be people eager to contribute in some way who could spend the time? Does something like this already exist?
I know the full talk, with the real speaker really speaking to me, is the best way to get their message. For example, Richard Feynman's series The Character of Physical Law just wouldn't be the same without his accent and delivery. But if the choice is reading a lengthy summary or not getting the message at all, I'll definitely take the summary.
Or maybe I'm an old codger stuck in text-world while all the younguns just want video?
Ben spent the summer at a RISD program for high-schoolers (obligatory celebration cake was here.) He majored in comics, and this is his final project. It's five pages long, click to see the entire comic as one long image, then possibly click again to enlarge it so you can read it:
I can't tell you how proud this comic makes me. Ben has always been a naturally talented artist, but this is a quantum leap up in technique and execution for him. Also, it's a really sweet story.
I've been writing about Ben's progress as an artist here for a long time:
Now at 16, Ben continues to amaze me with what he can do. In the robot movie post, I said, "I've always tried to encourage their creative sides, and they haven't let me down." Still true.
When reviewing GitHub pull requests, I sometimes want to get the proposed code onto my own machine, for running it, or just reading it in my own editor. Here are a few ways to do it, with a digression into git/alias/shell weirdness tacked on the end.
If the pull request is from a branch in the same repo, you can just check out the branch by name:
But you might not remember the name of the branch, or it might be in a different fork. Better is to be able to request the code by the pull request number.
The first technique I found was to modify the repo's .git/config file so that when you fetch code from the remote, it automatically pulls the pull request branches also. On GitHub, pull requests are at refspecs like "refs/pull/1234" (no, I don't really know what refspecs are, but I look forward to the day when I do...) Bert Belder wrote up a description of how to tweak your repo to automatically pull down all the pull request branches. You add this line to the [remote "origin"] section of your .git/config:
Now when you "git fetch origin", you'll get all the pull request branches, and you can simply check out the one you want with "git checkout pr/1234".
But this means having to edit your repo's .git/config file before you can get the pull request code. If you have many repos, you're always going to be finding the ones that haven't been tweaked yet.
A technique I liked better is on Corey Frang's gist, provided by Rico Sta. Cruz: Global .gitconfig aliases for pull request management. Here, you update your single ~/.gitconfig file to define a new command that will pull down a pull request branch when you need it:
(That should all be on one line, but I wanted it to be readable here.) This gives us a new command, "git copr" (for CheckOut Pull Request) that gets branches from pull requests:
This technique has the advantage that once you define the alias, it's available in any repo, and also, it both fetches the branch and switches you to it.
BTW: finding and collecting these kinds of shortcuts can be daunting, because if you don't understand every bit of them, then you're in cargo-cult territory. "This thing worked for someone else, and if I copy it here, then it will work for me!"
In a few of the aliases on these pages, I see that the commands end with "&& :". I asked about this in the #git IRC channel, and was told that it was pointless: "&&" joins two shell commands, and runs the second one if the first one succeeded, and ":" is a shell built-in that simply succeeds (it's the same as "true"). So what does "&& :" add to the command? Seemed like it was pointless; we were stumped.
Then I also asked why other aliases took the form they did. Our copr alias has this form:
The bang character escapes from git syntax to the shell. Then we define a shell function called f with two commands in it, then we call the function. Why define the function? Why not just define the alias to be the two commands?
More discussion and experimentation turned up the answer. The way git invokes the shell, the arguments to the alias are available as $1, $2, etc, but they are also appended to the command line. As an example, let's define three different git aliases, each of which uses two arguments:
When we try these, the first does a bad thing, but the second and third are good:
The second one works because the ":" command eats up the extra arguments. The third one works because the eventual command run is "f one two", so the values are passed to the function. So the "&& :" wasn't pointless afterall, it was needed to make the arguments work properly.
From previous cargo-cult expeditions, my ~/.gitconfig has other aliases using a different form:
These do this:
(No, I have no idea why ee4 does what it does.) So we have three odd forms that all are designed to let you access arguments positionally, but not get confused by them:
All of them work, I like the function-defining one best, it seems most programmery, and least shell-tricky. I'm sure there's something here I'm misunderstanding, or a subtlety I'm overlooking, but I've learned stuff today.
One of the interesting things about helping beginning programmers is to see the way they think. After programming for so long, and using Python for so long, it's hard to remember how confusing it can all be. Beginners can reacquaint us with the difficulties.
Python has a handy way to iterate over all the elements of a sequence, such as a list:
(BTW, I did a talk at the PyCon before last all about iteration in Python, including these sorts of comparisons of techniques: Loop Like a Native.)
Once you learn about the range() builtin function, you know you can loop over the indexes of the sequence like this:
These two styles of loop are commonly seen. But when I saw this on Stackoverflow, I did a double-take:
This is truly creative! It's an amalgam of the two beginner loops we've already seen, and at first glance, looks like a syntax error.
In fact, this works in both Python 2 and Python 3. In Python 2, range() produces a list, and lists support the "in" operator for checking element membership. In Python 3, range() produces a range object which also supports "in".
So each time around the loop, a new range is constructed, and it's examined for the value of i. It works, although it's baroque and performs poorly in Python 2, being O(n2) instead of O(n).
People are creative! Just when I thought there's no other ways to loop over a list, a new technique arrives!
At edX, I help with the Open edX community, which includes being a traffic cop with the flow of pull requests. We have 15 or so different repos that make up the entire platform, so it's tricky to get a picture of what's happening where.
So I made a chart:
The various teams internal to edX are responsible for reviewing pull requests in their areas of expertise, so this chart is organized by teams, with most-loaded at the top. The colors indicate the time since the pull request was opened. The bars are clickable, showing details of the pull requests in each bunch.
This was a fun project because of the new stuff I got to play with along the way. The pull request data is gathered by a Python program running on Heroku, using the GitHub API of course. The summary of the appropriate pull requests are stored in a JSON file. A GitHub webhook pings Heroku when a pull request changes, and the Python updates the JSON.
Then I used d3.js in the HTML page to retrieve the JSON, slice and dice it, and build an SVG chart. The clickable bars open to show HTML tables embedded with a foreignObject. This was complicated to get right, but drawing the tables with SVG would be painful, and drawing the bars with HTML would be painful. This let me use the best tool for each job.
D3.js is an impressive piece of work, but took some getting used to. Mike Bostock's writings helped explain what was going on. The key insight: d3 is not a charting library. It's a way to use data to create pages, of turning data into DOM nodes.
So far, the chart seems to have helped edX stay aware of how pull requests are moving. It hasn't made everything speedy, but at least we know where things are stalled, and it has encouraged teams to try to avoid being at the top. I'd like to add more to it, for example, other ways of sorting and grouping, and more information about the pull requests themselves.
The code is part of our repo-tools if you are interested.
As the maintainer of coverage.py, it's always been intriguing that web applications have so much code in template files. Coverage.py measures Python execution, so the logic in the template files goes un-measured.
Recently I started experimenting with measuring templates as well as pure Python code. Mako templates compile to Python files, which are then executed. Coverage.py can see the execution in the compiled Python files, so once we have a way to back-map the lines from the Mako output back to the Mako template, we have the start of a usable Mako coverage measurement.
This Mako experiment is on the tip of the coverage.py repo, and requires some code on the tip of Mako. The code isn't right yet, but it shows the idea. Eventually, this should be a plugin to coverage.py provided by Mako, but for now, we're just trying to prove out the concept.
If you want to try the Mako coverage (please do!), configure Mako to put your compiled .py files someplace convenient (like a mako/ directory in your project), then set this environment variable:
Jinja also compiles templates to Python files, but Django does not. Django is very popular, so I would like to support those templates also. Dmitry Trofimov wrote dtcov to measure Django template coverage. He does a tricky thing: in the trace function, determine if you are inside the Django template engine, and if so, walk the stack and look at the locals to grab line numbers.
As written dtcov looks too compute-intensive to run on full-sized projects, but I think the idea could work. I'm planning to experiment with it this week.
I had coffee the other day with Nathan Kohn. He goes by the nickname en_zyme, and it's easy to see why. He relishes the role of bringing pairs of people together to see what kind of new reaction can result.
This time, it was to meet Jonathan Henner, a doctoral student of his at Boston University. The topic was how to include deaf people in the Python community.
The discussion was wide-ranging, and I'm sure I've forgotten interesting tangents, but I got this jumble of notes:
Accommodating the deaf at Python community gatherings is a challenge because it means getting either an ASL interpreter, or a CART provider to close-caption presentations live. This presents a few hurdles:
Programming is a good career for the deaf, since it is heavily textual, but they may have a hard time accessing the curriculum for it. Jonathan is exploring the possibility of creating classes in ASL, since that is many deaf people's first language. A common misconception is that ASL is simply English spoken with the hands, but it is not.
We talked a bit about the overlap between the deaf and autistic worlds. The Walden school near Boston specializes in deaf students with other mental or emotional impairments, including autism. Jonathan made a claim that made me think: that deafness and autism are the two disabilities that have their own sub-culture. I don't know if that is true, I'm sure people with other disabilities will disagree, but it's interesting to discuss.
There were a lot of avenues to explore, I'm not sure what will come of it all. It would be great to broaden Python's reach into another community of people who haven't had full access to tech.
Has anyone had any experience doing this? Thoughts?
I continue to notice an unsettling trend: the rise of the GitHub monoculture. More and more, people seem to believe that GitHub is the center of the programming universe.
Don't get me wrong, I love GitHub. It succeeded at capturing and promoting the social aspect of development better than any other site. And git, despite its flaws, is a great version control system.
And just to be clear, I am not talking about the recent turmoil about GitHub's internal culture. That's a problem, but not the one I'm talking about.
Someone said to me, "I couldn't find coverage.py on GitHub." Right, because it's hosted on Bitbucket. When a developer thinks, "I want to find the source for package XYZ," why do they go to the GitHub search bar instead of Google? Do people really so believe that GitHub is the only place for code that it has supplanted Google as the way to find things?
(Yes, Google has a monopoly on search. But searching with Google today does not lock me in to continuing to search with Google tomorrow. When a new search engine appears, I can switch with no downside whatsoever.)
Another example: I'm contributing a chapter to the 500 lines book (irony: the link is to GitHub). Here in the README, to summarize authors, we are asked to provide a GitHub username and a Twitter handle. I suggested that a homepage URL is a more powerful and flexible way for authors to invite the curious to learn more about them. This suggestion was readily adopted (in a pending pull request), but the fact that the first thing to mind was GitHub+Twitter is another sign of people's mindset that these sites are the only places, not just some places.
Don't get me started on the irony of shops whose workflow is interrupted when GitHub is down. Git is a distributed version control system, right?
Some people go so far as to say, as Brandon Weiss has, GitHub is your resume. I would hope they do not mean it literally, but instead as a shorthand for, "your public code will be more useful to potential employers than your list of previous jobs." But reading Brandon's post, he means it literally, going so far as to recommend that you carefully garden your public repos to be sure that only the best work is visible. So much for collaboration.
There is power in everyone using the same tools. GitHub succeeds because it makes it simple for code to flow from developer to developer, and for people to find each other and work together. Still, other tools do some things better. Gerrit is a better code review workflow. Mercurial is easier for people to get started with.
GitHub has done a good job providing an API that makes it possible for other tools to integrate with them. But if Travis only works with GitHub, that just reinforces the monoculture. Eventually someone will have a better idea than GitHub, or even git. But the more everyone believes that GitHub is the only game in town, the higher the barrier will be to adopting the next great idea.
I love git and GitHub, but they should be a choice, not the only choice.
PyCon 2014 is over, and as usual, I loved every minute. There are a huge number of people that I know there, and about 5 different sub-communities that I feel an irrationally strong attachment to.
My head is still spinning from the high-energy four days I've had, I'm sure I'm leaving out an important high point. I just love every minute!
On the downside, I did not see as much of Montreal as I would have liked, but we'll be back for PyCon 2015, so I have a second chance!
My youngest son Ben turns 16 in a few days, and a few days ago was accepted into the RISD summer program for high-schoolers! So today, his cake:
He's really excited about RISD. It will be a big transition for him, six weeks away from home, doing serious art instruction. I'm really proud of him, and eager to see what changes it will bring. I'm also nervous about that...
The cake was fun, it's not often you get to try your hand at a calligraphic challenge in frosting!
Happy Pi day! Celebrate with delicious circular pastries!
In an ongoing (and wildly premature!) thread on the Python-Dev mailing list, people are debating the possible shapes of Python 4, and Barry pointed out that Guido doesn't like two-digit minor versions.
I can understand that, version numbers look like floats, and 3.1 is the same as 3.10. But in this case I think we should forge on to 3.10, 3.11, etc. Partly to avoid the inevitable panic that a switch to 4.x will entail, no matter what the actual semantics. But mostly so that we can get to version 3.14, which will of course be known as PiPy, joining in merry celebration with friends PyPI and PyPy!