|Ned Batchelder : Blog | Code | Text | Site|
One of the challenging things about programming is being able to really see code the way the computer is going to see it. Sometimes the human-only signals are so strong, we can't ignore them. This is one of the reasons I like indentation-significant languages like Python: people attend to the indentation whether the computer does or not, so you might as well have the people and computers looking at the same thing.
I was reminded of this problem yesterday while trying to debug a sample application I was toying with. It has a config file with some strings and dicts in it. It reads in part like this:
When I saw this file, I thought, "That's a weird way to comment things," but didn't worry more about it. Then later when the response was failing, I debugged into it, and realized what was wrong with this file. Before reading on, do you see what it is?
• • •
• • •
• • •
Python concatenates adjacent string literals. This is handy for making long strings without having to worry about backslashes. In real code, this feature is little-used, and it happens in a surprising place here. The "docstring" for the dictionary is implicitly concatenated to the first key. PYLTI_URL_FIX has a key that's 163 characters long: " Remap URL to ... URL.\nhttps://localhost:8000/", including three newlines.
But SECRET_KEY isn't affected. Why? Because the SECRET_KEY assignment line is a complete statement all by itself, so it doesn't continue onto the next line. Its "docstring" is a statement all by itself. The PYLTI_URL_FIX docstring is inside the braces of the dictionary, so it's all part of one 13-line statement. All the tokens are considered together, and the adjacent strings are concatenated.
As odd as this code was, it was hard to see what was going to happen, because the first string was clearly meant as a comment, both in its token form (a multiline string, starting in the first column) and in its content (English text explaining the dictionary). The second string is clearly intended as a key in the dict (short, containing data, indented). But all of those signals are human signals, not computer signals. So I as a human attended to them and misunderstood what would happen when the computer saw the same text and ignored those signals.
The fix of course is to use conventional comments. Programming is hard, yo. Stick to the conventions.
I have a document challenge. It's a perfect job for Lotus Notes. What do I use in its place today?
I want to keep track of a bunch of web sites, say 100-200 of them. For each, I want a free-form document that lets me keep notes about them. But I also have structured information I want to track for each, like an email contact, a GitHub repo, some statistics, and so on. I want to be able to display these documents in summarized lists, so that some of the structured information is displayed in a table, and I can sort and filter the documents based on that information.
This is exactly what Lotus Notes did well. Is there something that can do it now? Ideally, it would be part of a Confluence wiki, but other options would be good too. (Please don't say SharePoint...)
CouchDB is the perfect backend for a system like this (no wonder, it was written by Damien Katz, and inspired by his time at Lotus), but is there a GUI client that makes it a complete application?
Say what you will about Lotus Notes, it was really good at this kind of job.
It was our first time organizing a conference, and we did it on short notice, about four months. Where we didn't know what to do, we mostly made it be like PyCon: 30-minute talks, 10 minutes between talks, a few opportunities for lightning talks, etc.
Judging from the #openedxcon tweet stream, and from talking to people afterward, people seemed to really like it.
I gave part of the edX keynote, and as usually happens when I give a talk, there are things I know I am going to say, and things that seem to just pop out of my mouth once I get going. I was showing two examples of long-tail Open edX sites, and making the point that edX would never have put these particular courses online itself. I said,
But it got tweeted as:
How meta: I say something, then the community turns it into something else, beyond my control! This was widely re-quoted, and was repeated by our CEO at another edX event later that week.
There's a difference between "beyond our reach" and "beyond our control." Not a huge difference, but I was talking more about reach at the time. But maybe that's a sign that things really are working, when it is beyond your control, and it's still good. Just like I "said."
And Open edX is going well: there are about 60 sites running 400 courses, all over the world. EdX has as outsized goal of educating a billion students by 2020, and Open edX will be a significant part of that. The 160 people at the conference were excited to help by running their own sites and courses. The conference was a success, even the parts beyond our control...
My son Max is a senior at NYU, studying film. He has to finish his senior project. It costs real money to make real films. Give him some money to help!
In 1964, Richard Feynman gave a series of seven lectures at Cornell called The Character of Physical Law. They were recorded by the BBC, and are now on YouTube. These are great.
These are not advanced lectures, they were intended for a general audience, and Feynman does a great job inhabiting the world of fundamental physics. He's clearly one of the top experts, but explains in such a personal approachable style that you are right alongside him as he explores this world looking for answers, following in the footsteps of Newton and Einstein.
If you've never heard Feynman, at least dip into the first one if only to hear his deep, thick New York accent. He also is witty: he places the French Revolution in 1783, and says, "Accurate to three decimal places, not bad for a physicist!" It's disarmingly out of character for an intellectual, but Feynman is the real thing, discussing not just the basics of forces and particles, but the philosophical implications for scientists and all thinkers:
I converted the videos to pure audio and listened to them in my car, which meant I couldn't see what he was drawing on the blackboard, but it was enlightening nonetheless. Highly recommended: The Character of Physical Law.
I do a lot of side projects in the Python world. I write coverage.py. I give talks at PyCon, like the one about iterators, or the one with the Unicode sandwich. I hang out in the #python IRC channel and help people with their programs. I organize Boston Python.
I enjoy these things, and I don't get paid for them. But if you want to help me out, here's how you can: my son Max is in his last semester at NYU film school, which means he and his friends are making significant short films. These films need funding. If you've liked something I've done for you in the Python world, how about tossing some money over to a film?
Max will be doing a film of his own this semester, but his Kickstarter isn't live yet. In the meantime, he's the cinematographer on his friend Jacob's film Go To Hell. So give a little money to Jacob, and in a month or so I'll hit you up again to give a lot of money to Max. :)
The first alpha of the next major version of coverage.py is available: coverage.py v4.0a1.
The big new feature is support for the gevent, greenlet, and eventlet concurrency libraries. Previously, these libraries' behind-the-scenes stack swapping would confuse coverage.py. Now coverage adapts to give accurate coverage measurement. To enable it, use the "concurrency" setting to specify which library you are using.
Huge thanks to Peter Portante for getting the concurrency support started, and Joe Jevnik for the last final push.
Also new is that coverage.py will read its configuration from setup.cfg if there is no .coveragerc file. This lets you keep more of your project configuration in one place.
Lastly, the textual summary report now shows missing branches if you are using branch coverage.
One warning: I'm moving around lots of internals. People have a tendency to use what they need to to get their plugin or tool to work, so some of those third-party packages may now be broken. Let me know what you find.
Full details of other changes are in the CHANGES.txt file.
I thought today was going to be a good day. I was going to release the first alpha version of coverage.py 4.0. I finally finished the support for gevent and other concurrency libraries like it, and I wanted to get the code out for people to try it.
So I made the kits and pushed them to PyPI. I used to not do that, because people would get the betas by accident. But pip now understands about pre-releases and real releases, and won't install an alpha version by default. Only if you explicitly use --pre will you get an alpha.
About 10 minutes after I pushed the kits, someone I was chatting with on IRC said, "Did you just release a new version of coverage?" Turns out his Travis build was failing.
He was using coveralls to report his coverage statistics, and it was failing. Turns out coveralls uses internals from coverage.py to do its work, and I've made big refactorings to the internals, so their code was broken. But how did the alpha get installed in the first place?
He was using tox, and it turns out that when tox installs dependencies, it defaults to using the --pre switch! Why? I don't know.
OK, I figured I would just hide the new version on PyPI. That way, if people wanted to try it, they could use "pip install coverage==4.0a1", and no one else would be bothered with it. Nope: pip will find the newer version even if it is hidden on PyPI. Why? I don't know.
In my opinion:
So now the kit is removed entirely from PyPI while I figure out a new approach. Some possibilities, none of them great:
Software is hard, yo.
A friend recommended a technical talk today: How to Design an a Good API and Why it Matters by Joshua Bloch. Looks good! It's also an hour long...
For a variety of reasons, it's hard to watch an hour-long video. I'd prefer to read the same content. But it isn't available textually. For my own talks, I produce full text as part of the preparation (for example, the Unicode sandwich talk).
People put slide decks up on SlideShare, but decks vary wildly in how well they contain the content. Some simply provide a backdrop, which is entertaining during a talk, but useless afterward.
Is there some way we can pool efforts to get more talks transcribed or summarized? Surely others would like to see it done? And there must be people eager to contribute in some way who could spend the time? Does something like this already exist?
I know the full talk, with the real speaker really speaking to me, is the best way to get their message. For example, Richard Feynman's series The Character of Physical Law just wouldn't be the same without his accent and delivery. But if the choice is reading a lengthy summary or not getting the message at all, I'll definitely take the summary.
Or maybe I'm an old codger stuck in text-world while all the younguns just want video?
Ben spent the summer at a RISD program for high-schoolers (obligatory celebration cake was here.) He majored in comics, and this is his final project. It's five pages long, click to see the entire comic as one long image, then possibly click again to enlarge it so you can read it:
I can't tell you how proud this comic makes me. Ben has always been a naturally talented artist, but this is a quantum leap up in technique and execution for him. Also, it's a really sweet story.
I've been writing about Ben's progress as an artist here for a long time:
Now at 16, Ben continues to amaze me with what he can do. In the robot movie post, I said, "I've always tried to encourage their creative sides, and they haven't let me down." Still true.
When reviewing GitHub pull requests, I sometimes want to get the proposed code onto my own machine, for running it, or just reading it in my own editor. Here are a few ways to do it, with a digression into git/alias/shell weirdness tacked on the end.
If the pull request is from a branch in the same repo, you can just check out the branch by name:
But you might not remember the name of the branch, or it might be in a different fork. Better is to be able to request the code by the pull request number.
The first technique I found was to modify the repo's .git/config file so that when you fetch code from the remote, it automatically pulls the pull request branches also. On GitHub, pull requests are at refspecs like "refs/pull/1234" (no, I don't really know what refspecs are, but I look forward to the day when I do...) Bert Belder wrote up a description of how to tweak your repo to automatically pull down all the pull request branches. You add this line to the [remote "origin"] section of your .git/config:
Now when you "git fetch origin", you'll get all the pull request branches, and you can simply check out the one you want with "git checkout pr/1234".
But this means having to edit your repo's .git/config file before you can get the pull request code. If you have many repos, you're always going to be finding the ones that haven't been tweaked yet.
A technique I liked better is on Corey Frang's gist, provided by Rico Sta. Cruz: Global .gitconfig aliases for pull request management. Here, you update your single ~/.gitconfig file to define a new command that will pull down a pull request branch when you need it:
(That should all be on one line, but I wanted it to be readable here.) This gives us a new command, "git copr" (for CheckOut Pull Request) that gets branches from pull requests:
This technique has the advantage that once you define the alias, it's available in any repo, and also, it both fetches the branch and switches you to it.
BTW: finding and collecting these kinds of shortcuts can be daunting, because if you don't understand every bit of them, then you're in cargo-cult territory. "This thing worked for someone else, and if I copy it here, then it will work for me!"
In a few of the aliases on these pages, I see that the commands end with "&& :". I asked about this in the #git IRC channel, and was told that it was pointless: "&&" joins two shell commands, and runs the second one if the first one succeeded, and ":" is a shell built-in that simply succeeds (it's the same as "true"). So what does "&& :" add to the command? Seemed like it was pointless; we were stumped.
Then I also asked why other aliases took the form they did. Our copr alias has this form:
The bang character escapes from git syntax to the shell. Then we define a shell function called f with two commands in it, then we call the function. Why define the function? Why not just define the alias to be the two commands?
More discussion and experimentation turned up the answer. The way git invokes the shell, the arguments to the alias are available as $1, $2, etc, but they are also appended to the command line. As an example, let's define three different git aliases, each of which uses two arguments:
When we try these, the first does a bad thing, but the second and third are good:
The second one works because the ":" command eats up the extra arguments. The third one works because the eventual command run is "f one two", so the values are passed to the function. So the "&& :" wasn't pointless afterall, it was needed to make the arguments work properly.
From previous cargo-cult expeditions, my ~/.gitconfig has other aliases using a different form:
These do this:
(No, I have no idea why ee4 does what it does.) So we have three odd forms that all are designed to let you access arguments positionally, but not get confused by them:
All of them work, I like the function-defining one best, it seems most programmery, and least shell-tricky. I'm sure there's something here I'm misunderstanding, or a subtlety I'm overlooking, but I've learned stuff today.
One of the interesting things about helping beginning programmers is to see the way they think. After programming for so long, and using Python for so long, it's hard to remember how confusing it can all be. Beginners can reacquaint us with the difficulties.
Python has a handy way to iterate over all the elements of a sequence, such as a list:
(BTW, I did a talk at the PyCon before last all about iteration in Python, including these sorts of comparisons of techniques: Loop Like a Native.)
Once you learn about the range() builtin function, you know you can loop over the indexes of the sequence like this:
These two styles of loop are commonly seen. But when I saw this on Stackoverflow, I did a double-take:
This is truly creative! It's an amalgam of the two beginner loops we've already seen, and at first glance, looks like a syntax error.
In fact, this works in both Python 2 and Python 3. In Python 2, range() produces a list, and lists support the "in" operator for checking element membership. In Python 3, range() produces a range object which also supports "in".
So each time around the loop, a new range is constructed, and it's examined for the value of i. It works, although it's baroque and performs poorly in Python 2, being O(n2) instead of O(n).
People are creative! Just when I thought there's no other ways to loop over a list, a new technique arrives!