My day job is working on Open edX. It’s large, and our requirements files are getting unruly. In particular, our requirements file for installing our other GitHub repos has grown very long in the tooth.
First, we have a mix of -e installs and non-e installs. -e means, check out the git working tree, and then install it as the code. This makes it easy to use the code: you can skip the discipline of writing and properly maintaining a setup.py. Just changing the SHA in the github URL should bring in new code.
We also have inconsistent use of “#egg” syntax in the URLs, and we don’t always include the version number, and when we do, we use one of three different syntaxes for it.
Worse, we’d developed a cargo-cult mentality about the mysteries of what pip might do. No one had confidence about the different behavior to expect from the different syntaxes. Sometimes updated code was being installed, and sometimes not.
I did an experiment where I made a simple package with just a version number in it (version_dummy), and I tried installing it in various ways. I found that I had to include a version number in the hash fragment at the end of the URL to get it to update properly. Then another engineer did a similar experiment and came to the opposite conclusion, that just changing the SHA would be enough.
As bad as cargo-culting is, this was even worse: two experiments designed to answer the same question, with different results! It was time to get serious.
An important property of science is reproducibility: another investigator should be able to run your experiment to see if they get the same results. On top of that, I knew I’d want to re-run my own experiment many times as I thought of new twists to try.
So I wrote a shell script that automated the installation and verification of versions. You can run it yourself: create a new virtualenv, then run the script.
I asked in the #pypa IRC channel for help with my mystery, and they had the clue I needed to get to the bottom of why we got two different answers. I had a git URL that looked like this:
He had a URL like this:
These look similar enough that they should behave the same, right? The difference is that mine has an underscore in the name, and his does not. My suffix (‘#egg=version_dummy’) is being parsed inside pip as if the package name was “version” and the version was “dummy”! This meant that updating the SHA wouldn’t install new code, because pip thought it knew what version it would get (“dummy”), and that’s the version it already had, so why install it?
Writing my experiment.sh script gave me a good place to try out different scenarios of updating my version_dummy from version 1.0 to 2.0.
Things I learned:
- -e installs work even if you only change the SHA, although there remains superstition around the office that this is not true. That might just be superstition, or there might be scenarios where it fails that I haven’t tried yet.
- If you use a non-e install, you have to supply an explicit version number on the URL, because punctuation in the package name can confuse pip.
- If you install a package non-e, and then update it with a -e install, you will have both installed, and you’ll need to uninstall it twice to really get rid of it.
- There are probably more scenarios that I haven’t tried yet that will come back to bite me later. :(
If anyone has more information, I’m really interested.