How to distribute Python modules?

Wednesday 28 January 2004

I’ve been writing a lot of Python modules lately, and intend to put them up in my code section as soon as I can. But what starts as a self-contained module inevitably splits into multiple smaller modules for modularity, and I don’t know how best to distribute them.

For example, I wrote a code generation tool (module cogapp). It had an XML wrapper to make using XML data files easier. That code was split out and posted as handyxml. To test that code, I wrote a small module (called makefiles) to generate trees of files from a dictionary description. The makefiles module used the standard library textwrap.dedent function to make specifying the file contents easier, until I realized that dedent also expands tabs (why?), which I didn’t want. My cogapp module has the dedenter I want (because I didn’t know about dedent at the time). So I’m going to split out the dedenter as its own module (I’ll call it redenter).

But now I’ve got these modules:

  • redenter
  • makefiles, which requires redenter
  • handyxml, whose unit test requires makefiles
  • cogapp, which requires redenter and optionally uses handyxml, and whose unit test requires makefiles

So here are the questions:

  1. How should cogapp (for example) be posted? In a tar file with all of the required modules? By itself with a pointer to the other pages?
  2. Should these modules all be in a package? If this were Java, they’d all be in a package called com.nedbatchelder.
  3. Should unit tests be included?
  4. How big does something need to be before I should really use distutils for it?
  5. Should these things go on PyPI, or sourceforge?


1. I always distribute truly independent things indepedently. If you ship the whole lot as one package there is nothing stopping other people from cherry picking the parts they want. On the other hand it looks like it could be a fair bit of work to split that lot into four truly independent packages.

2. However you like. I tend to follow the Java idiom of lots of different nested packages (halfcooked.utilities.schema.database anyone?) but the Zen of Python says "Flat is better than nested", so its probably more 'correct' to put them all in one package.

3. Oh yes. Preferably invoked from the if __name__ == "__main__": section of each module.

4. I use distutils for single module (file) packages. It just works.

5. Put them on PyPI and/or sourceforge. Its up to you.
1. Hmm. I have the same problem with my own projects. For example, Firedrop requires Wax, but doesn't "ship" with it. Maybe it should. On the other hand, it also requires Sextile, which I just drop into the distro because it's only one module. (This approach has an additional problem: whenever I update, I need to copy it to the Firedrop directory so it has the latest version.)

2. They can be, but they're really 4 different things... myself, I would probably not group them together.

3. Those are always useful, especially if you want (to keep the option open for) others to contribute to your programs.

4. No clue, I don't use distutils (yet?). I've seen people use it for single modules.

5. PyPI, yes. (I don't practice what I preach... I should add a bunch of my own projects one of these days.) Sourceforge is more useful if you expect others to work on these projects as well, otherwise it's a bit overkill.

My $0.02,
Darn, I was hoping for answers to all these questions, since I wonder about them too. On the sourceforge thing, I'd suggest PyPI -- that's more aimed, in my mind, towards letting people know that your code exists, and SF is (or at least should be) oriented around allowing a larger community to hack on the code together. One-man projects on SF always seem a bit silly to me, especially if they're not undergoing heavy development.
PyPI: absolutely. Why ever not? That's exactly what it's there for. Freshmeat: maybe. I'd probably only bother if it was a large, significant library, or if it was a free-standing program. SourceForge: if you expect many contributions, discussion, etc. In your case, probably not, it's easier to manage it by hand and in personal email. If you have a bunch of projects, you could put them all together in a single SourceForge project. nedutils

distutils should be used for any library. Stand-alone programs can use it too, but it doesn't feel as useful in that case, but you should use it if your program also has portions that can be used like a library. It's not hard to set up distutils, after you do it the first time. Makes packaging easy and uniform. (Automatic PyPI registration to boot)

Make a package out of every library you distribute. If you still only have one file, put everything in Then you have room to grow, and users need be none the wiser. Keep the package and library names lower case, and simple but distinct. If it's too long someone can always do "import blahblahblah as blah", but if it conflicts then it's a pain in the ass to resolve.

Distribute the tests. What's a few extra bytes between friends? A distutilified package will usually have the source in a subdirectory of the main package, and the tests in a separate subdirectory. If it's all together (or in a subpackage), you can tell distutils not to install the tests. This is easier if they are in a subpackage.

I haven't myself really decided if unit tests should be done with a "python install ; python tests/" (i.e., reinstall before every test), or if the tests should be runnable out of the uninstalled source package. Installation is fast, though (distutils checks dates before copying), so it's not that big a deal to reinstall frequently. Or you could use "python --home ~ install" (I think that's the command), and make sure ~/lib/python is at the front of the search path when you run your test, and then you'll keep your development code away from your global Python environment. I suppose all this should really be encapsulated into a "python test"... there's a bunch of distutils patterns out there that aren't documented, but people are using in their own projects.

I would generally err on the side of including files. If they are files that are also part of an external library that you are distributing separately, I might put them in an awkwardly-named subpackage (e.g., "included_fooutil"), and then do a try:import fooutil; except ImportError: import included_fooutil as fooutil. Again, what's a few bytes between friends? But it is a little lame and redundant. If you don't do that, I would still protect the import with a try:except:, and print out specific information about where to get the package if it's missing (before re-raising the error). The warnings module is probably the right way to print the error message.

Okay, that's all my thoughts...

I haven't myself really decided if unit tests should be done with a "python install ; python tests/" (i.e., reinstall before every test), or if the tests should be runnable out of the uninstalled source package.

At the moment I run unit-tests out of build/lib.$ARCH, which works fine except for one minor nit: The .pyc-files get compiled with the build/lib.$ARCH path, and are then installed. Which results in tracebacks with .../build/lib.$ARCH after installation :-(.

Ah, well. Maybe I should try the test-after-install routine.

I don't understand the issue about "install before test". If the tests are installed with the code, then why force an install before every test?

I think Ian meant that when your tests run against the installed code, you need to "install before test" (i.e. before each test-run, not individual test-cases). As opposed to the situation where your tests run against the code in the source tree, where you do not need to install before each test-run.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.