Evil ninja module initialization

Tuesday 10 January 2017This is close to eight years old. Be careful.

A question about import styles on the Python-Dev mailing list asked about imports like this:

import os as _os

Understanding why people do this is an interesting lesson in how modules work. A module is nothing more than a collection of names. When you define a name in a .py file, it becomes an attribute of the module, and is then importable from the module.

An underlying simplicity in Python is that many statements are really just assignment statements in disguise. All of these define the name X:

X = 17
def X(): print("look!")
import X

When you create a module, you can make the name “X” importable from that module by assigning to it, or defining it as a function. You can also make it importable by importing it yourself.

Suppose your module looks like this:

# yourmodule.py
import os

def doit():
    os.something_or_other()

This module has two names defined in it: “doit”, and “os”. Someone else can now do this:

# someone.py
from yourmodule import os

# or worse, this imports os and doit:
from yourmodule import *

This bothers some people. “os” is not part of the actual interface of yourmodule. That first import I showed prevents this leaking of your imports into your interface. Importing star doesn’t pull in names starting with underscores. (Another solution is to define __all__ in your module.)

Most people though, don’t worry about this kind of name leaking. Import-star is discouraged anyway, and people know not to import os from other modules. The solution of renaming os to _os just makes your code ugly for little benefit.

The part of the discussion thread that really caught my eye was Daniel Holth’s winking suggestion of the “evil ninja mode pattern” of module initialization:

def ninja():
    global exported
    import os
    def exported():
        os.do_something()

ninja()
del ninja

What’s going on here!? Remember that def is an assignment statement like any other. When used inside a function, it defines a local name, as assignment always does. But an assignment in a function can define a global name if the name is declared as global. It’s a little unusual to see a global statement without an explicit assignment at the top-level, but it works just fine. The def statement defines a global “exported” function, because the global statement told it to. “os” is now a local in our function, because again, the import statement is just another form of assignment.

So we define ninja(), and then execute it immediately. This defines the global “exported”, and doesn’t define a global “os”. The only problem is the name “ninja” has been defined, which we can clean up with a del statement.

Please don’t ever write code this way. It’s a kind of over-defensiveness that isn’t needed in typical Python code. But understanding what it does, and why it does it, is a good way to flex your understanding of Python workings.

For more about how names (and values) work in Python, people seem to like my PyCon talk, Python Names and Values.

Comments

[gravatar]
> Note that we could also solve our original "os" concern with a del statement

I don't think that's true: wouldn't the functions needing "os" now fail at runtime?
[gravatar]
It's called Ninja mode because it defines the module, and then leaves no trace. Like a Ninja.
[gravatar]
By the way I thought of this after programming in JavaScript for a while, which uses closures much more heavily than Python (at least when being used Douglas Crockford "JavaScript: The Good Parts"-style). In the example the os module is part of a closure only accessible to the exported function.
[gravatar]
@Chris: oops, right! I've removed the erroneous sentence.
[gravatar]
Here are some other ways you can define the name X
for X in [1]:
    pass

with ctx as X:
    pass

# This one only works in Python 2
try:
    stuff
except Exception as X:
    pass
[gravatar]
I commented a bit on Twitter, but I'll expand here. I disagree that you shouldn't be defensive about this. It is true that import * is bad, but it is part of the language and people use it. More to the point, any name that you export will become part of your API. People will come to rely on it (import * or no). If you later remove it, the API will break. Here's an example of a case where someone added a variable named "version" in __init__.py and it was mistaken for an API (which was confusing, because it didn't exist in previous versions). The variable was almost certainly not intended to be the API for the version (__version__ already existed), but since it was there, people thought it should be used.

I highly recommend using __all__ in all __init__.py files. Using it in individual .py files is nice too, but not as necessary, since people import from top-level modules more often than from submodules. A good way to do it is to use the double bookkeeping method. For instance
from .submodule import name1, name2

__all__  = ['name1', 'name2']
This seems terrible and redundant, but you should think of it as double bookkeeping, not redundancy.

Additionally, use the tool pyflakes on your __init__.py file, it will detect names that are defined but not in __all__ (which should be del-ed) and names that are in __all__ but not defined (usually spelling errors). It's also smart enough to recognize the "__all__ +=" pattern. This is where the "double bookkeeping" comes in. If you forget one or the other it will error, and it forces you to be explicit and intentional about which names you export.

Some people try to be smart here and using some code to "remove the redundancy" and do the bookkeeping automatically. But that defeats the purpose! Instead, be explicit and use pyflakes to spot mistakes.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.