« | » Main « | »

Writing pylint plugins

Saturday 30 May 2015

Of the popular Python static checkers, pylint seems to be the most forceful: it raises alarms more aggressively than the others. This can be annoying, but thankfully it also has detailed controls over what it complains about.

It is also extensible: you can write plugins that add checkers for your code. At edX, we've started doing this for problems we see that pylint doesn't already check for.

edx-lint is our repo of pylint extras, including plugins and a simple tool for keeping a central pylintrc file and using it in a number of repos.

The documentation for pylint internals is not great. It exists, but too quickly recommends reading the source to understand what's going on. The good news is that all of the built-in pylint checkers use the same mechanisms you will, so there are plenty of examples to follow.

A pylint checker is basically an abstract syntax tree (AST) walker, but over a richer AST than Python provides natively. Writing a checker involves some boilerplate that I don't entirely understand, but the meat of it is a simple function that examines the AST.

One problem we've had in our code is getting engineers to understand the idiosyncratic way that translation functions are used. When you use the gettext functions in your code, you have to use a literal string as the first argument. This is because the function will not only be called at runtime, but is also analyzed statically by the string extraction tools.

So this is good:

welcome = gettext("Welcome, {}!").format(user_name)

but this won't work properly:

welcome = gettext("Welcome, {}!".format(user_name))

The difference is subtle, but crucial. And both will work with the English string, so the bug can be hard to catch. So we wrote a pylint checker to flag the bad case.

The checker is i18n_check.py, and here is the important part:

TRANSLATION_FUNCTIONS = set([
    '_',
    'gettext',
    'ngettext', 'ngettext_lazy',
    'npgettext', 'npgettext_lazy',
    'pgettext', 'pgettext_lazy',
    'ugettext', 'ugettext_lazy', 'ugettext_noop',
    'ungettext', 'ungettext_lazy',
])

def visit_callfunc(self, node):
    if not isinstance(node.func, astroid.Name):
        # It isn't a simple name, can't deduce what function it is.
        return

    if node.func.name not in self.TRANSLATION_FUNCTIONS:
        # Not a function we care about.
        return

    if not self.linter.is_message_enabled(self.MESSAGE_ID):
        return

    first = node.args[0]
    if isinstance(first, astroid.Const):
        if isinstance(first.value, basestring):
            # The first argument is a constant string! All is well!
            return

    # Bad!
    self.add_message(self.MESSAGE_ID, args=node.func.name, node=node)

Because the method is named "visit_callfunc", it will be invoked for every function call found in the code. The "node" variable is the AST node for the function call. In the first line, we look at the expression for the function being called. It could be a name, or it could be some other expression. Most function calls will be a simple name, but if it isn't a name, then we don't know enough to tell if this is one of the translation functions, so we return without flagging a problem.

Next we look at the name of the function. If it isn't one of the dozen or so functions that will translate the string, then we aren't interested in this function call, so again, return without taking any action.

The next check is to see if this checker is even enabled. I think there's a better way to do this, but I'm not sure.

Finally we can do the interesting check: we look at the first argument to the function, which remember, is not a calculated value, but a node in the abstract syntax tree representing the code that will calculate the value.

The only acceptable value is a string constant. So we can check if the first argument is a Const node. Then we can examine the actual literal value, to see that it's a string. If it is, then everything is good, and we can return without an alarm.

But if the first argument is not a string constant, then we can use self.add_message to add a warning message to the pylint output. Elsewhere in the file, we defined MESSAGE_ID to refer to the message:

"i18n function %s() must be called with a literal string"

Our add_message call uses that string, providing an argument for the string formatter, so the message will have the actual function name in it, and also provides the AST node, so that the message can indicate the file and line where the problem happened.

That's the whole checker. If you're interested, the edx-lint repo also shows how to test checkers, which is done with sample .py files, and .txt files with the pylint messages they should generate.

We have a few other checkers also: checks that setUp and tearDown call their super() counterparts properly, and a check that range isn't called with a needless first argument.

The checker I'd like to write is one that can tell you that this:

self.assertTrue(len(x) == 2)

should be re-written as:

self.assertEqual(len(x), 2)

and other similar improvements to test assertions.

Once you write a pylint checker, you start to get ideas for others that might work well. I can see it becoming a kind of mania...

Be careful deleting files around git

Saturday 2 May 2015

Working in a Python project, it's common to have a clean-up step that deletes all the .pyc files, like this:

find . -name '*.pyc' -delete

This works great, but there's a slight chance of a problem: Git records information about branches in files within the .git directory. These files have the same name as the branch.

Try this:

git checkout -b cleanup-all-.pyc

This makes a branch called "cleanup-all-.pyc". After making a commit, I will have files named .git/refs/heads/cleanup-all-.pyc and logs/refs/heads/cleanup-all-.pyc. Now if I run my find command, it will delete those files inside the .git directory, and my branch will be lost.

One way to fix it is to tell find not to delete the file if it's found in the .git directory:

find . -name '*.pyc' -not -path './.git/*' -delete

A better way is:

find . -name '.git' -prune -o -name '*.pyc' -exec rm {} \;

The first command examines every file in .git, but won't delete the .pyc it finds there. The second command will skip the entire .git directory, and not waste time examining it.

UPDATE: I originally had -delete in that latter command, but find doesn't like -prune and -delete together. It seems simplistic and unfortunate, but there it is.

« | » Main « | »