Writing pylint plugins

Saturday 30 May 2015

Of the popular Python static checkers, pylint seems to be the most forceful: it raises alarms more aggressively than the others. This can be annoying, but thankfully it also has detailed controls over what it complains about.

It is also extensible: you can write plugins that add checkers for your code. At edX, we've started doing this for problems we see that pylint doesn't already check for.

edx-lint is our repo of pylint extras, including plugins and a simple tool for keeping a central pylintrc file and using it in a number of repos.

The documentation for pylint internals is not great. It exists, but too quickly recommends reading the source to understand what's going on. The good news is that all of the built-in pylint checkers use the same mechanisms you will, so there are plenty of examples to follow.

A pylint checker is basically an abstract syntax tree (AST) walker, but over a richer AST than Python provides natively. Writing a checker involves some boilerplate that I don't entirely understand, but the meat of it is a simple function that examines the AST.

One problem we've had in our code is getting engineers to understand the idiosyncratic way that translation functions are used. When you use the gettext functions in your code, you have to use a literal string as the first argument. This is because the function will not only be called at runtime, but is also analyzed statically by the string extraction tools.

So this is good:

welcome = gettext("Welcome, {}!").format(user_name)

but this won't work properly:

welcome = gettext("Welcome, {}!".format(user_name))

The difference is subtle, but crucial. And both will work with the English string, so the bug can be hard to catch. So we wrote a pylint checker to flag the bad case.

The checker is i18n_check.py, and here is the important part:

TRANSLATION_FUNCTIONS = set([
    '_',
    'gettext',
    'ngettext', 'ngettext_lazy',
    'npgettext', 'npgettext_lazy',
    'pgettext', 'pgettext_lazy',
    'ugettext', 'ugettext_lazy', 'ugettext_noop',
    'ungettext', 'ungettext_lazy',
])

def visit_callfunc(self, node):
    if not isinstance(node.func, astroid.Name):
        # It isn't a simple name, can't deduce what function it is.
        return

    if node.func.name not in self.TRANSLATION_FUNCTIONS:
        # Not a function we care about.
        return

    if not self.linter.is_message_enabled(self.MESSAGE_ID):
        return

    first = node.args[0]
    if isinstance(first, astroid.Const):
        if isinstance(first.value, basestring):
            # The first argument is a constant string! All is well!
            return

    # Bad!
    self.add_message(self.MESSAGE_ID, args=node.func.name, node=node)

Because the method is named "visit_callfunc", it will be invoked for every function call found in the code. The "node" variable is the AST node for the function call. In the first line, we look at the expression for the function being called. It could be a name, or it could be some other expression. Most function calls will be a simple name, but if it isn't a name, then we don't know enough to tell if this is one of the translation functions, so we return without flagging a problem.

Next we look at the name of the function. If it isn't one of the dozen or so functions that will translate the string, then we aren't interested in this function call, so again, return without taking any action.

The next check is to see if this checker is even enabled. I think there's a better way to do this, but I'm not sure.

Finally we can do the interesting check: we look at the first argument to the function, which remember, is not a calculated value, but a node in the abstract syntax tree representing the code that will calculate the value.

The only acceptable value is a string constant. So we can check if the first argument is a Const node. Then we can examine the actual literal value, to see that it's a string. If it is, then everything is good, and we can return without an alarm.

But if the first argument is not a string constant, then we can use self.add_message to add a warning message to the pylint output. Elsewhere in the file, we defined MESSAGE_ID to refer to the message:

"i18n function %s() must be called with a literal string"

Our add_message call uses that string, providing an argument for the string formatter, so the message will have the actual function name in it, and also provides the AST node, so that the message can indicate the file and line where the problem happened.

That's the whole checker. If you're interested, the edx-lint repo also shows how to test checkers, which is done with sample .py files, and .txt files with the pylint messages they should generate.

We have a few other checkers also: checks that setUp and tearDown call their super() counterparts properly, and a check that range isn't called with a needless first argument.

The checker I'd like to write is one that can tell you that this:

self.assertTrue(len(x) == 2)

should be re-written as:

self.assertEqual(len(x), 2)

and other similar improvements to test assertions.

Once you write a pylint checker, you start to get ideas for others that might work well. I can see it becoming a kind of mania...

» 4 reactions

Comments

[gravatar]
Claudiu Popa 9:23 AM on 31 May 2015

Yeah, pylint's documentation is not that good, but I hope to change that someday.

Some comments:

> The next check is to see if this checker is even enabled. I think there's a better way to do this, but I'm not sure.

You can use @utils.check_messages(message_ids) decorator over visit_callfunc instead, should do the trick nicely and it's obvious what messages the visit function can actually emit.

> Most function calls will be a simple name, but if it isn't a name, then we don't know enough to tell if this is one of the translation functions, so we return without flagging a problem.

I guess you can infer the name, using .infer() and check if the result is one of the translation functions, but I doubt that there's code out there where the translation function is something else though.

Thanks for the article.

[gravatar]
The Compiler 11:28 AM on 2 Jun 2015

Thanks for the article!

Have you considered submitting those fixers upstream? range_check and super_check sound useful for my own code, and I'm sure i18n_check is useful for others too! Maybe Claudiu can say more, but I'm sure they'd be appreciated :)

I also have some custom plugins here, though they're mainly quick hacks without tests:

https://github.com/The-Compiler/qutebrowser/tree/master/scripts/pylint_checkers

Since testing those seems easier than I thought I might add some tests, and then maybe upstream those which make sense (settrace, a better version of crlf, and maybe open_encoding) too.

[gravatar]
Sveder 8:10 PM on 20 Sep 2015

Awesome article. I've been wanting to automate some of the most common code review issues that come up in my team, and Pylinter seems like a great tool. To play with it I implemented the assert checks you mentioned, as they are so easily missed and annoying to mass change (in a project with ~700 unitests).

https://github.com/Sveder/custom-lints

[gravatar]
Pavel Karateev 1:50 PM on 9 Dec 2015

Thanks, great article and blog = )

I had no clue edX has such an excellent syntax validation! Just wow!

Btw lack of documentation is a real problem of pylint (and not only pylint). I struggled a lot with customisation back then.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.