Makefile help target

Wednesday 4 April 2018

In a pull request today, I was struck again by the difficulty of providing a “help” target for Makefiles. The make command doesn’t natively have a way to see what targets are available, because the set is dynamic and large, so we are left to cobble things together ourselves.

We’d been cargo-culting this target across Makefiles for a while:

help## display this help message
        @echo "Please use \`make <target>' where <target> is one of"
        @perl -nle'print $& if m{^[a-zA-Z_-]+:.*?## .*$$}' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m  %-25s\033[0m %s\n", $$1, $$2}'

Here’s the meaty line, split across lines for readability, as all the rest of the code in this post will be:

perl -nle'print $& if m{^[a-zA-Z_-]+:.*?## .*$$}' $(MAKEFILE_LIST) |\
    sort |\
    awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m  %-25s\033[0m %s\n", $$1, $$2}'

It finds labelled lines with double-hash comments, and prints them, sorted, in a nice two-column layout, with the target names in cyan.

We’re a Python shop, so that Perl command really seemed out of place. What would it look like in Python? Longer, that’s what:

python -c 'import fileinput,re; \
    ms=filter(None, ("([a-zA-Z_-]+):.*?## (.*)$$",l) for l in fileinput.input())); \
    print("\n".join(sorted("\033[36m  {:25}\033[0m {}".format(*m.groups()) for m in ms)))' $(MAKEFILE_LIST)

But looking at that original line more, what is the Perl even doing? It’s just selecting lines. That’s what grep is for:

grep -E '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | \
    sort | \
    awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m  %-25s\033[0m %s\n", $$1, $$2}'

That’s shorter than the original, but we can do even better by using awk more effectively:

grep '^[a-zA-Z]' $(MAKEFILE_LIST) | \
    sort | \
    awk -F ':.*?## ' 'NF==2 {printf "\033[36m  %-25s\033[0m %s\n", $$1, $$2}'

The terminal coloring is cute, but unnecessary and can actually be counterproductive depending on your terminal’s natural colors, so:

grep '^[a-zA-Z]' $(MAKEFILE_LIST) | \
    sort | \
    awk -F ':.*?## ' 'NF==2 {printf "  %-26s%s\n", $$1, $$2}'

Looking around for other people’s techniques, marmelab had a very similar line, while Rodrigo Machado and O. Libre went down the all-awk path with fancier behavior.

In many ways, this doesn’t matter at all. But it’s a fun rabbit-hole...

Is Python interpreted or compiled? Yes.

Thursday 29 March 2018

A common question: “Is Python interpreted or compiled?” Usually, the asker has a simple model of the world in mind, and as is typical, the world is more complicated.

In the simple model of the world, “compile” means to convert a program in a high-level language into a binary executable full of machine code (CPU instructions). When you compile a C program, this is what happens. The result is a file that your operating system can run for you.

In the simple definition of “interpreted”, executing a program means reading the source file a line at a time, and doing what it says. This is the way some shells operate.

But the real world is not so limited. Making real programming languages useful and powerful involves a wider range of possibilities about how they work. Compiling is a more general idea: take a program in one language (or form), and convert it into another language or form. Usually the source form is a higher-level language than the destination form, such as when converting from C to machine code. But converting from JavaScript 8 to JavaScript 5 is also a kind of compiling.

In Python, the source is compiled into a much simpler form called bytecode. These are instructions similar in spirit to CPU instructions, but instead of being executed by the CPU, they are executed by software called a virtual machine. (These are not VM’s that emulate entire operating systems, just a simplified CPU execution environment.)

Here’s an example of a short Python function, and its bytecode:

>>> import dis
>>> def example(x):
...     for i in range(x):
...         print(2 * i)
>>> dis.dis(example)
  2           0 SETUP_LOOP              28 (to 30)
              2 LOAD_GLOBAL              0 (range)
              4 LOAD_FAST                0 (x)
              6 CALL_FUNCTION            1
              8 GET_ITER
        >>   10 FOR_ITER                16 (to 28)
             12 STORE_FAST               1 (i)

  3          14 LOAD_GLOBAL              1 (print)
             16 LOAD_CONST               1 (2)
             18 LOAD_FAST                1 (i)
             20 BINARY_MULTIPLY
             22 CALL_FUNCTION            1
             24 POP_TOP
             26 JUMP_ABSOLUTE           10
        >>   28 POP_BLOCK
        >>   30 LOAD_CONST               0 (None)
             32 RETURN_VALUE

The dis module in the Python standard library is the disassembler that can show you Python bytecode. It’s also the best (but not great) documentation for the bytecode itself. If you want to know more about how Python’s bytecode works, there are lots of conference talks about bytecode. The software that executes bytecode can be written in any language: byterun is an implementation in Python (!), which is useful only as an educational exercise.

An important aspect of Python’s compilation to bytecode is that it’s entirely implicit. You never invoke a compiler, you simply run a .py file. The Python implementation compiles the files as needed. This is different than Java, for example, where you have to run the Java compiler to turn Java source into compiled class files. For this reason, Java is often called a compiled language, while Python is called an interpreted language. But both compile to bytecode, and then both execute the bytecode with a software implementation of a virtual machine.

Another important Python feature is its interactive prompt. You can type Python statements and have them immediately executed. This interactivity is usually missing in “compiled” languages, but even at the Python interactive prompt, your Python is compiled to bytecode, and then the bytecode is executed. This immediate execution, and Python’s lack of an explicit compile step, are why people call the Python executable “the Python interpreter.”

By the way, even this is a simplified description of how these languages can work. “Compiled” languages like Java and C can have interactive prompts, but they are not at the center of those worlds in the same way that Python’s is. Java originally always compiled to bytecode, but then it pioneered just-in-time (JIT) techniques for compiling to machine code at runtime, and now Java is sometimes compiled entirely to machine code, in the C style.

This shows just how flimsy the words “interpreted” and “compiled” can be. Like most adjectives applied to programming languages, they are thrown around as if they were black-and-white distinctions, but the reality is much subtler and complex.

Finally, how your program gets executed isn’t a characteristic of the language at all: it’s about the language implementation. I’ve been talking here about Python, but this has really been a description of CPython, the usual implementation of Python, so-named because it is written in C. PyPy is another implementation, using a JIT compiler to run code much faster than CPython can.

So: is Python compiled? Yes. Is Python interpreted? Yes. Sorry, the world is complicated...

What’s in which Python 3.4–3.6?

Thursday 22 March 2018

This is the third in a series of summarizations of what’s in each release of Python. The first two were What’s in which Python 2.x? and What’s in which Python 3.0–3.3?.

3.4: March 16, 2014

  • pip is always available, via ensurepip
  • asyncio (provisional API)
  • enum
  • Other stdlib modules: statistics, pathlib, and tracemalloc

Full list of 3.4 changes.

3.5: September 13, 2015

  • async and await syntax
  • matrix multiplication operator @
  • more unpacking generalizations
  • The typing module for type hints
  • os.scandir()

Full list of 3.5 changes.

3.6: December 23, 2016

  • f-strings
  • kwargs and class attributes order is preserved
  • dicts happen to be (but are not guaranteed to be) ordered
  • underscores in numeric literals
  • variable annotations
  • secrets module in stdlib

Full list of 3.6 changes.

A Python gargoyle

Monday 26 February 2018

In the #python IRC channel today, someone asked:

Does anyone know of any libraries that could convert ‘1-5,7,9,10-13’ to [1,2,3,4,5,7,9,10,11,12,13] ?

This seemed like an interesting challenge, and people started offering code. This was mine. It’s ungainly and surprising, and I wouldn’t want to keep it, so I call it a gargoyle:

    i for p in s.split(',')
    for a, _, b in [p.partition('-')]
    for i in range(int(a), int(b or a)+1)

There are a few things going on here. First, this is a list comprehension, but with three nested loops. A simple list comprehension has this form:

result = [ EXPR for NAMES in ITERABLE ]

which is the same as this code:

result = []

For many, the list comprehension seems kind of backwards, where the expression comes first, before the loop produces it. Then the multi-loop form can seem like another surprise:

result = [ EXPR for NAMES1 in ITERABLE1 for NAMES2 in ITERABLE2 ]

which is equivalent to:

result = []
    for NAMES2 in ITERABLE2:

The first time I ever tried to write a double list comprehension, I thought the loops should go in the other order, in keeping with the Yoda-style of EXPR coming first. They don’t.

Back to the gargoyle. It’s a triply-nested loop (this time formatted a little differently):

    for p in s.split(',')
        for a, _, b in [p.partition('-')]
            for i in range(int(a), int(b or a)+1)

The first loop splits the number ranges on comma, to produce the individual chunks: ‘1-5’, ‘7’, ‘9’, ‘10-13’. This seems like an obvious first step.

The next loop is the most surprising. Strings in Python have a .partition method, which is super-handy and under-used. It takes a separator, and produces three values: the part of the string before the separator, the separator itself, and the part of the string after the separator. The best thing about .partition is that is always produces three values, even if the separator doesn’t appear in the string. In that case, the first value is the whole string, and the other two values are empty strings:

>>> '1-5'.partition('-')
('1', '-', '5')
>>> '12'.partition('-')
('12', '', '')

This means we can always assign the result to three names. Super-handy.

But list comprehensions can’t have assignments in them, so what to do? Recently, the Python-Ideas mailing list had a thread about adding assignments to list comprehensions, which can simplify complicated comprehensions. In that thread, Stephan Houben pointed out that you can already get the same effect with a cute trick:

for x in [17]:

# has the same effect as:

x = 17

We can explicitly make a one-element list, and “iterate” over it to assign its value to a name. In my gargoyle, we get a and b as the two numbers in the chunk.

The third loop is where we actually generate numbers. We’ll use range(), and we always want to start from the first number in the chunk. If there was a second number in the chunk (b), then we want to iterate up to and including it, so we need range(a, b+1). If there was no second number, then we act just as if there were a second number, the same as the first number. That is, “7” should behave just like “7-7”. If there was no second number, then .partition will have set b to be the empty string. So “b or a” will give us the second number we need.

To help a little more, here is the gargoyle:

s = '1-5,7,9,10-13'
result = [
    for p in s.split(',')
        for a, _, b in [p.partition('-')]
            for i in range(int(a), int(b or a)+1)

and here is the same idea written out as explicit loops:

s = '1-5,7,9,10-13'

result = []
for p in s.split(','):
    a, _, b = p.partition('-')
    for i in range(int(a), int(b or a)+1):

I wouldn’t recommend the list comprehension approach in real code, but it’s fun to make gargoyles sometimes.

BTW, one of the reasons the original question caught my eye is because has a function that does the opposite.

Coverage 4.5

Sunday 4 February 2018

Just out: v4.5.

There’s one new feature: configurator plug-ins, that let you run Python code at startup to set the configuration for coverage. This side-steps a requested feature to have different exclusion pragmas for different versions of Python.

People wanted to be able to say, this line of code is excluded from coverage when run under Python 3, but not under Python 2. That sounds simple enough, but then some wanted to be able to exclude for Python 3.5, but not 3.6. Or, excluded when running under PyPy, but not under CPython.

I could see this turning into a never-ending road of finer and finer differentiation. Next would be operating systems, or versions of Django, or, etc, etc, etc.

Rather than me building all that into coverage itself, now you can write your own plug-in that makes all those determinations, and sets the exclude pragmas as you like.

Python’s misleading readability

Tuesday 23 January 2018

One of the things that has made Python successful is its readability. Code is clear and easy to understand. One of the reasons is that Python uses words for a few things that other languages use symbols for. But sometimes the readability is misleading. Beginners construct valid Python expressions that don’t do what they seem like they should do.

Let’s say you want to know if your variable x is equal to 17. You could do:

if x is 17:

This might work. But then if you try:

if name is "Ned":

it doesn’t work? What!? Why not? It’s so clear.

The problem is that “is” doesn’t check two values for equality, it checks if the left and right side are precisely the same object. But you can have two different string objects, each of which has the value “Ned”. You don’t use “is” to check equality, you use “==”:

if name == "Ned":

It’s not just strings: numbers can also do surprising things:

>>> 1000 + 1 is 1001

“x is 17” is more English-like than “x == 17”, but it isn’t right. This is one time that Python’s famed readability leads you to the wrong construct.

Another example: you need to know if the answer was either “y” or “yes”, so you try this:

if answer == "y" or "yes":

and now your program doesn’t work. No matter what answer is, it prints “Thanks.” Why?

The “or” operator is for combining boolean (true/false) expressions. The result is true if either of its operands is true. So your code is equivalent to:

if (answer == "y") or ("yes"):

If answer is “y”, then the if will be true. If answer isn’t “y”, then the or will consider the right-hand side, “yes”. Strings are true if they are not empty, so “yes” is always true. So the if condition will always be true, no matter what value answer has.

The right ways to do this are:

if answer == "y" or answer == "yes":

or if you want to be fancier,

if answer in {"y", "yes"}:

(a list or a tuple would also work here instead of a set, though then you get into philosophical debates about how many data structures fit on the head of a pin.)

Don’t get me wrong, I agree that Python is very readable. And every language has constructs that seem like they should work, but don’t. You have to study well, and be careful to use your chosen language properly.


Jan 2: