Wicked hack: Python bytecode tracing

Friday 11 April 2008

Something I’ve been noodling on since PyCon is how to improve code coverage testing in Python, in particular, finding a way to measure bytecode execution instead of just line execution. After a fruitful investigation, I know a lot more about how CPython executes code, and I’ve found a way to trace each bytecode.

As I mentioned in the Hidden Conditionals section of Flaws in coverage measurement, measuring line execution misses details within the lines. My example was:

def implied_conditional(a):
    if (a % 2 == 0) or (a % 0 == 0):
        print "Special case"
    return a+2

# 100% coverage:
implied_conditional(0) == 2
implied_conditional(2) == 4

A line coverage tool says that line 2 was executed, but never points out that a divide by zero error is lurking there.

Line tracing and beyond

The problem is that Python’s tracing facility (sys.settrace) is based on source lines. Your callback function is invoked for each line executed. At PyCon, Matt Harrison floated the possibility of a source transformation tool which would take your Python code and rewrite it so that the operations were spread out over more lines so that the trace function would be invoked more often. This would allow for tracing of the operations within lines.

It’s an intriguing idea, but seems difficult and risky: the rewriting process could introduce errors, and there could be constructs which can’t be pulled apart successfully.

I thought a better approach would be to modify the Python interpreter itself. If we could have the interpreter call a tracing function for each bytecode, we’d have an authoritative trace without intricate code munging. This approach has a few problems of its own:

  • Hacking the C code of the Python interpreter is not a simple task.
  • The change might not even be technically feasible.
  • Even if we did write a patch, getting those changes accepted by the core team is not easy if their perception of the costs and benefits is different than yours.
  • Even if the changes were accepted, it would be some future version of the Python interpreter which supported the cool new feature.

But I was interested enough to explore the possibility, so I went digging into the Python interpreter sources to see how sys.settrace did its work. I found the answer to how it works, and also a cool trick to accomplish bytecode tracing without modifying the interpreter.

Python line numbers

The bytecode interpreter invokes the trace function every time an opcode to be executed is the first opcode on a new source line. But how does it know which opcodes those are? The key is the co_lnotab member in the code object. This is a string, interpreted as pairs of one-byte unsigned integers. To re-use the example from The Structure of .pyc Files, here’s some bytecode:

  1           0 LOAD_CONST               4 ((1, 0))
              3 UNPACK_SEQUENCE          2
              6 STORE_NAME               0 (a)
              9 STORE_NAME               1 (b)

  2          12 LOAD_NAME                0 (a)
             15 JUMP_IF_TRUE             7 (to 25)
             18 POP_TOP
             19 LOAD_NAME                1 (b)
             22 JUMP_IF_FALSE           13 (to 38)
        >>   25 POP_TOP

  3          26 LOAD_CONST               2 ('Hello')
             29 PRINT_ITEM
             30 LOAD_NAME                0 (a)
             33 PRINT_ITEM
             34 PRINT_NEWLINE
             35 JUMP_FORWARD             1 (to 39)
        >>   38 POP_TOP
        >>   39 LOAD_CONST               3 (None)
             42 RETURN_VALUE

and here’s its line number information:

   firstlineno 1
   lnotab 0c010e01

The lnotab bytes are pairs of small integers, so this entry represents:

0c 01: (12, 1)
0e 01: (14, 1)

The two numbers in each pair are a bytecode offset delta and a source line number delta. The firstlineno value of 1 means that the bytecode at offset zero is line number 1. Each entry in the lnotab is then a delta to the bytecode offset and a delta to the line number to get to the next line. So bytecode offset 12 is line number 2, and bytecode offset 26 (12+14) is line number 3. The line numbers at the left of the disassembled bytecode are computed this way from firstlineno and lnotab.

(There are more details to deal with deltas larger than 255. Complete info is in the CPython source: compile.c, search for “All about a_lnotab”.)

As the Python interpreter executes bytecodes, it examines the offsets against this map, and when the source line number that results is different than for the previous bytecode, it calls the trace function.

Here’s where the hack comes in: what if we lie about line numbers? What would happen if we change the .pyc file to have a different mapping of bytecode offsets to line numbers?

A simple program to trace

To set up the test, here’s a sample.py:


a, b = 1, 0
if a or b:
    print "Hello", a

and here’s tracesample.py:


import sys

def trace(frame, event, arg):
    if event == 'line':
        print frame.f_code.co_filename, frame.f_lineno
    return trace


import sample   # import sample to produce a sample.pyc file...

Running tracesample.py gives this output:

C:\ned\sample.py 1
C:\ned\sample.py 2
C:\ned\sample.py 3
Hello 1

As each line is executed, my trace function is invoked, and it digs into the frame object to get the filename and line number. From the output, we can see that we executed lines 1, 2, and 3 in turn.

Lying about line numbers

To lie about line numbers, I wrote a small tool to rewrite .pyc files. It copies everything verbatim, except it changes the firstlineno and lnotab entries. As a simple proof of concept, we’ll make the lnotab map claim that every byte offset is a new line number: it will consist entirely of (1,1) entries. And because byte offsets start at zero, I’ll change the firstlineno entry to zero. Here’s hackpyc.py:


""" Wicked hack to get .pyc files to do bytecode tracing
    instead of line tracing.

import dis, marshal, new, sys, types

class PycFile:
    def read(self, f):
        if isinstance(f, basestring):
            f = open(f, "rb")
        self.magic = f.read(4)
        self.modtime = f.read(4)
        self.code = marshal.load(f)
    def write(self, f):
        if isinstance(f, basestring):
            f = open(f, "wb")
        marshal.dump(self.code, f)

    def hack_line_numbers(self):
        self.code = hack_line_numbers(self.code)
def hack_line_numbers(code):
    """ Replace a code object's line number information to claim that every
        byte of the bytecode is a new line.  Returns a new code object.
        Also recurses to hack the line numbers in nested code objects.
    n_bytes = len(code.co_code)
    new_lnotab = "\x01\x01" * (n_bytes-1)
    new_consts = []
    for const in code.co_consts:
        if type(const) == types.CodeType:
    new_code = new.code(
        code.co_argcount, code.co_nlocals, code.co_stacksize, code.co_flags,
        code.co_code, tuple(new_consts), code.co_names, code.co_varnames,
        code.co_filename, code.co_name, 0, new_lnotab
    return new_code

def hack_file(f):
    pyc = PycFile()


This is fairly straightforward, the only hiccup was that code objects’ members are read-only, so I couldn’t just update the parts I wanted, I had to create a new code object with new.code.

Running “hackpyc.py sample.pyc” rewrites sample.pyc to lie about its line numbers. Now running tracesample.py produces:

C:\ned\sample.py 0
C:\ned\sample.py 3
C:\ned\sample.py 6
C:\ned\sample.py 9
C:\ned\sample.py 12
C:\ned\sample.py 15
C:\ned\sample.py 25
C:\ned\sample.py 26
C:\ned\sample.py 29
Hello C:\ned\sample.py 30
C:\ned\sample.py 33
1 C:\ned\sample.py 34

C:\ned\sample.py 35
C:\ned\sample.py 39
C:\ned\sample.py 42

Here the “line number” in the trace function is really a bytecode offset, and the interpreter invokes the trace function for every bytecode executed!

We can see that execution jumped from 15 to 25, skipping the bytecodes that examine the b variable. This is just the sort of detail about execution that line-oriented coverage measurement could never tell us.

Where does this leave us?

As I see it, these are the good things about this technique:

  • It works.
  • It doesn’t require complex manipulation of source code.
  • It doesn’t require changes to the Python interpreter, with all of the difficulties that could bring.

Problems with this so far:

  • I’m sure some people will find it distasteful.
  • It needs to compute more useful fake line numbers. These bytecode offsets are always from the beginning of each code object. In real .pyc files, which have many distinct code objects, each code object will start from zero. We need some way of disambiguating the offsets. Also, the line number is the byte offset, not the number of the bytecode, so there are gaps in the numbers that can only be understood by examining the bytecode.
  • If the Python interpreter changes their line numbering mechanism, this technique could be completely broken.
  • There’s no way to correlate the bytecodes back to source constructs other than the original source lines they came from. In our small example, we only know it was b that was never examined by looking at the disassembled bytecodes.
  • The original line number information is lost. It would have to be stored off to the side to make this useful.

This is only a quick demonstration of a technique, it isn’t useful yet. I think it could be made useful though.

PS: As a result of this investigation, I also think it would be simple to patch the interpreter to call a trace function on every bytecode.


If you want to try "complex manipulation of source code", take a look at my python4ply. I developed it in part to experiment with that possibility. The tutorial shows one way to annotate line numbers. My need for branch coverage wasn't clear enough to figure out what I wanted from it. For example, should "2/0+3" show that the "+3" is never executed?
Maybe you could work with the authors of decompyle
to be able to relate bytecode coverage back to Python sub-statement coverage.
It would be great to spit out a colourised version of ones script which highlighted in read the parts of the script that were not covered.

- Paddy.
Wow. That's some crazy stuff, Ned. Very cool.
I considered an advanced source transformation for the coverage langlet originally ( transforming boolean and/or operations into nested if-statements ) but realized it was cheap to just wrap expressions within boolean operations together with a side-effect which gets monitored externally.

Maybe something like can be applied to coverage.py as well without rewriting it entirely and heavy lifting?

I'm scared about lnotab manipulations. It's easy to violate some monotonicity conditions that make the line number reconstruction algorithm work which is very compact and optimized. Having correct line number information is extremely important when running tests.

There is a really old tool created by Brian Marick (its one of the original path/stmt testing tools) for gcc that did path/stmt coverage called gct. You can take a look at http://www.testing.com/tools.html for some of the reporting/communicating strategies used to tell developers what was & wasnt covered. Getting users to understand what the coverage means is always a tricky part.

Gct had support for weak mutation testing as well. My old advisor at Purdue DeMillo did a lot of the pioneering work on something called the Competent Programmer Assertion and subsequently how simple mutation testing is an effective strategy for ensuring that a testsuite is solid.

It could take you in a completely different direction from just observing the coverage, but since you are already manipulating the bytecode you might want to look into mutation testing.

The short story on mutation testing is along the lines of testing a conditional like (X < 9) is actually a tristate. One test coverage state where X is less than 9, one where X is greater than 9, and another where X is equal to 9. The weak mutation testing piece is to validate that your test suite can detect a coding error of < instead of <= with the theory being things like off by one errors are some of the most common yet most insidious to find.

John Cavanaugh
Could you use the line number as a fixed point float instead of an integer and the post-decimal bit being the bytecode-on-that-line ?
I'm poking at an implementation, will post here with a diff if I come up with anything interesting. For code coverage another interesting bit would be a list of how many bytes make up each line of code; then you could compare that to your trace to figure out coverage.
Here, sub this in for your ' new_lnotab = "\x01\x01" * (n_bytes-1)' :

bytes = 0
new_lnotab = ''
for b in range(len(code.co_lnotab)/2):
old_lnotab_b = ord(code.co_lnotab[b*2])
new_lnotab += "\x01\xFF\x00\xFF\x00\xFF\x00\xEC"
new_lnotab += "\x01\x01" * (old_lnotab_b)
bytes += old_lnotab_b

new_lnotab += "\x01\x01" * (n_bytes-bytes)
Ignore the previous, it's got bugs. This one is much better(everything after the 'for' is indented, despite appearances):

lb_ranges = [ ord(code.co_lnotab[b*2]) for b in range(len(code.co_lnotab)/2) ]
lb_ranges += [ n_bytes - sum(lb_ranges) ]
prev_lb = 0
new_lnotab = ''
for lb in lb_ranges:
new_lnotab += "\x00\xFF\x00\xFF\x00\xFF\x00"
new_lnotab += chr(0xEB - prev_lb)
new_lnotab += "\x01\x01" * lb
prev_lb = lb
Tons of academic work has been done (a tiny bit by me) on transformation and tracing in Java byte-codes. There are also several libraries available for Java byte-code instrumentation. I'm sure a lot of of this work would be applicable to Python byte-code. My advisor's group has several well developed Java tools at:


Sofya, in particular, is a very well developed Java byte-code instrumenter.

I can say from experience that building these tools is tricky and a lot of work. But, in general, it's lots easier than transforming source code. These kinds of tools are incredibly useful, and it looks like you've made a great start. Good luck.
PJ: Thanks for the code frag. For those who haven't tried it: it assigns line numbers so that original source lines are 1000, 2000, 3000, with increments for each byte within the source line.
I've extended your idea for a bytecode tracer and used it inside Pythoscope. It rewrites modules and functions on the fly and allows to trace all calls from Python into C. If you're interested in details, please read:


Anyway, cheers and thanks for this blog post! Without it I would be stuck on this problem for a long time.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.