Byterun, and making cells

Thursday 24 January 2013

I started a new side project this month, called Byterun. It’s a pure-Python implementation of Python bytecode execution. That is, it runs Python bytecode.

I did it because has some bugs in its branch coverage. It analyzes your program for potential branches by reading the bytecode rather than the source, and I think some of the bugs are due to subtle misunderstandings on my part about how Python bytecode works.

So I figured if I could have a working implementation of bytecode in Python, it would be a good way to be sure I understood how they work. I found a ten-year-old implementation called by Paul Swartz, and started refurbishing it, fixing bugs, adding missing opcodes, and bringing it up to 2.7 and 3.3.

It isn’t finished yet, and supporting Python 2 and Python 3 might be a bit difficult, but already it has helped me understand a few things better, such as how generators are suspended and resumed: when the generator function yields, instead of discarding the stack frame as a normal function’s return does, the frame object is held by the generator object, so the next time a value is needed, the frame can be resumed just where it was.

Closure cells make more sense now too, and I know how to create them by hand. I was creating Python function objects with types.FunctionType. But one of its arguments is “closure”, which must be a tuple of cell objects. The types module has ways of making functions, generators, classes, and so on, but has no way to make a cell.

Turns out you can do it by creating a closure and grabbing its cells:

def make_cell(value):
    # Construct an actual cell object by creating a closure right here,
    # and grabbing the cell object out of the function we create.

    return (lambda x: lambda: x)(value).func_closure[0]

A fine bit of Python haiku there...


Malcolm Tredinnick 5:22 PM on 24 Jan 2013
Enjoy yourself with this. As you've probably discovered in the past, it can become amazingly addictive to play with Python at the bytecode level. Seems to happen to me every three or four years and I always come away a little bit smarter.
def make_cell(value):
    return (lambda: value).__closure__[0]
'value' is a cellvar in make_cell, such that bytecode to load it in make_cell would use LOAD_DEREF instead of LOAD_FAST. Thus, as far as I can see, there's no need for a nested lambda.

In Cpython you can also use ctypes to call PyCell_New:
import ctypes
make_cell = ctypes.pythonapi.PyCell_New
make_cell.restype = ctypes.py_object
make_cell.argtypes = [ctypes.py_object]
*blink* That's a really cool trick for creating a cell object in a portable way. I had never thought of creating a throwaway function and then ensuring the cell object survived. Very nice!
@anon: you are right. The double-lambda started life in a larger function, and when I moved it into its own function, I overlooked that I could remove one level of lambdafication.

@Nick (et al): the nut of the idea came from Alex Gaynor (of course...)
I'm glad someone takes bytecode seriously.
So much more can be elucidated when you drop from symbolic to bytecode level...
p.s. pypy evaluator is perhaps too complex for the task?

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.