How many instructions in a print statement?

Tuesday 16 July 2013

Hanging out in IRC channels, you see a lot of the same discussions pop up over and over again. One involves people who want to be “close to the metal.” Either they want “no dependencies” (other than Python itself, which is a large dependency), or they feel like they need to know how things “really work” so they want to use sockets instead of Flask, or something.

Today that topic came up again, and the low-level proponent said it was important to know what’s happening in the CPU when you do “print x”. My feeling is, modern CPUs are hella-complicated beasts, I have no idea how they work, and it hasn’t hindered me.

He thought you should at least have a rough idea of the instruction count for something like that. I asked him to tell me what he thought it was. He guessed 500 instructions for “print x” if x was an integer. I guessed that a) he was off by a factor of at least 10, and b) that we were both making incredibly wild guesses.

Conceptually, printing an integer isn’t much work, but keep in mind that print has to find sys.stdout, and manipulate reference counts, and convert the int to a string, and deal with output buffering, etc, not to mention the general mechanisms of Python bytecode interpretation, memory management, and so on.

OK, so we had our two guesses, how to actually measure? Linux has “perf stat” which can measure all sorts of performance statistics, including number of instructions executed.

I wrote a simple Python program:

import sys
x = 1
for i in range(int(sys.argv[1])):
    print x

Running this, I can change the number of print statements from the command line, and see how many instructions result by running it under perf stat:

ned@ubuntu:~$ perf stat python 10

 Performance counter stats for 'python 10':

         11.913667 task-clock                #    0.883 CPUs utilized          
                21 context-switches          #    0.002 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
             1,221 page-faults               #    0.102 M/sec                  
        33,379,047 cycles                    #    2.802 GHz                    
        19,506,536 stalled-cycles-frontend   #   58.44% frontend cycles idle   
   <not supported> stalled-cycles-backend  
        28,821,962 instructions              #    0.86  insns per cycle        
                                             #    0.68  stalled cycles per insn
         6,345,082 branches                  #  532.588 M/sec                  
           292,467 branch-misses             #    4.61% of all branches        

       0.013497566 seconds time elapsed

So, 28 million instructions for that program. Running it again, I saw that the total instruction count fluctuates quite a bit. So I ran it 10 times to get an average: 28,696,694 instructions for 10 print statements.

Then I ran it 10 times with 11 print statements, for an average of 28,705,257, or a difference of 8,563 instructions for the one extra print statement.

Then I ran it 10 times with 30 print statements, averaged, subtracted, and divided by 20, which should give me another per-print statement instruction count. This time it came out to 10,518 instructions per additional print statement.

What did we learn?

  • Linux has some cool tools.
  • Measuring instruction counts is an inexact science.
  • There’s a lot more going on in a print statement than some people think.
  • Printing an integer in Python takes roughly 10,000 instructions.

Finally, does this matter? I claim that if you want to think about numbers of instructions, then Python (or any other language of its kind) is not for you. Sure, it’s useful to understand the big picture of what goes into Python execution, but tomorrow when I go to work, how does this help me? It’s important to know things like the performance characteristics of data structures, and have an idea of the forces at work on your system.

But number of instructions? Meh.


- Linux has some cool tools.

These are not Linux specific: hwpmc works on other OSes too.
I'm astounded that there are people who think about such things. The whole point of python is to NOT worry about such stuff. If it's time critical, profile (don't guess-you'll guess wrong) and then deal with the results. Same as any other environment.

Knowing in general how things work IS a good idea (because when you don't, you tend to do pathological things), but trying to know what python does to print an integer? Not useful beyond knowing that I/O isn't cheap.

Yes but this does not negate the fact that Linux has cool tools

@Michael Kohne

It astonishes you that people writing programs are curious how things work under the hood? That astonishes me :). I don't think this curiosity misses the point of working with python. Not all knowledge is immediately useful on its own but taken as a whole this knowledge adds to the ability to make informed decisions.
It's not weird programmers are interested in how something works, it is weird for programmers to feel that it's going to make a blind bit of difference to their usage of it.

Knowing how many instructions a print in Python is is interesting. Finding it out is kinda interesting too. Trying to use that knowledge in your day to day programming is pointless and it's likely to cause you to draw bad conclusions.
It could be cool to compare with "hardcoded print" because your example uses a 'for loop' and so deals with more stuff that just the print (list, jump, iterator)

Anyway the question itself is really interesting.
Michael Droettboom 4:36 PM on 17 Jul 2013
Interesting post, and I basically agree with the conclusion.

However, by pure intellectual curiosity, I couldn't resist comparing to the following more-or-less-equivalent program in C:

#define NUM_LOOPS 30

int main(int argc, char **argv)
    int i;
    for (i = 0; i < NUM_LOOPS; ++i) {
        printf("%d\n", i);
And the same methodology of running 10 times, once with 10 loops and then with 30 loops, I get 11,502 instructions per print statement. More or less a tie with your Python result. Suggests to me that this number of instructions is dominated by the system I/O (which should fundamentally be the same in both cases).

EDIT: Note however, when trying to reproduce your original Python example on my machine, I got 55,000 instructions per Python print, so there's obviously some environmental/platform difference such that my numbers shouldn't be directly compared to yours.
With all due respect, running a performance benchmark for 10s of times isn't going to give you reasonable numbers. You need to run it for, say, a million iterations, or so that it lasts a few seconds, and probably run the benchmark, say, three times. I didn't do the latter, but my benchmarks show that 'print x' (where x is 1), takes about 13000 instructions using Python 2.7, almost 20000 on Python 3.2, and about 11000 using C (gcc -O2). Nevertheless, I agree with the conclusion that knowing these numbers isn't going to affect day-to-day coding.
I'd be interested in knowing more about CPU cache hit rate versus various Python programming constructs. Is it possible to increase the effectiveness of a CPU cache in a dynamic language like Python?
I enjoyed this post even though the original question is as silly as asking: if I am traveling at 60MPH in my car, how many times do the pistons have to have to cover 1 mile? And modern fuel injection multi-cylinder engines are much less complicated than your run of the mill Intel CPU!
Kind of disappointing that you didn't get into a debugger and look under the hood to see what happens. From what I recall during my assembly language classes, setting up a print statement to stdout was only a few instructions. I would be very interested to know what exactly is happening in python that it requires 4 orders of magnitude more instructions to do the same thing.
@Kevin: the goal here was not to figure out exactly what is going on. I was discussing the idea of having a general sense of the magnitude of the number of operations, I made the point that I had no idea, and when the other guy made a guess, I wanted to see how close each of us was.

Also, I already mentioned what I thought Python was doing under the covers: "print has to find sys.stdout, and manipulate reference counts, and convert the int to a string, and deal with output buffering, etc, not to mention the general mechanisms of Python bytecode interpretation, memory management, and so on."

Python does much more to accomplish the same thing than assembly code does. The tradeoff is that it can do much of the work for you. For most code, it's a good tradeoff.
@Kevin, you're measuring only the instructions used in your program. You almost surely made a system call to do the actual output, and the system might have executed thousands of instructions before it returned to the next instruction in your program.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.