« | » Main « | »

How many instructions in a print statement?

Tuesday 16 July 2013

Hanging out in IRC channels, you see a lot of the same discussions pop up over and over again. One involves people who want to be "close to the metal." Either they want "no dependencies" (other than Python itself, which is a large dependency), or they feel like they need to know how things "really work" so they want to use sockets instead of Flask, or something.

Today that topic came up again, and the low-level proponent said it was important to know what's happening in the CPU when you do "print x". My feeling is, modern CPUs are hella-complicated beasts, I have no idea how they work, and it hasn't hindered me.

He thought you should at least have a rough idea of the instruction count for something like that. I asked him to tell me what he thought it was. He guessed 500 instructions for "print x" if x was an integer. I guessed that a) he was off by a factor of at least 10, and b) that we were both making incredibly wild guesses.

Conceptually, printing an integer isn't much work, but keep in mind that print has to find sys.stdout, and manipulate reference counts, and convert the int to a string, and deal with output buffering, etc, not to mention the general mechanisms of Python bytecode interpretation, memory management, and so on.

OK, so we had our two guesses, how to actually measure? Linux has "perf stat" which can measure all sorts of performance statistics, including number of instructions executed.

I wrote a simple Python program:

import sys
x = 1
for i in range(int(sys.argv[1])):
    print x

Running this, I can change the number of print statements from the command line, and see how many instructions result by running it under perf stat:

ned@ubuntu:~$ perf stat python foo.py 10
1
1
1
1
1
1
1
1
1
1

 Performance counter stats for 'python foo.py 10':

         11.913667 task-clock                #    0.883 CPUs utilized          
                21 context-switches          #    0.002 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
             1,221 page-faults               #    0.102 M/sec                  
        33,379,047 cycles                    #    2.802 GHz                    
        19,506,536 stalled-cycles-frontend   #   58.44% frontend cycles idle   
   <not supported> stalled-cycles-backend  
        28,821,962 instructions              #    0.86  insns per cycle        
                                             #    0.68  stalled cycles per insn
         6,345,082 branches                  #  532.588 M/sec                  
           292,467 branch-misses             #    4.61% of all branches        

       0.013497566 seconds time elapsed

So, 28 million instructions for that program. Running it again, I saw that the total instruction count fluctuates quite a bit. So I ran it 10 times to get an average: 28,696,694 instructions for 10 print statements.

Then I ran it 10 times with 11 print statements, for an average of 28,705,257, or a difference of 8,563 instructions for the one extra print statement.

Then I ran it 10 times with 30 print statements, averaged, subtracted, and divided by 20, which should give me another per-print statement instruction count. This time it came out to 10,518 instructions per additional print statement.

What did we learn?

  • Linux has some cool tools.
  • Measuring instruction counts is an inexact science.
  • There's a lot more going on in a print statement than some people think.
  • Printing an integer in Python takes roughly 10,000 instructions.

Finally, does this matter? I claim that if you want to think about numbers of instructions, then Python (or any other language of its kind) is not for you. Sure, it's useful to understand the big picture of what goes into Python execution, but tomorrow when I go to work, how does this help me? It's important to know things like the performance characteristics of data structures, and have an idea of the forces at work on your system.

But number of instructions? Meh.

Facts and myths about Python names and values

Sunday 7 July 2013

I've written a page about how Python names and values work: Fact and myths about Python names and values.

Other people have written about this, but maybe my small effort will help.

As always, my problem with explaining stuff like this is knowing where to start, and how to build, and what to leave out. In other words, trimming and linearizing the concept graph.

As a result, finishing a page like this just creates more drafts in my mind. Who knows where it will end??

One of the challenges with this piece was the diagrams. I finally got a tool chain working well enough with Cog and Graphviz.

Hashtags for commands

Wednesday 3 July 2013

I was working on creating a command today to run Pylint on only the files I had changed:

pylint $(git diff --name-only $(git merge-base HEAD origin/master))

This works pretty well, but I'm not sure it's ready yet. I want to come back to it, and fiddle with it some. Eventually it might go into a bash alias, but for now, I just want to get it back from history.

Rather than remember some detail from the command, I had an idea to make the command findable again. I could give it a hashtag!

pylint $(git diff --name-only $(git merge-base HEAD origin/master)) #lintdiff

The hash is conveniently a comment character for bash, so I can use any hashtag I like, and later come back to it with ^Rlintdiff.

I'm not sure this is really that useful, since I can make a bash alias right off the bat, but this will let me tweak the command until I like it.

« | » Main « | »