|Ned Batchelder : Blog | Code | Text | Site|
» Home : Code
Created 10 February 2004, last updated 11 January 2015
An older version of this document is also available in Russian.
Cog is a file generation tool. It lets you use pieces of Python code as generators in your source files to generate whatever text you need.
The sections below are:
Cog transforms files in a very simple way: it finds chunks of Python code embedded in them, executes the Python code, and inserts its output back into the original file. The file can contain whatever text you like around the Python code. It will usually be source code.
For example, if you run this file through cog:
it will come out like this:
Lines with triple square brackets are marker lines. The lines between [[[cog and ]]] are the generator Python code. The lines between ]]] and [[[end]]] are the output from the generator.
When cog runs, it discards the last generated Python output, executes the generator Python code, and writes its generated output into the file. All text lines outside of the special markers are passed through unchanged.
The cog marker lines can contain any text in addition to the triple square bracket tokens. This makes it possible to hide the generator Python code from the source file. In the sample above, the entire chunk of Python code is a C++ comment, so the Python code can be left in place while the file is treated as C++ code.
Cog is designed to be easy to run. It writes its results back into the original file while retaining the code it executed. This means cog can be run any number of times on the same file. Rather than have a source generator file, and a separate output file, typically cog is run with one file serving as both generator and output.
Because the marker lines accommodate any language syntax, the markers can hide the cog Python code from the source file. This means cog files can be checked into source control without worrying about keeping the source files separate from the output files, without modifying build procedures, and so on.
I experimented with using a templating engine for generating code, and found myself constantly struggling with white space in the generated output, and mentally converting from the Python code I could imagine, into its templating equivalent. The advantages of a templating system (that most of the code could be entered literally) were lost as the code generation tasks became more complex, and the generation process needed more logic.
Cog lets you use the full power of Python for text generation, without a templating system dumbing down your tools for you.
Cog requires Python 2.6, 2.7, 3.3, 3.4, 3.5, 3.6, or Jython 2.5.
Cog is installed with a standard Python distutils script:
You should now have cog.py in your Python scripts directory.
Cog is distributed under the MIT license. Use it to spread goodness through the world.
Source files to be run through cog are mostly just plain text that will be passed through untouched. The Python code in your source file is standard Python code. Any way you want to use Python to generate text to go into your file is fine. Each chunk of Python code (between the [[[cog and ]]] lines) is called a generator and is executed in sequence.
The output area for each generator (between the ]]] and [[[end]]] lines) is deleted, and the output of running the Python code is inserted in its place. To accommodate all source file types, the format of the marker lines is irrelevant. If the line contains the special character sequence, the whole line is taken as a marker. Any of these lines mark the beginning of executable Python code:
Cog can also be used in languages without multi-line comments. If the marker lines all have the same text before the triple brackets, and all the lines in the generator code also have this text as a prefix, then the prefixes are removed from all the generator lines before execution. For example, in a SQL file, this:
will produce this:
Finally, a compact form can be used for single-line generators. The begin-code marker and the end-code marker can appear on the same line, and all the text between them will be taken as a single Python line:
You can also use this form to simply import a module. The top-level statements in the module can generate the code.
If you have special requirements for the syntax of your file, you can use the --markers option to define new markers.
If there are multiple generators in the same file, they are executed with the same globals dictionary, so it is as if they were all one Python module.
Cog tries to do the right thing with white space. Your Python code can be block-indented to match the surrounding text in the source file, and cog will re-indent the output to fit as well. All of the output for a generator is collected as a block of text, a common whitespace prefix is removed, and then the block is indented to match the indentation of the cog generator. This means the left-most non-whitespace character in your output will have the same indentation as the begin-code marker line. Other lines in your output keep their relative indentation.
A module called cog provides the functions you call to produce output into your file. The functions are:
Cog is a command-line utility which takes arguments in standard form.
In addition to running cog as a command on the command line:
you can also invoke it as a module with the Python interpreter:
Note that the Python module is called "cogapp".
Files on the command line are processed as input files. All input files are assumed to be UTF-8 encoded. Using a minus for a filename (-) will read the standard input.
Files can also be listed in a text file named on the command line with an @:
These @-files can be nested, and each line can contain switches as well as a file to process. For example, you can create a file cogfiles.txt:
then invoke cog like this:
Now cog will process four files, using C++ syntax for markers on all the C++ files, SQL syntax for the .sql file, and no markers at all on the readme.txt file.
As another example, cogfiles2.txt could be:
with cog invoked like this:
Cog will process template.h twice, creating both data1.h and data2.h. Both executions would define the variable version as "3.4.1", but the first run would have thefile equal to "data1.xml" and the second run would have thefile equal to "data2.xml".
The -r flag tells cog to write the output back to the input file. If the input file is not writable (for example, because it has not been checked out of a source control system), a command to make the file writable can be provided with -w:
Global values can be set from the command line with the -D flag. For example, invoking Cog like this:
will run Cog over mycode.txt, but first define a global variable called thefile with a value of "fooey.xml". This variable can then be referenced in your generator code. You can provide multiple -D arguments on the command line, and all will be defined and available.
The value is always interpreted as a Python string, to simplify the problem of quoting. This means that:
will define NUM_TO_DO not as the integer 12, but as the string "12", which are different and not equal values in Python. Use int(NUM_TO_DO) to get the numeric value.
If cog is run with the -c flag, then generated output is accompanied by a checksum:
If the generated code is edited by a misguided developer, the next time cog is run, the checksum won't match, and cog will stop to avoid overwriting the edited code.
To make it easier to identify generated lines when grepping your source files, the -s switch provides a suffix which is appended to every non-blank text line generated by Cog. For example, with this input file (mycode.txt):
invoking cog like this:
will produce this output:
The -n option lets you tell cog what encoding to use when reading and writing files.
The --verbose option lets you control how much cog should chatter about the files it is cogging. --verbose=2 is the default: cog will name every file it considers, and whether it has changed. --verbose=1 will only name the changed files. --verbose=0 won't mention any files at all.
The --markers option lets you control the syntax of the marker lines. The value must be a string with two spaces in it. The three markers are the three pieces separated by the spaces. The default value for markers is "[[cog ]]] [[[end]]]".
The -x flag tells cog to delete the old generated output without running the generators. This lets you remove all the generated output from a source file.
The -d flag tells cog to delete the generators from the output file. This lets you generate code in a public file but not have to show the generator to your customers.
The -U flag causes the output file to use pure Unix newlines rather than the platform's native line endings. You can use this on Windows to produce Unix-style output files.
The -I flag adds a directory to the path used to find Python modules.
The -z flag lets you omit the [[[end]]] marker line, and it will be assumed at the end of the file.
Cog's change log is on a separate change page.
I'd love to hear about your successes or difficulties using cog. Comment here, or send me a note.
There are a handful of other implementations of the ideas in Cog:
You might like to read: