Wednesday 28 March 2012 — This is almost 13 years old. Be careful.
Maybe this is crazy, but I’m looking for advice.
Conceptually, coverage.py is pretty simple. First, using the sys.settrace facility in Python, record every line that is executed. Then, after the program is done, report on those lines, and especially on lines that could have been executed but were not.
Of course, the reality is more difficult. During execution, to record the line, we have to find the file name, which we get from the stack frame. Later, we look for that file by name to create the report. Sometimes, the file isn’t a Python file!
One reason this can happen is if the file was actually created by a tool, and the tool provides the original source file as the reported name. For example, Jinja compiles .html files to Python code, and when the code is running, it claims to be “mytemplate.html”. When coverage.py tries to report on the file, it can’t parse it as Python, and things go wrong.
Originally, this error would be reported to the user. There’s a -i switch that shuts off all errors like this, but it seemed dumb for coverage.py to get confused by something like this. So I changed it to not trace files named “*.html”.
Of course, the world is more varied than that, so I got a report of someone with Jinja2 files named “*.jinja2” which now trip the error. So I need a more general solution.
I figure there are a couple of possibilities:
- Don’t measure files at all if they have an extension that isn’t “.py”. This will let us measure extension-less files, and .py files, and will ignore all the rest, on the theory that any other extension implies that we won’t be able to parse it later anyway.
- Measure all files, but during reporting, if a file can’t be parsed, ignore the error if it has an extension that isn’t “*.py”.
- (Shudder) Make a configuration option about what extensions to measure, or which to ignore.
- Some people want “ignore errors” to be the default, but if a file is missing for some reason, it’s important to know, because it will throw off the reporting, and that shouldn’t happen silently.
Do people ever name their Python source files something other than “*.py”? Are there weird ecosystems like this that I’ll only hear about if I make one of these changes?
Comments
There are a couple of good reasons to give your Python file a non-standard extension: (a) because of an extension-based policy, for example on a web server where only files with the .cgi extension get executed as CGI scripts; (b) for command-line tools where the user prefers to type "foo" rather than "foo.py".
How about option (5) — measure all files; try to parse them as Python; if that fails, report naïve (line-based rather than code-based) coverage metrics for them. This might give useful results even for Jinja's .html templates.
On Windows, the ".pyw" extension is used to run Python programs without creating a console window.
Another more ambitious option would be to grab the generated source at runtime: Tornado templates support the PEP 302 loader protocol so linecache works on them.
All: When I said, "if it has an extension that isn't *.py", I wasn't including "no extension" in that. Extension-less files are safe!
@Artem: "hooks" are an interesting idea, but I don't know if tool makers would be able to perform the back-mapping.
Anyone else have specific cases of files with unusual extensions?
Possibly you could treat files without extension or an extension starting in ".py" differently, but I'm not sure there is a need.
1) Must be Python code, exactly like ".py" files, and it should be reported as an error if it cannot be parsed satisfactorily (unless silenced by general error suppression). For example: ".py2" & ".py3" or similar conventions, fancy extensions for application scripts or CGI-like setups.
2) Could be Python code or not; there's no way to tell in advance, and no error should be reported if it isn't. For example: files with no extension, which on POSIX systems might be executable Python scripts or executable scripts for some other interpreter or something completely different.
3) It isn't expected to be Python code, never try to parse it as it would be a waste of time. For example, the mentioned extended-HTML templates.
I suggest a safe default (".py" in class 1, extension-less in class 2, anything else in class 2 or class 3) and two or three optional commandline options to override the default (maybe "-pythonextension", "-maybepythonextension", "-nonpythonextension").
This policy about the "deluxe" treatment of Python sources could be combined with option 4 (reporting missing files), as checking that a file exists and contains the lines referenced in Python bytecode doesn't require parsing it. Another commandline option would be needed to reverse the default.
Add a comment: