Things I don’t like about doctest

Sunday 30 November 2008This is nearly 16 years old. Be careful.

Python’s doctest is very cool technology, letting you write a narrative in documentation that can then be executed directly to test the code. But as cool as it is, I don’t like it very much:

  • You can’t run a subset of the tests.
  • If a failure happens in the middle of the doctest, the whole thing stops.
  • The coding style is stylized, and has to have printable results.
  • Your code is executed in a special way, so it’s harder to reason about how it will be executed, harder to add helpers, and harder to program around the tests.

Most importantly, though, doctest seems like an odd way to write tests. Docstrings, and the long sequence of code they encourage, may be good ways to explain what code does, but explaining and testing are two different tasks, and the code you write for each will be different. So why try to serve two masters at once?

Either your expository text will be cluttered with uninformative edge cases, or your tests will merely skim the surface of what your code can do.

I know that doctest can be used independently of the actual docstrings in the code, but then where’s the great advantage? I’d rather use real software engineering tools to write my tests, and the idiomatic way doctest executes code and evaluates results don’t help me.

I’m not the only one who feels this way. Andrew Bennetts has two posts with much more detail about these issues: Narrative tests are lousy tests and Tests are code, doctests aren’t.

While I admire the ingenuity that went into doctest, I just don’t find it a good tool for testing real code.

Comments

[gravatar]
I think the purpose of doctest is to verify that the code examples are (still) valid rather than writing test suites through documentation.
[gravatar]
The only way I have used doctest is to make sure I am including valid code snippets in the documentation, not as a full blown test suite. Use unit testing for that. I'm not sure what the original purpose of doctest was, but if people are using it as a full test framework they are missing the point of docstrings.
[gravatar]
The advantage is that you can write tests a lot faster, simply by leaving out the expected output, running the tests, inspecting the output, and then using good old cut and paste to get a regression test. Why waste time on figuring out what the expected result is when you can let the computer tell you?

And the trick for using doctest efficiently is to use it to test your test program, not the docstrings in your code.
[gravatar]
I agree with AdSR: Doctests are no unit tests. Or if you really like it doctests are unit tests for the documentation, not for the code.

I still think that doctests have value. That's why I ported them to Lua.
[gravatar]
I have one of my unit tests run my doctest code samples.

I like doctest because I see a lot of overlap between unit tests and documentation. I cannot grok prose descriptions of code. Only samples show me how much (or how little) that bit of code protects against edge-case abuse; or how much (or how little) that bit of code does clean-up of input and other hand-holding. Prose descriptions of all this can be misleading. English just isn't a good way to unambiguously talk about code. And documentation does not always keep up with changes to code, unless you have automated enforcement.

Do the docstrings drift into weirder and weirder edge cases? Yes, of course. When the docstring becomes too esoteric, stop reading! You are not compelled to read to the bitter end.

Responces to your other points:

*) You can't run a subset of the tests.

Even a small change in code compels me to re-run all tests. I think this is recommended practice.

*) If a failure happens in the middle of the doctest, the whole thing stops.

If you run after every small change, you have a good idea where the problem is.

*) The coding style is stylized, and has to have printable results.

Sound engineering practice, to keep code running away from a human's ability to understand the code. If the code cannot report its output clearly in a print statement, heaven help you when you return to that code months later.

*) Your code is executed in a special way, so it's harder to reason about how it will be executed, harder to add helpers, and harder to program around the tests.

Not so very special. It mimics the Python read-eval-print loop (REPL). If you have some code that you cannot interactively explore with Python's REPL, you are at a disadvantage when you come back to it months later, and want to try using that code for new problems.

My workflow:

*) Immediately setup unittest, one of the unit tests runs all doctests

*) use doctests to write unit tests as I write code (I use Fredrik's tip to quick-and-dirty cut-and-paste, technically using the doctests as regression tests)

*) if a unit test is served by non-trivial set-up or tear-down, write it solely in unittest framework

*) the docstrings all end with very boring doctest edge cases. Later, I will not compel myself to read all docstrings to the bitter end. Bad style for writing short stories for fan fiction, but acceptable style for docstrings.

> While I admire the ingenuity that went into doctest, I just don't find it a good tool for testing real code.

I guess Zope3 is fake code, then. I personally would have qualified this parting statement...
[gravatar]
I think there needs to be a new tool that turns unit tests into documentation, the inverse of DocTest.

Then in the code/documentation browser enable the ability to link from the code to the unit tests / documentation about that code.

I find doctests too constraining. I simply use functions beginning with "test_" and execute all of those. No need for a framework. And they all get run each time my code executes. Think of it as a precondition before main()
[gravatar]
I like the point many of you are making about using doctests simply to check that code samples in the docstrings are still accurate. That seems like a great use.

@manuelg: "Even a small change in code compels me to re-run all tests. I think this is recommended practice."

I find I often need to debug within a single failing test. Having that test be half-way through a long docstring transcript makes this very difficult. The unittest framework lets me run a single test by name, which makes it much easier to focus in on what I need to fix.

@manuelg: "I guess Zope3 is fake code, then. I personally would have qualified this parting statement..."

I didn't say it couldn't be done, just that I don't like doing it that way. I think my last statement is qualified enough: "*I* just don't find it..."

I may take a look at the Zope3 code to see how they've managed it.
[gravatar]
The unfortunate side effect of doctests is that users do not get the "hidden" benefits that can be had from unit tests. More specifically these users do not gain design experience they would receive if they wrote unit tests before writing code. How many people would write out the doctests by hand and before writing the code as that would eliminate one of the main features of doctests (copy & paste).

I see doctests only good for creating a narrative example usage of some code base or api. Beyond that they have no other positive value IMO and wish they were never created as they do more harm than good.
[gravatar]
John - I love writing my doctests first. The order of the code on the page ends up being the order I write it in, in fact. First a new function, then some doctest examples with as much output text as I feel like writing, then I start on the real code. I might miss on my output a little bit ("2" != "2.0"), so I fix the output lines.

Recent example from a scraper. I don't even think I need to write any more docs here. The first example is pretty clear about what this function does:
def parseTimes(s):
    """
    >>> parseTimes('(1:30), (5:00), 8:30')
    ['13:30', '17:00', '20:30']
    >>> parseTimes('(2:10), 8:15')
    ['14:10', '20:15']
...

[gravatar]
> Having that test be half-way through a long docstring transcript makes this very difficult.

Doctest reports only the individual REPL lines that fail, no matter the length of transcript.

Rather than docstrings of staggering size, you have options to break the tests down. Doctest also lets you put tests into the dictionary "__test__", with a descriptive key. Doctest gives resolution all the way down to individual class methods. I have never heard of putting all of a module's doctests into a single string.

J. M. Camara:

> ...Beyond that they have no other positive value IMO and wish they were never created as they do more harm than good.

Lemme guess. Another adequately qualified statement about doctest? Is this the way the game is played?

N. Batchelder:

> I'd rather use real software engineering tools to write my tests...

This is the root of the matter, I believe.

You know, there are some parts of the universe where Python and other runtime-dynamically-typed languages are not considered "real software engineering tools".

Unittest is a hammer, and doctest is a nail gun. Use them in their proper domains... for great success. You can use a hammer's claw to scratch behind your ear. I recommend nail guns, but would not advise them for this use.
[gravatar]
@manuelg: "Lemme guess. Another adequately [sic] qualified statement about doctest? Is this the way the game is played?"

There's no game here. The commenter said, "IMO", and stated his opinion. People can have different opinions about the utility of a tool, and are allowed to state those opinions. This is the second time you've taken a statement of opinion as some sort of underhanded trick. Chill out.
[gravatar]
drewp - I'm glad to hear you write the doctest first. Your definitely in the minority of doctest users. I have no issues in the way you use doctest in the example you provide but would you write doctests for invalid and corner cases(i.e. testing for 0:00, 1:62, 25:16, etc). I know if I'm looking at documentation for an api I wouldn't want to see all these corner cases and failures. It would just clutter the documentation but they are also important test cases.

manuelg - I did say it was my opinion. I try to keep my responses to a blog short as I feel a lengthy discussion can feel like hijacking the blog. To get into all the reasons why I feel doctests are bad would require a lengthy discussion.

Any way, over the last 10 years I have mentored something like 40+ people using Python. I'm always trying to figure what's the best approach to turn them into great programmers. Which generally translates into getting them to be good at design. I have found that in those who I introduce doctests I find they tend to develop design skills at a rate of 3-7 times slower than those who use unit style testing. They tend to pick up design skills at a rate only slightly better than those who do no testing at all. This has been my experience and the main reason I have a negative view on doctests.

Even though I have a negative view on doctests I do use them. I just use them for documentation purposes, to verify the docs are correct, rather than create them for testing my code.
[gravatar]
In the scraper case, I wrote 4 cases (each with a few example times in it) and then ended up adding one more for a corner case (12:00) and editing one for another case (leading whitespace). So for this data point, at least, there are currently 10 lines of doctest. That's probably half the number of lines (sans setup boilerplate) that I would use for the pyunit version.

I can imagine that over time the test suite would get longer, but the rule of "when it gets boring, stop reading" seems like it would work well enough. I also always have the option of porting to pyunit if I think that would help things.

Generally, I'm happy when doctest looks like it'll work, since it means I'm in for a very efficient time writing my tests. minimock is also pretty awesome at cutting out test-writing mechanics, although again, if you find you're leaning on the '...' feature too much, it's time to jump back to pyunit.
[gravatar]
At a lunch chat at pycon2008, someone mentioned that they were very pleased with the "write an executable API doc first" model of using doctest - in particular, it gets a lot of the plain usability issues of the API out of the way before you even have code. His suggestion was that as far as narrative flow goes, the more obscure cases go in an Appendix - since the user may actually want to consult them (if they're not as obscure as you thought they were) and will have high confidence in their accuracy, even if they don't need to read them most of the time. It seemed like a plausible way to avoid too much "clutter" that inline doctests seem to cause...
[gravatar]
> People can have different opinions about the utility of a tool, and are allowed to state those opinions. This is the second time you've taken a statement of opinion as some sort of underhanded trick. Chill out.

Fair enough. I apologize to you and J. M. Camara. I was particularly interested to hear about John's experience with introducing different unit testing styles to beginning Python programmers. If doctests had the consistent effect of fostering poor design skills, then I could see having a strong prejudice against doctest.

My personal experience with doctest has been more in line with drewp's, and I am glad he gave example code in his short comment.

As you note in your post, there have been some recent complaints and criticisms against doctest. I feel, far beyond what would be expected for what is just a single module in the standard library. Before I criticize a tool, I first assume other people are able to use it profitably, and try to find out how they are using it.

May I suggest that using doctest in concert with a structured unittest tool is the key. I would not want to defend using doctest as drop-in replacement for unittest (or another structured unit-test tool). But I was surprised that there is little appreciation between the overlap of documentation and unit testing. I find code is a natural medium for talking about code.
[gravatar]
I wouldn't say that doctests foster poor design it just I see developers who use unit tests tend to pick up design skills at a faster rate. Any one who writes tests before the code whether they use unit tests or doctests will improve design skills as the developer is forced to have to think about a layer of abstraction higher than the code to be developed which leads to better design.

I think the issue with doctests is that while trying to write a narrative, writing "happy path" test cases, then covering corner and bad test cases, and at the same time dealing with design issues is just too much for the average programmer to deal with at the same time so something tends to give. That something that tends to give is either inadequate test coverage, poor design decisions, or the narrative is too complicated to read.

IMO doctest should only be used for documentation purposes where the test that are written only cover the "happy paths". This allows for a clean and simple narrative of the API which will get other developers up to speed quickly with the code base.
[gravatar]
Consider your distaste of doctests as a dissatisfaction with the state of our editors. In the dark ages before cross reference tools were commonly integrated, I inserted error-prone verbiage (or verbage) describing "called by:" and "calls to:" for some portions of code. The goal was to efficiently communicate the mental model of the code, its usage, and its state.

Doctests satisfy those same goals with similar imperfections. Providing typical usage immediately inline with the code communicates more efficiently than test cases in another function or another file. You long for the day when our tests are correctly categorized as "typical usage", "parameter validation", "edge cases", "design narratives", etc.; when the editor shows the typical usage in an adjacent window; when other handy tabs show which tests are failing, and the code coverage of passing and failing tests separately. You feel the pain of the doctest tool being abused.

Currently, our best editors tend to manage graphical windows poorly and barely understand test cases. Doctests will eventually disappear after they serve their purpose.
[gravatar]
On further reflection, doctests could be trivially improved:

Consider this doctest:

>>> ## Typical Usage
>>> g = Frob('mark')
>>> print g.frobbed()
mark
>>> g.frob('bob')
>>> print g.frob_chain()
mark frobbed bob
>>> print g.frob('sue').frob('jenny').unfrob('mark').frob_chain()
bob frobbed sue. sue frobbed jenny.
>>> ## Edge cases
>>> # No frobs
>>> Frob().frob_chain()
no frobbing here
>>> ## Parameter checks
>>> Frob().unfrob('mark')
Traceback....


And then:

$ yank_doc_tests -not_typical frob.py tests/frob_
frob_edge_cases.py: no_frobs()
frob_parameter_checks.py: test1()
frob.py shorter by 15 lines


This handles drewp's comment, simplifies some test writing, and costs little.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.