Sunday 30 November 2008 — This is 16 years old. Be careful.
Python’s doctest is very cool technology, letting you write a narrative in documentation that can then be executed directly to test the code. But as cool as it is, I don’t like it very much:
- You can’t run a subset of the tests.
- If a failure happens in the middle of the doctest, the whole thing stops.
- The coding style is stylized, and has to have printable results.
- Your code is executed in a special way, so it’s harder to reason about how it will be executed, harder to add helpers, and harder to program around the tests.
Most importantly, though, doctest seems like an odd way to write tests. Docstrings, and the long sequence of code they encourage, may be good ways to explain what code does, but explaining and testing are two different tasks, and the code you write for each will be different. So why try to serve two masters at once?
Either your expository text will be cluttered with uninformative edge cases, or your tests will merely skim the surface of what your code can do.
I know that doctest can be used independently of the actual docstrings in the code, but then where’s the great advantage? I’d rather use real software engineering tools to write my tests, and the idiomatic way doctest executes code and evaluates results don’t help me.
I’m not the only one who feels this way. Andrew Bennetts has two posts with much more detail about these issues: Narrative tests are lousy tests and Tests are code, doctests aren’t.
While I admire the ingenuity that went into doctest, I just don’t find it a good tool for testing real code.
Comments
And the trick for using doctest efficiently is to use it to test your test program, not the docstrings in your code.
I still think that doctests have value. That's why I ported them to Lua.
I like doctest because I see a lot of overlap between unit tests and documentation. I cannot grok prose descriptions of code. Only samples show me how much (or how little) that bit of code protects against edge-case abuse; or how much (or how little) that bit of code does clean-up of input and other hand-holding. Prose descriptions of all this can be misleading. English just isn't a good way to unambiguously talk about code. And documentation does not always keep up with changes to code, unless you have automated enforcement.
Do the docstrings drift into weirder and weirder edge cases? Yes, of course. When the docstring becomes too esoteric, stop reading! You are not compelled to read to the bitter end.
Responces to your other points:
*) You can't run a subset of the tests.
Even a small change in code compels me to re-run all tests. I think this is recommended practice.
*) If a failure happens in the middle of the doctest, the whole thing stops.
If you run after every small change, you have a good idea where the problem is.
*) The coding style is stylized, and has to have printable results.
Sound engineering practice, to keep code running away from a human's ability to understand the code. If the code cannot report its output clearly in a print statement, heaven help you when you return to that code months later.
*) Your code is executed in a special way, so it's harder to reason about how it will be executed, harder to add helpers, and harder to program around the tests.
Not so very special. It mimics the Python read-eval-print loop (REPL). If you have some code that you cannot interactively explore with Python's REPL, you are at a disadvantage when you come back to it months later, and want to try using that code for new problems.
My workflow:
*) Immediately setup unittest, one of the unit tests runs all doctests
*) use doctests to write unit tests as I write code (I use Fredrik's tip to quick-and-dirty cut-and-paste, technically using the doctests as regression tests)
*) if a unit test is served by non-trivial set-up or tear-down, write it solely in unittest framework
*) the docstrings all end with very boring doctest edge cases. Later, I will not compel myself to read all docstrings to the bitter end. Bad style for writing short stories for fan fiction, but acceptable style for docstrings.
> While I admire the ingenuity that went into doctest, I just don't find it a good tool for testing real code.
I guess Zope3 is fake code, then. I personally would have qualified this parting statement...
Then in the code/documentation browser enable the ability to link from the code to the unit tests / documentation about that code.
I find doctests too constraining. I simply use functions beginning with "test_" and execute all of those. No need for a framework. And they all get run each time my code executes. Think of it as a precondition before main()
@manuelg: "Even a small change in code compels me to re-run all tests. I think this is recommended practice."
I find I often need to debug within a single failing test. Having that test be half-way through a long docstring transcript makes this very difficult. The unittest framework lets me run a single test by name, which makes it much easier to focus in on what I need to fix.
@manuelg: "I guess Zope3 is fake code, then. I personally would have qualified this parting statement..."
I didn't say it couldn't be done, just that I don't like doing it that way. I think my last statement is qualified enough: "*I* just don't find it..."
I may take a look at the Zope3 code to see how they've managed it.
I see doctests only good for creating a narrative example usage of some code base or api. Beyond that they have no other positive value IMO and wish they were never created as they do more harm than good.
Recent example from a scraper. I don't even think I need to write any more docs here. The first example is pretty clear about what this function does:
Doctest reports only the individual REPL lines that fail, no matter the length of transcript.
Rather than docstrings of staggering size, you have options to break the tests down. Doctest also lets you put tests into the dictionary "__test__", with a descriptive key. Doctest gives resolution all the way down to individual class methods. I have never heard of putting all of a module's doctests into a single string.
J. M. Camara:
> ...Beyond that they have no other positive value IMO and wish they were never created as they do more harm than good.
Lemme guess. Another adequately qualified statement about doctest? Is this the way the game is played?
N. Batchelder:
> I'd rather use real software engineering tools to write my tests...
This is the root of the matter, I believe.
You know, there are some parts of the universe where Python and other runtime-dynamically-typed languages are not considered "real software engineering tools".
Unittest is a hammer, and doctest is a nail gun. Use them in their proper domains... for great success. You can use a hammer's claw to scratch behind your ear. I recommend nail guns, but would not advise them for this use.
There's no game here. The commenter said, "IMO", and stated his opinion. People can have different opinions about the utility of a tool, and are allowed to state those opinions. This is the second time you've taken a statement of opinion as some sort of underhanded trick. Chill out.
manuelg - I did say it was my opinion. I try to keep my responses to a blog short as I feel a lengthy discussion can feel like hijacking the blog. To get into all the reasons why I feel doctests are bad would require a lengthy discussion.
Any way, over the last 10 years I have mentored something like 40+ people using Python. I'm always trying to figure what's the best approach to turn them into great programmers. Which generally translates into getting them to be good at design. I have found that in those who I introduce doctests I find they tend to develop design skills at a rate of 3-7 times slower than those who use unit style testing. They tend to pick up design skills at a rate only slightly better than those who do no testing at all. This has been my experience and the main reason I have a negative view on doctests.
Even though I have a negative view on doctests I do use them. I just use them for documentation purposes, to verify the docs are correct, rather than create them for testing my code.
I can imagine that over time the test suite would get longer, but the rule of "when it gets boring, stop reading" seems like it would work well enough. I also always have the option of porting to pyunit if I think that would help things.
Generally, I'm happy when doctest looks like it'll work, since it means I'm in for a very efficient time writing my tests. minimock is also pretty awesome at cutting out test-writing mechanics, although again, if you find you're leaning on the '...' feature too much, it's time to jump back to pyunit.
Fair enough. I apologize to you and J. M. Camara. I was particularly interested to hear about John's experience with introducing different unit testing styles to beginning Python programmers. If doctests had the consistent effect of fostering poor design skills, then I could see having a strong prejudice against doctest.
My personal experience with doctest has been more in line with drewp's, and I am glad he gave example code in his short comment.
As you note in your post, there have been some recent complaints and criticisms against doctest. I feel, far beyond what would be expected for what is just a single module in the standard library. Before I criticize a tool, I first assume other people are able to use it profitably, and try to find out how they are using it.
May I suggest that using doctest in concert with a structured unittest tool is the key. I would not want to defend using doctest as drop-in replacement for unittest (or another structured unit-test tool). But I was surprised that there is little appreciation between the overlap of documentation and unit testing. I find code is a natural medium for talking about code.
I think the issue with doctests is that while trying to write a narrative, writing "happy path" test cases, then covering corner and bad test cases, and at the same time dealing with design issues is just too much for the average programmer to deal with at the same time so something tends to give. That something that tends to give is either inadequate test coverage, poor design decisions, or the narrative is too complicated to read.
IMO doctest should only be used for documentation purposes where the test that are written only cover the "happy paths". This allows for a clean and simple narrative of the API which will get other developers up to speed quickly with the code base.
Doctests satisfy those same goals with similar imperfections. Providing typical usage immediately inline with the code communicates more efficiently than test cases in another function or another file. You long for the day when our tests are correctly categorized as "typical usage", "parameter validation", "edge cases", "design narratives", etc.; when the editor shows the typical usage in an adjacent window; when other handy tabs show which tests are failing, and the code coverage of passing and failing tests separately. You feel the pain of the doctest tool being abused.
Currently, our best editors tend to manage graphical windows poorly and barely understand test cases. Doctests will eventually disappear after they serve their purpose.
Consider this doctest:
>>> ## Typical Usage
>>> g = Frob('mark')
>>> print g.frobbed()
mark
>>> g.frob('bob')
>>> print g.frob_chain()
mark frobbed bob
>>> print g.frob('sue').frob('jenny').unfrob('mark').frob_chain()
bob frobbed sue. sue frobbed jenny.
>>> ## Edge cases
>>> # No frobs
>>> Frob().frob_chain()
no frobbing here
>>> ## Parameter checks
>>> Frob().unfrob('mark')
Traceback....
And then:
$ yank_doc_tests -not_typical frob.py tests/frob_
frob_edge_cases.py: no_frobs()
frob_parameter_checks.py: test1()
frob.py shorter by 15 lines
This handles drewp's comment, simplifies some test writing, and costs little.
Add a comment: