tl;dw: Stop mocking, start testing

Saturday 2 June 2012

At PyCon 2012, Augie Fackler and Nathaniel Manista gave a talk entitled, Stop Mocking, Start Testing. This is my textual summary of the talk, the first of a series of summaries. You can look at Augie and Nathaniel's slides themselves, or watch the video:

If you not only don't have time to watch the video, but don't even want to read this summary, here's the tl;dr:

Test doubles are useful, but can get out of hand. Use fakes, not mocks. Have one authoritative fake implementation of each service.

Here's (roughly) what Augie and Nathaniel said:

We work on Google Code, which has been a project since July 2006. There are about 50 engineer-years of work on it so far. Median time on the project is 2 years, people rotate in and out, which is usual for Google. Google code offers svn, hg, git, wiki, issue tracker, download service, offline batch, etc. They started off with a few implementation languages, now there are at least eight.

There are many servers and processes, components, including RPC services, all talking to each other, until finally at the bottom there's persistence. Your code is probably like this too: stateless components, messages sent between components, user data stored statefully at the bottom.

What's been the evolution of the testing process? Standard operating procedure as of 2006: Limited test coverage. We inherited the svn test suite, but it had to be run manually against a preconfigured dev env then manually examine output! Took all afternoon!

"Tests? We have users to test!" An effective but stressful way to find bugs. Users are not a test infrastructure. Tests that cost more people time than CPU time are bad. A project can't grow this way. If the feature surface area grows linearly, the time spent testing grows quadratically.

Starting to Test (2009): A new crew of engineers rolled onto the project, but they didn't understand the existing code. Policy: tests are required for new and modified code. Untouched code remained untested. The core persistence is changed a lot, so it's well tested, but the layers above might not, and that untested code would break on deploy. We set up a continuous build server, with red/green light, though a few engrs are red/green blind, so we had to find just the right colors!

We thought we were doing well, adding tests was helping, but the tests were problems themselves. Everyone made their own mock objects. We had N different implementations of a mock. When the real code changed, you have to find all N mocks and update them.

It wasn't just N mocks: even with one mock, it would tell us what we wanted to hear. The mocks do what we said, instead of accurately modeling the real code. Tests would pass, then the product would break on deploy. The mocks had diverged from the real code.

Lessons so far:

  • Share mocks among test modules.
  • Maybe you don't need a mock: if an object is cheap, then don't mock it.
  • If you need a mock, have exactly one well-tested mock.
  • Test the mock against the real implementation.
  • If you don't have time to fully test the mock, at least use Python introspection to confirm that the interfaces are the same. The inspect module has the tools to do this.

We tried to use full Selenium system tests to make up for gaps in unit coverage. Selenium is slow, race conditions creep in, difficult to diagnose problems. They weren't a good replacement for unit tests, unit tests give much better information.

We tested user stories with full system tests, this worked much better. Still use system tests, but test the user story, not the edge conditions.

We went through Enlightenment, now we have modern mocking:

  • A common collection of authoritative mocks.
  • The mock collection is narrow, only the things we really need to test.
  • The mock is isolated. No code dependency between the mocks and the real code. Mocks don't inherit from real implementations.
  • Mocks are actually fakes. Lots of terms: mocks, stubs, dummies, fakes, doubles, etc. Fakes are full in-memory implementations of the interface they are mocking.
  • (from a question at the end:) Mocks are works in progress, they only implement what is needed, so strong interface checking wouldn't work to confirm they have the right interface.
  • What gets mocked? Only expensive external services. Everything else is real code.

Testing today: Tests are written to the interface, not the implementation. When writing tests ask yourself, "how much could the implementation change, and not have to change the test?" Running against mocks in CI makes the tests go faster, and reduces cycles.

We used to do bad things:

  • use a framework to inject dependencies.
  • use a framework to create mock objects.
  • have constructors automatically create resources if they were not passed in.
  • twist code into weird shapes to make it work.

Now we do good things:

  • Object dependencies are required constructor params. Implicit params are bad, because it's hard to track all those implicit connections. If you forget a required parameter, it's very apparent. If object A doesn't make sense without object B, then don't default it.
  • Separate state from behavior. (code sample at 22:30 in the video) An instance method that reads an attribute, performs calculations on it, and assigns it back to an attribute. The calculation in the middle can be pulled into a pure function, and the method can change to self.b = pure_fn(self.a).
  • Classes shrink before your eyes. Functional programming is very testable.

Define clear interfaces between components. If you can't figure out how to write a test, it's a code smell, you need to think more about the product code.


Yegor Bugayenko 8:17 AM on 28 Sep 2014

I wrote my own article on this subject and then I found yours :) Many thanks for a great writing. Here is mine:

Add a comment:

Ignore this:
not displayed and no spam.
Leave this empty:
not searched.
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.