|Ned Batchelder : Blog | Code | Text | Site|
Hunting a random() bug
» Home : Blog : February 2013
At edX, we have Python behind the scenes in courses to initialize the state of problems presented to students. Often, these problems are randomized so that different students will see different details in quantitative problems, but each student's random seed is saved so that the student will see the same problem if they revisit the page.
The seed is used to seed the random module before executing any chunk of course Python, so that you can simply use the random module and know that you'll get an appropriate value.
Today I found code like this in a course:
My task was to refactor how information flowed around, and the_seed wasn't going to be available, so I asked why the code was like this. It seemed odd, because the random module had just been seeded before this code was invoked, so why had the author bothered to re-seed the module with the same seed?
The answer was that it was a mysterious bug from months ago where the first time the code was run, it would produce a different result than any other time, and the re-seeding solved it. The q import seemed to be messing with the random seed, but only the first time.
The "only first time" clue pointed to it being code that is run on import. Remember, Python modules are just a series of statements, and when you import a module, it really executes all the statements. There's no "import mode" that just collects function definitions. If you write a statement with a side effect at the top level of a module, that side effect will happen when you import the module.
But statements in module are only executed the first time the module is imported in a process. Subsequent imports simply produce another reference to the existing module object. Everything pointed to a statement running during import which stomped on the random module.
The q module imported a number of other modules, including numpy and sympy. But why would importing a module re-seed the random module?
A little experimenting showed that sympy was at fault here:
Looking at the values, after importing sympy, we've skipped ahead one number in our random sequence. So sympy isn't re-seeding the generator, it's consuming a random number.
To find out where, we resorted to a monkey-patching trick: Replace random.random with a booby-trap:
OK, not sure why it's importing its tests when I try to use the package, but looking at the code, here's the culprit:
Here we can see the problem. Remember that function arguments are computed once, when the function is defined. Since this function is defined when the module is imported, random.random() will be called during import, consuming one of our random numbers.
Better would be to define it like this:
I'm not quite sure which behavior the author wanted, one seed for all the instances, or one seed per instance. I know I don't want importing this module to change my random number sequence.
Amusingly enough, the behavior of the initializer is irrelevant, it's only called in one place, and never defaults the seed argument:
The best solution for our code would be to not rely on the module-level random number sequence, and instead use our own Random object. Come to think of it, that's what sympy should do too.
BTW, looking at why sympy is importing test infrastructure when I import it, there's this in sympy/utilities/__init__.py:
This makes using utilities very convenient, since it contains everything at the top level. But the downside is it means you must always take everything. There is no way to import only part of utilities. Even if you use "from utilities.lambdify import lambdify," Python will execute the utilities/__init__.py file, importing everything.
tagged: python» 11 reactions