This is the 2500th post on this blog. That’s a lot of writing. I estimate
this site has about 480,000 words in total, enough for five books.
I’ve been writing here for more than 18 years. The pace is different than
when I started: last year I wrote 33
posts. Compare that to 2003, when I wrote more than ten times as many:
Twitter has siphoned off some of the
short-post energy, but also interests shift over time.
Writing is a good way to understand things, and to learn things. People
mostly think of writing as a way to teach and explain, and I am glad when my
posts can do that. But I also really value the feedback loop of learning as I
explain, and the deeper understanding I find when I teach.
Here’s a common piece of advice from people who create things: to make better
things, make more things. Not only does it give you constant practice at making
things, but it gives you more chances at lucking into making a good thing.
These days I set myself a goal of writing two posts a month. I find the goal
helpful. It prods me to dig for topics. Some will be duds, but sometimes an
apparently boring idea will turn out well.
I can’t promise everything (or anything!) will be interesting or insightful.
But I’ll keep writing here. Thanks for reading.
module is a very convenient way to serialize and de-serialize objects. It needs
no schema, and can handle arbitrary Python objects. But it has problems. This
post briefly explains the problems.
Some people will tell you to never use pickle because it’s bad. I won’t go
that far. I’ll say, only use pickle if you are OK with its nine flaws:
- Old pickles look like old code
- __init__ isn’t called
- Python only
- Appears to pickle code
Here is a brief explanation of each flaw, in roughly the order of
Pickles can be hand-crafted that will have malicious effects when you
unpickle them. As a result, you should never unpickle data that you do not
The insecurity is not because pickles contain code, but because they create
objects by calling constructors named in the pickle. Any callable can be used
in place of your class name to construct objects. Malicious pickles will use
other Python callables as the “constructors.” For example, instead of executing
“models.MyObject(17)”, a dangerous pickle might execute “os.system(‘rm -rf /’)”.
The unpickler can’t tell the difference between “models.MyObject” and
“os.system”. Both are names it can resolve, producing something it can call.
The unpickler executes either of them as directed by the pickle.
More details, including an example, are in Supakeen’s post
in Python’s standard library.
Old pickles look like old code
Because pickles store the structure of your objects, when they are unpickled,
they have the same structure as when you pickled them. This sounds like a good
thing and is exactly what pickle is designed to do. But if your code changes
between the time you made the pickle and the time you used it, your objects may
not correspond to your code. The objects will still have the structure created
by the old code, but they will be running with the new code.
For example, if you’ve added an attribute since the pickle was made, the
objects from the pickle won’t have that attribute. Your code will be expecting
the attribute, causing problems.
The great convenience of pickles is that they will serialize whatever
structure your object has. There’s no extra work to create a serialization
structure. But that brings problems of its own. Do you really want your
datetimes serialized as datetimes? Or as iso8601 strings? You don’t have a
choice: they will be datetimes.
Not only don’t you have to specify the serialization form, you can’t specify
Pickles are implicit: they serialize everything in your objects, even data
you didn’t want to serialize. For example, you might have an attribute that is
a cache of computation that you don’t want serialized. Pickle doesn’t have a
convenient way to skip that attribute.
Worse, if your object contains an attribute that can’t be pickled, like an
open file object, pickle won’t skip it, it will insist on trying to pickle it,
and then throw an exception.
__init__ isn’t called
Pickles store the entire structure of your objects. When the pickle module
recreates your objects, it does not call your __init__ method, since the object
has already been created.
This can be surprising, since nowhere else do objects come into being without
calling __init__. The logic here is that __init__ was already called when the
object was first created in the process that made the pickle.
But your __init__ method might perform some essential work, like opening file
objects. Your unpickled objects will be in a state that is inconsistent with
your __init__ method. Or your __init__ might log information about the object
being created. Unpickled objects won’t appear in the log.
Pickles are specific to Python, and are only usable by other Python programs.
This isn’t strictly true, you can find packages for other languages that can use
pickles, but they are rare. They will naturally be limited to the cross-language
generic list/dict object structures, at which point you might as well just use
A pickle is a binary data stream (actually instructions for an abstract
execution engine.) If you open a pickle as a plain file, you cannot read its
contents. The only way to know what is in a pickle is to use the pickle module
to load it. This can make debugging difficult, since you might not be able to
search your pickle files for data you are interested in:
>>> pickle.dumps([123, 456])
Appears to pickle code
Functions and classes are first-class objects in Python: you can store them
in lists, dicts, attributes, and so on. Pickle will gladly serialize objects
that contain callables like functions and classes. But it doesn’t store the
code in the pickle, just the name of the function or class.
Pickles are not a way to move or store code, though they can appear to be.
When you unpickle your data, the names of the functions are used to find
existing code in your running process.
Compared to other serialization techniques, pickle can be slow as Ben
Frederickson demonstrates in
Some of these problems can be addressed by
special methods to your class, like __getstate__ or __reduce__. But once
you start down that path, you might as well use another serialization method
that doesn’t have these flaws to begin with.
There are lots of other ways to serialize objects, ranging from plain-old
JSON to fancier alternatives like marshmallow, cattrs, protocol buffers, and
I don’t have a strong recommendation for any one of these. The right answer
will depend on the particulars of your problem. It might even be pickle...
58 is the sum of the first seven primes: 2 + 3 + 5 + 7 + 11 + 13 + 17.
Since Trump’s “election” in 2016, politics have been overwhelming enough that
it’s been difficult to catch my breath long enough to write anything about it.
These last few weeks have only intensified that feeling, but have also demanded
a response of some sort.
George Floyd’s killing was egregious enough to finally light a match to
tinder that had been drying and accumulating for a long time. As difficult as
it is to confront the gross injustices that run through our society, it is
encouraging to see people come together to call it out and address it.
I can try to speak up in my small way. It would be easy to sit back and say
I have not personally seen problems with police or in how society treats me.
But that is not evidence that all is well. The difference between my experience
and others’ is precisely the problem.
People with privilege, the people who can do something, are the people who
don’t experience the problems. We have to listen to others, to people not like
us. We have to face difficult truths about our place in society. It doesn’t
mean we are bad people. It doesn’t mean we have sought to subjugate others.
Privilege doesn’t mean we don’t have our own challenges and struggles. But we
benefit where others do not, and we have to act.
Trump’s incompetence, disregard, corruption, and malice are on full display
now, because of both COVID-19 and the Black Lives Matter movement. There are
signs that this could be a significant turning point. But it will not be easy,
and conflicts will get worse before they get better.
I’m looking for ways to help. I can donate money, though the current
extensive energy on the left means the progressive organization landscape is
cluttered and confusing. I wish there was more I could do. I am looking for
I’m in the #python IRC channel on Freenode a lot. The people there are often
quite opinionated. Julian had the idea
of processing the logs to see what we thought was good, and what was bad, using
sophisticated sentiment analysis.
Finding out what I liked and didn’t like wasn’t hard, since the
“sophisticated sentiment analysis” was two regexes:
“<nedbat>.* is good” and “<nedbat>.* is bad”!
Without further commentary, here is a sampling of things that I said were bad:
- vertical alignment is bad because it means you might have to change many lines just because one of them got wider.
- eval(input()) is bad.
- blindly following stuff is bad.
- trolling is bad. Find a way to use your brains for good.
- floats for currency is bad.
- any class that you can only instantiate once is bad.
- that sounds like a singleton, which is bad.
- some people say, “you should start by learning assembler,” and i think that is bad advice.
- del is fine. __del__ is bad
- implicit copying is bad
- texture (repetition you can see when you squint) is bad in code
- aligning indents with the opening delimiter is bad.
- this is bad: def main(nums=[1,2,3]).
- monkeypatching is bad
- it is bad to modify a list you are iterating.
- __import__ is bad
- import * is bad stuff
- the python doc search is bad
- and-or is bad
- singletons are hidden global state, and global state is bad
- checking types is bad
- that “:type myparam:” syntax is bad, it’s not readable. use google style instead: https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html#google-vs-numpy
- python is bad at recursion deeper than 1000
- there is a package on pypi called time, which is bad.
And here are some things I said were good:
- “python -m app_name” is good, but recent.
- endswith is good
- xpath is good at selecting nodes in an xml dom tree.
- recursion is good for recursive structures. Iteration is good for iterative structures
- madlibs is good because you can do string manipulation with puerile humor
- learning is good! :)
- python is good for full applications.
- re.sub is good.
- mock(spec=thing) is good
- excitement is good :)
- pip will know how to install into virtualenvs, which is good.
- gist.github.com is good
- argparse is good for simple things
- it is good to cover your tests, though others disagree with me
- textwrap.dedent is good
- requirements.txt is good for recreating environments.
- the csv module is good at writing out dicts as rows.
- this is good for seeing how rst will be formatted: http://rst.ninjs.org/
- colorama is good
- yaml is good
- setUp is good. tearDown is better done as addCleanup
- yield from is good for when you want one generator to produce all the values from another generator.
- duck typing is good when there’s an operation supported across a number of types, and you can just use the operation without worrying about the type.
- learning is good! :)
- Django is good if you like having lots of things handled for you. Flask is good if you like to put together all the pieces yourself.
- tox is good for testing against multiple environments
- for validating email addresses, this is good: [^@ ]+@[^@ ]+\.[^@ ]+ https://nedbatchelder.com/blog/200908/humane_email_validation.html
- the python.org tutorial is good if you have programmed in other languages before.
- “pip install -e .” is good
- the interactive interpreter is good for experimentation, but isn’t good for real development.
- bpaste.net is good
- Think Python is good
- atexit is good.
- python is good, and we are helpful and friendly :)
- pandas is good if you need to manipulate tables of data. If you don’t, then don’t use pandas
- you want to do something for each thing in a list, that’s what a for loop is good for :)
- there’s an old habit of using “:type:blah:” or whatever, which is horrible. Sphinx now supports the “napoleon” style natively, which is good: http://www.sphinx-doc.org/en/1.4.8/ext/napoleon.html
- pytest does cool assert rewriting, which 99.9999% of the time is good magic.
- pudb is good in the terminal
- trying to be efficient is good.
- sql is good for some kinds of data. nosql is good for others.
- .format is good
- a decorator is good for wrapping functions in new functionality.
- the pytest -k option is good at that.
- i would not try to jam everything into setup.py. this feels like something a makefile is good at.
- pytest is good at parameterized tests.
- “if not list:” is good python
- @classmethod is good for alternate constructors, yes.
- rg is good: https://github.com/BurntSushi/ripgrep
- the prompt is good for doing small experiments. Once you have larger programs, put them in .py files, and run them: python myprog.py
- low tech is good tech
- Learning is good.
- numpy is good when you can do whole-matrix operations at once. If you need to iterate over elements and do individual operations, it doesn’t provide any benefit.
- any is good when you have an iterable of true/false
- choice is good. Why should there be only one implementation?
- gist.github.com is good, or paste.pound-python.org
- click is good
- you’re not using a shell, which is good.
- whatever helps you learn is good
- numpy is good when you are doing matrix and array operations. lists are good for ordered collections of things
- obfuscation isn’t something Python is good at.
- the ast module is good for one thing: representing python programs as a tree of nodes. It provides tools for parsing source text into that tree.
- python is good as a first language
- isolation is good, but doing it with mocks can be a problem in itself
- subclassing is good for when SubClass, by its essence is a ParentClass.
- talking is good :)
- coverage.py is good.
- recursion is good for recursive structures (trees). iteration is better for linear structures (lists)
- Jupyter is good for visualizations, graphing, tables, etc. interactive experimentation
- lxml.html is good
- sha256 is good too
- Fluent Python is good, if you like books
- for speed, PyPy is good. Or Cython. if you want to write C code, you can use cffi to call it from Python
- curiosity is good
- collections.Counter is good at counting things, and would do this in O(N).
- .encode makes the conversion explicit, which is good. my_bytes = my_unicode.encode(“utf8”)
- they is good.
- pudb is good
- in python 3, super() is good, but it doesn’t work in python 2.
- the -k option is good for this, or you can define markers.
- virtualenv is good for separating different projects’ needs
- tig is good too
- rst is good at multi-page docs, without assuming it will be html. markdown just shrugs and says, “use html when you have to”
- writing is good just for its own sake.
- bpaste.net is good.
- learning is good
- dependencies are good. using other people’s solutions to your problems is good.
- split has a much better PR agency, but partition is good too
- attribute access is good.
- i won’t say that loops should introduce scopes, but it is good to be able to understand the interplay between scoping and closur-ing
- https://pypi.org/project/appdirs/ is good for answering that question
- automate the boring stuff is good. What kind of software will you be writing?
- iso8601 is good
- prompt_toolkit is good
- numpy is good when you have an array full of data, and you can do one operation that works on all of it at once
My son Nat is 30 and has autism. His expressive language is somewhat
limited, and he relies on rote answers when he can. A few years ago, one of his
caregivers taught him that if someone asks, “What’s up?” you can answer, “Not
At first I thought, “that’s not always a good answer,” but then I started
paying attention to how people around me responded, and sure enough, not only is
it a good answer, but it’s almost always the answer people give.
It’s especially appropriate these days. Nat has been living with us through
these COVID-19 times since the middle of March, and he is frustrated at how
limited his days have become.
One of Nat’s favorite things is a three-week calendar showing what’s going to
happen. It was a regular routine when he would visit on weekends from his group
home: we’d sit down and update his calendar. Every day would be marked with
where he’d wake up, what he would do during the day, and where he would go to
bed. Unusual activities and special events in particular would be noted and
recited. He would often sit and study the calendar, or would ask to review it
When he first moved in for the lock-down, we tried updating his calendar, but
the result was too depressingly accurate: every day was the same, and every day
was at home. The only special event was Passover, which had been changed from
in-person to Zoom:
Soon the weekly calendar was abandoned as uninteresting, and Nat started
saying, “April,” meaning, I want it to be April when this will be over and I can
go back to my regular life.
Of course, April came without a let-up of the lockdown, so he started saying,
“May.” Now that we are in May, he says, “Summer.” We can’t give him a definite
answer. The best we can do is to remind him that we have to wait, and that
everyone he knows is also at home waiting.
I have been walking with Nat, a
long-favored activity of ours. We’re
up to about five miles a day, which is good for both of us.
My wife Susan has been handling most
of the weekday activities other than the walks, trying to find things for Nat
to do. They have been doing a lot of Facebook, a baking project most days, a
little street basketball, and chores around the house.
She has taken to calling this
Little Day Program“:
Suzie’s Little Day Program pros and cons:
Pros: 1) Great staff-to-client ratio; 2) Lots of love;
3) Lots of napping; 4) Great treats; 5) Strong exercise
component; 6) No ABA Whatsoever.
Cons: 1) Over-reliance on sugar; 2) Over-napping;
3) Not enough variety in peer group; 4) Moody staff;
5) Unreliable hours; 6) No ABA Whatsoever; 7) Often boring
Overall, Nat is taking this very well. He has settled into this
underwhelming routine. He dutifully wears his face mask on walks, and now knows
to walk far around other people on the sidewalk without me prompting him. He
likes getting on Zoom calls with the groups he is part of
Special Olympics, his day
program), even if it’s just to watch because jumping in is difficult in those
We have gotten used to having him around full-time. There are 12 nearly empty
bottles of shampoo in the shower (hard to explain). A few favorite Disney movies
are on tight rotation. He keeps us a little more regimented, since ad-hoc is
not his style.
I keep a close eye on him. He will not let us know if he starts to feel
sick, so we have to be alert for him. He can be very passive, so it’s easy to
feel guilty if he is staring into space or napping too much. We feel like we
should be filling his time somehow, but it’s not possible to keep him busy ten
hours a day.
Luckily, he has been in a calm period overall. There were other times in his
life when these days would have been much stormier. We hope that his even
temper continues. It’s been seven weeks so far, and we don’t know how much
longer we are going to be together like this.
This is just our life right now. We’re all doing what we can. What’s going
on? I’d have to say, not much.