War is peace

Friday 1 February 2013This is more than ten years old. Be careful.

The Rails community has had a few high-profile security issues this week. They are well-summarized, with an alarming list of what follow-ons to expect, by Patrick McKenzie: What the Rails Security Issue Means for Your Startup.


  • Ruby’s YAML parser will execute arbitrary Ruby code,
  • YAML is parsed all over the place in Rails, including for all JSON input,
  • Pretty much every Rails app is going to be compromised soon.

The Python community is in a slightly better position. True, we have pickle in the standard library, which has exactly the same problem, but it’s rare to find applications that accept pickles from untrusted sources.

Don’t ever unpickle data you don’t trust!

The 3rd-party YAML parser PyYAML has the same issue as Ruby’s YAML parser. By default, it will let you create arbitrary Python objects, which means it can run arbitrary Python code. YAML isn’t nearly as pervasive in the Python world, and we don’t parse JSON with the YAML parser usually, but this can still create security holes.

PyYAML has a .load() method and a .safe_load() method. Why do serialization implementers do this? If you must extend the format with dangerous features, provide them in the non-obvious method. Provide a .load() method and a .dangerous_load() method instead. At least that way people would have to decide to do the dangerous thing. I would advocate for PyYAML to make this change now, who cares if backward compatibility breaks? Most people using .load() never intended to deserialize arbitrary Python objects anyway, so they’ll never notice.

If you use the PyYAML library in your code, check now that you are using the .safe_load() method.

If you want automatic serialization of your user-defined classes, take a look at Cerealizer, which works similarly to pickle, but is built to be secure from the start. I’ve never used it, but it looks promising.

BTW, this whole circus reminded me of Allen Short’s excellent lightning talk from PyCon 2010: Big Brother’s Design Rules (skip to 17:30). To summarize Allen’s pithy maxims:

  • War is Peace: assume you are at war, all input is an attack, and then you can be at peace.
  • Slavery is Freedom: the more you constrain your code’s behavior, the more freedom you have to act. The smaller your interface, the smaller your attack surface.
  • Ignorance is Strength: the less your code knows about, the fewer things it can break. This is the principle of least authority.

Allen in particular mentions that adding “conveniences” to your interface can make your life harder later on. In Ruby’s case, there were two unneeded conveniences that combined to make things really bad: parse JSON with the YAML parser, and let the YAML parser construct arbitrary Ruby objects. Neither of these is actually needed by 99.999% of programs reading JSON, but now all of them are compromisable.

Think hard about what your program does. Stay safe.


TLDR: It doesn't affect the vast majority of python users in any way shape or form, you just felt the need to drag it through the mud because rails is quickly being outed as the piece of shit it is.
I would say Ned pointed out a couple very specific places where that exploit could affect python developers. You don't know what every developer out there is doing, so I think this is really good advice.
load() and dangerous_load() is a great idea
@Walls: wow, this post is "dragging it through the mud"? You might not want to venture into the rest of the internet.... :)
Python libraries which unpickle untrusted data are unfortunately in widespread use. Please don't underestimate this - *even in stdlib* Cookie.Cookie does exactly this and it is still somehow accepted in the name of backward compatibility.

Posts like Ned's acknowledging issues like this and taking them seriously are more helpful and do more credit to the Python community than just "closing the issue" would. Every language has problems, not just Ruby (and certainly not just Rails). If there's a way for the Python community to distinguish itself here it is by taking security seriously and getting out ahead of the issues instead of just getting defensive.
@Walls: Writing secure software is hard - it requires a combination of paranoia (assuming you will be attacked instead of thinking "Why would anyone ever try to exploit this?") and humility (assuming your attackers will be smarter than you instead of thinking "I don't know how to break it, therefore it is secure") that most people don't have. It also often comes at a cost in flexibility - pickle is a lot more powerful than JSON as a data format, but that power carries with it a huge increase in risk.

The core Python team tries hard to promote a culture of "use as much magic as you need, but no more" (often paraphrased as "magic is evil", and included in the Zen of Python in various guises like "explicit is better than implicit", "simple is better than complex", "complex is better than complicated" and "if the implementation is hard to explain, it's a bad idea"). However, it's always going to be tempting to make the powerful and flexible option the default, and the more restrictive option the exception.

As an example that was fixed in Python 3: Python 2 has "input()" which implicitly calls "eval()" on user supplied data. The safer alternative, which allows the use of more restrictive parsing by always returning a string, is called "raw_input()". In Python 3, the input() builtin itself has been fixed to behave like Python 2's raw_input()

However, even in Python 3, the builtin eval() is still dangerous to use on user-supplied data, as it can execute *any* Python expression. For obscure technical reasons, the safer-but-more-limited alternative, "ast.literal_eval()", isn't even a builtin the way raw_input() was.

Only in Python 3.3 did we start shipping a comparison operation suitable for security sensitive operations (hmac.compare_digest), and there are still no suitable primitives for password hashing in the standard library (although "passlib" is just a download away on PyPI).

No Pythonista should ever feel smug about security woes in another language or runtime, whether that's Java or Ruby or something else. We have a track record of promoting "safe by default" behaviour, but our record certainly isn't perfect, and we'll almost certainly have more issues in the future. Standard library behaviours that are safe within the confines of a single system (like sharing pickled objects through a pipe) become unsafe when spanning multiple systems (like sharing pickled objects without cryptographic signatures across a network socket), and we're relying on other developers to understand that. Heck, the Rails vulnerability is overshadowing a recent MoinMoin exploit which was used to take out both Debian's main wiki and the Python wiki on python.org.

Looking specifically at the case of the recent Rails problems, even apps written in Python may run into trouble if a related Rails app, or an unrelated Rails app on the same network, falls to an attacker. Attackers don't stop just with the first machine compromised - every compromised machine becomes a platform for launching additional attacks, often with additional data about or privileged access to subsequent target systems.

The design space available for programming languages is enormous, and we collectively still know very little about how to write large scale software sensibly. When other languages and software are attacked, it is important to reflect on it and see what lessons can be learned for our own tools (as Ned has done here), rather than arrogantly assuming ourselves to be immune from the same kinds of error.
Lise de Saint Romain 12:41 AM on 2 Feb 2013
Very good advice. Thanks.
"the less your code knows about, the fewer things it can break"

I like that. Nicely put.
Nice blog post! I'm glad to see the renewed attention to security in Python. It would absolutely be a shame if we didn't learn from the failures of others -- because then the only failures we have to learn from are our own, and when that happens we've already lost.
Change load to safe_load and safe_load to real_safe_load.
There are big problems with Rails.
There aren't any problems with Python.
Rails is not comparable to Python.
I don't see any amazing insight in this post.
As much as I love Python (my first favorite language :'-), Wall and, to a lesser degree, DC demonstrate one of the main reasons I don't use it anymore: these insufferable anti-Ruby Python trolls are omnipresent in the community, and hiring them in a polyglot team leads to totally unnecessary tension. It's to the point that for any developer position for even non-python positions I have interview questions to weed them out ASAP. As this post demonstrates, you can't even mention "Ruby" without these toxic individuals showing up.
Wow. So many Python fan boys.
I think we should all be thankful of the efforts of all python (cpython in particular) contributors and everyone should read Nick Coghlan's comment (found above).

There is only one thing that at the present time irks me in python and it is package management. I would love to see http://www.python.org/dev/peps/pep-0381/ implemented as a starting point and maybe even parts of the technical spec of TUF found at https://www.updateframework.com/ integrated into the code (perhaps in the 'pip' module).

BTW, if anyone is interested in a 'dumb search' for 'potentially' unsafe module/module function calls in their python code, I maintain a small grep script which can be found at https://github.com/d1b/python-check-script/blob/master/python_hunt.sh
Thanks for the write-up Ned. I was wondering about the same thing with our standard json, simplejson, pyyaml, etc. and just hadn't had the time yet to satisfy the curiosity.

* don't feed the trolls *
Thank you for writing this, I have been thinking the same thing about PyYAML.
My half-baked plan of action would be use Github's code search to dig up some real-world examples of unsafe PyYAML usage, and petition the PyYAML author to

- Increase the major version
- Rename load() to unsafe_load()
- rename safe_load to load(), but keep safe_load() as an alias

This would break the API for some users, but I suspect many people are using YAML as a "prettier JSON", and should really be using safe_load anyway.
Chris Sattinger 10:15 AM on 3 Feb 2013
Django uses pickle for its cache implementation.

Celery by default uses pickle for sending objects through the broker. You can switch it to json, but then you need to implement json methods for any complex objects you are sending. All of my objects in celery are only ids and strings but I should go make absolutely sure.

Popular packages should get community security reviews. Maybe eyeballs are good enough.

Pip needs to be checking PGP keys and we need to all be signing our distributions when we push. That's serious and we should get on that. Gather the most paranoid dudes and fortify the castle.
For those who are able, it's time to jump on this pip ticket and start giving it some TLS cert and GPG signature verifications:

This is definitely a problem with the jsonpickle library, but has always been a documented issue. Normal json/yaml decoders in python just return arrays and dicts of the given data, so the chance of the implemetation allowing any sort of execution is much, much lower.

The need for caution when using *native* serialisation seems obvious enough to go without saying to myself, but perhaps a little more warning should be more heavily peppered in the pickle/unpickle documentation.
> True, we have pickle in the standard library, which has
> exactly the same problem, but it's rare to find applications
> that accept pickles from untrusted sources.

There was known issue on this.
Great post, I'm a Python guy but I do like the synergistic relationship that Ruby and Python have. Which is why I'll be learning Ruby this year.
"PyYAML has a .load() method and a .safe_load() method."

Ruby had a .load() method and... well, that's it. Pretty much every Ruby application that parsed untrusted YAML did so unsafely because there was no trivial way to parse it the correct, safe way, and the parser developers had been dragging their feet on adding one. That's a fairly fundamental difference.
Here are some solutions to security problems when serializing/deserializing data in the context of distributed systems:

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.