Python has an eval() function which evaluates a string of Python code:

assert eval("2 + 3 * len('hello')") == 17

This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is "os.system('rm -rf /')" ? It will really start deleting all the files on your computer. (In the examples that follow, I'll use 'clear' instead of 'rm -rf /' to prevent accidental foot-shootings.)

Some have claimed that you can make eval safe by providing it with no globals. eval() takes a second argument which are the global values to use during the evaluation. If you don't provide a globals dictionary, then eval uses the current globals, which is why "os" might be available. If you provide an empty dictionary, then there are no globals. This now raises a NameError, "name 'os' is not defined":

eval("os.system('clear')", {})

But we can still import modules and use them, with the builtin function __import__. This succeeds:

eval("__import__('os').system('clear')", {})

The next attempt to make things safe is to refuse access to the builtins. The reason names like __import__ and open are available to you in Python 2 is because they are in the __builtins__ global. We can explicitly specify that there are no builtins by defining that name as an empty dictionary in our globals. Now this raises a NameError:

eval("__import__('os').system('clear')", {'__builtins__':{}})

Are we safe now? Some say yes, but they are wrong. As a demonstration, running this in CPython will segfault your interpreter:

s = """
(lambda fc=(
    lambda n: [
        c for c in 
            ().__class__.__bases__[0].__subclasses__() 
            if c.__name__ == n
        ][0]
    ):
    fc("function")(
        fc("code")(
            0,0,0,0,"KABOOM",(),(),(),"","",0,""
        ),{}
    )()
)()
"""
eval(s, {'__builtins__':{}})

Let's unpack this beast and see what's going on. At the center we find this:

().__class__.__bases__[0]

which is a fancy way of saying "object". The first base class of a tuple is "object". Remember, we can't simply say "object", since we have no builtins. But we can create objects with literal syntax, and then use attributes from there. This is the list of all the subclasses of object:

().__class__.__bases__[0].__subclasses__()

or in other words, a list of all the classes that have been instantiated to this point in the program. We'll come back to this at the end. If we shorthand this as ALL_CLASSES, then this is a list comprehension that examines all the classes to find one named n:

[c for c in ALL_CLASSES if c.__name__ == n][0]

We'll use this to find classes by name, and because we need to use it twice, we'll create a function for it:

lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]

But we're in an eval, so we can't use the def statement, or the assignment statement to give this function a name. But default arguments to a function are also a form of assignment, and lambdas can have default arguments. So we put the rest of our code in a lambda function to get the use of the default arguments as an assignment:

(lambda fc=(
    lambda n: [
        c for c in ALL_CLASSES if c.__name__ == n
        ][0]
    ):
    # code goes here...
)()

Now that we have our "find class" function fc, what will we do with it? We can make a code object! It isn't easy, you need to provide 12 arguments to the constructor, but most can be given simple default values.

fc("code")(0,0,0,0,"KABOOM",(),(),(),"","",0,"")

The string "KABOOM" is the actual bytecodes to use in the code object, and as you can probably guess, "KABOOM" is not a valid sequence of bytecodes. Actually, any one of these bytecodes would be enough, they are all binary operators that will try to operate on an empty operand stack, which will segfault CPython. "KABOOM" is just more fun, thanks to lvh for it.

This gives us a code object: fc("code") finds the class "code" for us, and then we invoke it with the 12 arguments. You can't invoke a code object directly, but you can create a function with one:

fc("function")(CODE_OBJECT, {})

And of course, once you have a function, you can call it, which will run the code in its code object. In this case, that will execute our bogus bytecodes, which will segfault the CPython interpreter. Here's the dangerous string again:

(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__() if c.__name__ == n][0]):
    fc("function")(fc("code")(0,0,0,0,"KABOOM",(),(),(),"","",0,""),{})()
)()

So eval is not safe, even if you remove all the globals and the builtins!

We used the list of all subclasses of object here to make a code object and a function. You can of course find other classes and use them. Which classes you can find depends on where the eval() call actually is. In a real program, there will be many classes already created by the time the eval() happens, and all of them will be in our list of ALL_CLASSES. As an example:

s = """
[
    c for c in 
    ().__class__.__bases__[0].__subclasses__() 
    if c.__name__ == "Quitter"
][0](0)()
"""

The standard site module defines a class called Quitter, it's what the name "quit" is bound to, so that you can type quit() at the interactive prompt to exit the interpreter. So in eval we simply find Quitter, instantiate it, and call it. This string cleanly exits the Python interpreter.

Of course, in a real system, there will be all sorts of powerful classes lying around that an eval'ed string could instantiate and invoke. There's no end to the havoc that could be caused.

The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there's just one item left off the list, you can attack the system.

While I was poking around on this topic, I stumbled on Python's restricted evaluation mode, which seems to be an attempt to plug some of these holes. Here we try to access the code object for a lambda, and find we aren't allowed to:

>>> eval("(lambda:0).func_code", {'__builtins__':{}})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
RuntimeErrorfunction attributes not accessible in restricted mode

Restricted mode is an explicit attempt to blacklist certain "dangerous" attribute access. It's specifically triggered when executing code if your builtins are not the official builtins. There's a much more detailed explanation and links to other discussion on this topic on Tav's blog. As we've seen, the existing restricted mode it isn't enough to prevent mischief.

So, can eval be made safe? Hard to say. At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if exclude any string with double underscores you are safe. Maybe...

Update: from a thread on Reddit about recovering cleared globals, a similar snippet that will get you the original builtins:

[
    c for c in ().__class__.__base__.__subclasses__() 
    if c.__name__ == 'catch_warnings'
][0]()._module.__builtins__
tagged: , » 28 reactions

Comments

[gravatar]
Dan Crosta 9:05 AM on 6 Jun 2012

Great in-depth investigation. When I've talked about this in the past (and in 2 days at PyGotham) I've always kind of hand-waved about the risks of eval() and exec (which I believe has all of the same risks inherent?), but now I can just point people here. Thanks for saving me some work!

[gravatar]
Bill Mill 10:05 AM on 6 Jun 2012

There's an interesting attempt at a restricted python: https://github.com/haypo/pysandbox .

I used it in a chatbot once to allow people to eval simple python statements: https://github.com/llimllib/pyphage/blob/master/plugins/eval.py

While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ), I let some pretty serious python hackers have a go at it and they weren't able to break it.

I wouldn't trust it on a for-real server, obviously, but it was good enough for the task I set out to accomplish in that scenario.

[gravatar]
David Goodger 10:26 AM on 6 Jun 2012

Reading this, I was confused. Of course eval() (and exec()) are dangerous if you don't trust the strings to be evaluated.

The article seems to be missing a basic stipulation, perhaps "running untrusted code is dangerous". You seem to be trying to prove that no matter what you do, you can't fully isolate and sanitize untrusted code. OK, I buy that. But I think the article needs a liberal sprinkling of "... for untrusted code/input/source".

eval() and exec() are perfectly safe for trusted code, useful and even necessary on occasion (e.g. dynamic programming). Just never *ever* use them with untrusted input.

[gravatar]
sil 10:44 AM on 6 Jun 2012

Sadly, eval can also evaluate... eval. So the double-underscore trick doesn't work:

eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")')
type 'object'

[gravatar]
lahwran 10:54 AM on 6 Jun 2012

The solution I'd use if you need this is to use the PyPy sandbox. It isn't quite ready for use, but it's pretty close, and it's fully software virtualized so much safer than eval.

Incidentally, Ned himself did some work on it to speed up the startup a few months ago. That's still the main area that needs work, too.

Also, using eval to be "dynamic" is icky.

[gravatar]
Masklinn 10:59 AM on 6 Jun 2012

> At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if exclude any string with double underscores you are safe.

You could always concatenate single underscores. But if you prevent all direct accesses to dunder-methods and dunder-attributes (__subclasses__ in this case, and the indirect access via __getattribute__) this threat should be mostly mitigated. Far's I can tell, this means walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

> Sadly, eval can also evaluate... eval.

Eval's a builtin, if you remove all builtins you can't eval eval:

>>> eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")', {'__builtins__': {}})
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1, in 
NameError: name 'eval' is not defined

[gravatar]
Jeff Blaine 11:22 AM on 6 Jun 2012

The easy (to type) related answer is "Use an OS with RBAC and configure it properly."

[gravatar]
sil 11:59 AM on 6 Jun 2012

Masklinn: aha, of course! My mistake. Thanks :)

[gravatar]
Masklinn 12:03 PM on 6 Jun 2012

> walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

Thought about it for a bit longer that time, you don't actually need to walk the bytecode, just look in co_names and forbid any dunder name in it (this will catch access to user-created ones as well, but I'm not sure there's any reason to care).

[gravatar]
Kent Johnson 12:27 PM on 6 Jun 2012

ast.literal_eval() is a useful alternative for one common use of eval() - to evaluate literal expressions. For example you can convert the string representation of a list into a real list:

>>> ast.literal_eval('[2, 3, "some list"]')
[2, 3, 'some list']
http://docs.python.org/library/ast.html#ast.literal_eval

[gravatar]
haypo 3:30 PM on 6 Jun 2012

eval() is not evil, the os module is not evil, only some functions of the os module are dangerous. I wrote the pysandbox project which installs a lot of protections in the CPython interpreter to hide dangerous function but also any way to get access to these functions. For example, you can give access to the os.urandom() function if you enable the "random" feature of pysandbox. You get an os module with only one function: urandom().

https://github.com/haypo/pysandbox

Bill Mill already mentionned pysandbox.

"While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ),"

I'm working on a new version (1.6) which runs untrusted code in a subprocess. I reintroduced the timeout option, it is now safe (and implemented completly differently).

[gravatar]
Christian Heimes 3:55 PM on 6 Jun 2012

I'm using a modified version of seval from an example at activestate for simplistic restricted Python. The code uses the compiler package and a custom AST walker to verify the code. You can disable certain keywords like raise or forbid access to attributes and objects.

Here is my modified version: http://pastebin.mozilla.org/1657291

[gravatar]
Wes Turner 5:59 PM on 6 Jun 2012

Hey thanks. http://lucumr.pocoo.org/2011/2/1/exec-in-python/ was also a good read. I've heard about some pretty cool things done with AST test case function invocation introspection. I believe Sage has a few examples of walking the AST.

[gravatar]
Ned Batchelder 9:52 PM on 6 Jun 2012

@David Goodger: Yes, you are right, I meant, "safe for untrusted input." I've added that clause to the opening paragraph to make it clearer.

About sandboxes, @lahwran is right, we've worked on the PyPy sandbox, though this investigation was not to try to fix eval, but instead to find a convincing example for people who thing eval can be fixed for untrusted input.

Everyone else, thanks for the interesting pointers and suggestions.

[gravatar]
SFJulie1 4:29 AM on 7 Jun 2012

Eval'ing arbitrary string is like sql.exec'ing arbitrary string. An obvious security hole, since knowing of security, means parsing and understanding the string to look for malicious code.

The philosophy for safe eval'ing should be like SQL prepare : using place holders with validations, or ... being imaginative :
eval("dispatch_table[%s](*%s,**%s)") % func, loads(table_as_json),loads( dict_as_json)

If we do so, the advantage are :
- with a dispatch table we can limit the safe available functions, and if dispatch keys are easy ([a-zA-Z]{8,12}), then, escaping seems very hard ;
- if we use named argument serialized in json it would be easier to avoid active charge in the arguments (only «passive data» are json'able).


ofc it seems very reductive (it looks like we cannot build complex functions trees), unless you notice that if the dispatch table is global, then, dispatch table keys can also be passed as arguments, with arguments.

Okay, it looks like LISP. But all safe parsing problem seems to point to a LISP'like solution.

[gravatar]
Jonathan Hartley 1:43 AM on 8 Jun 2012

Slightly off-topic, but just to cconfirm I understand: It's for these kinds of reasons that I shouldn't be importing a Python source file as my application's config file (because if a user lets malicious input into the config file, then my application will execute it.) So I should be using ConfigObj instead, is that still a recommended route these days?

[gravatar]
Ned Batchelder 7:22 AM on 8 Jun 2012

@Jonathan: an important point about "untrusted input" is, who is your user, where are they, and what damage could they do if you ran code from them? For example, on a desktop application, the program is running on the user's own machine. If they try to "rm -rf /", they will be deleting their own files. Perhaps not something you need to be concerned with. If you're building a web service, then the user is anyone on the internet, and they could destroy your server, a much more dangerous proposition.

[gravatar]
Jonathan Hartley 9:38 AM on 8 Jun 2012

@Ned, thanks for that clarification, which makes sense on one level, but on the other hand: If my application was responsible for deleting all of a user's Desktop files (or many users), then while that is less serious to me and my business, it's obviously more serious to the affected users. I'm not yet convinced that it's something I don't need to worry about. I guess it boils down to: is ever a time when an attacker is not able to execute arbitrary (Python) code, but is able to insert arbitrary Python code into my application's appropriately-permissioned config file? My application would then execute this code, and be blamed as one link in the chain which created the exploited security hole. But if this scenario isn't actually a threat, then I'll continue to store my application config in Python source and effectively 'eval' it when my application runs.

[gravatar]
Janus Troelsen 7:15 PM on 26 Nov 2012

Is eval without _ (underscore), . (dot) [, ] (sharp parenthesis) , " and ' (double-quote and single-quote) dangerous too?

[gravatar]
lahwran 7:18 PM on 26 Nov 2012

Almost certainly. There are usually ways to get other values without entering, say, " - but even short of that, there's _always_ the risk of bugs in the interpreter. There are all sorts of things that aren't tested because people don't use the interpreter that way.

If you want serious, hardcore foreign-code safety, you'll need something like the pypy sandbox or an OS-level thing like virtualization. and even that is vulnerable to some things, apparently people are stealing encryption keys from neighboring VMs now.

[gravatar]
Ned Batchelder 7:53 PM on 26 Nov 2012

@Janus: it depends what kind of risk you're talking about. For example, "9**9**9**9**9" will consume all your memory, and bugs like http://bugs.python.org/issue14010 exist and can be exploited.

[gravatar]
Janus Troelsen 8:07 PM on 26 Nov 2012

Let's consider DoS a non-issue. But I don't need * so I'll disallow it anyway.

I currently use the following cleaner (don't worry, it's not a serious project):

cleancode = lambda x: re.sub("[^0-9A-z _\^|&()]","",x)
Every single exploit I have seen have used [ or . or *. I just want to evaluate boolean literals in a simple way. ast.literal_eval is not really evaluating, just parsing. I have yet to see a non-DoS exploit using the character set above...

I'll PayPal € 5 to the first person that shows me a non-DoS exploit in under 100 characters that survives the filter above.

[gravatar]
Janus Troelsen 8:09 PM on 26 Nov 2012

(no builtins of course, only Python 3.2+)

[gravatar]
haypo 10:56 AM on 27 Nov 2012

Try

((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((1))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

s_push: parser stack overflow
Traceback (most recent call last):
File "", line 1, in
MemoryError

It's not a crash, but a funny error, and the code matchs your regex ;)

[gravatar]
Aaron Meurer 11:15 PM on 14 Feb 2013

What would be a Python 3 version? It seems that code takes 13 arguments instead of 12. I'm curious if it still sefgaults.

[gravatar]
Ned Batchelder 6:47 AM on 15 Feb 2013

@Aaron: it does still segfault, you just have to provide one more argument, and make some of them byte strings:

(0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b"")

[gravatar]
Ned Batchelder 6:48 AM on 15 Feb 2013

Also, BTW, for Python 3, there are some more details here: http://nedbatchelder.com/blog/201302/finding_python_3_builtins.html

[gravatar]
Vladislav Stepanov 2:14 PM on 8 May 2014

Hi, Ned!
I've translated your post into russian, you can find it here: http://habrahabr.ru/post/221937/
Is it ok for you? Sorry for not asking in advance:(

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.