Eval really is dangerous

Wednesday 6 June 2012

Python has an eval() function which evaluates a string of Python code:

assert eval("2 + 3 * len('hello')") == 17

This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer. (In the examples that follow, I’ll use ‘clear’ instead of ‘rm -rf /’ to prevent accidental foot-shootings.)

Some have claimed that you can make eval safe by providing it with no globals. eval() takes a second argument which are the global values to use during the evaluation. If you don’t provide a globals dictionary, then eval uses the current globals, which is why “os” might be available. If you provide an empty dictionary, then there are no globals. This now raises a NameError, “name ‘os’ is not defined”:

eval("os.system('clear')", {})

But we can still import modules and use them, with the builtin function __import__. This succeeds:

eval("__import__('os').system('clear')", {})

The next attempt to make things safe is to refuse access to the builtins. The reason names like __import__ and open are available to you in Python 2 is because they are in the __builtins__ global. We can explicitly specify that there are no builtins by defining that name as an empty dictionary in our globals. Now this raises a NameError:

eval("__import__('os').system('clear')", {'__builtins__':{}})

Are we safe now? Some say yes, but they are wrong. As a demonstration, running this in CPython will segfault your interpreter:

s = """
(lambda fc=(
    lambda n: [
        c for c in
            if c.__name__ == n
eval(s, {'__builtins__':{}})

Let’s unpack this beast and see what’s going on. At the center we find this:


which is a fancy way of saying “object”. The first base class of a tuple is “object”. Remember, we can’t simply say “object”, since we have no builtins. But we can create objects with literal syntax, and then use attributes from there.

Once we have object, we can get the list of all the subclasses of object:


or in other words, a list of all the classes that have been instantiated to this point in the program. We’ll come back to this at the end. If we shorthand this as ALL_CLASSES, then this is a list comprehension that examines all the classes to find one named n:

[c for c in ALL_CLASSES if c.__name__ == n][0]

We’ll use this to find classes by name, and because we need to use it twice, we’ll create a function for it:

lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]

But we’re in an eval, so we can’t use the def statement, or the assignment statement to give this function a name. But default arguments to a function are also a form of assignment, and lambdas can have default arguments. So we put the rest of our code in a lambda function to get the use of the default arguments as an assignment:

(lambda fc=(
    lambda n: [
        c for c in ALL_CLASSES if c.__name__ == n
    # code goes here...

Now that we have our “find class” function fc, what will we do with it? We can make a code object! It isn’t easy, you need to provide 12 arguments to the constructor, but most can be given simple default values.


The string “KABOOM” is the actual bytecodes to use in the code object, and as you can probably guess, “KABOOM” is not a valid sequence of bytecodes. Actually, any one of these bytecodes would be enough, they are all binary operators that will try to operate on an empty operand stack, which will segfault CPython. “KABOOM” is just more fun, thanks to lvh for it.

This gives us a code object: fc(“code”) finds the class “code” for us, and then we invoke it with the 12 arguments. You can’t invoke a code object directly, but you can create a function with one:

fc("function")(CODE_OBJECT, {})

And of course, once you have a function, you can call it, which will run the code in its code object. In this case, that will execute our bogus bytecodes, which will segfault the CPython interpreter. Here’s the dangerous string again, in more compact form:

(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__()
    if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"KABOOM",(),

So eval is not safe, even if you remove all the globals and the builtins!

We used the list of all subclasses of object here to make a code object and a function. You can of course find other classes and use them. Which classes you can find depends on where the eval() call actually is. In a real program, there will be many classes already created by the time the eval() happens, and all of them will be in our list of ALL_CLASSES. As an example:

s = """
    c for c in
    if c.__name__ == "Quitter"

The standard site module defines a class called Quitter, it’s what the name “quit” is bound to, so that you can type quit() at the interactive prompt to exit the interpreter. So in eval we simply find Quitter, instantiate it, and call it. This string cleanly exits the Python interpreter.

Of course, in a real system, there will be all sorts of powerful classes lying around that an eval’ed string could instantiate and invoke. There’s no end to the havoc that could be caused.

The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there’s just one item left off the list, you can attack the system.

While I was poking around on this topic, I stumbled on Python’s restricted evaluation mode, which seems to be an attempt to plug some of these holes. Here we try to access the code object for a lambda, and find we aren’t allowed to:

>>> eval("(lambda:0).func_code", {'__builtins__':{}})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
RuntimeErrorfunction attributes not accessible in restricted mode

Restricted mode is an explicit attempt to blacklist certain “dangerous” attribute access. It’s specifically triggered when executing code if your builtins are not the official builtins. There’s a much more detailed explanation and links to other discussion on this topic on Tav’s blog. As we’ve seen, the existing restricted mode it isn’t enough to prevent mischief.

So, can eval be made safe? Hard to say. At this point, my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...

Update: from a thread on Reddit about recovering cleared globals, a similar snippet that will get you the original builtins:

    c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == 'catch_warnings'


Dan Crosta 9:05 AM on 6 Jun 2012

Great in-depth investigation. When I've talked about this in the past (and in 2 days at PyGotham) I've always kind of hand-waved about the risks of eval() and exec (which I believe has all of the same risks inherent?), but now I can just point people here. Thanks for saving me some work!

Bill Mill 10:05 AM on 6 Jun 2012

There's an interesting attempt at a restricted python: https://github.com/haypo/pysandbox .

I used it in a chatbot once to allow people to eval simple python statements: https://github.com/llimllib/pyphage/blob/master/plugins/eval.py

While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ), I let some pretty serious python hackers have a go at it and they weren't able to break it.

I wouldn't trust it on a for-real server, obviously, but it was good enough for the task I set out to accomplish in that scenario.

David Goodger 10:26 AM on 6 Jun 2012

Reading this, I was confused. Of course eval() (and exec()) are dangerous if you don't trust the strings to be evaluated.

The article seems to be missing a basic stipulation, perhaps "running untrusted code is dangerous". You seem to be trying to prove that no matter what you do, you can't fully isolate and sanitize untrusted code. OK, I buy that. But I think the article needs a liberal sprinkling of "... for untrusted code/input/source".

eval() and exec() are perfectly safe for trusted code, useful and even necessary on occasion (e.g. dynamic programming). Just never *ever* use them with untrusted input.

sil 10:44 AM on 6 Jun 2012

Sadly, eval can also evaluate... eval. So the double-underscore trick doesn't work:

eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")')
type 'object'

lahwran 10:54 AM on 6 Jun 2012

The solution I'd use if you need this is to use the PyPy sandbox. It isn't quite ready for use, but it's pretty close, and it's fully software virtualized so much safer than eval.

Incidentally, Ned himself did some work on it to speed up the startup a few months ago. That's still the main area that needs work, too.

Also, using eval to be "dynamic" is icky.

Masklinn 10:59 AM on 6 Jun 2012

> At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if exclude any string with double underscores you are safe.

You could always concatenate single underscores. But if you prevent all direct accesses to dunder-methods and dunder-attributes (__subclasses__ in this case, and the indirect access via __getattribute__) this threat should be mostly mitigated. Far's I can tell, this means walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

> Sadly, eval can also evaluate... eval.

Eval's a builtin, if you remove all builtins you can't eval eval:

>>> eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")', {'__builtins__': {}})
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1, in 
NameError: name 'eval' is not defined

Jeff Blaine 11:22 AM on 6 Jun 2012

The easy (to type) related answer is "Use an OS with RBAC and configure it properly."

sil 11:59 AM on 6 Jun 2012

Masklinn: aha, of course! My mistake. Thanks :)

Masklinn 12:03 PM on 6 Jun 2012

> walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

Thought about it for a bit longer that time, you don't actually need to walk the bytecode, just look in co_names and forbid any dunder name in it (this will catch access to user-created ones as well, but I'm not sure there's any reason to care).

Kent Johnson 12:27 PM on 6 Jun 2012

ast.literal_eval() is a useful alternative for one common use of eval() - to evaluate literal expressions. For example you can convert the string representation of a list into a real list:

>>> ast.literal_eval('[2, 3, "some list"]')
[2, 3, 'some list']

haypo 3:30 PM on 6 Jun 2012

eval() is not evil, the os module is not evil, only some functions of the os module are dangerous. I wrote the pysandbox project which installs a lot of protections in the CPython interpreter to hide dangerous function but also any way to get access to these functions. For example, you can give access to the os.urandom() function if you enable the "random" feature of pysandbox. You get an os module with only one function: urandom().


Bill Mill already mentionned pysandbox.

"While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ),"

I'm working on a new version (1.6) which runs untrusted code in a subprocess. I reintroduced the timeout option, it is now safe (and implemented completly differently).

Christian Heimes 3:55 PM on 6 Jun 2012

I'm using a modified version of seval from an example at activestate for simplistic restricted Python. The code uses the compiler package and a custom AST walker to verify the code. You can disable certain keywords like raise or forbid access to attributes and objects.

Here is my modified version: http://pastebin.mozilla.org/1657291

Wes Turner 5:59 PM on 6 Jun 2012

Hey thanks. http://lucumr.pocoo.org/2011/2/1/exec-in-python/ was also a good read. I've heard about some pretty cool things done with AST test case function invocation introspection. I believe Sage has a few examples of walking the AST.

Ned Batchelder 9:52 PM on 6 Jun 2012

@David Goodger: Yes, you are right, I meant, "safe for untrusted input." I've added that clause to the opening paragraph to make it clearer.

About sandboxes, @lahwran is right, we've worked on the PyPy sandbox, though this investigation was not to try to fix eval, but instead to find a convincing example for people who thing eval can be fixed for untrusted input.

Everyone else, thanks for the interesting pointers and suggestions.

SFJulie1 4:29 AM on 7 Jun 2012

Eval'ing arbitrary string is like sql.exec'ing arbitrary string. An obvious security hole, since knowing of security, means parsing and understanding the string to look for malicious code.

The philosophy for safe eval'ing should be like SQL prepare : using place holders with validations, or ... being imaginative :
eval("dispatch_table[%s](*%s,**%s)") % func, loads(table_as_json),loads( dict_as_json)

If we do so, the advantage are :
- with a dispatch table we can limit the safe available functions, and if dispatch keys are easy ([a-zA-Z]{8,12}), then, escaping seems very hard ;
- if we use named argument serialized in json it would be easier to avoid active charge in the arguments (only «passive data» are json'able).

ofc it seems very reductive (it looks like we cannot build complex functions trees), unless you notice that if the dispatch table is global, then, dispatch table keys can also be passed as arguments, with arguments.

Okay, it looks like LISP. But all safe parsing problem seems to point to a LISP'like solution.

Jonathan Hartley 1:43 AM on 8 Jun 2012

Slightly off-topic, but just to cconfirm I understand: It's for these kinds of reasons that I shouldn't be importing a Python source file as my application's config file (because if a user lets malicious input into the config file, then my application will execute it.) So I should be using ConfigObj instead, is that still a recommended route these days?

Ned Batchelder 7:22 AM on 8 Jun 2012

@Jonathan: an important point about "untrusted input" is, who is your user, where are they, and what damage could they do if you ran code from them? For example, on a desktop application, the program is running on the user's own machine. If they try to "rm -rf /", they will be deleting their own files. Perhaps not something you need to be concerned with. If you're building a web service, then the user is anyone on the internet, and they could destroy your server, a much more dangerous proposition.

Jonathan Hartley 9:38 AM on 8 Jun 2012

@Ned, thanks for that clarification, which makes sense on one level, but on the other hand: If my application was responsible for deleting all of a user's Desktop files (or many users), then while that is less serious to me and my business, it's obviously more serious to the affected users. I'm not yet convinced that it's something I don't need to worry about. I guess it boils down to: is ever a time when an attacker is not able to execute arbitrary (Python) code, but is able to insert arbitrary Python code into my application's appropriately-permissioned config file? My application would then execute this code, and be blamed as one link in the chain which created the exploited security hole. But if this scenario isn't actually a threat, then I'll continue to store my application config in Python source and effectively 'eval' it when my application runs.

Janus Troelsen 7:15 PM on 26 Nov 2012

Is eval without _ (underscore), . (dot) [, ] (sharp parenthesis) , " and ' (double-quote and single-quote) dangerous too?

lahwran 7:18 PM on 26 Nov 2012

Almost certainly. There are usually ways to get other values without entering, say, " - but even short of that, there's _always_ the risk of bugs in the interpreter. There are all sorts of things that aren't tested because people don't use the interpreter that way.

If you want serious, hardcore foreign-code safety, you'll need something like the pypy sandbox or an OS-level thing like virtualization. and even that is vulnerable to some things, apparently people are stealing encryption keys from neighboring VMs now.

Ned Batchelder 7:53 PM on 26 Nov 2012

@Janus: it depends what kind of risk you're talking about. For example, "9**9**9**9**9" will consume all your memory, and bugs like http://bugs.python.org/issue14010 exist and can be exploited.

Janus Troelsen 8:07 PM on 26 Nov 2012

Let's consider DoS a non-issue. But I don't need * so I'll disallow it anyway.

I currently use the following cleaner (don't worry, it's not a serious project):

cleancode = lambda x: re.sub("[^0-9A-z _\^|&()]","",x)
Every single exploit I have seen have used [ or . or *. I just want to evaluate boolean literals in a simple way. ast.literal_eval is not really evaluating, just parsing. I have yet to see a non-DoS exploit using the character set above...

I'll PayPal € 5 to the first person that shows me a non-DoS exploit in under 100 characters that survives the filter above.

Janus Troelsen 8:09 PM on 26 Nov 2012

(no builtins of course, only Python 3.2+)

haypo 10:56 AM on 27 Nov 2012



s_push: parser stack overflow
Traceback (most recent call last):
File "", line 1, in

It's not a crash, but a funny error, and the code matchs your regex ;)

Aaron Meurer 11:15 PM on 14 Feb 2013

What would be a Python 3 version? It seems that code takes 13 arguments instead of 12. I'm curious if it still sefgaults.

Ned Batchelder 6:47 AM on 15 Feb 2013

@Aaron: it does still segfault, you just have to provide one more argument, and make some of them byte strings:


Ned Batchelder 6:48 AM on 15 Feb 2013

Also, BTW, for Python 3, there are some more details here: http://nedbatchelder.com/blog/201302/finding_python_3_builtins.html

Vladislav Stepanov 2:14 PM on 8 May 2014

Hi, Ned!
I've translated your post into russian, you can find it here: http://habrahabr.ru/post/221937/
Is it ok for you? Sorry for not asking in advance:(

Valentine Gogichashvili 11:14 AM on 6 Sep 2014

Actually there is way to protect yourself against that issue and please correct me if I am wrong:

Using ast.parse and then searching for any system attribute invocation you can detect a problematic code and block it from being executed. One also has to block the exec statement though, here an example:

def check_ast_node_is_safe(node):
    Check that the ast node does not contain any system attribute calls
    as well as exec call (not to construct the system attribute names with strings).

    eval() and exec() function calls should not be a problem, as they are not exposed
    in the globals and __builtins__

    >>> node = ast.parse('def __call__(): return 1')
    >>> node == check_ast_node_is_safe(node)

    >>> check_ast_node_is_safe(ast.parse('def m(): return ().__class__'))
    Traceback (most recent call last):

    >>> check_ast_node_is_safe(ast.parse('def horror(g): exec "exploit = ().__" + "class" + "__" in g'))
    Traceback (most recent call last):


    for n in ast.walk(node):
        if isinstance(n, ast.Attribute):
            if n.attr.startswith('__'):
                raise InvalidEvalExpression()
        elif isinstance(n, ast.Exec):
            raise InvalidEvalExpression()
    return node

def safe_eval(expr, **kwargs):
    Safely execute expr (...an excerpt...)
    g = {'__builtins__': {}, 'object': object, '__name__': __name__}
    # __builtins__ should be masked away to disable builtin functions
    # object is needed if the NewStyle class is being created
    # __name__ is needed to be able to complie a class

    node = compile(expr, '', 'exec', ast.PyCF_ONLY_AST | compiler.consts.CO_FUTURE_PRINT_FUNCTION)
    node = check_ast_node_is_safe(node)
    cc = compile(node, '', 'exec')  # can be nicely cached
    l = {}
    exec (cc, g, l)

Kiran 8:48 AM on 20 Jan 2016


I wanted to use eval in my code, but I found the drawbacks of eval, so I found my own way, but I wanted to check can we break this code. Here I don't want arithmetic operations.

def hack():
    print("You can't hack my code.")
    return "break";

str1 = "lambda open read + - * / % ** // "
str2 = "().__class__.__bases__[0].__subclasses__()"
    print(eval(str1.replace('()', 'hacker').replace('lambda', 'hacker').replace('open', 'hacker').replace('**', 'hacker').replace('/', 'hacker').replace('%', 'hacker').replace('*', 'hacker').replace('+', 'hacker').replace('-', 'hacker'), {'__builtins__': {}}, {'hacker': hack()}))
except (IOError, AttributeError, SyntaxError, RuntimeError):
    print("Invalid string found")

Kiran 11:11 AM on 20 Jan 2016

Small Change :
small change =>
jsn['logic18'].replace('().', 'hacker()').replace('__class__', 'hacker() ').replace('lambda', 'hacker()').replace('open', 'hacker()').replace(
'**', 'hacker()').replace('/', 'hacker()').replace('%', 'hacker()').replace('*', 'hacker()').replace(
'+','hacker()').replace('-', 'hacker()'), {'__builtins__': {}},
{'hacker': hack}))

Ned Batchelder 12:39 PM on 20 Jan 2016

@Kiran: I don't think a blacklist approach is going to work, unless you are willing to leave very little that is possible, that is, unless you make it something so weak that it is uninteresting. Also, your examples here don't make much sense. Where is the user's untrusted input?

Kiran 5:30 PM on 20 Jan 2016

@Ned: Here If i replace "jsn['logic18']" with anything, my program must not break. In the above operation I am not allowing import or instantiate any object except given local objects and If someone want to break using arithmetic operations like "9**9**9**9**9", "1/0", 1/(10 *0), ... etc., still my code must not break. I restricted access and arithmetic failures, Now I can call any local functions using my input string.

Ned Batchelder 6:36 PM on 20 Jan 2016

@Kiran: this technique isn't giving you any safety. You replace "().", but what about "( )."? or "( ) . "? I'm not sure you've read this blog post carefully: there are many ways to cause mischief, you can't prevent them all with a blacklist.

Kiran 6:36 AM on 21 Jan 2016

@Ned: I have replace '__class__' to blacklist, do you have any other ways to mischief the eval.

lahwran 7:32 AM on 21 Jan 2016

You are thinking about this wrong. This is not simply hard, it effectively cannot be done. do not use eval. if you want to run math, then make your own language, with your own parser, and your own runner - it isn't actually very hard with the help of something like parsley. if you eval foreign code, then you *will* lose. as long as you are using eval, I will be able to find a way through it. I ran through your code and tested attacks until I found one that worked, with all the restrictions; at least one exists, and I will not tell you what it is. This approach is broken. Do not use it. If you want to be able to run python code remotely, check out the pypy sandbox, or use one of the os-level sandboxes. Do not use eval. Presumably someone linked you to this page; they did so for a reason. Do not use eval. Do not use exec. Do not run foreign python code without manually checking it.

Abandon hope, all ye who eval foreign code.

lahwran 7:50 AM on 21 Jan 2016

despite that you claim to have disallowed these, and despite running in restricted mode, your example has ways for me to:
- make arbitrary functions
- import any module I like
- call any function that is available outside the sandbox - any function, not just built in ones
- replace any code outside the sandbox
- crash cpython in one of a variety of ways

you can't do it. if you deploy this, someone will break into it.

Here's some code you can use that actually-safely does an eval with foreign code:

def hack():
    print("You can't hack my code.")
    return "break";

str1 = jsn["logic18"]
    str1 = "" # to make sure it's safe to eval
except (IOError, AttributeError, SyntaxError, RuntimeError):
    print("Invalid string found")

Nag 2:21 AM on 10 Jan 2017

Extremely interesting article. But it begs the question:
Has this been tested on a machine?
Or is this from a purely 'mathematical'(?) standpoint?
Did you arrive at this based on your knowledge of the behaviour of python functions?

Ned Batchelder 2:35 AM on 10 Jan 2017

@Nag: of course it's been tested on machine. You can test it yourself. There was plenty of experimentation to arrive at the final construction.

SarahToo 6:45 PM on 15 Sep 2017

I simply (naive) redefined eval(). Works for what I am up to -- there is no longer a there, there.

Ken Seehart 12:10 AM on 20 Sep 2017

Possible solution when:
- You have a small subset of the language, such as a calculator, maybe some numpy stuff, etc...
- You don't want to spend time implementing the language (because you are lazy and want it all in under 15 lines of code)

Use ast to validate, and if it passes, eval it.

Whitelist instead of blacklist (because blacklisting only works if you are comprehensive, and I don't trust myself to be comprehensive). If you accidentally omit something in a whitelist you lose functionality. If you accidentally omit something in a blacklist, you have a bigger problem. It's like trying to make a lock that specifically resists each skeleton key you know about.

You need two whitelists: an ast node whitelist and a name whitelist.

To save yourself a headache, generate your node whitelist like this:

  sample = 'Foo(a[b] + a * a / a // a ** a % a ^ a & a | a and (a != b or c == d))'
  node_whitelist = set([x.__class__.__name__ for x in ast.walk(ast.parse(sample))])
... that way if you forgot something, you just add it to your sample. Just add syntax that you actually need, and be mindful how that syntax could be used (in this example attribute access is not included, as that involves additional care).

Your symbol whitelist might come from a module or whatever. Probably best to do it by hand. Just make sure you are aware of the contents and that it doesn't have stuff like '__dict__' in there, and think whitelist, not blacklist.
  name_whitelist = ['sin', 'cos', 'pi', ...]
The validation test looks something like:
def safe_eval(s, context)
  nodes = set([x.__class__.__name__ for x in ast.walk(ast.parse(expr))])
  names = set([x.id for x in ast.walk(ast.parse(expr)) if x.__class__.__name__ == 'Name'])
  bad = (nodes - node_whitelist) | (names - name_whitelist )
  if bad:
    raise ValueError('Illegal use of {} in expression'.format(list(bad)[0]))

  return eval(s, context)
Note that if I had whitelisted attribute access (by explicitly by including "a.b" in sample expression), I'd need to implement an attribute whitelist to check against set([x.attr for x in ast.walk(ast.parse(expr)) if x.__class__.__name__ == 'Attribute']). But for this use case, attributes are not necessary, and security seems a little simpler without them.

This won't prevent DOS attacks with various kinds of extreme math. AFAIK, that can only effectively be dealt with at the process level. I'm pretty sure there is no lexical approach to preventing extreme math in a sufficiently useful tool. And I acknowledge that this post is incomplete without a discussion of Gödel's theorem. Too bad.

So, other that DOS, is this solid? Please let me know of any inevitable hacks!

Stefan Pochmann 8:01 PM on 22 Dec 2017

As a corollary, `input()` in Python 2 is also dangerous, probably even *more* dangerous (because it's probably much more widely used, especially by beginners, *and* because the eval-aspect isn't as obvious as it is when explicitly calling eval). But nobody seems to be talking about that. Odd.

Stefan Pochmann 8:11 PM on 22 Dec 2017

Hmm, on the other hand, I guess `input()` is typically only used where the user is the owner of the machine, not an adversary. While `eval()` might be more likely to get used on strings coming from other people.

Ken Seehart 10:39 PM on 22 Dec 2017

lahwran or any other skeptics, if you are following, I'm curious to see your rebuttal to my previous post.

Not impossible at all. Easy to implement safe-eval with correct thinking.

It only appears to be impossible because there are so many examples of failed attempts based on a flawed approach, which is to disallow (blacklist) dangerous strings. That approach can't work because the set of hacking strategies is impossible to enumerate.

Kevin 4:29 PM on 3 Feb 2020

"my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe."

Simply checking for "__" in the string doesn't seem to be sufficient. You can access the frame object of a generator expression without any double-underscores, and from there, access the globals of any frame higher up in the program. The only tricky part is that the complete frame object hierarchy is only accessible when the generator is executing, so you need a little self-referential trickery if you're in an `eval` and have no access to assignment. Proof of concept:

s = "q = (q.gi_frame.f_back.f_back.f_globals for _ in (1,)); builtins = [*q][0]['_' + '_builtins_' + '_']; builtins.print('Gotcha:', builtins.dir(builtins))"
exec(s, {'__builtins__': {}})
On my machine, this prints all the builtins that the attacker can now access, from `ArithmeticError` to `zip`.

Here's the same approach using `exec`, which is a little more readable since a self-referential generator expression is trivial if you can use an assignment statement:
s = "q = (q.gi_frame.f_back.f_back.f_globals for _ in (1,)); builtins = [*q][0]['_' + '_builtins_' + '_']; builtins.print('Gotcha:', builtins.dir(builtins))"
exec(s, {'__builtins__': {}})

Kevin 4:31 PM on 3 Feb 2020

Oops, typo in my previous message. The first code block was supposed to be:

s = "(lambda b: b.print('gotcha:', b.dir(b)))([*([x.append((x[0].gi_frame.f_back.f_back.f_globals for _ in (1,))) or x[0] for x in [[]]][0])][0]['_' + '_builtins_' + '_'])"
eval(s, {'__builtins__': {}})

Ken Seehart 4:32 AM on 4 Feb 2020

As I mentioned in my previous post, blacklisting (e.g. attempting to disallow double underscore) is futile. You have to whitelist instead (i.e. describe what is allowed, and keep that limited).

Cocoshark 9:52 PM on 29 Jun 2020

This article (and comments) is very informative! Thank you for sharing!

I have an idea of making safe_eval. Simply disallow the usage of dot "." and f-strings

I an thinking of using something like

def safe_exec(st): #oversimplified
    if ("." in st) or ("f\"" in st):
        raise ValueError
    return exec(st, {'__builtins__': {}})
This is indeed not too restrictive. Just forget about things like arr.append(ele), and use instead the purely functional programming paradigm (map, reduce, yield) + making use of lazy evaluation.

The reason this works is because ultimately you get __builtins__ from either the attributes of an object, or from a dictionary. Disallowing dot eliminates the first possibility. I don't think it is possible to directly get __builtins__ from dictionaries. If it is possible, we can statically replace the [] operator f[x] by get(f, x), and monitor the behavior of get() during run time

Cocoshark 10:19 PM on 29 Jun 2020

blacklisting f-stings is prolly unnecessary. I was thinking of f"{eval('()' + chr(46) + '__class__')}", but eval is not available without builtins

Aaron Meurer 11:05 PM on 29 Jun 2020

@Cocoshark, you are mostly OK if you disable attribute access and f-strings. You also have to make sure you disallow certain builtins like eval itself. Even so, there are still potential ways that people can do nasty things to your machine. For example, eval('10**10**100') will run until all the memory on the system is consumed and then crash. See https://stackoverflow.com/questions/35804961/python-eval-is-it-still-dangerous-if-i-disable-builtins-and-attribute-access

Cocoshark 1:28 AM on 30 Jun 2020

Ty for your reply!

An improvement to this approach is whitelisting certain attribute access.

whitelist = ["append", "split", "join", ...]

whitelist_pattern = ".(append|split|join)[^a-zA-Z0-9_]"

pattern = ".[a-zA-Z0-9_]+[^a-zA-Z0-9_]"

def safety_check(s):
return count-pattern-matches == count-whitelist-pattern-matches

Finally run it in a separate process and use cggroup to control its memory/time assumption...

It is a very cheap option (in terms of execution time, management overhead & $$$). Running it in a separate virtual machine requires substantially more computing resources and I simply can't afford it

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.