Wednesday 6 June 2012 — This is over 12 years old, but it's still good.
Python has an eval() function which evaluates a string of Python code:
assert eval("2 + 3 * len('hello')") == 17
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer. (In the examples that follow, I’ll use ‘clear’ instead of ‘rm -rf /’ to prevent accidental foot-shootings.)
Some have claimed that you can make eval safe by providing it with no globals. eval() takes a second argument which are the global values to use during the evaluation. If you don’t provide a globals dictionary, then eval uses the current globals, which is why “os” might be available. If you provide an empty dictionary, then there are no globals. This now raises a NameError, “name ‘os’ is not defined”:
eval("os.system('clear')", {})
But we can still import modules and use them, with the builtin function __import__. This succeeds:
eval("__import__('os').system('clear')", {})
The next attempt to make things safe is to refuse access to the builtins. The reason names like __import__ and open are available to you in Python 2 is because they are in the __builtins__ global. We can explicitly specify that there are no builtins by defining that name as an empty dictionary in our globals. Now this raises a NameError:
eval("__import__('os').system('clear')", {"__builtins__": {}})
Are we safe now? Some say yes, but they are wrong. As a demonstration, running this in CPython will segfault your interpreter:
bomb = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
# 2.7: 0,0,0,0,"BOOM",(),(),(),"","",0,""
# 3.5-3.7: 0,0,0,0,0,b"BOOM",(),(),(),"","",0,b""
# 3.8-3.10: 0,0,0,0,0,0,b"BOOM",(),(),(),"","",0,b""
# 3.11: 0,0,0,0,0,0,b"BOOM",(),(),(),"","","",0,b"",b"",(),()
),{}
)()
)()
"""
eval(bomb, {"__builtins__": {}})
The middle “BOOM” line needs to change depending on the version of Python. Uncomment the right one to see the crash.
Let’s unpack this beast and see what’s going on. At the center we find this:
().__class__.__bases__[0]
which is a fancy way of saying “object”. The first base class of a tuple is “object”. Remember, we can’t simply say “object”, since we have no builtins. But we can create objects with literal syntax, and then use attributes from there.
Once we have object, we can get the list of all the subclasses of object:
().__class__.__bases__[0].__subclasses__()
or in other words, a list of all the classes that have been instantiated to this point in the program. We’ll come back to this at the end. If we shorthand this as ALL_CLASSES, then this is a list comprehension that examines all the classes to find one named n:
[c for c in ALL_CLASSES if c.__name__ == n][0]
We’ll use this to find classes by name, and because we need to use it twice, we’ll create a function for it:
lambda n: [c for c in ALL_CLASSES if c.__name__ == n][0]
But we’re in an eval, so we can’t use the def statement, or the assignment statement to give this function a name. But default arguments to a function are also a form of assignment, and lambdas can have default arguments. So we put the rest of our code in a lambda function to get the use of the default arguments as an assignment:
(lambda fc=(
lambda n: [
c for c in ALL_CLASSES if c.__name__ == n
][0]
):
# code goes here...
)()
Now that we have our “find class” function fc, what will we do with it? We can make a code object! It isn’t easy, you need to provide 12 arguments to the constructor, but most can be given simple default values.
fc("code")(0,0,0,0,"BOOM",(),(),(),"","",0,"")
The string “BOOM” is the actual bytecodes to use in the code object, and as you can probably guess, “BOOM” is not a valid sequence of bytecodes. Actually, any one of these bytecodes would be enough, they are all binary operators that will try to operate on an empty operand stack, which will segfault CPython. “BOOM” is just more fun, thanks to lvh for it.
This gives us a code object: fc(“code”) finds the class “code” for us, and then we invoke it with the 12 arguments. You can’t invoke a code object directly, but you can create a function with one:
fc("function")(CODE_OBJECT, {})
And of course, once you have a function, you can call it, which will run the code in its code object. In this case, that will execute our bogus bytecodes, which will segfault the CPython interpreter. Here’s the dangerous string again, in more compact form:
(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__()
if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"BOOM",(),
(),(),"","",0,""),{})())()
So eval is not safe, even if you remove all the globals and the builtins!
We used the list of all subclasses of object here to make a code object and a function. You can of course find other classes and use them. Which classes you can find depends on where the eval() call actually is. In a real program, there will be many classes already created by the time the eval() happens, and all of them will be in our list of ALL_CLASSES. As an example:
s = """
[
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == "Quitter"
][0](0)()
"""
The standard site module defines a class called Quitter, it’s what the name “quit” is bound to, so that you can type quit() at the interactive prompt to exit the interpreter. So in eval we simply find Quitter, instantiate it, and call it. This string cleanly exits the Python interpreter.
Of course, in a real system, there will be all sorts of powerful classes lying around that an eval’ed string could instantiate and invoke. There’s no end to the havoc that could be caused.
The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there’s just one item left off the list, you can attack the system.
While I was poking around on this topic, I stumbled on Python’s restricted evaluation mode, which seems to be an attempt to plug some of these holes. Here we try to access the code object for a lambda, and find we aren’t allowed to:
>>> eval("(lambda:0).func_code", {'__builtins__':{}})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
RuntimeError: function attributes not accessible in restricted mode
Restricted mode is an explicit attempt to blacklist certain “dangerous” attribute access. It’s specifically triggered when executing code if your builtins are not the official builtins. There’s a much more detailed explanation and links to other discussion on this topic on Tav’s blog. As we’ve seen, the existing restricted mode it isn’t enough to prevent mischief.
So, can eval be made safe? Hard to say. At this point, my best guess is that you can’t do any harm if you can’t use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...
Update: from a thread on Reddit about recovering cleared globals, a similar snippet that will get you the original builtins:
[
c for c in ().__class__.__base__.__subclasses__()
if c.__name__ == 'catch_warnings'
][0]()._module.__builtins__
Comments
I used it in a chatbot once to allow people to eval simple python statements: https://github.com/llimllib/pyphage/blob/master/plugins/eval.py
While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ), I let some pretty serious python hackers have a go at it and they weren't able to break it.
I wouldn't trust it on a for-real server, obviously, but it was good enough for the task I set out to accomplish in that scenario.
The article seems to be missing a basic stipulation, perhaps "running untrusted code is dangerous". You seem to be trying to prove that no matter what you do, you can't fully isolate and sanitize untrusted code. OK, I buy that. But I think the article needs a liberal sprinkling of "... for untrusted code/input/source".
eval() and exec() are perfectly safe for trusted code, useful and even necessary on occasion (e.g. dynamic programming). Just never *ever* use them with untrusted input.
Incidentally, Ned himself did some work on it to speed up the startup a few months ago. That's still the main area that needs work, too.
Also, using eval to be "dynamic" is icky.
You could always concatenate single underscores. But if you prevent all direct accesses to dunder-methods and dunder-attributes (__subclasses__ in this case, and the indirect access via __getattribute__) this threat should be mostly mitigated. Far's I can tell, this means walking the bytecode and forbidding all LOAD_ATTR to a dunder method or attribute.
> Sadly, eval can also evaluate... eval.
Eval's a builtin, if you remove all builtins you can't eval eval:
Thought about it for a bit longer that time, you don't actually need to walk the bytecode, just look in co_names and forbid any dunder name in it (this will catch access to user-created ones as well, but I'm not sure there's any reason to care).
https://github.com/haypo/pysandbox
Bill Mill already mentionned pysandbox.
"While it was vulnerable to DOS (the timeout function did not work properly on memory intensive functions: https://github.com/haypo/pysandbox/issues/10 ),"
I'm working on a new version (1.6) which runs untrusted code in a subprocess. I reintroduced the timeout option, it is now safe (and implemented completly differently).
Here is my modified version: http://pastebin.mozilla.org/1657291
About sandboxes, @lahwran is right, we've worked on the PyPy sandbox, though this investigation was not to try to fix eval, but instead to find a convincing example for people who thing eval can be fixed for untrusted input.
Everyone else, thanks for the interesting pointers and suggestions.
The philosophy for safe eval'ing should be like SQL prepare : using place holders with validations, or ... being imaginative :
eval("dispatch_table[%s](*%s,**%s)") % func, loads(table_as_json),loads( dict_as_json)
If we do so, the advantage are :
- with a dispatch table we can limit the safe available functions, and if dispatch keys are easy ([a-zA-Z]{8,12}), then, escaping seems very hard ;
- if we use named argument serialized in json it would be easier to avoid active charge in the arguments (only «passive data» are json'able).
ofc it seems very reductive (it looks like we cannot build complex functions trees), unless you notice that if the dispatch table is global, then, dispatch table keys can also be passed as arguments, with arguments.
Okay, it looks like LISP. But all safe parsing problem seems to point to a LISP'like solution.
If you want serious, hardcore foreign-code safety, you'll need something like the pypy sandbox or an OS-level thing like virtualization. and even that is vulnerable to some things, apparently people are stealing encryption keys from neighboring VMs now.
I currently use the following cleaner (don't worry, it's not a serious project): Every single exploit I have seen have used [ or . or *. I just want to evaluate boolean literals in a simple way. ast.literal_eval is not really evaluating, just parsing. I have yet to see a non-DoS exploit using the character set above...
I'll PayPal € 5 to the first person that shows me a non-DoS exploit in under 100 characters that survives the filter above.
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((1))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
s_push: parser stack overflow
Traceback (most recent call last):
File "", line 1, in
MemoryError
It's not a crash, but a funny error, and the code matchs your regex ;)
I've translated your post into russian, you can find it here: http://habrahabr.ru/post/221937/
Is it ok for you? Sorry for not asking in advance:(
Using ast.parse and then searching for any system attribute invocation you can detect a problematic code and block it from being executed. One also has to block the exec statement though, here an example:
I wanted to use eval in my code, but I found the drawbacks of eval, so I found my own way, but I wanted to check can we break this code. Here I don't want arithmetic operations.
small change =>
print(eval(
jsn['logic18'].replace('().', 'hacker()').replace('__class__', 'hacker() ').replace('lambda', 'hacker()').replace('open', 'hacker()').replace(
'**', 'hacker()').replace('/', 'hacker()').replace('%', 'hacker()').replace('*', 'hacker()').replace(
'+','hacker()').replace('-', 'hacker()'), {'__builtins__': {}},
{'hacker': hack}))
Abandon hope, all ye who eval foreign code.
- make arbitrary functions
- import any module I like
- call any function that is available outside the sandbox - any function, not just built in ones
- replace any code outside the sandbox
- crash cpython in one of a variety of ways
you can't do it. if you deploy this, someone will break into it.
Here's some code you can use that actually-safely does an eval with foreign code:
Has this been tested on a machine?
Or is this from a purely 'mathematical'(?) standpoint?
Did you arrive at this based on your knowledge of the behaviour of python functions?
- You have a small subset of the language, such as a calculator, maybe some numpy stuff, etc...
- You don't want to spend time implementing the language (because you are lazy and want it all in under 15 lines of code)
Use ast to validate, and if it passes, eval it.
Whitelist instead of blacklist (because blacklisting only works if you are comprehensive, and I don't trust myself to be comprehensive). If you accidentally omit something in a whitelist you lose functionality. If you accidentally omit something in a blacklist, you have a bigger problem. It's like trying to make a lock that specifically resists each skeleton key you know about.
You need two whitelists: an ast node whitelist and a name whitelist.
To save yourself a headache, generate your node whitelist like this: ... that way if you forgot something, you just add it to your sample. Just add syntax that you actually need, and be mindful how that syntax could be used (in this example attribute access is not included, as that involves additional care).
Your symbol whitelist might come from a module or whatever. Probably best to do it by hand. Just make sure you are aware of the contents and that it doesn't have stuff like '__dict__' in there, and think whitelist, not blacklist. The validation test looks something like: Note that if I had whitelisted attribute access (by explicitly by including "a.b" in sample expression), I'd need to implement an attribute whitelist to check against set([x.attr for x in ast.walk(ast.parse(expr)) if x.__class__.__name__ == 'Attribute']). But for this use case, attributes are not necessary, and security seems a little simpler without them.
This won't prevent DOS attacks with various kinds of extreme math. AFAIK, that can only effectively be dealt with at the process level. I'm pretty sure there is no lexical approach to preventing extreme math in a sufficiently useful tool. And I acknowledge that this post is incomplete without a discussion of Gödel's theorem. Too bad.
So, other that DOS, is this solid? Please let me know of any inevitable hacks!
Not impossible at all. Easy to implement safe-eval with correct thinking.
It only appears to be impossible because there are so many examples of failed attempts based on a flawed approach, which is to disallow (blacklist) dangerous strings. That approach can't work because the set of hacking strategies is impossible to enumerate.
Simply checking for "__" in the string doesn't seem to be sufficient. You can access the frame object of a generator expression without any double-underscores, and from there, access the globals of any frame higher up in the program. The only tricky part is that the complete frame object hierarchy is only accessible when the generator is executing, so you need a little self-referential trickery if you're in an `eval` and have no access to assignment. Proof of concept: On my machine, this prints all the builtins that the attacker can now access, from `ArithmeticError` to `zip`.
Here's the same approach using `exec`, which is a little more readable since a self-referential generator expression is trivial if you can use an assignment statement:
I have an idea of making safe_eval. Simply disallow the usage of dot "." and f-strings
I an thinking of using something like
This is indeed not too restrictive. Just forget about things like arr.append(ele), and use instead the purely functional programming paradigm (map, reduce, yield) + making use of lazy evaluation.
The reason this works is because ultimately you get __builtins__ from either the attributes of an object, or from a dictionary. Disallowing dot eliminates the first possibility. I don't think it is possible to directly get __builtins__ from dictionaries. If it is possible, we can statically replace the [] operator f[x] by get(f, x), and monitor the behavior of get() during run time
An improvement to this approach is whitelisting certain attribute access.
whitelist = ["append", "split", "join", ...]
whitelist_pattern = ".(append|split|join)[^a-zA-Z0-9_]"
pattern = ".[a-zA-Z0-9_]+[^a-zA-Z0-9_]"
def safety_check(s):
return count-pattern-matches == count-whitelist-pattern-matches
Finally run it in a separate process and use cggroup to control its memory/time assumption...
It is a very cheap option (in terms of execution time, management overhead & $$$). Running it in a separate virtual machine requires substantially more computing resources and I simply can't afford it
input_string = input_string.replace("("," ("))
result = eval(input_string,{"__builtins__":{}},{})
(That is, put a space in front of all the parenthesis.)
while allow rules (as opposed to deny rules) would be the right approach in the abstract, every single thing you allow is a potential security hole - at some point of few exposed functions you may be able to reduce it to only as risky as accepting any other untrusted input, and my original recommendation was to make a compiler or interpreter that you can guarantee has safe output; in some sense that's effectively what you're doing. however, the problem is that python has many normally-hidden operations that allow you to cause unusual changes in behavior, and even if you get your allow rules to apply to the code without any correctness issues, you still end up having to maintain a mental blacklist of functions to not expose.
unfortunately, the recommendation to use the pypy sandbox has to be rescinded simply because that project didn't get maintenance momentum. if you must do remote code execution as part of your normal functionality, and you cannot simply implement a hyper minimal safe language in rust or something, i would very strongly recommend thorough defense in depth. run it in an unprivileged docker container in a vm on a host that you can erase easily and does not have any authorizations to anything that is a risk if compromised.
btw how do i unsubscribe from this, ned ?
I liked @David Goodger 10:26 AM on 6 Jun 2012 comment about untrusted text and your response but with mdns https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval#never_use_eval! they seem to be saying that the string can never be trusted.
As if everyone has stuff running on their browser waiting for an errant eval as the only way they can do bad stuff "as the user "(on the browser). Surely this is not true and really there are only 3 ways things can go wrong:
* your eval gets a string from the user and does some the the user could do any way
* the users browser is completely compromised and bad-guy can do everything any way.
* your code is going out of its way to get compromised text and then evaling it
And if you are not doing the last there is no problem
On the server that is totally right.
On the browser I am unsure if "using eval can perform many actions that the user would not normally be able to do" is correct. I would love an example.
It might be true that eval allows users to run code that might be more complicated to set up and run from the console as it gives access to context more easily. However, as browsers have debuggers and break points context can be got that way.
The only thing I am left with is that eval might make it easier to find that context than it is with breakpoints.
From: https://security.stackexchange.com/questions/94017/what-are-the-security-issues-with-eval-in-javascript
@StackTracer, a malicious link does not necessarily mean a link to a malicious website. In this context I believe mcgyver5 was referring to something like yourwebsite.com/?someParam=DoMaliciousThings where somewhere on your page you have a statement like eval(someParam) (highly simplified). That would be considered a reflected XSS attack since I could send you a link like that and when you open it your browser would execute my malicious javascript through the eval. Something like that looks more reasonable when you consider it could be combind with url encoding or shortened urls. – S.C. Jul 15 '15
together they mean:
who can be bothered securing either
1 the data you accept from Alice or
2 the shape of the the text you run in eval on Bobs browser
just to stop an eval you run on Bobs browser containing malicious code that you accepted from Alice and used to create the text you are evaling when you can just use Function (I added this last bit myself)
the arguments against that are :
* you always have to sanitise the text you get from Alice and store in a DB, and
* basically it is really easy to protect eval just like you protect from sql injection build it where the unknown text is parameters that have to be only alphanumeric.
With sql injection if you stuff up then YOUR DB is screwed but you have backups
With eval "injection" if you stuff up YOUR USERS are screwed and they don't always have backups and sometimes cant have backups. For example they cant have a backup of the money in their bank account that you might have helped compromise
yep that was what I was focussed on.
Sorry if I missed the point.
My main reasons were (are now):
* eval in the backed is pretty much the same as sql in the backend for which there is a simple accepted solution
* JS in the backend is not something I would ....(though I might use python)
* eval in the browser is what most discussions on this discuss
* ARRG - I saw this page as what I wanted and failed to let the the first line "Python has an eval() function" sink in - sorry again
though I did work out my thoughts on MY PROBELM :)
Try:
Plus we can spawn it in a seperate process so crashes can be detected and handled.Can it damage the system with THESE precautions?Or can it at most crash the process?
@SomePythonGuy: you should try your code! After correcting a few mistakes (you need exec, not eval; you can’t import sys if you don’t have
__import__
in your builtins), it will still crash your interpreter.The example crasher given in the blog post doesn’t use any builtin names, and doesn’t import anything. Removing builtins and modules won’t protect you.
BTW, for recent Python versions the arguments to “code” are now:
I don’t understand how @Kevin’s expression works. It seems to work only if there is nothing (or spaces) at the position of “@” below:
If I replace “@” with a list or tuple, the expression will give its first element. Is this a bug in the Python interpreter? I tested it with Python 3.6.9.
Is there a clearer way to access a generator “while it is executing” as Kevin described, if that is the purpose of the strange expression? I imagine there may be links to all frames in the stack while a generator is executing to deal with exception catching. But if the mechanism in this example is an exfiltration of the interpreter’s internal data out to the scipt that has no utility other than as a hack, then I think it is a security issue.
I will answer my own question.
The key to the trick is that a generator expression binds to its closure like a function. So it can “yield” itself. This can be demonstrated simply by:
The rest of the complications in Kevin’s example were to execute everything in a single “eval()”, so that the content fit exactly the original subject.
@lahwran “… that might maybe actually be good enough with enough effort maintaining the allow list, but i don’t feel up to doing a full audit of every single operation you used. the fundamental problem with eval is unintuitively high surface area combined with python’s complete lack of privacy and high runtime mutability.”
My solution already solves your objections.
Very little effort is needed for the allow list. It’s just the set of tokens sufficient to implement a subset of python consisting of simple function calls and operators, which should be sufficient for the kind of application where this might be useful (e.g. interactive interpreters, spreadsheet apps or equivalent, etc.).
Names would be implicitly whitelists by being included in a scope, so adding features is just a matter of adding names to the scope on the backend.
“… a full audit of every single operation …": The audit has negligible cost in the kinds of applications for which eval() might be suitable.
“fundamental problem with eval is unintuitively high surface area": Huh? That’s solved by limiting the surface area. My solution does that.
See post on 20 Sep 2017. It is a correct solution without any vulnerabilities.
Well, of course you can still do things like bytes([95]*2).decode() to get a double underscore, so if there is globals() or getattr(), you can get hold of the respective members.
And in any case, you’re always vulnerable to denial of service.
eval() is fine for a non-malicious, local end user, but it simply must not be exposed to untrusted input (e.g. on a web server).
Add a comment: