|Ned Batchelder : Blog | Code | Text | Site|
Shell = Maybe
» Home : Text
Created 11 April 2017
Lots of people use Python to run other programs. Sometimes this is because they are using Python to coordinate other processes. Sometimes, it's because they are coming from a shell scripting world, and running other programs to get work done feels most natural.
If you are trying to run other programs (spawn subprocesses) in Python, the first thing to do is make sure you need to. Lots of things that are done with programs in a shell script are done more naturally with Python libraries. As an example, there's no need to use "ls" to list the files in a directory when you have os.listdir().
Once you decide you are going to create subprocesses, there's a common question to work through: whether to use a shell or not. To answer that, let's talk about shells.
When you type a command in your terminal, you are not typing to "the computer." You are typing to a program, called a shell. The shell's job is to interpret the command line you type, and actually do what it says. A shell is a program that is very good at running other programs.
Programs don't get the command line you type. Programs get a list of strings. This isn't a Python thing, this is the way Unix works, and the way Windows generally mimics Unix. It's the shell's job to turn the line of text you type into a list of strings.
At a very very crude level, the shell takes the line you type, and turns it into a list of strings. It uses the first string to find the program to run. Then it runs the program, giving it the list of strings as arguments.
Note about Windows: at its deepest native level, Windows is different than this. Programs get a single string, the original command line. But because of the C language's close cultural ties to Unix, C programs on Windows get a list of strings, and other languages do the same. There are some differences between Windows and Unix still, but the big picture is the same.
How does the shell turn a command line into a list of strings? For the simplest cases, it just splits the line on spaces. So this command:
is turned into:
If you want to experiment with this conversion of command lines into lists of strings, put this short Python program into echo.py:
Now you can try it yourself:
But what if you need an argument to have a space? If you want to search a file for "red apple", you need a command like:
Just splitting this on spaces would give four strings, which isn't right. The shell sees the quotes and understands that the quoted string should be kept together as a single argument. The resulting list is:
Notice that the double-quotes themselves are not in the argument.
There are other ways to protect spaces. This command could have been typed any of these ways:
The grep program literally can't tell the difference between these three lines, because the shell produces the same argument list for all of them.
Shells do much more than just split the line into an argument list. As we've just seen, they also deal with quoting and escaping special characters. But there's much more. When you use a wildcard pattern to do something with many files, it's the shell that expands that pattern into a list of actual files. This command:
could be turned into this argument list:
There are other more-advanced features of command line programs that are actually features of the shell:
The Python subprocess module has a few different functions and classes you can use to run a subprocess. One thing they all have in common: you have to tell it what program to run and what arguments to give the program. There are two ways to do this, and it all comes down to shells.
The more familiar way to run a program is with a shell:
(Note: subprocess has a number of functions. I'll use check_output because it is conceptually simple, but the shell considerations I'm discussing apply equally well to run, call, check_call, Popen, and so on.)
(Also note: this is one of those commands you shouldn't use a sub-process for. Listing files is easy to do in other ways. But it's nice and short for examples. We'll get to more realistic examples in a bit.)
When you specify shell=True, the program and arguments are provided in a single string. The shell is started, and given that string as the command line to execute. This gives a very familiar interface to running programs: it's exactly what we are used to from the command line. The shell parses the command line it's given, and invokes the program.
The other way to run the program is with no shell, which is the default:
Here we're running the program without the help of a shell, so we provide the program arguments as an explicit list of strings.
If you're wondering whether to use a shell with subprocess, the answer is simple: only use one if you have to. You should use a shell if you need some of its behavior, and otherwise avoid using a shell. Most of the time, you don't need a shell.
There are good reasons to avoid using a shell:
Here's an example of that last point. Suppose you want to split a video into a series of images. Ffmpeg is a powerful video tool that can do that, with a command like this:
But you want to get the video file name from the user. You might do this to insert the user's filename into the command, and then run it:
This works fine, but suppose the user gave you this file name: "; rm -rf * ; " Now the constructed command line would be:
Running this would delete a lot of files, which is definitely not what you wanted. The user has maliciously injected shell content where you didn't want it.
This is the risk of using the shell: it can do much much more than you intended it to.
If you have a command line in mind, and you want to turn it into some Python code that runs the program the same way, you have to think like the shell. When the shell runs your command line, what list of string arguments does it produce? If you have a tricky case, it can help to use the echo.py program above to experiment with your command line.
Once you understand how the shell works, and what it is doing for you, you can decide whether you want to keep the shell in the mix (carefully), or skip the shell, and do that work yourself. Often, all the shell does is split your command into words, something you can do easily yourself.
If you are using shell features like wildcards or pipes, it becomes trickier to replace the shell with your own code. But Python provides all the tools you need:
Keep in mind that many simple commands can be avoided altogether in favor of Python libraries. For example, there's no reason to run "date +%Y%m%d" to get the current date. You can get it from datetime.now.
There are a number of libraries to help with complex scenarios, though I have no experience with any of them, so I don't know which to recommend! If you are running complex pipelines of commands, it will be easier to use a shell to do it. Just be very very careful.