The curious case of shell commands, or how "this bug is required by POSIX"

by Ciprian Dorin Craciun (⁠ciprian.craciun@gmail.com⁠) on 

About the fatal perils and traps of many modern tools that handle "shell commands" as passed through `system(3)` or `sh -c`. Or, how by the end of 2020, we still haven't given up on shell's equivalent "SQL building", or how shell's equivalent "SQL injection" still thrives in our engineering world... Plus a `glibc` bug, then a Linux man pages bug, then a POSIX specification bug...

// permanent-link // hacker-news // index // RSS

Some context

(For those interested only in the glibc and POSIX issues, see at the end.)

I use Linux almost everywhere, from laptops and desktops to servers and routers, and over the course of many years I've written quite a few bash scripts that ease my interaction with all of these.

The usual "stack" that ties all of these together is composed of:

What do all of the former have in common? They all provide some sort of functionality, but by themselves they don't actually do anything; they are meant to delegate the actual work to other tools.

So what is the problem you ask? Nothing really... Everything is fine... No planes are falling from the sky... Nobody is running around screaming while on fire... Everything is business as usual, unless you want to write some wrapper scripts that takes arbitrary user input and delegates that to one of these, and many other, broken tools...

And just for completeness I'll throw in some other tools that fit the bill:

Root of all evil

As many of you have guessed, because we live in a UNIX world, many of these tools delegate the actual tool execution to the glibc system(3) function, which ends-up calling sh(1p) (most likely bash) with the -c argument. As opposed to doing the correct thing and just calling execve(2)...

BTW, this is not something Linux specific. Unfortunately it is a trait inherited from the UNIX ancestry by almost all operating systems, including all BSD variants plus OSX; and, for some tools, this behavior is even seen when running on Windows. But I digress...

Excerpts from Linux system(3) man page

Let's quote from the Linux system(3) man page:

The system(3) library function uses fork(2) to create a child process that executes the shell command specified in command using execl(3) as follows:

execl("/bin/sh", "sh", "-c", command, (char *) NULL);

[...]

Any user input that is employed as part of command should be carefully sanitized, to ensure that unexpected shell commands or command options are not executed. Such risks are especially grave when using system() from a privileged program.

Excerpts from POSIX system(3p) man page

Let's quote from the POSIX system(3p) man page:

Excerpts from POSIX sh(1p) man page

How about quoting from the POSIX sh(1p) man page regarding the -c flag:

-c -- Read commands from the command_string operand. Set the value of special parameter 0 (see Section 2.5.2, Special Parameters) from the value of the command_name operand and the positional parameters ($1, $2, and so on) in sequence from the remaining argument operands. No commands shall be read from the standard input.

Excerpts from bash(1) man page

Perhaps we'll have more luck quoting from the bash(1) man page:

-c -- If the -c option is present, then commands are read from the first non-option argument command_string. If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters. The assignment to $0 sets the name of the shell, which is used in warning and error messages.

Partial conclusion

For now let's remember that initial remark:

Any user input [...] part of command should be [...] sanitized [...].

Aside from that, perhaps this "shell command" business I'm crying about isn't such a big of an issue? Else the POSIX manuals for system(3p) (~2K words) and sh(1p) (~8K words) -- that I assume were written by legions of committee members -- would have certainly spared at least a few words about this issue... Right?

Unintended consequences

So why do we care about system(3) and sh -c?

Because as said in the beginning, many tools accept as arguments commands which in the end are passed to sh -c via the system(3) function.

Therefore when we write our scripts and tools we need to be aware of this situation and be prepared to escape and quote our commands and arguments accordingly, else we'll be subject to shell injections... (For more scarry stuff one can read about shellshock.)

A harmless example...

For example say we want to write a script that takes exactly two arguments, a command and a single argument (that we store in the _command and _argument variables):

if test "${#}" -ne 2 ; then
    printf -- '[ee] invalid arguments!\n' >&2
fi
_command="${1}"
_argument="${2}"
shift -- 2

And after some initial setup we want pass our command to a tool such as watch, hyperfine, ssh and many others, as in tool "command argument".

Thus one would quickly write the following:

tool "${_command} ${_argument}"

(Granted watch does have the -x flag that solves this issue, however it is not the default; moreover ssh, hyperfine and many other tools don't have this option.)

... with interesting failure modes

If one has written the delegation as above, then one would be in a bit of surprise, because trying to use a command or argument that:

Thus one asks himself, what is the right solution?

Proper solution

The proper solution would be dropping that broken tool immediately, securely erasing it from your hard-drive, then running and screaming that tool's name out-loud in shame... (Something akin to Game of Throne's walk of atonement...)

I'm not kidding... This kind of broken tools are the cause of many stupid bugs, ranging from the funny ups-rm-with-spaces (i.e. rm -Rf / some folder with spaces /some-file), to serious security issues like the formerly mentioned shellshock...

So, you say someone holds you at gun point, thus you must use that tool? Check if the broken tool doesn't have a flag that disables calling sh -c, and instead properly executes the given command and arguments directly via execve(2). (For example watch has the -x flag as mentioned.)

Alternatively, given that most likely the tool in question is an open-source project written by someone in his spare time, perhaps open a feature request describing the issue, and if possible contribute with a patch that solves it.

Still no luck? Make some popcorn and prepare for the latest block-buster "convoluted solutions for simple problems in UNIX town"...

Convoluted solutions

However if we are forced to use that broken tool, we'll need to apply the following countermeasures...

Yes, I've written "countermeasures", because now we have to think like an attacker and see how we can break our own script, then with each attempt we'll build layers upon layers of countermeasures. And if our eyes start hurting, it might be because of the onion we've created with all these convoluted layers...

Please note that none of the following are the correct solution. Instead they are just steps in building towards the correct solution.

Also, please note that most likely I've gotten half of it wrong!

Convoluted step 1 -- quoting the command and arguments

A first step is to use the bash specific @Q expansion modifier that properly quotes a string according to the sh rules:

broken-tool "${_command@Q} ${_argument@Q}"

In case we have multiple arguments, as in _arguments=( "${@}" ), then one could use the following:

broken-tool "${_command@Q} ${_arguments[*]@Q}"

In case we don't need the extra variables, we could simply just write:

broken-tool "${*@Q}"

Convoluted step 2 -- using exec before the command

What would happen if our command happens to be named just like a bash built-in, like for example kill or time which are not compatible with our arguments? Did you know that there are executables, in the default distribution, that are named if, echo, printf and even test or [? (Just try it out with type -P if.)

In order to make sure that we are actually executing the command as available from the $PATH environment variable, we should prefix it with an exec:

broken-tool "exec ${_command@Q} ${_arguments[*]@Q}"

Convoluted step 3 -- using -- after exec

What would happen if our command happens to start with a hyphen, like for example -strange-command?

In order to make sure that the command isn't by mistake interpreted as a flag to exec itself, we should also add an -- between exec and the command:

broken-tool "exec -- ${_command@Q} ${_arguments[*]@Q}"

Please note that not all sh implementations actually support the exec -- variant.

Final convoluted solution

broken-tool "exec -- ${_command@Q} ${_arguments[*]@Q}"

In fact many tools accept themselves the -- argument to denote the end of all "options" and the beginning of "arguments", thus my favorite final variant would be:

broken-tool -- "exec -- ${_command@Q} ${_arguments[*]@Q}"

The road paved with good intentions

So now you ask yourself, which are the tools that should be treated with such great care?

Case study -- watch(1) man page

Let's look at the watch(1) manual:

Pass command to exec(2) instead of sh -c which reduces the need to use extra quoting to get the desired effect.

Case study -- ssh(1) man page

Let's look at the ssh(1) manual:

However by trying various experiments, we guess it might not be the case?

Spoiler: ssh is one of the tools that not only should be handled with great care, they even lack a flag that would disable this dangerous behavior...

Case study -- i3 and i3-msg

Let's look at the i3 and i3-msg manuals; from i3-msg we have:

i3-msg [-q] [-v] [-h] [-s socket] [-t type] [message]

command -- The payload of the message is a command for i3 (like the commands you can bind to keys in the configuration file) and will be executed directly after receiving it.

From i3 we have:

exec [--no-startup-id] <command>

The exec command starts an application by passing the command you specify to a shell.

The common thread...

All of these tools behave in a misleading way:

Some experiments...

Let's experiment for example with printf (not the bash builtin, but the executable, thus the full path used): /usr/bin/printf .%q. argument-1 argument-2 would print out its arguments, quoted according to sh syntax, and joined by dots, as in:

> /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'
.'1 2'..'; a'..'b ;'..'c ; d'.

> /usr/bin/printf .%q. "'1 2' '; a' 'b ;' 'c ; d'"
.''\''1 2'\'' '\''; a'\'' '\''b ;'\'' '\''c ; d'\'''.

> /usr/bin/printf .%q. '"1 2 ' ' 3 4"'
.'"1 2 '..' 3 4"'.

Now let's try that through ssh:

> ssh user@remote "/usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'"
.'1 2'..'; a'..'b ;'..'c ; d'.
> ssh user@remote /usr/bin/printf .%q. "'1 2' '; a' 'b ;' 'c ; d'"
.'1 2'..'; a'..'b ;'..'c ; d'.
> ssh user@remote /usr/bin/printf .%q. '"1 2 ' ' 3 4"'
.'1 2   3 4'.
> ssh user@remote /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'
.1..2.
bash: a: command not found
bash: c: command not found
bash: d: command not found

Finally let's try that through i3-msg:

> i3-msg -t command -- exec urxvt -hold -e /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'
ERROR: Your command: exec urxvt -hold -e /usr/bin/printf .%q. 1 2 ; a b ; c ; d
ERROR:                                                              ^^^^^^^^^^^
> i3-msg -t command -- exec "urxvt -hold -e /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'"
ERROR: Your command: exec urxvt -hold -e /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'
ERROR:                                                                 ^^^^^^^^^^^^^^^^
> i3-msg -t command -- "exec \"urxvt -hold -e /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'\""
# the expected outcome
> i3-msg -t command -- "exec \"exec -- urxvt -hold -e /usr/bin/printf .%q. '1 2' '; a' 'b ;' 'c ; d'\""
# the expected outcome

Lessons learned

First of all we should stop using the any tool or library that uses system(3), sh -c, or equivalent.

If we can't replace those tools or libraries, then we should be careful and:

Second, we should create a "wall of shame" for those important tools that provide no alternative. Among these, at the top of the list in large-bold-red font I would definitively include OpenSSH, which given it's high profile, is perhaps responsible for countless hidden bugs that have their root cause in the way commands are handled.

In the end I acknowledge that that the "technical" fault is not with these tools but with their users. But in the end their developers should know better, and try to be as safe as possible...


Wall of shame

Wall of fame

Unlike the "system" library call from C and other languages, the os/exec package intentionally does not invoke the system shell and does not expand any glob patterns or handle other expansions, pipelines, or redirections typically done by shells. The package behaves more like C's "exec" family of functions.

args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names). If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.


The 1000 bonus points bug

Prologue for the 1000 bonus points bug

So I don't know how to say this... But I think I'm going mad... Although I'm a careful bash "programmer" and I've used bash -c 'some-script' countless times, I would have never guessed the following situation...

Let's look more closely once more at the Linux system(3) man page:

execl("/bin/sh", "sh", "-c", command, (char *) NULL);

Then let's look more closely once more at the bash(1) man page:

-c -- If the -c option is present, then commands are read from the first non-option argument command_string. [...]

Now let's suppose one has a tool named -x.

Why is it called -x? Because it is not a forbidden file name by the file-system, and for that matter by any existing standards. Moreover attackers are not "nice people" and if they can, most likely they will create such an executable...

Now let's suppose we want to call that tool from Python without any arguments:

import os
os.system("-x")

Also let's supposed we want to call that tool with two arguments:

import os
os.system("-x a b")

Surprise for the 1000 bonus points bug

Care to guess the outcome of those previous two snippets?

No, let's first strace that:

> strace -e execve -f -- python2 -c 'import os; os.system("-x")'
[...]
execve("/bin/sh", ["sh", "-c", "-x"], ...) = 0
[...]

> strace -e execve -f -- python2 -c 'import os; os.system("-x a b")'
[...]
execve("/bin/sh", ["sh", "-c", "-x a b"], ...) = 0
[...]

Now can you guess the outcome?

No, let's just try it out (I've put both the Python and the plain sh -c invocations):

> python2 -c 'import os; os.system("-x")'
> sh -c -x
sh: -c: option requires an argument
> python2 -c 'import os; os.system("-x a b")'
> sh -c '-x a b'
sh: - : invalid option
[...]

(!!! Lots of censored swear words... !!!)

Reasons for the 1000 bonus points bug

What (more censored swear words) happened?

Well -c uses as argument not the immediately following argument, but instead the first non-option argument. In this case there is no non-option argument because our -x or -x a b looks like an option argument...

What would be the proper solution?

Well the system(3) implementation should actually be:

execl("/bin/sh", "sh", "-c", "--", command, (char *) NULL);

Epilogue for the 1000 bonus points bug

What have I done about the problem?

As a good open-source citizen, I've opened the following bug reports (one lead to another):

So in the end, let me quote one of the glibc developers (the emphasis is mine):

Unfortunately, this bug is required by POSIX, which requires passing the string as an argument to the -c option of the shell.

I'm tired... I'm going to sleep... I need a saner job...