387 lines
12 KiB
HTML
387 lines
12 KiB
HTML
<HTML
|
||
><HEAD
|
||
><TITLE
|
||
>Handle Metacharacters</TITLE
|
||
><META
|
||
NAME="GENERATOR"
|
||
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
|
||
REL="HOME"
|
||
TITLE="Secure Programming for Linux and Unix HOWTO"
|
||
HREF="index.html"><LINK
|
||
REL="UP"
|
||
TITLE="Carefully Call Out to Other Resources"
|
||
HREF="call-out.html"><LINK
|
||
REL="PREVIOUS"
|
||
TITLE="Limit Call-outs to Valid Values"
|
||
HREF="limit-call-outs.html"><LINK
|
||
REL="NEXT"
|
||
TITLE="Call Only Interfaces Intended for Programmers"
|
||
HREF="call-intentional-apis.html"></HEAD
|
||
><BODY
|
||
CLASS="SECT1"
|
||
BGCOLOR="#FFFFFF"
|
||
TEXT="#000000"
|
||
LINK="#0000FF"
|
||
VLINK="#840084"
|
||
ALINK="#0000FF"
|
||
><DIV
|
||
CLASS="NAVHEADER"
|
||
><TABLE
|
||
SUMMARY="Header navigation table"
|
||
WIDTH="100%"
|
||
BORDER="0"
|
||
CELLPADDING="0"
|
||
CELLSPACING="0"
|
||
><TR
|
||
><TH
|
||
COLSPAN="3"
|
||
ALIGN="center"
|
||
>Secure Programming for Linux and Unix HOWTO</TH
|
||
></TR
|
||
><TR
|
||
><TD
|
||
WIDTH="10%"
|
||
ALIGN="left"
|
||
VALIGN="bottom"
|
||
><A
|
||
HREF="limit-call-outs.html"
|
||
ACCESSKEY="P"
|
||
>Prev</A
|
||
></TD
|
||
><TD
|
||
WIDTH="80%"
|
||
ALIGN="center"
|
||
VALIGN="bottom"
|
||
>Chapter 8. Carefully Call Out to Other Resources</TD
|
||
><TD
|
||
WIDTH="10%"
|
||
ALIGN="right"
|
||
VALIGN="bottom"
|
||
><A
|
||
HREF="call-intentional-apis.html"
|
||
ACCESSKEY="N"
|
||
>Next</A
|
||
></TD
|
||
></TR
|
||
></TABLE
|
||
><HR
|
||
ALIGN="LEFT"
|
||
WIDTH="100%"></DIV
|
||
><DIV
|
||
CLASS="SECT1"
|
||
><H1
|
||
CLASS="SECT1"
|
||
><A
|
||
NAME="HANDLE-METACHARACTERS"
|
||
></A
|
||
>8.3. Handle Metacharacters</H1
|
||
><P
|
||
>Many systems, such as the command line shell and SQL interpreters,
|
||
have ``metacharacters'', that is, characters in their input
|
||
that are not interpreted as data.
|
||
Such characters might commands, or delimit data from commands or other data.
|
||
If there's a language specification for that system's interface
|
||
that you're using, then it certainly has metacharacters.
|
||
If your program invokes those other systems and allows attackers to
|
||
insert such metacharacters, the usual result is that an attacker can
|
||
completely control your program.</P
|
||
><P
|
||
>One of the most pervasive metacharacter problems are those involving
|
||
shell metacharacters.
|
||
The standard Unix-like command shell (stored in /bin/sh)
|
||
interprets a number of characters specially.
|
||
If these characters are sent to the shell, then their special interpretation
|
||
will be used unless escaped; this fact can be used to break programs.
|
||
According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are:
|
||
|
||
<TABLE
|
||
BORDER="1"
|
||
BGCOLOR="#E0E0E0"
|
||
WIDTH="100%"
|
||
><TR
|
||
><TD
|
||
><FONT
|
||
COLOR="#000000"
|
||
><PRE
|
||
CLASS="SCREEN"
|
||
>& ; ` ' \ " | * ? ~ < > ^ ( ) [ ] { } $ \n \r</PRE
|
||
></FONT
|
||
></TD
|
||
></TR
|
||
></TABLE
|
||
></P
|
||
><P
|
||
>I should note that in many situations you'll also want to escape
|
||
the tab and space characters, since they (and the newline) are the default
|
||
parameter separators.
|
||
The separator values can be changed by setting the IFS environment
|
||
variable, but if you can't trust the source of this variable you should
|
||
have thrown it out or reset it anyway as part of your environment
|
||
variable processing.</P
|
||
><P
|
||
>Unfortunately, in real life this isn't a complete list.
|
||
Here are some other characters that can be problematic:
|
||
<P
|
||
></P
|
||
><UL
|
||
><LI
|
||
><P
|
||
>'!' means ``not'' in an expression (as it does in C);
|
||
if the return value of a program is tested, prepending !
|
||
could fool a script into thinking something had failed when it
|
||
succeeded or vice versa.
|
||
In some shells, the "!" also accesses the command history, which can
|
||
cause real problems.
|
||
In bash, this only occurs for interactive mode, but tcsh
|
||
(a csh clone found in some Linux distributions) uses "!" even in scripts.</P
|
||
></LI
|
||
><LI
|
||
><P
|
||
>'#' is the comment character; all further text on the line is ignored.</P
|
||
></LI
|
||
><LI
|
||
><P
|
||
>'-' can be misinterpreted as leading an option (or, as -<2D>-, disabling
|
||
all further options). Even if it's in the ``middle'' of a filename,
|
||
if it's preceded by what the shell considers as whitespace you may
|
||
have a problem.</P
|
||
></LI
|
||
><LI
|
||
><P
|
||
>' ' (space), '\t' (tab), '\n' (newline), '\r' (return),
|
||
'\v' (vertical space), '\f' (form feed),
|
||
and other whitespace characters can have many dangerous effects.
|
||
They can may turn a ``single'' filename into multiple arguments, for example,
|
||
or turn a single parameter into multiple parameter when stored.
|
||
Newline and return have a number of additional dangers, for example,
|
||
they can be used to create ``spoofed'' log entries in some programs,
|
||
or inserted just before a separate command that is then executed
|
||
(if an underlying protocol uses newlines or returns as command
|
||
separators).</P
|
||
></LI
|
||
><LI
|
||
><P
|
||
>Other control characters (in particular, NIL) may cause problems for
|
||
some shell implementations.</P
|
||
></LI
|
||
><LI
|
||
><P
|
||
>Depending on your usage, it's even conceivable that ``.''
|
||
(the ``run in current shell'') and ``='' (for setting variables) might
|
||
be worrisome characters.
|
||
However, any example I've found so far where these
|
||
are issues have other (much worse) security problems.</P
|
||
></LI
|
||
></UL
|
||
> </P
|
||
><P
|
||
>What makes the shell metacharacters particularly pervasive is
|
||
that several important library calls, such as popen(3) and system(3),
|
||
are implemented by calling the command shell, meaning that they will
|
||
be affected by shell metacharacters too.
|
||
Similarly, execlp(3) and execvp(3) may cause the shell to be called.
|
||
Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3)
|
||
entirely and use execve(3) directly in C when trying to spawn
|
||
a process [Galvin 1998b].
|
||
At the least, avoid using system(3) when you can use the execve(3);
|
||
since system(3) uses the shell to expand characters, there is more
|
||
opportunity for mischief in system(3).
|
||
In a similar manner the Perl and shell backtick (`) also call a command shell;
|
||
for more information on Perl see <A
|
||
HREF="perl.html"
|
||
>Section 10.2</A
|
||
>.</P
|
||
><P
|
||
>Since SQL also has metacharacters, a similar issue revolves around
|
||
calls to SQL.
|
||
When metacharacters are provided as input to trigger SQL metacharacters,
|
||
it's often called "SQL injection".
|
||
See
|
||
<A
|
||
HREF="http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf"
|
||
TARGET="_top"
|
||
>SPI Dynamic's paper ``SQL Injection: Are your Web Applications Vulnerable?''</A
|
||
>
|
||
for further discussion on this.
|
||
As discussed in <A
|
||
HREF="input.html"
|
||
>Chapter 5</A
|
||
>,
|
||
define a very limited pattern and only allow data matching that
|
||
pattern to enter; if you limit your pattern to ^[0-9]$ or
|
||
^[0-9A-Za-z]*$ then you won't have a problem.
|
||
If you must handle data that may include SQL metacharacters, a good approach
|
||
is to convert it (as early as possible) to some other encoding before
|
||
storage, e.g.,
|
||
HTML encoding (in which case you'll need to encode any ampersand characters
|
||
too).
|
||
Also, prepend and append a quote to all user input, even
|
||
if the data is numeric; that way, insertions of white space and other
|
||
kinds of data won't be as dangerous.</P
|
||
><P
|
||
>Forgetting one of these characters can be disastrous, for example,
|
||
many programs omit backslash as a shell metacharacter [rfp 1999].
|
||
As discussed in the <A
|
||
HREF="input.html"
|
||
>Chapter 5</A
|
||
>, a recommended approach
|
||
by some
|
||
is to immediately escape at least all of these characters when they are input.
|
||
But again, by far and away the best approach is to identify which
|
||
characters you wish to permit, and use a filter to only permit
|
||
those characters.</P
|
||
><P
|
||
>A number of programs, especially those designed for human interaction,
|
||
have ``escape'' codes that perform ``extra'' activities.
|
||
One of the more common (and dangerous) escape codes is one that brings
|
||
up a command line.
|
||
Make sure that these ``escape'' commands can't be included
|
||
(unless you're sure that the specific command is safe).
|
||
For example, many line-oriented mail programs (such as mail or mailx) use
|
||
tilde (~) as an escape character, which can then be used to send a number
|
||
of commands.
|
||
As a result, apparently-innocent commands such as
|
||
``mail admin < file-from-user'' can be used to execute arbitrary programs.
|
||
Interactive programs such as vi, emacs, and ed have ``escape'' mechanisms
|
||
that allow users to run arbitrary shell commands from their session.
|
||
Always examine the documentation of programs you call to search for
|
||
escape mechanisms.
|
||
It's best if you call only programs intended for use by other programs; see
|
||
<A
|
||
HREF="call-intentional-apis.html"
|
||
>Section 8.4</A
|
||
>.</P
|
||
><P
|
||
>The issue of avoiding
|
||
escape codes even goes down to low-level hardware components
|
||
and emulators of them.
|
||
Most modems implement the so-called ``Hayes'' command set.
|
||
Unless the command set is disabled, inducing
|
||
a delay, the phrase ``+++'', and then another delay forces the modem
|
||
to interpret any following text as commands to the modem instead.
|
||
This can be used to implement denial-of-service attacks (by
|
||
sending ``ATH0'', a hang-up command) or even forcing
|
||
a user to connect to someone else (a sophisticated attacker could
|
||
re-route a user's connection through a machine under the attacker's control).
|
||
For the specific case of modems, this is easy to counter
|
||
(e.g., add "ATS2-255" in the modem initialization string), but the
|
||
general issue still holds: if you're controlling a lower-level component,
|
||
or an emulation of one, make sure that you disable or otherwise handle
|
||
any escape codes built into them.</P
|
||
><P
|
||
>Many ``terminal'' interfaces implement the escape
|
||
codes of ancient, long-gone physical terminals like the VT100.
|
||
These codes can be useful, for example, for bolding characters,
|
||
changing font color, or moving to a particular location
|
||
in a terminal interface.
|
||
However, do not allow arbitrary untrusted data to be sent directly
|
||
to a terminal screen, because some of those codes can cause serious problems.
|
||
On some systems you can remap keys (e.g., so when a user presses
|
||
"Enter" or a function key it sends the command you want them to run).
|
||
On some you can even send codes to
|
||
clear the screen, display a set of commands you'd like the victim to run,
|
||
and then send that set ``back'', forcing the victim to run
|
||
the commands of the attacker's choosing without even waiting for a keystroke.
|
||
This is typically implemented using ``page-mode buffering''.
|
||
This security problem is why emulated tty's (represented as device files,
|
||
usually in /dev/) should only be writeable by
|
||
their owners and never anyone else - they should never have
|
||
``other write'' permission set, and unless only the user is a member of
|
||
the group (i.e., the ``user-private group'' scheme), the ``group write''
|
||
permission should not be set either for the terminal [Filipski 1986].
|
||
If you're displaying data to the user at a (simulated) terminal, you probably
|
||
need to filter out all control characters (characters with values less
|
||
than 32) from data sent back to
|
||
the user unless they're identified by you as safe.
|
||
Worse comes to worse, you can identify tab and newline (and maybe
|
||
carriage return) as safe, removing all the rest.
|
||
Characters with their high bits set (i.e., values greater than 127)
|
||
are in some ways trickier to handle; some old systems implement them as
|
||
if they weren't set, but simply filtering them inhibits much international
|
||
use.
|
||
In this case, you need to look at the specifics of your situation.</P
|
||
><P
|
||
>A related problem is that the NIL character (character 0) can have
|
||
surprising effects.
|
||
Most C and C++ functions assume
|
||
that this character marks the end of a string, but string-handling routines
|
||
in other languages (such as Perl and Ada95) can handle strings containing NIL.
|
||
Since many libraries and kernel calls use the C convention, the result
|
||
is that what is checked is not what is actually used [rfp 1999].</P
|
||
><P
|
||
>When calling another program or referring to a file
|
||
always specify its full path (e.g, <TT
|
||
CLASS="FILENAME"
|
||
>/usr/bin/sort</TT
|
||
>).
|
||
For program calls,
|
||
this will eliminate possible errors in calling the ``wrong'' command,
|
||
even if the PATH value is incorrectly set.
|
||
For other file referents, this reduces problems from ``bad'' starting
|
||
directories.</P
|
||
></DIV
|
||
><DIV
|
||
CLASS="NAVFOOTER"
|
||
><HR
|
||
ALIGN="LEFT"
|
||
WIDTH="100%"><TABLE
|
||
SUMMARY="Footer navigation table"
|
||
WIDTH="100%"
|
||
BORDER="0"
|
||
CELLPADDING="0"
|
||
CELLSPACING="0"
|
||
><TR
|
||
><TD
|
||
WIDTH="33%"
|
||
ALIGN="left"
|
||
VALIGN="top"
|
||
><A
|
||
HREF="limit-call-outs.html"
|
||
ACCESSKEY="P"
|
||
>Prev</A
|
||
></TD
|
||
><TD
|
||
WIDTH="34%"
|
||
ALIGN="center"
|
||
VALIGN="top"
|
||
><A
|
||
HREF="index.html"
|
||
ACCESSKEY="H"
|
||
>Home</A
|
||
></TD
|
||
><TD
|
||
WIDTH="33%"
|
||
ALIGN="right"
|
||
VALIGN="top"
|
||
><A
|
||
HREF="call-intentional-apis.html"
|
||
ACCESSKEY="N"
|
||
>Next</A
|
||
></TD
|
||
></TR
|
||
><TR
|
||
><TD
|
||
WIDTH="33%"
|
||
ALIGN="left"
|
||
VALIGN="top"
|
||
>Limit Call-outs to Valid Values</TD
|
||
><TD
|
||
WIDTH="34%"
|
||
ALIGN="center"
|
||
VALIGN="top"
|
||
><A
|
||
HREF="call-out.html"
|
||
ACCESSKEY="U"
|
||
>Up</A
|
||
></TD
|
||
><TD
|
||
WIDTH="33%"
|
||
ALIGN="right"
|
||
VALIGN="top"
|
||
>Call Only Interfaces Intended for Programmers</TD
|
||
></TR
|
||
></TABLE
|
||
></DIV
|
||
></BODY
|
||
></HTML
|
||
> |