old-www/HOWTO/Secure-Programs-HOWTO/handle-metacharacters.html

387 lines
12 KiB
HTML
Raw Permalink Blame History

<HTML
><HEAD
><TITLE
>Handle Metacharacters</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="Secure Programming for Linux and Unix HOWTO"
HREF="index.html"><LINK
REL="UP"
TITLE="Carefully Call Out to Other Resources"
HREF="call-out.html"><LINK
REL="PREVIOUS"
TITLE="Limit Call-outs to Valid Values"
HREF="limit-call-outs.html"><LINK
REL="NEXT"
TITLE="Call Only Interfaces Intended for Programmers"
HREF="call-intentional-apis.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Secure Programming for Linux and Unix HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="limit-call-outs.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 8. Carefully Call Out to Other Resources</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="call-intentional-apis.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="HANDLE-METACHARACTERS"
></A
>8.3. Handle Metacharacters</H1
><P
>Many systems, such as the command line shell and SQL interpreters,
have ``metacharacters'', that is, characters in their input
that are not interpreted as data.
Such characters might commands, or delimit data from commands or other data.
If there's a language specification for that system's interface
that you're using, then it certainly has metacharacters.
If your program invokes those other systems and allows attackers to
insert such metacharacters, the usual result is that an attacker can
completely control your program.</P
><P
>One of the most pervasive metacharacter problems are those involving
shell metacharacters.
The standard Unix-like command shell (stored in /bin/sh)
interprets a number of characters specially.
If these characters are sent to the shell, then their special interpretation
will be used unless escaped; this fact can be used to break programs.
According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are:
<TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>&#38; ; ` ' \ " | * ? ~ &#60; &#62; ^ ( ) [ ] { } $ \n \r</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>I should note that in many situations you'll also want to escape
the tab and space characters, since they (and the newline) are the default
parameter separators.
The separator values can be changed by setting the IFS environment
variable, but if you can't trust the source of this variable you should
have thrown it out or reset it anyway as part of your environment
variable processing.</P
><P
>Unfortunately, in real life this isn't a complete list.
Here are some other characters that can be problematic:
<P
></P
><UL
><LI
><P
>'!' means ``not'' in an expression (as it does in C);
if the return value of a program is tested, prepending !
could fool a script into thinking something had failed when it
succeeded or vice versa.
In some shells, the "!" also accesses the command history, which can
cause real problems.
In bash, this only occurs for interactive mode, but tcsh
(a csh clone found in some Linux distributions) uses "!" even in scripts.</P
></LI
><LI
><P
>'#' is the comment character; all further text on the line is ignored.</P
></LI
><LI
><P
>'-' can be misinterpreted as leading an option (or, as -<2D>-, disabling
all further options). Even if it's in the ``middle'' of a filename,
if it's preceded by what the shell considers as whitespace you may
have a problem.</P
></LI
><LI
><P
>' ' (space), '\t' (tab), '\n' (newline), '\r' (return),
'\v' (vertical space), '\f' (form feed),
and other whitespace characters can have many dangerous effects.
They can may turn a ``single'' filename into multiple arguments, for example,
or turn a single parameter into multiple parameter when stored.
Newline and return have a number of additional dangers, for example,
they can be used to create ``spoofed'' log entries in some programs,
or inserted just before a separate command that is then executed
(if an underlying protocol uses newlines or returns as command
separators).</P
></LI
><LI
><P
>Other control characters (in particular, NIL) may cause problems for
some shell implementations.</P
></LI
><LI
><P
>Depending on your usage, it's even conceivable that ``.''
(the ``run in current shell'') and ``='' (for setting variables) might
be worrisome characters.
However, any example I've found so far where these
are issues have other (much worse) security problems.</P
></LI
></UL
>&#13;</P
><P
>What makes the shell metacharacters particularly pervasive is
that several important library calls, such as popen(3) and system(3),
are implemented by calling the command shell, meaning that they will
be affected by shell metacharacters too.
Similarly, execlp(3) and execvp(3) may cause the shell to be called.
Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3)
entirely and use execve(3) directly in C when trying to spawn
a process [Galvin 1998b].
At the least, avoid using system(3) when you can use the execve(3);
since system(3) uses the shell to expand characters, there is more
opportunity for mischief in system(3).
In a similar manner the Perl and shell backtick (`) also call a command shell;
for more information on Perl see <A
HREF="perl.html"
>Section 10.2</A
>.</P
><P
>Since SQL also has metacharacters, a similar issue revolves around
calls to SQL.
When metacharacters are provided as input to trigger SQL metacharacters,
it's often called "SQL injection".
See
<A
HREF="http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf"
TARGET="_top"
>SPI Dynamic's paper ``SQL Injection: Are your Web Applications Vulnerable?''</A
>
for further discussion on this.
As discussed in <A
HREF="input.html"
>Chapter 5</A
>,
define a very limited pattern and only allow data matching that
pattern to enter; if you limit your pattern to ^[0-9]$ or
^[0-9A-Za-z]*$ then you won't have a problem.
If you must handle data that may include SQL metacharacters, a good approach
is to convert it (as early as possible) to some other encoding before
storage, e.g.,
HTML encoding (in which case you'll need to encode any ampersand characters
too).
Also, prepend and append a quote to all user input, even
if the data is numeric; that way, insertions of white space and other
kinds of data won't be as dangerous.</P
><P
>Forgetting one of these characters can be disastrous, for example,
many programs omit backslash as a shell metacharacter [rfp 1999].
As discussed in the <A
HREF="input.html"
>Chapter 5</A
>, a recommended approach
by some
is to immediately escape at least all of these characters when they are input.
But again, by far and away the best approach is to identify which
characters you wish to permit, and use a filter to only permit
those characters.</P
><P
>A number of programs, especially those designed for human interaction,
have ``escape'' codes that perform ``extra'' activities.
One of the more common (and dangerous) escape codes is one that brings
up a command line.
Make sure that these ``escape'' commands can't be included
(unless you're sure that the specific command is safe).
For example, many line-oriented mail programs (such as mail or mailx) use
tilde (~) as an escape character, which can then be used to send a number
of commands.
As a result, apparently-innocent commands such as
``mail admin &#60; file-from-user'' can be used to execute arbitrary programs.
Interactive programs such as vi, emacs, and ed have ``escape'' mechanisms
that allow users to run arbitrary shell commands from their session.
Always examine the documentation of programs you call to search for
escape mechanisms.
It's best if you call only programs intended for use by other programs; see
<A
HREF="call-intentional-apis.html"
>Section 8.4</A
>.</P
><P
>The issue of avoiding
escape codes even goes down to low-level hardware components
and emulators of them.
Most modems implement the so-called ``Hayes'' command set.
Unless the command set is disabled, inducing
a delay, the phrase ``+++'', and then another delay forces the modem
to interpret any following text as commands to the modem instead.
This can be used to implement denial-of-service attacks (by
sending ``ATH0'', a hang-up command) or even forcing
a user to connect to someone else (a sophisticated attacker could
re-route a user's connection through a machine under the attacker's control).
For the specific case of modems, this is easy to counter
(e.g., add "ATS2-255" in the modem initialization string), but the
general issue still holds: if you're controlling a lower-level component,
or an emulation of one, make sure that you disable or otherwise handle
any escape codes built into them.</P
><P
>Many ``terminal'' interfaces implement the escape
codes of ancient, long-gone physical terminals like the VT100.
These codes can be useful, for example, for bolding characters,
changing font color, or moving to a particular location
in a terminal interface.
However, do not allow arbitrary untrusted data to be sent directly
to a terminal screen, because some of those codes can cause serious problems.
On some systems you can remap keys (e.g., so when a user presses
"Enter" or a function key it sends the command you want them to run).
On some you can even send codes to
clear the screen, display a set of commands you'd like the victim to run,
and then send that set ``back'', forcing the victim to run
the commands of the attacker's choosing without even waiting for a keystroke.
This is typically implemented using ``page-mode buffering''.
This security problem is why emulated tty's (represented as device files,
usually in /dev/) should only be writeable by
their owners and never anyone else - they should never have
``other write'' permission set, and unless only the user is a member of
the group (i.e., the ``user-private group'' scheme), the ``group write''
permission should not be set either for the terminal [Filipski 1986].
If you're displaying data to the user at a (simulated) terminal, you probably
need to filter out all control characters (characters with values less
than 32) from data sent back to
the user unless they're identified by you as safe.
Worse comes to worse, you can identify tab and newline (and maybe
carriage return) as safe, removing all the rest.
Characters with their high bits set (i.e., values greater than 127)
are in some ways trickier to handle; some old systems implement them as
if they weren't set, but simply filtering them inhibits much international
use.
In this case, you need to look at the specifics of your situation.</P
><P
>A related problem is that the NIL character (character 0) can have
surprising effects.
Most C and C++ functions assume
that this character marks the end of a string, but string-handling routines
in other languages (such as Perl and Ada95) can handle strings containing NIL.
Since many libraries and kernel calls use the C convention, the result
is that what is checked is not what is actually used [rfp 1999].</P
><P
>When calling another program or referring to a file
always specify its full path (e.g, <TT
CLASS="FILENAME"
>/usr/bin/sort</TT
>).
For program calls,
this will eliminate possible errors in calling the ``wrong'' command,
even if the PATH value is incorrectly set.
For other file referents, this reduces problems from ``bad'' starting
directories.</P
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="limit-call-outs.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="call-intentional-apis.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Limit Call-outs to Valid Values</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="call-out.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Call Only Interfaces Intended for Programmers</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>