From 6beaab6302430a82664b609f72b5ebac22a2a231 Mon Sep 17 00:00:00 2001
From: gferg <>
Date: Tue, 22 May 2001 20:32:50 +0000
Subject: [PATCH] updated

---
 .../Unix-and-Internet-Fundamentals-HOWTO.sgml | 749 ++++++++++--------
 1 file changed, 429 insertions(+), 320 deletions(-)
diff --git a/LDP/howto/docbook/Unix-and-Internet-Fundamentals-HOWTO.sgml b/LDP/howto/docbook/Unix-and-Internet-Fundamentals-HOWTO.sgml
index f84d8151..ae28200b 100644
--- a/LDP/howto/docbook/Unix-and-Internet-Fundamentals-HOWTO.sgml
+++ b/LDP/howto/docbook/Unix-and-Internet-Fundamentals-HOWTO.sgml
@@ -1,13 +1,10 @@
-<!-- This is the Unix and Internet Fundamentals HOWTO, SGML source -- >
-<!-- Eric S. Raymond, esr@snark.thyrsus.com -- >
-
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN"[
+<!ENTITY howto         "http://www.linuxdoc.org/HOWTO/">
+<!ENTITY mini-howto    "http://www.linuxdoc.org/HOWTO/mini/">
+]>
 
 <article>
-
-<!-- Header -->
-
- <artheader>
+ <articleinfo>
   <title>The Unix and Internet Fundamentals HOWTO</title>
 
   <author>
@@ -21,6 +18,36 @@
   </author>
 
    <revhistory>
+      <revision>
+         <revnumber>2.3</revnumber>
+         <date>22 May 2001</date>
+         <authorinitials>esr</authorinitials>
+          <revremark>
+            Introduction to bus tyoes.  
+            Polish translatiopn link.
+         </revremark>
+      </revision>
+
+      <revision>
+         <revnumber>2.2</revnumber>
+         <date>5 February 2001</date>
+         <authorinitials>esr</authorinitials>
+          <revremark>
+            New section on how DNS is organized.  Corrected for new
+	    location of document. Various copy-edit fixes.
+         </revremark>
+      </revision>
+
+      <revision>
+         <revnumber>2.1</revnumber>
+         <date>29 November 2000</date>
+         <authorinitials>esr</authorinitials>
+          <revremark>
+            Correct explanation of twos-complement numbers.  Various
+	    copy-edit fixes.
+         </revremark>
+      </revision>
+
       <revision>
          <revnumber>2.0</revnumber>
          <date>5 August 2000</date>
@@ -77,33 +104,20 @@
       </revision>
    </revhistory>
 
-<!--
-Pre-Docbook version history:
-
-New in 1.2: The section `How does my computer store things in memory?'.
-New in 1.3: 
-
-Other versions incorporated typo fixes and minor editorial changes.
--->
-
   <abstract>
-    <indexterm>
-      <primary>template</primary>
-    </indexterm>
-
     <para>
 This document describes the working basics of PC-class computers, Unix-like
 operating systems, and the Internet in non-technical language.
     </para>
 
   </abstract>
- </artheader>
+ </articleinfo>
 
 <!-- Section1: intro -->
 
-<sect1><title>Introduction</title>
+<sect1 id="intro"><title>Introduction</title>
 
-<sect2><title>Purpose of this document</title>
+<sect2 id="purpose" ><title>Purpose of this document</title>
 
 <para>This document is intended to help Linux and Internet users who are
 learning by doing.  While this is a great way to acquire specific skills,
@@ -113,7 +127,7 @@ from lack of a good mental model of what is really going on.</para>
 
 <para>I'll try to describe in clear, simple language how it all works.  The
 presentation will be tuned for people using Unix or Linux on PC-class
-hardware.  Nevertheless I'll usually refer simply to `Unix' here, as most
+hardware.  Nevertheless, I'll usually refer simply to `Unix' here, as most
 of what I will describe is constant across platforms and across Unix
 variants.</para>
 
@@ -131,15 +145,7 @@ response to user feedback, so you should come back and review it
 periodically.</para>
 
 </sect2>
-<sect2><title>Related resources</title>
-
-<para>If you're reading this in order to learn how to hack, you should also
-read the <ulink url="http://www.tuxedo.org/~esr/faqs/hacker-howto.html">
-How To Become A Hacker FAQ</ulink>.  It has links to some other useful
-resources.</para>
-
-</sect2>
-<sect2><title>New versions of this document</title>
+<sect2 id="newversions"><title>New versions of this document</title>
 
 <para>New versions of the Unix and Internet Fundamentals HOWTO will be
 periodically posted to <ulink url="news:comp.os.linux.help">
@@ -150,12 +156,15 @@ FTP sites, including the LDP home page.</para>
 
 <para>You can view the latest version of this on the World Wide Web via the URL
 <ulink
-url="http://metalab.unc.edu/LDP/HOWTO/Unix-Internet-Fundamentals-HOWTO.html">
-http://metalab.unc.edu/LDP/HOWTO/Unix-Internet-Fundamentals-HOWTO.html</ulink>.
+url="http://www.linuxdoc.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html">
+http://www.linuxdoc.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html</ulink>.
 </para>
 
+<para>This document has been translated into <ulink
+url="www.gszczepa.hg.pl/esr1iso2.htm">Polish</ulink>.</para>
+
 </sect2>
-<sect2><title>Feedback and corrections</title>
+<sect2 id="feedback"><title>Feedback and corrections</title>
 
 <para>If you have questions or comments about this document, please feel
 free to mail Eric S. Raymond, at <ulink url="mailto:esr@thyrsus.com">
@@ -164,40 +173,56 @@ especially welcome hyperlinks to more detailed explanations of individual
 concepts.  If you find a mistake with this document, please let me know so
 I can correct it in the next version. Thanks.</para>
 
+</sect2>
+<sect2 id="resources"><title>Related resources</title>
+
+<para>If you're reading this in order to learn how to hack, you should also
+read the <ulink url="http://www.tuxedo.org/~esr/faqs/hacker-howto.html">
+How To Become A Hacker FAQ</ulink>.  It has links to some other useful
+resources.</para>
+
 </sect2>
 </sect1>
-<sect1><title>Basic anatomy of your computer</title>
+<sect1 id="anatomy"><title>Basic anatomy of your computer</title>
 
 <para>Your computer has a processor chip inside it that does the actual
 computing.  It has internal memory (what DOS/Windows people call ``RAM''
 and Unix people often call ``core''; the Unix term is a folk memory from
 when RAM consisted of ferrite-core donuts).  The processor and memory live
 on the
-<emphasis>motherboard</emphasis><indexterm><primary>motherboard</primary></indexterm>
+<firstterm>motherboard</firstterm><indexterm><primary>motherboard</primary></indexterm>,
 which is the heart of your computer.</para>
 
 <para>Your computer has a screen and keyboard.  It has hard drives and
-floppy disks.  The screen and your disks have <emphasis>controller
-cards</emphasis> that plug into the motherboard and help the computer drive
-these outboard devices.  (Your keyboard is too simple to need a separate
+floppy disks.  The screen and your disks have <firstterm>controller
+cards</firstterm> that plug into the motherboard and help the computer drive
+these devices.  (Your keyboard is too simple to need a separate
 card; the controller is built into the keyboard chassis itself.)</para>
 
 <para>We'll go into some of the details of how these devices work later.  For
 now, here are a few basic things to keep in mind about how they work
 together:</para>
 
-<para>All the inboard parts of your computer are connected by a
-<emphasis>bus</emphasis><indexterm><primary>bus</primary></indexterm>.
+<para>All the parts of your computer inside the case are connected by a
+<firstterm>bus</firstterm><indexterm><primary>bus</primary></indexterm>.
 Physically, the bus is what you plug your controller cards into (the video
 card, the disk controller, a sound card if you have one).  The bus is the
 data highway between your processor, your screen, your disk, and everything
 else.</para>
 
+<para>(If you've seen references to `ISA', `PCI', and `PCMCIA' in connection
+with PCs and have not understood them, these are bus types.  ISA is, except
+in minor details, the same bus that was used on IBM's original PCs in 1980;
+it is passing out of use now.  PCI, for Peripheral Component
+Interconnection, is the bus used on most modern PCs, and on modern
+Macintoshes as well.  PCMCIA is a variant of ISA with smaller physical
+connectors used on laptop computers.)</para>
+
 <para>The processor, which makes everything else go, can't actually see any of
 the other pieces directly; it has to talk to them over the bus.  The only
-other subsystem it has really fast, immediate access to is memory (the
-core).  In order for programs to run, then, they have to be <emphasis>in
-core</emphasis> (in memory).</para>
+other subsystem that it has really fast, immediate access to is memory (the
+core).  In order for programs to run, then, they have to be <firstterm>in
+core</firstterm> (in memory).</para>
 
 <para>When your computer reads a program or data off the disk, what actually
 happens is that the processor uses the bus to send a disk read request
@@ -211,35 +236,35 @@ bus, but in simpler ways.  We'll discuss those later on.  For now, you know
 enough to understand what happens when you turn on your computer.</para>
 
 </sect1>
-<sect1><title>What happens when you switch on a computer?</title>
+<sect1 id="bootup"><title>What happens when you switch on a computer?</title>
 
 <para>A computer without a program running is just an inert hunk of
 electronics.  The first thing a computer has to do when it is turned on is
-start up a special program called an <anchor id="os"><emphasis>operating
-system</emphasis>.  The operating system's job is to help other computer
+start up a special program called an <anchor id="os"><firstterm>operating
+system</firstterm>.  The operating system's job is to help other computer
 programs to work by handling the messy details of controlling the
 computer's hardware.</para>
 
 <para>The process of bringing up the operating system is called <anchor
-id="boot"> <emphasis>booting</emphasis> (originally this was
-<emphasis>bootstrapping</emphasis> and alluded to the difficulty of pulling
+id="boot"> <firstterm>booting</firstterm> (originally this was
+<firstterm>bootstrapping</firstterm> and alluded to the process of pulling
 yourself up ``by your bootstraps'').  Your computer knows how to boot
 because instructions for booting are built into one of its chips, the BIOS
 (or Basic Input/Output System) chip.</para>
 
-<para>The BIOS chip tells it to look in a fixed place on the
-lowest-numbered hard disk (the <emphasis>boot disk</emphasis>) for a
-special program called a <emphasis>boot loader</emphasis> (under Linux the
+<para>The BIOS chip tells it to look in a fixed place, usually on the
+lowest-numbered hard disk (the <firstterm>boot disk</firstterm>) for a
+special program called a <firstterm>boot loader</firstterm> (under Linux the
 boot loader is called LILO).  The boot loader is pulled into memory and
 started.  The boot loader's job is to start the real operating
 system.</para>
 
 <para>The loader does this by looking for a
-<emphasis>kernel</emphasis><indexterm><primary>kernel</primary></indexterm>,
+<firstterm>kernel</firstterm><indexterm><primary>kernel</primary></indexterm>,
 loading it into memory, and starting it.  When you boot Linux and see
 "LILO" on the screen followed by a bunch of dots, it is loading the kernel.
-(Each dot means it has loaded another <anchor id="diskblock"><emphasis>disk
-block</emphasis> of kernel code.)</para>
+(Each dot means it has loaded another <anchor id="diskblock"><firstterm>disk
+block</firstterm> of kernel code.)</para>
 
 <para>(You may wonder why the BIOS doesn't load the kernel directly -- why the
 two-step process with the boot loader?  Well, the BIOS isn't very smart.
@@ -252,25 +277,25 @@ enough for you.)</para>
 
 <para>Once the kernel starts, it has to look around, find the rest of the
 hardware, and get ready to run programs.  It does this by poking not at
-ordinary memory locations but rather at <emphasis>I/O ports</emphasis> --
+ordinary memory locations but rather at <firstterm>I/O ports</firstterm> --
 special bus addresses that are likely to have device controller cards
 listening at them for commands.  The kernel doesn't poke at random; it has
 a lot of built-in knowledge about what it's likely to find where, and how
 controllers will respond if they're present.  This process is called
-<emphasis>autoprobing</emphasis><indexterm><primary>autoprobing</primary></indexterm>.</para>
+<firstterm>autoprobing</firstterm><indexterm><primary>autoprobing</primary></indexterm>.</para>
 
 <para>Most of the messages you see at boot time are the kernel autoprobing
 your hardware through the I/O ports, figuring out what it has available to
 it and adapting itself to your machine.  The Linux kernel is extremely good
-at this, better than most other Unixes and <emphasis>much</emphasis> better
+at this, better than most other Unixes and <firstterm>much</firstterm> better
 than DOS or Windows.  In fact, many Linux old-timers think the cleverness
 of Linux's boot-time probes (which made it relatively easy to install) was
 a major reason it broke out of the pack of free-Unix experiments to attract
 a critical mass of users.</para>
 
 <para>But getting the kernel fully loaded and running isn't the end of the
-boot process; it's just the first stage (sometimes called <emphasis>run
-level 1</emphasis>).  After this first stage, the kernel hands control to a
+boot process; it's just the first stage (sometimes called <firstterm>run
+level 1</firstterm>).  After this first stage, the kernel hands control to a
 special process called `init' which spawns several housekeeping
 processes.</para>
 
@@ -281,7 +306,7 @@ recovery steps before your Unix is all the way up.  We'll go into some of
 this later on when we talk about <link linkend="fsck">how file systems can
 go wrong</link>.</para>
 
-<para>Init's next step is to start several <emphasis>daemons</emphasis>.  A
+<para>Init's next step is to start several <firstterm>daemons</firstterm>.  A
 daemon is a program like a print spooler, a mail listener or a WWW server
 that lurks in the background, waiting for things to do.  These special
 programs often have to coordinate several requests that could conflict.
@@ -296,12 +321,12 @@ spooler (a gatekeeper daemon for your printer).</para>
 program called <command>getty</command> to watch your console (and maybe
 more copies to watch dial-in serial ports).  This program is what issues
 the <command>login</command> prompt to your console.  Once all daemons and
-getty processes for each terminal are started, we're at <emphasis>run level
-2</emphasis>.  At this level, you can log in and run programs.</para>
+getty processes for each terminal are started, we're at <firstterm>run level
+2</firstterm>.  At this level, you can log in and run programs.</para>
 
 <para>But we're not done yet.  The next step is to start up various daemons
 that support networking and other services.  Once that's done, we're at
-<emphasis>run level 3</emphasis> and the system is fully ready for
+<firstterm>run level 3</firstterm> and the system is fully ready for
 use.</para>
 
 </sect1>
@@ -313,7 +338,7 @@ identify yourself to the computer.  It then runs a program called
 checks to see if you are authorized to be using the machine.  If you
 aren't, your login attempt will be rejected.  If you are, login does a few
 housekeeping things and then starts up a command interpreter, the
-<emphasis>shell</emphasis>.  (Yes, <command>getty</command> and
+<firstterm>shell</firstterm>.  (Yes, <command>getty</command> and
 <command>login</command> could be one program.  They're separate for
 historical reasons not worth going into here.)</para>
 
@@ -338,7 +363,7 @@ else you choose.)</para>
 <para>Once you have successfully logged in, you get all the privileges
 associated with the individual account you are using.  You may also be
 recognized as part of a
-<emphasis>group</emphasis><indexterm><primary>group</primary></indexterm>.
+<firstterm>group</firstterm><indexterm><primary>group</primary></indexterm>.
 A group is a named collection of users set up by the system administrator.
 Groups can have privileges independently of their members' privileges.  A
 user can be a member of multiple groups.  (For details about how Unix
@@ -351,39 +376,44 @@ file maps your account name to a user ID; the
 file maps group names to numeric group IDs.  Commands that deal with
 accounts and groups do the translation automatically.)</para>
 
-<para>Your account entry also contains your <emphasis>home
-directory</emphasis><indexterm><primary>home
+<para>Your account entry also contains your <firstterm>home
+directory</firstterm><indexterm><primary>home
 directory</primary></indexterm>, the place in the Unix file system where
 your personal files will live.  Finally, your account entry also sets your
-<emphasis>shell</emphasis><indexterm><primary>shell</primary></indexterm>,
+<firstterm>shell</firstterm><indexterm><primary>shell</primary></indexterm>,
 the command interpreter that <command>login</command> will start up to
 accept your commmands.</para>
 
 </sect1>
-<sect1 id="run"><title>What happens when you run programs from the shell?</title>
+<sect1 id="running-programs"><title>What happens when you run programs from the shell?</title>
 
 <para>The shell is Unix's interpreter for the commands you type in; it's
 called a shell because it wraps around and hides the operating system
-kernel. The normal shell gives you the '$' prompt that you see after logging in
-(unless you've customized it to something else).  We won't talk about
+kernel.  It's an important feature of Unix that the shell and kernel are
+separate programs communicating through a small set of system calls. 
+This makes it possible for there to be multiple shells, suiting different
+tastes in interfaces.</para>
+
+<para>The normal shell gives you the '$' prompt that you see after logging in
+(unless you've customized it to be something else).  We won't talk about
 shell syntax and the easy things you can see on the screen here; instead
 we'll take a look behind the scenes at what's happening from the 
 computer's point of view.</para>
 
 <para>After boot time and before you run a program, you can think of your
-computer of containing a zoo of processes that are all waiting for
-something to do.  They're all waiting on <emphasis>events</emphasis>. An
+computer as containing a zoo of processes that are all waiting for
+something to do.  They're all waiting on <firstterm>events</firstterm>. An
 event can be you pressing a key or moving a mouse.  Or, if your machine is
 hooked to a network, an event can be a data packet coming in over that
 network.</para>
 
 <para>The kernel is one of these processes.  It's a special one, because it
-controls when the other <emphasis>user processes</emphasis> can run, and it
+controls when the other <firstterm>user processes</firstterm> can run, and it
 is normally the only process with direct access to the machine's hardware.
 In fact, user processes have to make requests to the kernel when they want
 to get keyboard input, write to your screen, read from or write to disk, or
 do just about anything other than crunching bits in memory.  These requests
-are known as <emphasis>system calls</emphasis>.</para>
+are known as <firstterm>system calls</firstterm>.</para>
 
 <para>Normally all I/O goes through the kernel so it can schedule the
 operations and prevent processes from stepping on each other.  A few
@@ -395,20 +425,19 @@ yet; you're looking at a shell prompt on a character console.</para>
  
 <para>The shell is just a user process, and not a particularly special one.
 It waits on your keystrokes, listening (through the kernel) to the keyboard
-I/O port.  As the kernel sees them, it echos them to your screen then
-passes them to the shell.  When the kernel sees an `Enter' it passes your
-line of text to the shell. The shell tries to interpret those keystrokes as
-commands.</para>
+I/O port.  As the kernel sees them, it echoes them to your screen.  When
+the kernel sees an `Enter' it passes your line of text to the shell. The
+shell tries to interpret those keystrokes as commands.</para>
 
 <para>Let's say you type `ls' and Enter to invoke the Unix directory
 lister. The shell applies its built-in rules to figure out that you want to
 run the executable command in the file `/bin/ls'.  It makes a system call
-asking the kernel to start /bin/ls as a new <emphasis>child</emphasis>
-process and give it access to the screen and keyboard through the kernel.
-Then the shell goes to sleep, waiting for ls to finish.</para>
+asking the kernel to start /bin/ls as a new <firstterm>child
+process</firstterm> and give it access to the screen and keyboard through
+the kernel.  Then the shell goes to sleep, waiting for ls to finish.</para>
 
 <para>When <command>/bin/ls</command> is done, it tells the kernel it's
-finished by issuing an <emphasis>exit</emphasis> system call.  The kernel
+finished by issuing an <firstterm>exit</firstterm> system call.  The kernel
 then wakes up the shell and tells it it can continue running.  The shell
 issues another prompt and waits for another line of input.</para>
 
@@ -420,18 +449,18 @@ machine might be sending or receiving mail while <command>/bin/ls</command>
 runs.</para>
 
 </sect1>
-<sect1><title>How do input devices and interrupts work?</title>
+<sect1 id="devices"><title>How do input devices and interrupts work?</title>
 
 <para>Your keyboard is a very simple input device; simple because it
 generates small amounts of data very slowly (by a computer's standards).
 When you press or release a key, that event is signalled up the keyboard
-cable to raise a <emphasis>hardware
-interrupt</emphasis><indexterm><primary>hardware
+cable to raise a <firstterm>hardware
+interrupt</firstterm><indexterm><primary>hardware
 interrupt</primary></indexterm>.</para>
 
 <para>It's the operating system's job to watch for such interrupts.  For
-each possible kind of interrupt, there will be an <emphasis>interrupt
-handler</emphasis><indexterm><primary>interrupt
+each possible kind of interrupt, there will be an <firstterm>interrupt
+handler</firstterm><indexterm><primary>interrupt
 handler</primary></indexterm>, a part of the operating system that stashes
 away any data associated with them (like your keypress/keyrelease value)
 until it can be processed.</para>
@@ -442,20 +471,20 @@ be available for inspection when the operating system passes control to
 whichever program is currently supposed to be reading from the keyboard.</para>
 
 <para>More complex input devices like disk or network cards work in a similar
-way.  Above, we referred to a disk controller using the bus to signal that
+way.  Earlier, I referred to a disk controller using the bus to signal that
 a disk request has been fulfilled.  What actually happens is that the disk
 raises an interrupt.  The disk interrupt handler then copies the retrieved
 data into memory, for later use by the program that made the request.</para>
 
-<para>Every kind of interrupts has an associated <emphasis>priority
-level</emphasis><indexterm><primary>priority level</primary></indexterm>.
+<para>Every kind of interrupt has an associated <firstterm>priority
+level</firstterm><indexterm><primary>priority level</primary></indexterm>.
 Lower-priority interrupts (like keyboard events) have to wait on
 higher-priority interrupts (like clock ticks or disk events).  Unix is
 designed to give high priority to the kinds of events that need to be
 processed rapidly in order to keep the machine's response smooth.</para>
 
-<para>In your OS's boot-time messages, you may see references to
-<emphasis>IRQ</emphasis><indexterm><primary>IRQ</primary></indexterm>
+<para>In your operating system's boot-time messages, you may see references
+to <firstterm>IRQ</firstterm><indexterm><primary>IRQ</primary></indexterm>
 numbers.  You may be aware that one of the common ways to misconfigure
 hardware is to have two different devices try to use the same IRQ, without
 understanding exactly why. </para>
@@ -469,17 +498,17 @@ lock up the device, and can sometimes confuse the OS badly enough that it
 will flake out or crash.</para>
 
 </sect1>
-<sect1><title>How does my computer do several things at once?</title>
+<sect1 id="multitasking"><title>How does my computer do several things at once?</title>
 
 <para>It doesn't, actually.  Computers can only do one task (or
-<emphasis>process</emphasis>) at a time.  But a computer can change tasks
+<firstterm>process</firstterm>) at a time.  But a computer can change tasks
 very rapidly, and fool slow human beings into thinking it's doing several
 things at once.  This is called
-<emphasis>timesharing</emphasis><indexterm><primary>timesharing</primary></indexterm>.</para>
+<firstterm>timesharing</firstterm><indexterm><primary>timesharing</primary></indexterm>.</para>
 
 <para>One of the kernel's jobs is to manage timesharing.  It has a part
 called the
-<emphasis>scheduler</emphasis><indexterm><primary>scheduler</primary></indexterm>
+<firstterm>scheduler</firstterm><indexterm><primary>scheduler</primary></indexterm>
 which keeps information inside itself about all the other (non-kernel)
 processes in your zoo.  Every 1/60th of a second, a timer goes off in the
 kernel, generating a clock interrupt.  The scheduler stops whatever process
@@ -496,7 +525,7 @@ timeslices.</para>
 interrupt comes in from an I/O device, the kernel effectively stops the
 current task, runs the interrupt handler, and then returns to the current
 task.  A storm of high-priority interrupts can squeeze out normal
-processing; this misbehavior is called <emphasis>thrashing</emphasis> and
+processing; this misbehavior is called <firstterm>thrashing</firstterm> and
 is fortunately very hard to induce under modern Unixes.</para>
 
 <para>In fact, the speed of programs is only very seldom limited by the
@@ -514,40 +543,40 @@ reliable multitasking is a large part of what makes Linux superior for
 networking, communications, and Web service.</para>
 
 </sect1>
-<sect1><title>How does my computer keep processes from stepping on each other?</title>
+<sect1 id="memory-management"><title>How does my computer keep processes from stepping on each other?</title>
 
 <para>The kernel's scheduler takes care of dividing processes in time.
 Your operating system also has to divide them in space, so that processes
 can't step on each others' working memory.  Even if you assume that all
 programs are trying to be cooperative, you don't want a bug in one of them
 to be able to corrupt others.  The things your operating system does to
-solve this problem are called <emphasis>memory
-management</emphasis><indexterm><primary>memory
+solve this problem are called <firstterm>memory
+management</firstterm><indexterm><primary>memory
 management</primary></indexterm>.</para>
 
 <para>Each process in your zoo needs its own area of memory, as a place to
 run its code from and keep variables and results in.  You can think of this
-set as consisting of a read-only <emphasis>code
-segment</emphasis><indexterm><primary>code segment</primary></indexterm>
-(containing the process's instructions) and a writeable <emphasis>data
-segment</emphasis><indexterm><primary>data segment</primary></indexterm>
+set as consisting of a read-only <firstterm>code
+segment</firstterm><indexterm><primary>code segment</primary></indexterm>
+(containing the process's instructions) and a writeable <firstterm>data
+segment</firstterm><indexterm><primary>data segment</primary></indexterm>
 (containing all the process's variable storage).  The data segment is truly
 unique to each process, but if two processes are running the same code Unix
 automatically arranges for them to share a single code segment as an
 efficiency measure.</para>
 
-<sect2><title>Virtual memory: the simple version</title>
+<sect2 id="vm-simple"><title>Virtual memory: the simple version</title>
 
 <para>Efficiency is important, because memory is expensive.  Sometimes you
 don't have enough to hold the entirety of all the programs the machine is
 running, especially if you are using a large program like an X server.  To
 get around this, Unix uses a technique called <anchor id="vm">
-<emphasis>virtual memory</emphasis><indexterm><primary>virtual
+<firstterm>virtual memory</firstterm><indexterm><primary>virtual
 memory</primary></indexterm>.  It doesn't try to hold all the code and data
 for a process in memory.  Instead, it keeps around only a relatively small
-<emphasis>working set</emphasis><indexterm><primary>working
+<firstterm>working set</firstterm><indexterm><primary>working
 set</primary></indexterm>; the rest of the process's state is left in a
-special <emphasis>swap space</emphasis><indexterm><primary>swap
+special <firstterm>swap space</firstterm><indexterm><primary>swap
 space</primary></indexterm> area on your hard disk.</para>
 
 <para>Note that in the past, that "Sometimes" last paragraph ago was
@@ -559,7 +588,7 @@ run X and a typical mix of jobs without ever swapping after they're
 initially loded into core.</para>
 
 </sect2>
-<sect2><title>Virtual memory: the detailed version</title>
+<sect2 id="vm-details"><title>Virtual memory: the detailed version</title>
 
 <para>Actually, the last section oversimplified things a bit.  Yes,
 programs see most of your memory as one big flat bank of addresses bigger
@@ -567,25 +596,26 @@ than physical memory, and disk swapping is used to maintain that illusion.
 But your hardware actually has no fewer than five different kinds of memory
 in it, and the differences between them can matter a good deal when
 programs have to be tuned for maximum speed.  To really understand what
-goes on in your machine, you should learn how all of them work,</para>
+goes on in your machine, you should learn how all of them work.</para>
 
 <para>The five kinds of memory are these: processor registers, internal (or
 on-chip) cache, external (or off-chip) cache, main memory, and disk.  And
-the reason there are so many kinds is simple; speed costs money, I listed
-these kinds of memory in decreasing order of access time and cost; register
-memory is the fastest and most expensive and can be random-accessed about a
-billion times a second, while disk is the slowest and cheapest and can do
-about 100 random accesses a second.</para>
+the reason there are so many kinds is simple: speed costs money. I have
+listed these kinds of memory in decreasing order of access time and
+increasing order of cost.  Register memory is the fastest and most
+expensive and can be random-accessed about a billion times a second, while
+disk is the slowest and cheapest and can do about 100 random accesses a
+second.</para>
 
-<para>Here's a full list reflecting early-2000 speeds and prices for a typical
-desktop machine. While speed and capacity will go up and prices will drop,
-you can expect these ratios to remain fairly constant -- and it's those
-ratios that shape the memory hierarchy.</para>
+<para>Here's a full list reflecting early-2000 speeds for a typical desktop
+machine. While speed and capacity will go up and prices will drop, you can
+expect these ratios to remain fairly constant -- and it's those ratios that
+shape the memory hierarchy.</para>
 
 <variablelist>
 <varlistentry>
 <term>Disk</term>
-<listitem><para>Size: 13000MB	Accesses: 100/sec</para></listitem>
+<listitem><para>Size: 13000MB	Accesses: 100KB/sec</para></listitem>
 </varlistentry>
 <varlistentry>
 <term>Main memory</term>
@@ -611,21 +641,21 @@ volatile.  That is, it loses its marbles when the power goes off.  Thus,
 computers have to have hard disks or other kinds of non-volatile storage
 that retains data when the power goes off.  And there's a huge mismatch
 between the speed of processors and the speed of disks. The middle three
-levels of the memory hierarchy (<emphasis>internal
-cache</emphasis><indexterm><primary>internal
-cache</primary></indexterm>, <emphasis>external
-cache</emphasis><indexterm><primary>external
+levels of the memory hierarchy (<firstterm>internal
+cache</firstterm><indexterm><primary>internal
+cache</primary></indexterm>, <firstterm>external
+cache</firstterm><indexterm><primary>external
 cache</primary></indexterm>, and main memory) basically exist to bridge
 that gap.</para>
 
-<para>Linux and other Unixes have a feature called <emphasis>virtual
-memory</emphasis><indexterm><primary>virtual memory</primary></indexterm>.
+<para>Linux and other Unixes have a feature called <firstterm>virtual
+memory</firstterm><indexterm><primary>virtual memory</primary></indexterm>.
 What this means is that the operating system behaves as though it has much
 more main memory than it actually does.  Your actual physical main memory
 behaves like a set of windows or caches on a much larger "virtual" memory
 space, most of which at any given time is actually stored on disk in a
-special zone called the <emphasis>swap
-area</emphasis><indexterm><primary>swap area</primary></indexterm>.  Out of
+special zone called the <firstterm>swap
+area</firstterm><indexterm><primary>swap area</primary></indexterm>.  Out of
 sight of user programs, the OS is moving blocks of data (called "pages")
 between memory and disk to maintain this illusion.  The end result is that
 your virtual memory is much larger but not too much slower than real
@@ -636,9 +666,9 @@ the operating system's swapping algorithms match the way your programs use
 virtual memory.  Fortunately, memory reads and writes that are close
 together in time also tend to cluster in memory space. This tendency is
 called
-<emphasis>locality</emphasis><indexterm><primary>locality</primary></indexterm>,
-or more formally <emphasis>locality of
-reference</emphasis><indexterm><primary>locality of
+<firstterm>locality</firstterm><indexterm><primary>locality</primary></indexterm>,
+or more formally <firstterm>locality of
+reference</firstterm><indexterm><primary>locality of
 reference</primary></indexterm> -- and it's a good thing.  If memory
 references jumped around virtual space at random, you'd typically have to
 do a disk read and write for each new reference and virtual memory would be
@@ -649,7 +679,7 @@ reference.</para>
 <para>It's been found by experience that the most effective method for a
 broad class of memory-usage patterns is very simple; it's called LRU or the
 "least recently used" algorithm.  The virtual-memory system grabs disk
-blocks into its <emphasis>working set</emphasis><indexterm><primary>working
+blocks into its <firstterm>working set</firstterm><indexterm><primary>working
 set</primary></indexterm> as it needs them.  When it runs out of physical
 memory for the working set, it dumps the least-recently-used block.  All
 Unixes, and most other virtual-memory operating systems, use minor
@@ -659,7 +689,7 @@ variations on LRU.</para>
 processor speeds.  It's explicitly managed by the OS.  But there is still a
 major gap between the speed of physical main memory and the speed at which
 a processor can access its register memory.  The external and internal
-caches address this, using a technique similar to virtual memory as we've
+caches address this, using a technique similar to virtual memory as I've
 described it.</para>
 
 <para>Just as the physical main memory behaves like a set of windows or
@@ -667,7 +697,7 @@ caches on the disk's swap area, the external cache acts as windows on main
 memory.  External cache is faster (250M accesses per sec, rather than 100M)
 and smaller.  The hardware (specifically, your computer's memory
 controller) does the LRU thing in the external cache on blocks of data
-fetched from the main memory.  For historical regions, the unit of cache
+fetched from the main memory.  For historical reasons, the unit of cache
 swapping is called a "line" rather than a page.</para>
 
 <para>But we're not done.  The internal cache gives us the final step-up in
@@ -688,10 +718,10 @@ tutorial; by the time you need them, you'll be intimate enough with some
 compiler to figure out many of them yourself.</para>
 
 </sect2>
-<sect2><title>The Memory Management Unit</title>
+<sect2 id="mmu"><title>The Memory Management Unit</title>
 
 <para>Even when you have enough physical core to avoid swapping, the part
-of the operating system called the <emphasis>memory manager</emphasis>
+of the operating system called the <firstterm>memory manager</firstterm>
 still has important work to do.  It has to make sure that programs can only
 alter their own data segments -- that is, prevent erroneous or malicious
 code in one program from garbaging the data in another.  To do this, it
@@ -701,8 +731,8 @@ when it exits).</para>
 
 <para>This table is used to pass commands to a specialized part of the
 underlying hardware called an
-<emphasis>MMU</emphasis><indexterm><primary>MMU</primary></indexterm> or
-<emphasis>memory management unit</emphasis><indexterm><primary>memory
+<firstterm>MMU</firstterm><indexterm><primary>MMU</primary></indexterm> or
+<firstterm>memory management unit</firstterm><indexterm><primary>memory
 management unit</primary></indexterm>.  Modern processor chips have MMUs
 built right onto them.  The MMU has the special ability to put fences
 around areas of memory, so an out-of-bound reference will be refused and
@@ -712,7 +742,7 @@ cause a special interrupt to be raised.</para>
 dumped" or something similar, this is exactly what has happened; an attempt
 by the running program to access memory (core) outside its segment has
 raised a fatal interrupt.  This indicates a bug in the program code; the
-<emphasis>core dump</emphasis><indexterm><primary>core
+<firstterm>core dump</firstterm><indexterm><primary>core
 dump</primary></indexterm> it leaves behind is diagnostic information
 intended to help a programmer track it down.</para>
 
@@ -724,22 +754,22 @@ file permissions</link> which we'll discuss later.</para>
 
 </sect2>
 </sect1>
-<sect1><title>How does my computer store things in memory?</title>
+<sect1 id="core-formats"><title>How does my computer store things in memory?</title>
 
 <para>You probably know that everything on a computer is stored as strings of
 bits (binary digits; you can think of them as lots of little on-off
 switches).  Here we'll explain how those bits are used to represent the
 letters and numbers that your computer is crunching.</para>
 
-<para>Before we can go into this, you need to understand about the the
-<emphasis>word size</emphasis><indexterm><primary>word
+<para>Before we can go into this, you need to understand about the
+<firstterm>word size</firstterm><indexterm><primary>word
 size</primary></indexterm> of your computer.  The word size is the
 computer's preferred size for moving units of information around;
 technically it's the width of your processor's
-<emphasis>registers</emphasis><indexterm><primary>registers</primary></indexterm>,
+<firstterm>registers</firstterm><indexterm><primary>registers</primary></indexterm>,
 which are the holding areas your processor uses to do arithmetic and
 logical calculations.  When people write about computers having bit sizes
-(calling them, say, ``32-bit'' or ``64-bit'') computers, this is what they
+(calling them, say, ``32-bit'' or ``64-bit'' computers), this is what they
 mean.</para>
 
 <para>Most computers (including 386, 486, and Pentium PCs) have a word
@@ -751,52 +781,55 @@ replace the Pentium series with a 64-bit chip called the `Itanium'.</para>
 
 <para>The computer views your memory as a sequence of words numbered from
 zero up to some large value dependent on your memory size. That value is
-limited by your word size, which is why older machines like 286s had to go
-through painful contortions to address large amounts of memory.  I won't
-describe them here; they still give older programmers nightmares.</para>
+limited by your word size, which is why programs on older machines like
+286s had to go through painful contortions to address large amounts of
+memory.  I won't describe them here; they still give older programmers
+nightmares.</para>
 
-<sect2><title>Numbers</title>
+<sect2 id="numbers"><title>Numbers</title>
 
-<para>Numbers are represented as either words or pairs of words, depending
-on your processor's word size.  One 32-bit machine word is the most common
-size.</para>
+<para>Integer numbers are represented as either words or pairs of words,
+depending on your processor's word size.  One 32-bit machine word is the
+most common integer representation.</para>
 
 <para>Integer arithmetic is close to but not actually mathematical
 base-two.  The low-order bit is 1, next 2, then 4 and so forth as in pure
 binary.  But signed numbers are represented in
-<emphasis>twos-complement</emphasis><indexterm><primary>twos-complement</primary></indexterm>
-notation.  The highest-order bit is a <emphasis>sign
-bit</emphasis><indexterm><primary>sign bit</primary></indexterm> which
+<firstterm>twos-complement</firstterm><indexterm><primary>twos-complement</primary></indexterm>
+notation.  The highest-order bit is a <firstterm>sign
+bit</firstterm><indexterm><primary>sign bit</primary></indexterm> which
 makes the quantity negative, and every negative number can be obtained from
-the corresponding positive value by inverting all the bits.  This is why
-integers on a 32-bit machine have the range -2^31 + 1 to 2^31 - 1 (where ^
-is the `power' operation, 2^3 = 8).  That 32nd bit is being used for
-sign.</para>
+the corresponding positive value by inverting all the bits and adding one.
+This is why integers on a 32-bit machine have the range -2^31 to 2^31 - 1
+1 (where ^ is the `power' operation, 2^3 = 8).  That 32nd bit is being used
+for sign.</para>
 
-<para>Some computer languages give you access to <emphasis>unsigned
-arithmetic</emphasis><indexterm><primary>unsigned
+<para>Some computer languages give you access to <firstterm>unsigned
+arithmetic</firstterm><indexterm><primary>unsigned
 arithmetic</primary></indexterm> which is straight base 2 with zero and
 positive numbers only.</para>
 
-<para>Most processors and some languages can do in
-<emphasis>floating-point</emphasis><indexterm><primary>floating-point</primary></indexterm>
+<para>Most processors and some languages can do operations in
+<firstterm>floating-point</firstterm><indexterm><primary>floating-point</primary></indexterm>
 numbers (this capability is built into all recent processor chips).
 Floating-point numbers give you a much wider range of values than integers
-and let you express fractions.  The ways this is done vary and are rather
-too complicated to discuss in detail here, but the general idea is much
-like so-called `scientific notation', where one might write (say) 1.234 *
-10^23; the encoding of the number is split into a
-<emphasis>mantissa</emphasis><indexterm><primary>mantissa</primary></indexterm>
-(1.234) and the exponent part (23) for the power-of-ten multiplier.</para>
+and let you express fractions.  The ways in which this is done vary and are
+rather too complicated to discuss in detail here, but the general idea is
+much like so-called `scientific notation', where one might write (say)
+1.234 * 10^23; the encoding of the number is split into a
+<firstterm>mantissa</firstterm><indexterm><primary>mantissa</primary></indexterm>
+(1.234) and the exponent part (23) for the power-of-ten multiplier (which
+means the number multiplied out would have 20 zeros on it, 23 minus the
+three decimal places).</para>
 
 </sect2>
-<sect2><title>Characters</title>
+<sect2 id="characters"><title>Characters</title>
 
 <para>Characters are normally represented as strings of seven bits each in
 an encoding called ASCII (American Standard Code for Information
 Interchange).  On modern machines, each of the 128 ASCII characters is the
 low seven bits of an 8-bit
-<emphasis>octet</emphasis><indexterm><primary>octet</primary></indexterm>;
+<firstterm>octet</firstterm><indexterm><primary>octet</primary></indexterm>;
 octets are packed into memory words so that (for example) a six-character
 string only takes up two memory words.  For an ASCII code chart, type `man
 7 ascii' at your Unix prompt.</para>
@@ -804,7 +837,7 @@ string only takes up two memory words.  For an ASCII code chart, type `man
 <para>The preceding paragraph was misleading in two ways.  The minor one is
 that the term `octet' is formally correct but seldom actually used; most
 people refer to an octet as
-<emphasis>byte</emphasis><indexterm><primary>byte</primary></indexterm> and
+<firstterm>byte</firstterm><indexterm><primary>byte</primary></indexterm> and
 expect bytes to be eight bits long.  Strictly speaking, the term `byte' is
 more general; there used to be, for example, 36-bit machines with 9-bit
 bytes (though there probably never will be again).</para>
@@ -826,7 +859,7 @@ scathing account of the trouble this causes, see the <ulink
 url="http://www.fourmilab.ch/webtools/demoroniser/">demoroniser</ulink>
 page).</para>
 
-<para>Latin-1 handles the major European languages, including English,
+<para>Latin-1 handles western European languages, including English,
 French, German, Spanish, Italian, Dutch, Norwegian, Swedish, Danish.
 However, this isn't good enough either, and as a result there is a whole
 series of Latin-2 through -9 character sets to handle things like Greek,
@@ -845,7 +878,7 @@ the <ulink url="http://www.unicode.org/">Unicode Home Page</ulink>.</para>
 
 </sect2>
 </sect1>
-<sect1><title>How does my computer store things on disk?</title>
+<sect1 id="disk-layout"><title>How does my computer store things on disk?</title>
 
 <para>When you look at a hard disk under Unix, you see a tree of named
 directories and files.  Normally you won't need to look any deeper than
@@ -854,21 +887,21 @@ have a disk crash and need to try to salvage files.  Unfortunately, there's
 no good way to describe disk organization from the file level downwards, so
 I'll have to describe it from the hardware up.</para>
 
-<sect2><title>Low-level disk and file system structure</title>
+<sect2 id="disk-lowlevel"><title>Low-level disk and file system structure</title>
 
 <para>The surface area of your disk, where it stores data, is divided up
 something like a dartboard -- into circular tracks which are then
 pie-sliced into sectors.  Because tracks near the outer edge have more area
 than those close to the spindle at the center of the disk, the outer tracks
 have more sector slices in them than the inner ones.  Each sector (or
-<emphasis>disk block</emphasis><indexterm><primary>disk
+<firstterm>disk block</firstterm><indexterm><primary>disk
 block</primary></indexterm>) has the same size, which under modern Unixes
 is generally 1 binary K (1024 8-bit words).  Each disk block has a unique
-address or <emphasis>disk block number</emphasis><indexterm><primary>disk
+address or <firstterm>disk block number</firstterm><indexterm><primary>disk
 block number</primary></indexterm>.</para>
 
-<para>Unix divides the disk into <emphasis>disk
-partitions</emphasis><indexterm><primary>disk
+<para>Unix divides the disk into <firstterm>disk
+partitions</firstterm><indexterm><primary>disk
 partitions</primary></indexterm>.  Each partition is a continuous span of
 blocks that's used separately from any other partition, either as a file
 system or as swap space.  The original reasons for partitions had to do
@@ -879,48 +912,50 @@ Nowadays, it's more important that partitions can be declared read-only
 (preventing an intruder from modifying critical system files) or shared
 over a network through various means we won't discuss here.  The
 lowest-numbered partition on a disk is often treated specially, as a
-<emphasis>boot partition</emphasis><indexterm><primary>boot
+<firstterm>boot partition</firstterm><indexterm><primary>boot
 partition</primary></indexterm> where you can put a kernel to be
 booted.</para>
 
-<para>Each partition is either <emphasis>swap
-space</emphasis><indexterm><primary>swap space</primary></indexterm> (used
-to implement <link linkend="vm">virtual memory</link> or a <anchor id="filesystems"><emphasis>file system</emphasis><indexterm><primary>file
+<para>Each partition is either <firstterm>swap
+space</firstterm><indexterm><primary>swap space</primary></indexterm> (used
+to implement <link linkend="vm">virtual memory</link>) or a <anchor
+id="filesystems"><firstterm>file system</firstterm><indexterm><primary>file
 system</primary></indexterm> used to hold files.  Swap-space partitions are
 just treated as a linear sequence of blocks.  File systems, on the other
 hand, need a way to map file names to sequences of disk blocks.  Because
 files grow, shrink, and change over time, a file's data blocks will not be
 a linear sequence but may be scattered all over its partition (from
 wherever the operating system can find a free block when it needs
-one).</para>
+one).  This scattering effect is called
+<firstterm>fragmentation</firstterm>.</para>
 
 </sect2>
-<sect2><title>File names and directories</title>
+<sect2 id="filestructure"><title>File names and directories</title>
 
 <para>Within each file system, the mapping from names to blocks is handled
 through a structure called an
-<emphasis>i-node</emphasis><indexterm><primary>i-node</primary></indexterm>.
+<firstterm>i-node</firstterm><indexterm><primary>i-node</primary></indexterm>.
 There's a pool of these things near the ``bottom'' (lowest-numbered blocks)
 of each file system (the very lowest ones are used for housekeeping and
 labeling purposes we won't describe here).  Each i-node describes one file.
-File data blocks live above the inodes (in higher-numbered blocks). </para>
+File data blocks (including directories) live above the i-nodes (in
+higher-numbered blocks). </para>
 
 <para>Every i-node contains a list of the disk block numbers in the file it
 describes.  (Actually this is a half-truth, only correct for small files,
 but the rest of the details aren't important here.)  Note that the i-node
-does <emphasis>not</emphasis> contain the name of the file.</para>
+does <firstterm>not</firstterm> contain the name of the file.</para>
 
-<para>Names of files live in <emphasis>directory
-structures</emphasis><indexterm><primary>directory
+<para>Names of files live in <firstterm>directory
+structures</firstterm><indexterm><primary>directory
 structures</primary></indexterm>.  A directory structure just maps names to
 i-node numbers.  This is why, in Unix, a file can have multiple true names
-(or <emphasis>hard links</emphasis><indexterm><primary>hard
+(or <firstterm>hard links</firstterm><indexterm><primary>hard
 links</primary></indexterm>); they're just multiple directory entries that
-happen to point to the same inode.</para>
+happen to point to the same i-node.</para>
 
 </sect2>
-</sect1>
-<sect1><title>Mount points</title>
+<sect2 id="mount-points"><title>Mount points</title>
 
 <para>In the simplest case, your entire Unix file system lives in just one
 disk partition.  While you'll see this arrangement on some small personal
@@ -931,7 +966,7 @@ slightly larger one where OS utilities live, and a much bigger one where
 user home directories live.</para>
 
 <para>The only partition you'll have access to immediately after system
-boot is your <emphasis>root partition</emphasis><indexterm><primary>root partition</primary></indexterm>,
+boot is your <firstterm>root partition</firstterm><indexterm><primary>root partition</primary></indexterm>,
 which is (almost always) the one you booted from.  It holds the root
 directory of the file system, the top node from which everything else
 hangs.</para>
@@ -940,15 +975,15 @@ hangs.</para>
 in order for your entire, multiple-partition file system to be accessible.
 About midway through the boot process, your Unix will make these non-root
 partitions accessible.  It will
-<emphasis>mount</emphasis><indexterm><primary>mount</primary></indexterm>
+<firstterm>mount</firstterm><indexterm><primary>mount</primary></indexterm>
 each one onto a directory on the root partition.</para>
 
 <para>For example, if you have a Unix directory called `/usr', it is probably 
 a mount point to a partition that contains many programs installed with
 your Unix but not required during initial boot.</para>
 
-</sect1>
-<sect1><title>How a file gets looked up</title>
+</sect2>
+<sect2 id="iname"><title>How a file gets looked up</title>
 
 <para>Now we can look at the file system from the top down.  When you open
 a file (such as, say,
@@ -959,30 +994,31 @@ happens:</para>
 partition).  It looks for a directory there called `home'.  Usually `home'
 is a mount point to a large user partition elsewhere, so it will go there.
 In the top-level directory structure of that user partition, it will look
-for a entry called `esr' and extract an inode number.  It will go to that
-i-node, notice it is a directory structure, and look up `WWW'.  Extracting
-<emphasis>that</emphasis> i-node, it will go to the corresponding
-subdirectory and look up `ldp'.  That will take it to yet another directory
-inode.  Opening that one, it will find an i-node number for
-`fundamentals.sgml'.  That inode is not a directory, but instead holds the
-list of disk blocks associated with the file.</para>
+for a entry called `esr' and extract an i-node number.  It will go to that
+i-node, notice that its associated file data blocks are a directory
+structure, and look up `WWW'.  Extracting <emphasis>that</emphasis> i-node,
+it will go to the corresponding subdirectory and look up `ldp'.  That will
+take it to yet another directory i-node.  Opening that one, it will find an
+i-node number for `fundamentals.sgml'.  That i-node is not a directory, but
+instead holds the list of disk blocks associated with the file.</para>
 
-<sect2><title>File ownership, permissions and security</title>
+</sect2>
+<sect2 id="permissions"><title>File ownership, permissions and security</title>
 
-<para><anchor id="permissions">To keep programs from accidentally or
+<para>To keep programs from accidentally or
 maliciously stepping on data they shouldn't, Unix has
-<emphasis>permission</emphasis><indexterm><primary>permission</primary></indexterm>
+<firstterm>permission</firstterm><indexterm><primary>permission</primary></indexterm>
 features.  These were originally designed to support timesharing by
 protecting multiple users on the same machine from each other, back in the
 days when Unix ran mainly on expensive shared minicomputers.</para>
 
-<para>In order to understand file permissions, you need to recall our
-description of of users and groups in the section <link linkend="login">
+<para>In order to understand file permissions, you need to recall the
+description of users and groups in the section <link linkend="login">
 What happens when you log in?</link>.  Each file has an owning user and an
 owning group.  These are initially those of the file's creator; they can be
 changed with the programs
-<indexterm><primary>chown(1)</primary></indexterm> and
-<indexterm><primary>chgrp(1)</primary></indexterm>.</para>
+chown(1)<indexterm><primary>chown(1)</primary></indexterm> and
+chgrp(1)<indexterm><primary>chgrp(1)</primary></indexterm>.</para>
 
 <para>The basic permissions that can be associated with a file are `read'
 (permission to read data from it), `write' (permission to modify it) and
@@ -991,7 +1027,7 @@ permissions; one for its owning user, one for any user in its owning group,
 and one for everyone else.  The `privileges' you get when you log in are
 just the ability to do read, write, and execute on those files for which
 the permission bits match your user ID or one of the groups you are
-in.</para>
+in, or files that have been made accessible to the world.</para>
 
 <para>To see how these may interact and how Unix displays them, let's look
 at some file listings on a hypothetical Unix system.  Here's one:</para>
@@ -1049,13 +1085,13 @@ intruders who ever come after you will be trying to get.</para>
 that can easily be guessed, like the first name of your
 girlfriend/boyfriend/spouse.  This is an astonishingly common bad practice
 that helps crackers no end.  In general, don't pick any word in the
-dictionary; there are programs called `dictionary crackers' that look for
-likely passwords by running through word lists of common choices.  A good
-technique is to pick a combination consisting of a word, a digit, and
-another word, such as `shark6cider' or `jump3joy'; that will make the
-search space too large for a dictionary cracker.  Don't use these examples,
-though -- crackers might expect that after reading this document and put
-them in their dictionaries.</para>
+dictionary; there are programs called <firstterm>dictionary
+crackers</firstterm> that look for likely passwords by running through word
+lists of common choices.  A good technique is to pick a combination
+consisting of a word, a digit, and another word, such as `shark6cider' or
+`jump3joy'; that will make the search space too large for a dictionary
+cracker.  Don't use these examples, though -- crackers might expect that
+after reading this document and put them in their dictionaries.</para>
 
 <para>Now let's look at a third case:</para>
 
@@ -1072,12 +1108,12 @@ anybody else.</para>
 <para>Read permission gives you the ability to list the directory -- that
 is, to see the names of files and directories it contains. Write permission
 gives you the ability to create and delete files in the directory.  If you
-remember that the directory imcludes a list of the names of the files and
+remember that the directory includes a list of the names of the files and
 subdirectories it contains, these rules will make sense.</para>
 
 <para>Execute permission on a directory means you can get through the
 directory to open the files and directories below it.  In effect, it gives
-you permission to access the inodes in thbe directory.  A directory with
+you permission to access the i-nodes in the directory.  A directory with
 execute completely turned off would be useless.</para>
 
 <para>Occasionally you'll see a directory that is world-executable but not
@@ -1088,7 +1124,7 @@ listed).</para>
 <para>It's important to remember that read, write, or execute permission on a
 directory is independent of the permissions on the files and directories
 beneath.  In particular, write access on a directory means you can
-create new files or delete existing files there, but ity does not
+create new files or delete existing files there, but does not
 automatically give you write access to existing files.</para>
 
 <para>Finally, let's look at the permissions of the login program itself.</para>
@@ -1101,7 +1137,7 @@ snark:~$ ls -l /bin/login
 <para>This has the permissions we'd expect for a system command -- except
 for that 's' where the owner-execute bit ought to be.  This is the visible
 manifestation of a special permission called the `set-user-id' or
-<emphasis>setuid bit</emphasis><indexterm><primary>setuid
+<firstterm>setuid bit</firstterm><indexterm><primary>setuid
 bit</primary></indexterm>.</para>
 
 <para>The setuid bit is normally attached to programs that need to give
@@ -1116,7 +1152,7 @@ can use it to spawn a shell with root privileges.  For this reason, opening
 a file to write it automatically turns off its setuid bit on most Unixes.
 Many attacks on Unix security try to exploit bugs in setuid programs in
 order to subvert them.  Security-conscious system administrators are
-therefore extra-careful about these programs and relucutant to install new
+therefore extra-careful about these programs and reluctant to install new
 ones.</para>
 
 <para>There are a couple of important details we glossed over when
@@ -1124,7 +1160,7 @@ discussing permissions above; namely, how the owning group and permissions
 are assigned when a file or directory is first created.  The group is an
 issue because users can be members of multiple groups, but one of them
 (specified in the user's <filename>/etc/passwd</filename> entry) is the
-user's <emphasis>default group</emphasis><indexterm><primary>default
+user's <firstterm>default group</firstterm><indexterm><primary>default
 group</primary></indexterm> and will normally own files created by the
 user.</para>
 
@@ -1132,7 +1168,7 @@ user.</para>
 A program that creates a file will normally specify the permissions it is
 to start with.  But these will be modified by a variable in the user's
 environment called the
-<emphasis>umask</emphasis><indexterm><primary>umask</primary></indexterm>.
+<firstterm>umask</firstterm><indexterm><primary>umask</primary></indexterm>.
 The umask specifies which permission bits to <emphasis>turn off</emphasis>
 when creating a file; the most common value, and the default on most
 systems, is -------w- or 002, which turns off the world-write bit.  See the
@@ -1146,16 +1182,11 @@ in which it's created (this is the BSD convention).  On some modern Unixes,
 including Linux, the latter behavior can be selected by setting the
 set-group-ID on the directory (chmod g+s).</para>
 
-<para>There is a useful discussion of file permissions in Eric
-Goebelbecker's article <ulink
-url="http://www2.linuxjournal.com/cgi-bin/frames.pl/lj-issues/issue21/tc21.html">Take
-Command</ulink>.</para>
-
 </sect2>
 <sect2><title id="fsck">How things can go wrong</title>
 
-<para>Earlier we hinted that file systems can be fragile things.
-Now we know that to get to file you have to hopscotch through what may be
+<para>Earlier it was hinted that file systems can be fragile things.
+Now we know that to get to a file you have to hopscotch through what may be
 an arbitrarily long chain of directory and i-node references.  Now suppose
 your hard disk develops a bad spot?</para>
 
@@ -1163,7 +1194,7 @@ your hard disk develops a bad spot?</para>
 unlucky, it could corrupt a directory structure or i-node number and leave
 an entire subtree of your system hanging in limbo -- or, worse, result in a
 corrupted structure that points multiple ways at the same disk block or
-inode.  Such corruption can be spread by normal file operations, trashing
+i-node.  Such corruption can be spread by normal file operations, trashing
 data that was not in the original bad spot.</para>
 
 <para>Fortunately, this kind of contingency has become quite uncommon as disk
@@ -1176,47 +1207,54 @@ more thorough check that takes a few minutes longer.</para>
 <para>If all of this sounds like Unix is terribly complex and
 failure-prone, it may be reassuring to know that these boot-time checks
 typically catch and correct normal problems <emphasis>before</emphasis>
-they become really disasterous.  Other operating systems don't have these
+they become really disastrous.  Other operating systems don't have these
 facilities, which speeds up booting a bit but can leave you much more
 seriously screwed when attempting to recover by hand (and that's assuming
 you have a copy of Norton Utilities or whatever in the first
 place...).</para>
 
+<para>One of the trends in current Unix designs is <firstterm>journalling
+file systems</firstterm><indexterm><primary>journalling file
+systems</primary></indexterm>.  These arrange traffic to the disk so that
+it's guaranteed to be in a consistent state that can be recovered when the
+system comes back up.  This will speed up the boot-time integrity check a
+lot.</para>
+
 </sect2>
 </sect1>
-<sect1><title>How do computer languages work?</title>
+<sect1 id="languages"><title>How do computer languages work?</title>
 
-<para>We've already discussed <link linkend="run">how programs are run</link>.
-Every program ultimately has to execute as a stream of bytes that are
-instructions in your computer's <emphasis>machine
-language</emphasis><indexterm><primary>machine
+<para>We've already discussed <link linkend="running-programs">how programs
+are run</link>.  Every program ultimately has to execute as a stream of
+bytes that are instructions in your computer's <firstterm>machine
+language</firstterm><indexterm><primary>machine
 language</primary></indexterm>.  But human beings don't deal with machine
 language very well; doing so has become a rare, black art even among
 hackers.</para>
 
 <para>Almost all Unix code except a small amount of direct
 hardware-interface support in the kernel itself is nowadays written in a
-<emphasis>high-level language</emphasis><indexterm><primary>high-level language</primary></indexterm>.  (The
+<firstterm>high-level language</firstterm><indexterm><primary>high-level language</primary></indexterm>.  (The
 `high-level' in this term is a historical relic meant to distinguish these
-from `low-level' <emphasis>assembler
-languages</emphasis><indexterm><primary>assembler
+from `low-level' <firstterm>assembler
+languages</firstterm><indexterm><primary>assembler
 languages</primary></indexterm>, which are basically thin wrappers around
 machine code.)</para>
 
 <para>There are several different kinds of high-level languages.  In order
 to talk about these, you'll find it useful to bear in mind that the
-<emphasis>source code</emphasis><indexterm><primary>source code</primary></indexterm> of a program (the
+<firstterm>source code</firstterm><indexterm><primary>source code</primary></indexterm> of a program (the
 human-created, editable version) has to go through some kind of translation
 into machine code that the machine can actually run.</para>
 
-<sect2><title>Compiled languages</title>
+<sect2 id="compilers"><title>Compiled languages</title>
 
-<para>The most conventional kind of language is a <emphasis>compiled
-language</emphasis><indexterm><primary>compiled
+<para>The most conventional kind of language is a <firstterm>compiled
+language</firstterm><indexterm><primary>compiled
 language</primary></indexterm>.  Compiled languages get translated into
 runnable files of binary machine code by a special program called
 (logically enough) a
-<emphasis>compiler</emphasis><indexterm><primary>compiler</primary></indexterm>.
+<firstterm>compiler</firstterm><indexterm><primary>compiler</primary></indexterm>.
 Once the binary has been generated, you can run it directly without looking
 at the source code again.  (Most software is delivered as compiled binaries
 made from code you don't see.)</para>
@@ -1228,7 +1266,7 @@ complete access to the OS, but also to be difficult to program in.</para>
 important of these (with its variant C++).  FORTRAN is another compiled
 language still used among engineers and scientists but years older and much
 more primitive.  In the Unix world no other compiled languages are in
-mainstream use.  Outide it, COBOL is very widely used for financial and
+mainstream use.  Outside it, COBOL is very widely used for financial and
 business software.</para>
 
 <para>There used to be many other compiler languages, but most of them have
@@ -1237,10 +1275,10 @@ Unix developer using a compiled language, it is overwhelmingly likely
 to be C or C++.</para>
 
 </sect2>
-<sect2><title>Interpreted languages</title>
+<sect2 id="interpreters"><title>Interpreted languages</title>
 
-<para>An <emphasis>interpreted
-language</emphasis><indexterm><primary>interpreted
+<para>An <firstterm>interpreted
+language</firstterm><indexterm><primary>interpreted
 language</primary></indexterm> depends on an interpreter program that reads
 the source code and translates it on the fly into computations and system
 calls.  The source has to be re-interpreted (and the interpreter present)
@@ -1255,7 +1293,8 @@ coding errors than compiled languages.</para>
 are effectively small interpreted languages.  BASICs are usually
 interpreted.  So is Tcl.  Historically, the most important interpretive
 language has been LISP (a major improvement over most of its successors).
-Today Perl is very widely used and steadily growing more popular.</para>
+Today, Unix shells and the Lisp that lives inside the Emacs editor are
+probably the most important pure interpreted languages.</para>
 
 </sect2>
 <sect2><title>P-code languages</title>
@@ -1265,9 +1304,9 @@ interpretation has become increasingly important.  P-code languages are
 like compiled languages in that the source is translated to a compact
 binary form which is what you actually execute, but that form is not
 machine code.  Instead it's
-<emphasis>pseudocode</emphasis><indexterm><primary>pseudocode</primary></indexterm>
+<firstterm>pseudocode</firstterm><indexterm><primary>pseudocode</primary></indexterm>
 (or
-<emphasis>p-code</emphasis><indexterm><primary>p-code</primary></indexterm>),
+<firstterm>p-code</firstterm><indexterm><primary>p-code</primary></indexterm>),
 which is usually a lot simpler but more powerful than a real machine
 language.  When you run the program, you interpret the p-code.</para>
 
@@ -1279,7 +1318,7 @@ the flexibility and power of a good interpreter.</para>
 
 </sect2>
 </sect1>
-<sect1><title>How does the Internet work?</title>
+<sect1 id="internet"><title>How does the Internet work?</title>
 
 <para>To help you understand how the Internet works, we'll look at the things
 that happen when you do a typical Internet operation -- pointing a browser
@@ -1287,27 +1326,28 @@ at the front page of this document at its home on the Web at the Linux
 Documentation Project.  This document is</para>
 
 <screen>
-http://metalab.unc.edu/LDP/HOWTO/Fundamentals.html
+ttp://www.linuxdoc.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html
 </screen>
 
-<para>which means it lives in the file LDP/HOWTO/Fundamentals.html under
-the World Wide Web export directory of the host metalab.unc.edu.</para>
+<para>which means it lives in the file
+LDP/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html under the World Wide Web
+export directory of the host www.linuxdoc.org.</para>
 
-<sect2><title>Names and locations</title>
+<sect2 id="dns"><title>Names and locations</title>
 
 <para>The first thing your browser has to do is to establish a network
 connection to the machine where the document lives.  To do that, it first
 has to find the network location of the
-<emphasis>host</emphasis><indexterm><primary>host</primary></indexterm>
-metalab.unc.edu (`host' is short for `host machine' or `network host';
-metalab.unc.edu is a typical
-<emphasis>hostname</emphasis><indexterm><primary>hostname</primary></indexterm>).
-The corresponding location is actually a number called an <emphasis>IP
-address</emphasis><indexterm><primary>IP address</primary></indexterm>
+<firstterm>host</firstterm><indexterm><primary>host</primary></indexterm>
+www.linuxdoc.org (`host' is short for `host machine' or `network host';
+www.linuxdoc.org is a typical
+<firstterm>hostname</firstterm><indexterm><primary>hostname</primary></indexterm>).
+The corresponding location is actually a number called an <firstterm>IP
+address</firstterm><indexterm><primary>IP address</primary></indexterm>
 (we'll explain the `IP' part of this term later).</para>
 
 <para>To do this, your browser queries a program called a
-<emphasis>name server</emphasis><indexterm><primary>name server</primary></indexterm>.  The name server
+<firstterm>name server</firstterm><indexterm><primary>name server</primary></indexterm>.  The name server
 may live on your machine, but it's more likely to run on a service machine
 that yours talks to.  When you sign up with an ISP, part of your setup
 procedure will almost certainly involve telling your Internet software the
@@ -1316,56 +1356,123 @@ IP address of a nameserver on the ISP's network.</para>
 <para>The name servers on different machines talk to each other, exchanging
 and keeping up to date all the information needed to resolve hostnames (map
 them to IP addresses).  Your nameserver may query three or four different
-sites across the network in the process of resolving metalab.unc.edu, but
-this usually happens very quickly (as in less than a second).</para>
+sites across the network in the process of resolving www.linuxdoc.org, but
+this usually happens very quickly (as in less than a second). We'll look
+at how nameservers detail in the next section.</para>
 
-<para>The nameserver will tell your browser that Metalab's IP address is
-152.2.22.81; knowing this, your machine will be able to exchange bits 
-with metalab directly.</para>
+<para>The nameserver will tell your browser that www.linuxdoc.org's IP
+address is 152.19.254.81; knowing this, your machine will be able to
+exchange bits with www.linuxdoc.org directly.</para>
 
 </sect2>
-<sect2><title>Packets and routers</title>
+<sect2 id="domains"><title>The Domain Name System</title>
+
+<para>The whole network of programs and databases that cooperates to
+translate hostnames to IP addresses is called `DNS' (Domain Name System).
+When you see references to a `DNS server', that means what we just called
+a nameserver. Now I'll explain how the overall system works.</para>
+
+<para>Internet hostnames are composed of parts separated by dots.  A
+<firstterm>domain</firstterm><indexterm><primary>domain</primary>
+</indexterm> is a collection of machines that share a common name suffix.
+Domains can live inside other domains.  For example, the machine
+www.linuxdoc.org lives in the .linuxdoc.org subdomain of the .org
+domain.</para>
+
+<para>Each domain is defined by an <firstterm>authoritative name
+server</firstterm><indexterm><primary>authoritative name server</primary>
+</indexterm> that knows the IP addresses of the other machines in the
+domain.  The authoritative (or `primary') name server may have backups in
+case it goes down; if you see references to a <firstterm>secondary name
+server</firstterm><indexterm><primary>secondary name server</primary>
+</indexterm> or (`secondary DNS') it's talking about one of those.  These
+secondaries typically refresh their information from their primaries every
+few hours, so a change made to the hostname-to-IP mapping on the primary
+will automatically be propagated.</para>
+
+<para>Now here's the important part.  The nameservers for a domain do
+<emphasis>not</emphasis> have to know the locations of all the machines in
+other domains (including their own subdomains); they only have to know the
+location of the nameservers.  In our example, the authoritative name server
+for the .org domain knows the IP address of the nameserver for .linuxdoc.org,
+but <emphasis>not</emphasis> the address of all the other machines in
+linuxdoc.org. </para>
+
+<para>The domains in the DNS system are arranged like a big inverted tree.
+At the top are the root servers. Everybody knows the IP addresses of the 
+root servers; they're wired into your DNS software.
+The root servers know the IP addresses of the nameservers for the 
+top-level domains like .com and .org, but not the addresses of machines 
+inside those domains.  Each top-level domain server knows where the
+nameservers for the domains directly beneath it are, and so forth.</para>
+
+<para>DNS is carefully designed so that each machine can get away with the
+minimum amount of knowledge it needs to have about the shape of the tree,
+and local changes to subtrees can be made simply by changing one
+authoritative server's database of name-to-IP-address mappings.</para>
+
+<para>When you query for the IP address of www.linuxdoc.org, what actually
+happens is this:  First, your nameserver asks a root server to tell it
+where it can find a nameserver for .org.  Once it knows that, it then asks 
+the .org server to tell it the IP address of a .linuxdoc.org nameserver.
+Once it has that, it asks the .linuxdoc.org nameserver to tell it the 
+address of the host www.linuxdoc.org.</para>
+
+<para>Most of the time, your nameserver doesn't actually have to work that
+hard.  Nameservers do a lot of cacheing; when yours resolves a hostname, it
+keeps the association with the resulting IP address around in memory for a
+while.  This is why, when you surf to a new website, you'll usually only
+see a message from your browser about "Looking up" the host for the first
+page you fetch.  Eventually the name-to-address mapping expires and your
+DNS has to re-query &mdash; this is important so you don't have invalid
+information hanging around forever when a hostname changes addresses. Your
+cached IP address for a site is also thrown out if the host is
+unreachable. </para>
+
+</sect2>
+<sect2 id="transport"><title>Packets and routers</title>
 
 <para>What the browser wants to do is send a command to the Web server on
-Metalab that looks like this:</para>
+www.linuxdoc.org that looks like this:</para>
 
 <screen>
 GET /LDP/HOWTO/Fundamentals.html HTTP/1.0
 </screen>
 
 <para>Here's how that happens.  The command is made into a
-<emphasis>packet</emphasis><indexterm><primary>packet</primary></indexterm>,
+<firstterm>packet</firstterm><indexterm><primary>packet</primary></indexterm>,
 a block of bits like a telegram that is wrapped with three important
-things; the <emphasis>source address</emphasis><indexterm><primary>source
+things; the <firstterm>source address</firstterm><indexterm><primary>source
 address</primary></indexterm> (the IP address of your machine), the
-<emphasis>destination address</emphasis><indexterm><primary>destination
-address</primary></indexterm> (152.2.22.81), and a <emphasis>service
-number</emphasis><indexterm><primary>service number</primary></indexterm>
-or <emphasis>port number</emphasis><indexterm><primary>port
+<firstterm>destination address</firstterm><indexterm><primary>destination
+address</primary></indexterm> (152.19.254.81), and a <firstterm>service
+number</firstterm><indexterm><primary>service number</primary></indexterm>
+or <firstterm>port number</firstterm><indexterm><primary>port
 number</primary></indexterm> (80, in this case) that indicates that it's a
 World Wide Web request.</para>
 
-<para>Your machine then ships the packet down the wire (modem connection to
+<para>Your machine then ships the packet down the wire (your connection to
 your ISP, or local network) until it gets to a specialized machine called a
-<emphasis>router</emphasis><indexterm><primary>router</primary></indexterm>.
+<firstterm>router</firstterm><indexterm><primary>router</primary></indexterm>.
 The router has a map of the Internet in its memory -- not always a complete
 one, but one that completely describes your network neighborhood and knows
 how to get to the routers for other neighborhoods on the Internet.</para>
 
 <para>Your packet may pass through several routers on the way to its
 destination.  Routers are smart.  They watch how long it takes for other
-routers to acknowledge having received a packet.  They use that information
-to direct traffic over fast links.  They use it to notice when another
-routers (or a cable) have dropped off the network, and compensate if
-possible by finding another route.</para>
+routers to acknowledge having received a packet.  They also use that
+information to direct traffic over fast links.  They use it to notice when
+another routers (or a cable) have dropped off the network, and compensate
+if possible by finding another route.</para>
 
-<para>There's an urban legend that the Internet was designed to survive nuclear
-war.  This is not true, but the Internet's design is extremely good at
-getting reliable performance out of flaky hardware in an uncertain world..
-This is directly due to the fact that its intelligence is distributed
-through thousands of routers rather than a few massive switches (like the
-phone network).  This means that failures tend to be well localized and the
-network can route around them.</para>
+<para>There's an urban legend that the Internet was designed to survive
+nuclear war.  This is not true, but the Internet's design is extremely good
+at getting reliable performance out of flaky hardware in an uncertain
+world.  This is directly due to the fact that its intelligence is
+distributed through thousands of routers rather than concentrated in a few
+massive and vulnerable switches (like the phone network).  This means that
+failures tend to be well localized and the network can route around
+them.</para>
 
 <para>Once your packet gets to its destination machine, that machine uses the
 service number to feed the packet to the web server.  The web server can
@@ -1375,19 +1482,19 @@ into a number of packets.  The size of the packets will vary according to
 the transmission media in the network and the type of service.</para>
 
 </sect2>
-<sect2><title>TCP and IP</title>
+<sect2 id="TCP-IP"><title>TCP and IP</title>
 
 <para>To understand how multiple-packet transmissions are handled, you need to
 know that the Internet actually uses two protocols, stacked one on top
 of the other.</para>
 
 <para>The lower level,
-<emphasis>IP</emphasis><indexterm><primary>IP</primary></indexterm>
+<firstterm>IP</firstterm><indexterm><primary>IP</primary></indexterm>
 (Internet Protocol), knows how to get individual packets from a source
 address to a destination address (this is why these are called IP
 addresses).  However, IP is not reliable; if a packet gets lost or dropped,
 the source and destination machines may never know it.  In network jargon,
-IP is a <emphasis>connectionless</emphasis> protocol; the sender just fires
+IP is a <firstterm>connectionless</firstterm> protocol; the sender just fires
 a packet at the receiver and doesn't expect an acknowledgement.</para>
 
 <para>IP is fast and cheap, though.  Sometimes fast, cheap and unreliable
@@ -1395,38 +1502,41 @@ is OK.  When you play networked Doom or Quake, each bullet is represented
 by an IP packet.  If a few of those get lost, that's OK.</para>
 
 <para>The upper level,
-<emphasis>TCP</emphasis><indexterm><primary>TCP</primary></indexterm>
+<firstterm>TCP</firstterm><indexterm><primary>TCP</primary></indexterm>
 (Transmission Control Protocol), gives you reliability.  When two machines
 negotiate a TCP connection (which they do using IP), the receiver knows to
 send acknowledgements of the packets it sees back to the sender.  If the
 sender doesn't see an acknowledgement for a packet within some timeout
 period, it resends that packet.  Furthermore, the sender gives each TCP
 packet a sequence number, which the receiver can use you reassemble packets
-in case they show up out of order.  (This can happen if network links go up
-or down during a connection.)</para>
+in case they show up out of order.  (This can easily happen if network
+links go up or down during a connection.)</para>
 
-<para>TCP/IP packets also contain a checksum to enable detection of data 
-corrupted by bad links.  So, from the point of view of anyone using
-TCP/IP and nameservers, it looks like a reliable way to pass streams
-of bytes between hostname/service-number pairs.  People who write
-network protocols almost never have to think about all the packetizing,
-packet reassembly, error checking, checksumming, and retransmission
-that goes on below that level.</para>
+<para>TCP/IP packets also contain a checksum to enable detection of data
+corrupted by bad links.  (The checksum is computed from the rest of the
+packet in such a way that if the either the rest of the packet or the
+checksum is corrupted, redoing the computation and comparing is very likely
+to indicate an error.)  So, from the point of view of anyone using TCP/IP
+and nameservers, it looks like a reliable way to pass streams of bytes
+between hostname/service-number pairs.  People who write network protocols
+almost never have to think about all the packetizing, packet reassembly,
+error checking, checksumming, and retransmission that goes on below that
+level.</para>
 
 </sect2>
-<sect2><title>HTTP, an application protocol</title>
+<sect2 id="HTTP"><title>HTTP, an application protocol</title>
 
 <para>Now let's get back to our example.  Web browsers and servers speak an
-<emphasis>application protocol</emphasis><indexterm><primary>application
+<firstterm>application protocol</firstterm><indexterm><primary>application
 protocol</primary></indexterm> that runs on top of TCP/IP, using it simply
 as a way to pass strings of bytes back and forth.  This protocol is called
-<emphasis>HTTP</emphasis><indexterm><primary>HTTP</primary></indexterm>
+<firstterm>HTTP</firstterm><indexterm><primary>HTTP</primary></indexterm>
 (Hyper-Text Transfer Protocol) and we've already seen one command in it --
 the GET shown above.</para>
 
-<para>When the GET command goes to metalab.unc.edu's webserver with service
-number 80, it will be dispatched to a <emphasis>server
-daemon</emphasis><indexterm><primary>server
+<para>When the GET command goes to www.linuxdoc.org's webserver with service
+number 80, it will be dispatched to a <firstterm>server
+daemon</firstterm><indexterm><primary>server
 daemon</primary></indexterm> listening on port 80.  Most Internet services
 are implemented by server daemons that do nothing but wait on ports,
 watching for and executing incoming commands.</para>
@@ -1434,7 +1544,7 @@ watching for and executing incoming commands.</para>
 <para>If the design of the Internet has one overall rule, it's that all the
 parts should be as simple and human-accessible as possible.  HTTP, and its
 relatives (like the Simple Mail Transfer Protocol,
-<emphasis>SMTP</emphasis><indexterm><primary>SMTP</primary></indexterm>,
+<firstterm>SMTP</firstterm><indexterm><primary>SMTP</primary></indexterm>,
 that is used to move electronic mail between hosts) tend to use simple
 printable-text commands that end with a carriage-return/line feed.</para>
 
@@ -1470,8 +1580,7 @@ tells it the returned data is really HTML).</para>
 The following sets edit modes for GNU EMACS
 Local Variables:
 fill-column:75
-compile-command: "mail -s \"HOWTO update\" ldp-submit@lists.linuxdoc.org <Unix-and-Internet-Fundamentals-HOWTO.sgml"
-End:
+compile-command: "mail -s \"Unix and Internet Fundamentals HOWTO update\" submit@linuxdoc.org <Unix-and-Internet-Fundamentals-HOWTO.sgml"
 End:
 -- >