This commit is contained in:
gferg 2004-04-04 22:37:35 +00:00
parent b6645120c1
commit 960cfb6ea3
3 changed files with 200 additions and 180 deletions

View File

@ -4806,7 +4806,7 @@ How to change your Linux system so it uses UTF-8 as text encoding. </Para>
Unix-and-Internet-Fundamentals-HOWTO</ULink>,
<CiteTitle>The Unix and Internet Fundamentals HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Oct 2003</CiteTitle>.
<CiteTitle>Updated: Mar 2004</CiteTitle>.
Describes the working basics of PC-class computers, Unix-like
operating systems, and the Internet in non-technical language. </Para>
</ListItem>

View File

@ -44,7 +44,7 @@ requirements, and some resources. </Para>
Unix-and-Internet-Fundamentals-HOWTO</ULink>,
<CiteTitle>The Unix and Internet Fundamentals HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Oct 2003</CiteTitle>.
<CiteTitle>Updated: Mar 2004</CiteTitle>.
Describes the working basics of PC-class computers, Unix-like
operating systems, and the Internet in non-technical language. </Para>
</ListItem>

View File

@ -1,7 +1,6 @@
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://docbook.org/xml/4.1.2/docbookx.dtd" [
"http://docbook.org/xml/4.1.2/docbookx.dtd" [
<!ENTITY howto "http://www.tldp.org/HOWTO/">
<!ENTITY mini-howto "&howto;/mini/">
<!ENTITY howto-host "www.tldp.org">
@ -25,6 +24,16 @@
</author>
<revhistory>
<revision>
<revnumber>2.9</revnumber>
<date>2004-03-03</date>
<authorinitials>esr</authorinitials>
<revremark>
Minor updates.
</revremark>
</revision>
<!--
<revision>
<revnumber>2.8</revnumber>
<date>2003-10-04</date>
@ -141,8 +150,8 @@
<date>1999-06-27</date>
<authorinitials>esr</authorinitials>
<revremark>
The sections `What happens when you log in?' and `File
ownership, permissions and security'.
The sections &lsquo;What happens when you log in?&rsquo; and
&lsquo;File ownership, permissions and security&rsquo;.
</revremark>
</revision>
@ -151,10 +160,10 @@
<date>1998-12-26</date>
<authorinitials>esr</authorinitials>
<revremark>
The section `How does my computer store things in memory?'.
The section &lsquo;How does my computer store things in memory?&rsquo;.
</revremark>
</revision>
-->
<revision>
<revnumber>1.0</revnumber>
<date>1998-10-29</date>
@ -164,7 +173,6 @@
</revremark>
</revision>
</revhistory>
<abstract>
<para>
This document describes the working basics of PC-class computers, Unix-like
@ -188,9 +196,9 @@ from lack of a good mental model of what is really going on.</para>
<para>I'll try to describe in clear, simple language how it all works. The
presentation will be tuned for people using Unix or Linux on PC-class
hardware. Nevertheless, I'll usually refer simply to `Unix' here, as most
of what I will describe is constant across platforms and across Unix
variants.</para>
hardware. Nevertheless, I'll usually refer simply to &lsquo;Unix&rsquo;
here, as most of what I will describe is constant across platforms and
across Unix variants.</para>
<para>I'm going to assume you're using an Intel PC. The details differ
slightly if you're running an Alpha or PowerPC or some other Unix box, but
@ -247,8 +255,8 @@ resources.</para>
<sect1 id="anatomy"><title>Basic anatomy of your computer</title>
<para>Your computer has a processor chip inside it that does the actual
computing. It has internal memory (what DOS/Windows people call ``RAM''
and Unix people often call ``core''; the Unix term is a folk memory from
computing. It has internal memory (what DOS/Windows people call <quote>RAM</quote>
and Unix people often call <quote>core</quote>; the Unix term is a folk memory from
when RAM consisted of ferrite-core donuts). The processor and memory live
on the
<firstterm>motherboard</firstterm><indexterm><primary>motherboard</primary></indexterm>,
@ -273,13 +281,13 @@ card, the disk controller, a sound card if you have one). The bus is the
data highway between your processor, your screen, your disk, and everything
else.</para>
<para>(If you've seen references to `ISA', `PCI', and `PCMCIA' in connection
with PCs and have not understood them, these are bus types. ISA is, except
in minor details, the same bus that was used on IBM's original PCs in 1980;
it is passing out of use now. PCI, for Peripheral Component
Interconnection, is the bus used on most modern PCs, and on modern
Macintoshes as well. PCMCIA is a variant of ISA with smaller physical
connectors used on laptop computers.)</para>
<para>(If you've seen references to &lsquo;ISA&rsquo;, &lsquo;PCI&rsquo;,
and &lsquo;PCMCIA&rsquo; in connection with PCs and have not understood
them, these are bus types. ISA is, except in minor details, the same bus
that was used on IBM's original PCs in 1980; it is passing out of use now.
PCI, for Peripheral Component Interconnection, is the bus used on most
modern PCs, and on modern Macintoshes as well. PCMCIA is a variant of ISA
with smaller physical connectors used on laptop computers.)</para>
<para>The processor, which makes everything else go, can't actually see any of
the other pieces directly; it has to talk to them over the bus. The only
@ -329,14 +337,14 @@ loading it into memory, and starting it. When you boot Linux and see
(Each dot means it has loaded another <anchor id="diskblock"/><firstterm>disk
block</firstterm> of kernel code.)</para>
<para>(You may wonder why the BIOS doesn't load the kernel directly &mdash; why the
two-step process with the boot loader? Well, the BIOS isn't very smart.
In fact it's very stupid, and Linux doesn't use it at all after boot time.
It was originally written for primitive 8-bit PCs with tiny disks, and
literally can't access enough of the disk to load the kernel directly. The
boot loader step also lets you start one of several operating systems off
different places on your disk, in the unlikely event that Unix isn't good
enough for you.)</para>
<para>(You may wonder why the BIOS doesn't load the kernel directly &mdash;
why the two-step process with the boot loader? Well, the BIOS isn't very
smart. In fact it's very stupid, and Linux doesn't use it at all after
boot time. It was originally written for primitive 8-bit PCs with tiny
disks, and literally can't access enough of the disk to load the kernel
directly. The boot loader step also lets you start one of several
operating systems off different places on your disk, in the unlikely event
that Unix isn't good enough for you.)</para>
<para>Once the kernel starts, it has to look around, find the rest of the
hardware, and get ready to run programs. It does this by poking not at
@ -359,7 +367,7 @@ a critical mass of users.</para>
<para>But getting the kernel fully loaded and running isn't the end of the
boot process; it's just the first stage (sometimes called <firstterm>run
level 1</firstterm>). After this first stage, the kernel hands control to a
special process called `init' which spawns several housekeeping
special process called &lsquo;init&rsquo; which spawns several housekeeping
processes.</para>
<para>The init process's first job is usually to check to make sure your disks
@ -428,7 +436,7 @@ associated with the individual account you are using. You may also be
recognized as part of a
<firstterm>group</firstterm><indexterm><primary>group</primary></indexterm>.
A group is a named collection of users set up by the system administrator.
Groups can have privileges independently of their members' privileges. A
Groups can have privileges independently of their members&rsquo; privileges. A
user can be a member of multiple groups. (For details about how Unix
privileges work, see the section below on <link linkend="permissions">permissions</link>.)</para>
@ -457,11 +465,11 @@ separate programs communicating through a small set of system calls.
This makes it possible for there to be multiple shells, suiting different
tastes in interfaces.</para>
<para>The normal shell gives you the '$' prompt that you see after logging in
(unless you've customized it to be something else). We won't talk about
shell syntax and the easy things you can see on the screen here; instead
we'll take a look behind the scenes at what's happening from the
computer's point of view.</para>
<para>The normal shell gives you the &lsquo;$&rsquo; prompt that you see
after logging in (unless you've customized it to be something else). We
won't talk about shell syntax and the easy things you can see on the screen
here; instead we'll take a look behind the scenes at what's happening from
the computer's point of view.</para>
<para>After boot time and before you run a program, you can think of your
computer as containing a zoo of processes that are all waiting for
@ -482,34 +490,35 @@ are known as <firstterm>system calls</firstterm>.</para>
operations and prevent processes from stepping on each other. A few
special user processes are allowed to slide around the kernel, usually by
being given direct access to I/O ports. X servers (the programs that
handle other programs' requests to do screen graphics on most Unix boxes)
handle other programs&rsquo; requests to do screen graphics on most Unix boxes)
are the most common example of this. But we haven't gotten to an X server
yet; you're looking at a shell prompt on a character console.</para>
<para>The shell is just a user process, and not a particularly special one.
It waits on your keystrokes, listening (through the kernel) to the keyboard
I/O port. As the kernel sees them, it echoes them to your screen. When
the kernel sees an `Enter' it passes your line of text to the shell. The
shell tries to interpret those keystrokes as commands.</para>
the kernel sees an &lsquo;Enter&rsquo; it passes your line of text to the
shell. The shell tries to interpret those keystrokes as commands.</para>
<para>Let's say you type `ls' and Enter to invoke the Unix directory
lister. The shell applies its built-in rules to figure out that you want to
run the executable command in the file `/bin/ls'. It makes a system call
asking the kernel to start /bin/ls as a new <firstterm>child
process</firstterm> and give it access to the screen and keyboard through
the kernel. Then the shell goes to sleep, waiting for ls to finish.</para>
<para>Let's say you type &lsquo;ls&rsquo; and Enter to invoke the Unix
directory lister. The shell applies its built-in rules to figure out that
you want to run the executable command in the file
<filename>/bin/ls</filename>. It makes a system call asking the kernel to
start /bin/ls as a new <firstterm>child process</firstterm> and give it
access to the screen and keyboard through the kernel. Then the shell goes
to sleep, waiting for ls to finish.</para>
<para>When <command>/bin/ls</command> is done, it tells the kernel it's
finished by issuing an <firstterm>exit</firstterm> system call. The kernel
then wakes up the shell and tells it it can continue running. The shell
issues another prompt and waits for another line of input.</para>
<para>Other things may be going on while your `ls' is executing, however
(we'll have to suppose that you're listing a very long directory). You
might switch to another virtual console, log in there, and start a game of
Quake, for example. Or, suppose you're hooked up to the Internet. Your
machine might be sending or receiving mail while <command>/bin/ls</command>
runs.</para>
<para>Other things may be going on while your &lsquo;ls&rsquo; is
executing, however (we'll have to suppose that you're listing a very long
directory). You might switch to another virtual console, log in there, and
start a game of Quake, for example. Or, suppose you're hooked up to the
Internet. Your machine might be sending or receiving mail while
<command>/bin/ls</command> runs.</para>
</sect1>
<sect1 id="devices"><title>How do input devices and interrupts work?</title>
@ -598,19 +607,19 @@ are caused when the program has to wait on data from a disk drive or
network connection.</para>
<para>An operating system that can routinely support many simultaneous
processes is called "multitasking". The Unix family of operating systems
was designed from the ground up for multitasking and is very good at it &mdash;
much more effective than Windows or the Mac OS, which have had multitasking
bolted into it as an afterthought and do it rather poorly. Efficient,
reliable multitasking is a large part of what makes Linux superior for
networking, communications, and Web service.</para>
processes is called <quote>multitasking</quote>. The Unix family of operating
systems was designed from the ground up for multitasking and is very good
at it &mdash; much more effective than Windows or the old Mac OS, which
had multitasking bolted into them as an afterthought and do it rather poorly.
Efficient, reliable multitasking is a large part of what makes Linux
superior for networking, communications, and Web service.</para>
</sect1>
<sect1 id="memory-management"><title>How does my computer keep processes from stepping on each other?</title>
<para>The kernel's scheduler takes care of dividing processes in time.
Your operating system also has to divide them in space, so that processes
can't step on each others' working memory. Even if you assume that all
can't step on each others'; working memory. Even if you assume that all
programs are trying to be cooperative, you don't want a bug in one of them
to be able to corrupt others. The things your operating system does to
solve this problem are called <firstterm>memory
@ -642,13 +651,13 @@ set</primary></indexterm>; the rest of the process's state is left in a
special <firstterm>swap space</firstterm><indexterm><primary>swap
space</primary></indexterm> area on your hard disk.</para>
<para>Note that in the past, that "Sometimes" last paragraph ago was
"Almost always" &mdash; the size of memory was typically small relative to the
size of running programs, so swapping was frequent. Memory is far less
expensive nowadays and even low-end machines have quite a lot of it. On
modern single-user machines with 64MB of memory and up, it's possible to
run X and a typical mix of jobs without ever swapping after they're
initially loaded into core.</para>
<para>Note that in the past, that <quote>Sometimes</quote> last paragraph ago was
<quote>Almost always</quote> &mdash; the size of memory was typically small
relative to the size of running programs, so swapping was frequent. Memory
is far less expensive nowadays and even low-end machines have quite a lot
of it. On modern single-user machines with 64MB of memory and up, it's
possible to run X and a typical mix of jobs without ever swapping after
they're initially loaded into core.</para>
</sect2>
<sect2 id="vm-details"><title>Virtual memory: the detailed version</title>
@ -741,12 +750,12 @@ reference.</para>
<para>It's been found by experience that the most effective method for a
broad class of memory-usage patterns is very simple; it's called LRU or the
"least recently used" algorithm. The virtual-memory system grabs disk
blocks into its <firstterm>working set</firstterm><indexterm><primary>working
set</primary></indexterm> as it needs them. When it runs out of physical
memory for the working set, it dumps the least-recently-used block. All
Unixes, and most other virtual-memory operating systems, use minor
variations on LRU.</para>
<quote>least recently used</quote> algorithm. The virtual-memory system grabs disk
blocks into its <firstterm>working
set</firstterm><indexterm><primary>working set</primary></indexterm> as it
needs them. When it runs out of physical memory for the working set, it
dumps the least-recently-used block. All Unixes, and most other
virtual-memory operating systems, use minor variations on LRU.</para>
<para>Virtual memory is the first link in the bridge between disk and
processor speeds. It's explicitly managed by the OS. But there is still a
@ -761,7 +770,7 @@ memory. External cache is faster (250M accesses per sec, rather than 100M)
and smaller. The hardware (specifically, your computer's memory
controller) does the LRU thing in the external cache on blocks of data
fetched from the main memory. For historical reasons, the unit of cache
swapping is called a "line" rather than a page.</para>
swapping is called a <firstterm>line</firstterm> rather than a page.</para>
<para>But we're not done. The internal cache gives us the final step-up in
effective speed by caching portions of the external cache. It is faster
@ -801,11 +810,11 @@ built right onto them. The MMU has the special ability to put fences
around areas of memory, so an out-of-bound reference will be refused and
cause a special interrupt to be raised.</para>
<para>If you ever see a Unix message that says "Segmentation fault", "core
dumped" or something similar, this is exactly what has happened; an attempt
by the running program to access memory (core) outside its segment has
raised a fatal interrupt. This indicates a bug in the program code; the
<firstterm>core dump</firstterm><indexterm><primary>core
<para>If you ever see a Unix message that says <quote>Segmentation fault</quote>,
<quote>core dumped</quote> or something similar, this is exactly what has happened;
an attempt by the running program to access memory (core) outside its
segment has raised a fatal interrupt. This indicates a bug in the program
code; the <firstterm>core dump</firstterm><indexterm><primary>core
dump</primary></indexterm> it leaves behind is diagnostic information
intended to help a programmer track it down.</para>
@ -832,15 +841,13 @@ technically it's the width of your processor's
<firstterm>registers</firstterm><indexterm><primary>registers</primary></indexterm>,
which are the holding areas your processor uses to do arithmetic and
logical calculations. When people write about computers having bit sizes
(calling them, say, ``32-bit'' or ``64-bit'' computers), this is what they
mean.</para>
(calling them, say, <quote>32-bit</quote> or <quote>64-bit</quote> computers), this is what
they mean.</para>
<para>Most computers (including 386, 486, and Pentium PCs) have a word
size of 32 bits. The old 286 machines had a word size of 16. Old-style
mainframes often had 36-bit words. A few processors (like the Alpha from
what used to be DEC and is now Compaq) have 64-bit words. The 64-bit word
will become more common over the next five years; Intel is planning to
replace the Pentium series with a 64-bit chip called the `Itanium'.</para>
<para>Most computers (including 386, 486, and Pentium PCs) have a word size
of 32 bits. The old 286 machines had a word size of 16. Old-style
mainframes often had 36-bit words. The AMD Opteron, Intel Itanium, and the
Alpha from what used to be DEC and is now Compaq have 64-bit words. </para>
<para>The computer views your memory as a sequence of words numbered from
zero up to some large value dependent on your memory size. That value is
@ -863,9 +870,10 @@ notation. The highest-order bit is a <firstterm>sign
bit</firstterm><indexterm><primary>sign bit</primary></indexterm> which
makes the quantity negative, and every negative number can be obtained from
the corresponding positive value by inverting all the bits and adding one.
This is why integers on a 32-bit machine have the range -2^31 to 2^31 - 1
1 (where ^ is the `power' operation, 2^3 = 8). That 32nd bit is being used
for sign.</para>
This is why integers on a 32-bit machine have the range
-2<superscript>31</superscript> to 2<superscript>31</superscript> - 1.
That 32nd bit is being used for sign; 0 means a positive number or zero, 1
a negative number.</para>
<para>Some computer languages give you access to <firstterm>unsigned
arithmetic</firstterm><indexterm><primary>unsigned
@ -878,8 +886,9 @@ numbers (this capability is built into all recent processor chips).
Floating-point numbers give you a much wider range of values than integers
and let you express fractions. The ways in which this is done vary and are
rather too complicated to discuss in detail here, but the general idea is
much like so-called `scientific notation', where one might write (say)
1.234 * 10^23; the encoding of the number is split into a
much like so-called &lsquo;scientific notation&rsquo;, where one might
write (say) 1.234 * 10<superscript>23</superscript>; the encoding of the
number is split into a
<firstterm>mantissa</firstterm><indexterm><primary>mantissa</primary></indexterm>
(1.234) and the exponent part (23) for the power-of-ten multiplier (which
means the number multiplied out would have 20 zeros on it, 23 minus the
@ -895,13 +904,13 @@ low seven bits of an
<firstterm>octet</firstterm><indexterm><primary>octet</primary></indexterm>
or 8-bit byte; octets are packed into memory words so that (for example) a
six-character string only takes up two memory words. For an ASCII code
chart, type `man 7 ascii' at your Unix prompt.</para>
chart, type &lsquo;man 7 ascii&rsquo; at your Unix prompt.</para>
<para>The preceding paragraph was misleading in two ways. The minor one is
that the term `octet' is formally correct but seldom actually used; most
that the term &lsquo;octet&rsquo; is formally correct but seldom actually used; most
people refer to an octet as
<firstterm>byte</firstterm><indexterm><primary>byte</primary></indexterm> and
expect bytes to be eight bits long. Strictly speaking, the term `byte' is
expect bytes to be eight bits long. Strictly speaking, the term &lsquo;byte&rsquo; is
more general; there used to be, for example, 36-bit machines with 9-bit
bytes (though there probably never will be again).</para>
@ -913,7 +922,7 @@ sign.</para>
<para>There have been several attempts to fix this problem. All use the extra
high bit that ASCII doesn't, making it the low half of a 256-character set.
The most widely-used of these is the so-called `Latin-1' character set
The most widely-used of these is the so-called &lsquo;Latin-1&rsquo; character set
(more formally called ISO 8859-1). This is the default character set for
Linux, HTML, and X. Microsoft Windows uses a mutant version of Latin-1
that adds a bunch of characters such as right and left double quotes in
@ -998,11 +1007,11 @@ one). This scattering effect is called
<para>Within each file system, the mapping from names to blocks is handled
through a structure called an
<firstterm>i-node</firstterm><indexterm><primary>i-node</primary></indexterm>.
There's a pool of these things near the ``bottom'' (lowest-numbered blocks)
of each file system (the very lowest ones are used for housekeeping and
labeling purposes we won't describe here). Each i-node describes one file.
File data blocks (including directories) live above the i-nodes (in
higher-numbered blocks). </para>
There's a pool of these things near the <quote>bottom</quote>
(lowest-numbered blocks) of each file system (the very lowest ones are used
for housekeeping and labeling purposes we won't describe here). Each
i-node describes one file. File data blocks (including directories) live
above the i-nodes (in higher-numbered blocks). </para>
<para>Every i-node contains a list of the disk block numbers in the file it
describes. (Actually this is a half-truth, only correct for small files,
@ -1041,29 +1050,32 @@ partitions accessible. It will
<firstterm>mount</firstterm><indexterm><primary>mount</primary></indexterm>
each one onto a directory on the root partition.</para>
<para>For example, if you have a Unix directory called `/usr', it is probably
a mount point to a partition that contains many programs installed with
your Unix but not required during initial boot.</para>
<para>For example, if you have a Unix directory called
<filename>/usr</filename>, it is probably a mount point to a partition that
contains many programs installed with your Unix but not required during
initial boot.</para>
</sect2>
<sect2 id="iname"><title>How a file gets looked up</title>
<para>Now we can look at the file system from the top down. When you open
a file (such as, say,
<filename>/home/esr/WWW/ldp/fundamentals.sgml</filename>) here is what
<filename>/home/esr/WWW/ldp/fundamentals.xml</filename>) here is what
happens:</para>
<para>Your kernel starts at the root of your Unix file system (in the root
partition). It looks for a directory there called `home'. Usually `home'
is a mount point to a large user partition elsewhere, so it will go there.
In the top-level directory structure of that user partition, it will look
for a entry called `esr' and extract an i-node number. It will go to that
i-node, notice that its associated file data blocks are a directory
structure, and look up `WWW'. Extracting <emphasis>that</emphasis> i-node,
it will go to the corresponding subdirectory and look up `ldp'. That will
partition). It looks for a directory there called &lsquo;home&rsquo;.
Usually &lsquo;home&rsquo; is a mount point to a large user partition
elsewhere, so it will go there. In the top-level directory structure of
that user partition, it will look for a entry called &lsquo;esr&rsquo; and
extract an i-node number. It will go to that i-node, notice that its
associated file data blocks are a directory structure, and look up
&lsquo;WWW&rsquo;. Extracting <emphasis>that</emphasis> i-node, it will go
to the corresponding subdirectory and look up &lsquo;ldp&rsquo;. That will
take it to yet another directory i-node. Opening that one, it will find an
i-node number for `fundamentals.sgml'. That i-node is not a directory, but
instead holds the list of disk blocks associated with the file.</para>
i-node number for &lsquo;fundamentals.xml&rsquo;. That i-node is not a
directory, but instead holds the list of disk blocks associated with the
file.</para>
</sect2>
<sect2 id="permissions"><title>File ownership, permissions and security</title>
@ -1083,14 +1095,15 @@ changed with the programs
chown(1)<indexterm><primary>chown(1)</primary></indexterm> and
chgrp(1)<indexterm><primary>chgrp(1)</primary></indexterm>.</para>
<para>The basic permissions that can be associated with a file are `read'
(permission to read data from it), `write' (permission to modify it) and
`execute' (permission to run it as a program). Each file has three sets of
permissions; one for its owning user, one for any user in its owning group,
and one for everyone else. The `privileges' you get when you log in are
just the ability to do read, write, and execute on those files for which
the permission bits match your user ID or one of the groups you are
in, or files that have been made accessible to the world.</para>
<para>The basic permissions that can be associated with a file are
&lsquo;read&rsquo; (permission to read data from it), &lsquo;write&rsquo;
(permission to modify it) and &lsquo;execute&rsquo; (permission to run it
as a program). Each file has three sets of permissions; one for its owning
user, one for any user in its owning group, and one for everyone else. The
&lsquo;privileges&rsquo; you get when you log in are just the ability to do
read, write, and execute on those files for which the permission bits match
your user ID or one of the groups you are in, or files that have been made
accessible to the world.</para>
<para>To see how these may interact and how Unix displays them, let's look
at some file listings on a hypothetical Unix system. Here's one:</para>
@ -1100,23 +1113,25 @@ snark:~$ ls -l notes
-rw-r--r-- 1 esr users 2993 Jun 17 11:00 notes
</screen>
<para>This is an ordinary data file. The listing tells us that it's
owned by the user `esr' and was created with the owning group `users'.
Probably the machine we're on puts every ordinary user in this group by
default; other groups you commonly see on timesharing machines are `staff',
`admin', or `wheel' (for obvious reasons, groups are not very important
on single-user workstations or PCs). Your Unix may use a different default
<para>This is an ordinary data file. The listing tells us that it's owned
by the user &lsquo;esr&rsquo; and was created with the owning group
&lsquo;users&rsquo;. Probably the machine we're on puts every ordinary user in
this group by default; other groups you commonly see on timesharing
machines are &lsquo;staff&rsquo;, &lsquo;admin&rsquo;, or
&lsquo;wheel&rsquo; (for obvious reasons, groups are not very important on
single-user workstations or PCs). Your Unix may use a different default
group, perhaps one named after your user ID.</para>
<para>The string `-rw-r--r--' represents the permission bits for the file.
The very first dash is the position for the directory bit; it would show
`d' if the file were a directory, or would show `l' if the file were a
aymbolic link. After that, the first three places are user permissions,
the second three group permissions, and the third are permissions for
others (often called `world' permissions). On this file, the owning user
`esr' may read or write the file, other people in the `users' group may
read it, and everybody else in the world may read it. This is a pretty
typical set of permissions for an ordinary data file.</para>
<para>The string &lsquo;-rw-r--r--&rsquo; represents the permission bits
for the file. The very first dash is the position for the directory bit;
it would show &lsquo;d&rsquo; if the file were a directory, or would show
&lsquo;l&rsquo; if the file were a symbolic link. After that, the first
three places are user permissions, the second three group permissions, and
the third are permissions for others (often called &lsquo;world&rsquo;
permissions). On this file, the owning user &lsquo;esr&rsquo; may read or
write the file, other people in the &lsquo;users&rsquo; group may read it,
and everybody else in the world may read it. This is a pretty typical set
of permissions for an ordinary data file.</para>
<para>Now let's look at a file with very different permissions. This file
is GCC, the GNU C compiler. </para>
@ -1126,36 +1141,38 @@ snark:~$ ls -l /usr/bin/gcc
-rwxr-xr-x 3 root bin 64796 Mar 21 16:41 /usr/bin/gcc
</screen>
<para>This file belongs to a user called `root' and a group called `bin';
it can be written (modified) only by root, but read or executed by anyone.
This is a typical ownership and set of permissions for a pre-installed
system command. The `bin' group exists on some Unixes to group together
system commands (the name is a historical relic, short for `binary'). Your
Unix might use a `root' group instead (not quite the same as the `root'
<para>This file belongs to a user called &lsquo;root&rsquo; and a group
called &lsquo;bin&rsquo;; it can be written (modified) only by root, but
read or executed by anyone. This is a typical ownership and set of
permissions for a pre-installed system command. The &lsquo;bin&rsquo;
group exists on some Unixes to group together system commands (the name is
a historical relic, short for &lsquo;binary&rsquo;). Your Unix might use a
&lsquo;root&rsquo; group instead (not quite the same as the &lsquo;root'
user!).</para>
<para>The `root' user is the conventional name for numeric user ID 0, a
special, privileged account that can override all privileges. Root access
is useful but dangerous; a typing mistake while you're logged in as root
can clobber critical system files that the same command executed from an
ordinary user account could not touch.</para>
<para>The &lsquo;root&rsquo; user is the conventional name for numeric user
ID 0, a special, privileged account that can override all privileges. Root
access is useful but dangerous; a typing mistake while you're logged in as
root can clobber critical system files that the same command executed from
an ordinary user account could not touch.</para>
<para>Because the root account is so powerful, access to it should be guarded
very carefully. Your root password is the single most critical piece of
security information on your system, and it is what any crackers and
intruders who ever come after you will be trying to get.</para>
<para>About passwords: Don't write them down &mdash; and don't pick a passwords
that can easily be guessed, like the first name of your
<para>About passwords: Don't write them down &mdash; and don't pick a
passwords that can easily be guessed, like the first name of your
girlfriend/boyfriend/spouse. This is an astonishingly common bad practice
that helps crackers no end. In general, don't pick any word in the
dictionary; there are programs called <firstterm>dictionary
crackers</firstterm> that look for likely passwords by running through word
lists of common choices. A good technique is to pick a combination
consisting of a word, a digit, and another word, such as `shark6cider' or
`jump3joy'; that will make the search space too large for a dictionary
cracker. Don't use these examples, though &mdash; crackers might expect that
after reading this document and put them in their dictionaries.</para>
consisting of a word, a digit, and another word, such as
&lsquo;shark6cider&rsquo; or &lsquo;jump3joy&rsquo;; that will make the search
space too large for a dictionary cracker. Don't use these examples, though
&mdash; crackers might expect that after reading this document and put them
in their dictionaries.</para>
<para>Now let's look at a third case:</para>
@ -1165,9 +1182,9 @@ drwxr-xr-x 89 esr users 9216 Jun 27 11:29 /home2/esr
snark:~$
</screen>
<para>This file is a directory (note the `d' in the first permissions
slot). We see that it can be written only by esr, but read and executed by
anybody else.</para>
<para>This file is a directory (note the &lsquo;d&rsquo; in the first
permissions slot). We see that it can be written only by esr, but read and
executed by anybody else.</para>
<para>Read permission gives you the ability to list the directory &mdash; that
is, to see the names of files and directories it contains. Write permission
@ -1191,18 +1208,19 @@ beneath. In particular, write access on a directory means you can
create new files or delete existing files there, but does not
automatically give you write access to existing files.</para>
<para>Finally, let's look at the permissions of the login program itself.</para>
<para>Finally, let's look at the permissions of the login program
itself.</para>
<screen>
snark:~$ ls -l /bin/login
-rwsr-xr-x 1 root bin 20164 Apr 17 12:57 /bin/login
</screen>
<para>This has the permissions we'd expect for a system command &mdash; except
for that 's' where the owner-execute bit ought to be. This is the visible
manifestation of a special permission called the `set-user-id' or
<firstterm>setuid bit</firstterm><indexterm><primary>setuid
bit</primary></indexterm>.</para>
<para>This has the permissions we'd expect for a system command &mdash;
except for that &lsquo;s&rsquo; where the owner-execute bit ought to be.
This is the visible manifestation of a special permission called the
&lsquo;set-user-id&rsquo; or <firstterm>setuid
bit</firstterm><indexterm><primary>setuid bit</primary></indexterm>.</para>
<para>The setuid bit is normally attached to programs that need to give
ordinary users the privileges of root, but in a controlled way. When it is
@ -1256,10 +1274,10 @@ your hard disk develops a bad spot?</para>
<para>If you're lucky, it will only trash some file data. If you're
unlucky, it could corrupt a directory structure or i-node number and leave
an entire subtree of your system hanging in limbo &mdash; or, worse, result in a
corrupted structure that points multiple ways at the same disk block or
i-node. Such corruption can be spread by normal file operations, trashing
data that was not in the original bad spot.</para>
an entire subtree of your system hanging in limbo &mdash; or, worse, result
in a corrupted structure that points multiple ways at the same disk block
or i-node. Such corruption can be spread by normal file operations,
trashing data that was not in the original bad spot.</para>
<para>Fortunately, this kind of contingency has become quite uncommon as disk
hardware has become more reliable. Still, it means that your Unix will
@ -1298,18 +1316,20 @@ hackers.</para>
<para>Almost all Unix code except a small amount of direct
hardware-interface support in the kernel itself is nowadays written in a
<firstterm>high-level language</firstterm><indexterm><primary>high-level language</primary></indexterm>. (The
`high-level' in this term is a historical relic meant to distinguish these
from `low-level' <firstterm>assembler
<firstterm>high-level language</firstterm><indexterm><primary>high-level
language</primary></indexterm>. (The &lsquo;high-level&rsquo; in this term
is a historical relic meant to distinguish these from
&lsquo;low-level&rsquo; <firstterm>assembler
languages</firstterm><indexterm><primary>assembler
languages</primary></indexterm>, which are basically thin wrappers around
machine code.)</para>
<para>There are several different kinds of high-level languages. In order
to talk about these, you'll find it useful to bear in mind that the
<firstterm>source code</firstterm><indexterm><primary>source code</primary></indexterm> of a program (the
human-created, editable version) has to go through some kind of translation
into machine code that the machine can actually run.</para>
<firstterm>source code</firstterm><indexterm><primary>source
code</primary></indexterm> of a program (the human-created, editable
version) has to go through some kind of translation into machine code that
the machine can actually run.</para>
<sect2 id="compilers"><title>Compiled languages</title>
@ -1403,12 +1423,12 @@ export directory of the host &howto-host;.</para>
connection to the machine where the document lives. To do that, it first
has to find the network location of the
<firstterm>host</firstterm><indexterm><primary>host</primary></indexterm>
&howto-host; (`host' is short for `host machine' or `network host';
&howto-host; (&lsquo;host&rsquo; is short for &lsquo;host machine&rsquo; or &lsquo;network host';
&howto-host; is a typical
<firstterm>hostname</firstterm><indexterm><primary>hostname</primary></indexterm>).
The corresponding location is actually a number called an <firstterm>IP
address</firstterm><indexterm><primary>IP address</primary></indexterm>
(we'll explain the `IP' part of this term later).</para>
(we'll explain the &lsquo;IP&rsquo; part of this term later).</para>
<para>To do this, your browser queries a program called a
<firstterm>name server</firstterm><indexterm><primary>name server</primary></indexterm>. The name server
@ -1432,9 +1452,10 @@ exchange bits with &howto-host; directly.</para>
<sect2 id="domains"><title>The Domain Name System</title>
<para>The whole network of programs and databases that cooperates to
translate hostnames to IP addresses is called `DNS' (Domain Name System).
When you see references to a `DNS server', that means what we just called
a nameserver. Now I'll explain how the overall system works.</para>
translate hostnames to IP addresses is called &lsquo;DNS&rsquo; (Domain
Name System). When you see references to a &lsquo;DNS server&rsquo;, that
means what we just called a nameserver. Now I'll explain how the overall
system works.</para>
<para>Internet hostnames are composed of parts separated by dots. A
<firstterm>domain</firstterm><indexterm><primary>domain</primary>
@ -1446,10 +1467,10 @@ domain.</para>
<para>Each domain is defined by an <firstterm>authoritative name
server</firstterm><indexterm><primary>authoritative name server</primary>
</indexterm> that knows the IP addresses of the other machines in the
domain. The authoritative (or `primary') name server may have backups in
domain. The authoritative (or &lsquo;primary') name server may have backups in
case it goes down; if you see references to a <firstterm>secondary name
server</firstterm><indexterm><primary>secondary name server</primary>
</indexterm> or (`secondary DNS') it's talking about one of those. These
</indexterm> or (&lsquo;secondary DNS') it's talking about one of those. These
secondaries typically refresh their information from their primaries every
few hours, so a change made to the hostname-to-IP mapping on the primary
will automatically be propagated.</para>
@ -1656,4 +1677,3 @@ fill-column:75
compile-command: "mail -s \"Unix and Internet Fundamentals HOWTO update\" submit@en.tldp.org <Unix-and-Internet-Fundamentals-HOWTO.xml"
End:
-->