old-www/HOWTO/Unix-and-Internet-Fundament.../internet.html

510 lines
14 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>How does the Internet work?</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="The Unix and Internet Fundamentals HOWTO"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="How do computer languages work?"
HREF="languages.html"><LINK
REL="NEXT"
TITLE="To Learn More"
HREF="more.html"></HEAD
><BODY
CLASS="sect1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>The Unix and Internet Fundamentals HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="languages.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="more.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="internet"
></A
>12. How does the Internet work?</H1
><P
>To help you understand how the Internet works, we'll look at the things
that happen when you do a typical Internet operation &#8212; pointing a browser
at the front page of this document at its home on the Web at the Linux
Documentation Project. This document is</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="screen"
>&#13;http://www.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html
</PRE
></FONT
></TD
></TR
></TABLE
><P
>which means it lives in the file
HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html under the World Wide Web
export directory of the host www.tldp.org.</P
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="dns"
></A
>12.1. Names and locations</H2
><P
>The first thing your browser has to do is to establish a network
connection to the machine where the document lives. To do that, it first
has to find the network location of the
<I
CLASS="firstterm"
>host</I
>
www.tldp.org (&#8216;host&#8217; is short for &#8216;host machine&#8217; or &#8216;network host';
www.tldp.org is a typical
<I
CLASS="firstterm"
>hostname</I
>).
The corresponding location is actually a number called an <I
CLASS="firstterm"
>IP
address</I
>
(we'll explain the &#8216;IP&#8217; part of this term later).</P
><P
>To do this, your browser queries a program called a
<I
CLASS="firstterm"
>name server</I
>. The name server
may live on your machine, but it's more likely to run on a service machine
that yours talks to. When you sign up with an ISP, part of your setup
procedure will almost certainly involve telling your Internet software the
IP address of a nameserver on the ISP's network.</P
><P
>The name servers on different machines talk to each other, exchanging
and keeping up to date all the information needed to resolve hostnames (map
them to IP addresses). Your nameserver may query three or four different
sites across the network in the process of resolving www.tldp.org, but
this usually happens very quickly (as in less than a second). We'll look
at how nameservers detail in the next section.</P
><P
>The nameserver will tell your browser that www.tldp.org's IP
address is 152.19.254.81; knowing this, your machine will be able to
exchange bits with www.tldp.org directly.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="domains"
></A
>12.2. The Domain Name System</H2
><P
>The whole network of programs and databases that cooperates to
translate hostnames to IP addresses is called &#8216;DNS&#8217; (Domain
Name System). When you see references to a &#8216;DNS server&#8217;, that
means what we just called a nameserver. Now I'll explain how the overall
system works.</P
><P
>Internet hostnames are composed of parts separated by dots. A
<I
CLASS="firstterm"
>domain</I
> is a collection of machines that share a common name suffix.
Domains can live inside other domains. For example, the machine
www.tldp.org lives in the .tldp.org subdomain of the .org
domain.</P
><P
>Each domain is defined by an <I
CLASS="firstterm"
>authoritative name
server</I
> that knows the IP addresses of the other machines in the
domain. The authoritative (or &#8216;primary') name server may have backups in
case it goes down; if you see references to a <I
CLASS="firstterm"
>secondary name
server</I
> or (&#8216;secondary DNS') it's talking about one of those. These
secondaries typically refresh their information from their primaries every
few hours, so a change made to the hostname-to-IP mapping on the primary
will automatically be propagated.</P
><P
>Now here's the important part. The nameservers for a domain do
<EM
>not</EM
> have to know the locations of all the machines in
other domains (including their own subdomains); they only have to know the
location of the nameservers. In our example, the authoritative name server
for the .org domain knows the IP address of the nameserver for
.tldp.org but <EM
>not</EM
> the address of all the other
machines in .tldp.org. </P
><P
>The domains in the DNS system are arranged like a big inverted tree.
At the top are the root servers. Everybody knows the IP addresses of the
root servers; they're wired into your DNS software.
The root servers know the IP addresses of the nameservers for the
top-level domains like .com and .org, but not the addresses of machines
inside those domains. Each top-level domain server knows where the
nameservers for the domains directly beneath it are, and so forth.</P
><P
>DNS is carefully designed so that each machine can get away with the
minimum amount of knowledge it needs to have about the shape of the tree,
and local changes to subtrees can be made simply by changing one
authoritative server's database of name-to-IP-address mappings.</P
><P
>When you query for the IP address of www.tldp.org, what actually
happens is this: First, your nameserver asks a root server to tell it where
it can find a nameserver for .org. Once it knows that, it then
asks the .org server to tell it the IP address of a
.tldp.org nameserver. Once it has that, it asks the .tldp.org
nameserver to tell it the address of the host www.tldp.org.</P
><P
>Most of the time, your nameserver doesn't actually have to work that
hard. Nameservers do a lot of cacheing; when yours resolves a hostname, it
keeps the association with the resulting IP address around in memory for a
while. This is why, when you surf to a new website, you'll usually only
see a message from your browser about "Looking up" the host for the first
page you fetch. Eventually the name-to-address mapping expires and your
DNS has to re-query &#8212; this is important so you don't have invalid
information hanging around forever when a hostname changes addresses. Your
cached IP address for a site is also thrown out if the host is
unreachable. </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="transport"
></A
>12.3. Packets and routers</H2
><P
>What the browser wants to do is send a command to the Web server on
www.tldp.org that looks like this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="screen"
>&#13;GET /LDP/HOWTO/Fundamentals.html HTTP/1.0
</PRE
></FONT
></TD
></TR
></TABLE
><P
>Here's how that happens. The command is made into a
<I
CLASS="firstterm"
>packet</I
>,
a block of bits like a telegram that is wrapped with three important
things; the <I
CLASS="firstterm"
>source address</I
> (the IP address of your machine), the
<I
CLASS="firstterm"
>destination address</I
> (152.19.254.81), and a <I
CLASS="firstterm"
>service
number</I
>
or <I
CLASS="firstterm"
>port number</I
> (80, in this case) that indicates that it's a
World Wide Web request.</P
><P
>Your machine then ships the packet down the wire (your connection to
your ISP, or local network) until it gets to a specialized machine called a
<I
CLASS="firstterm"
>router</I
>.
The router has a map of the Internet in its memory &#8212; not always a complete
one, but one that completely describes your network neighborhood and knows
how to get to the routers for other neighborhoods on the Internet.</P
><P
>Your packet may pass through several routers on the way to its
destination. Routers are smart. They watch how long it takes for other
routers to acknowledge having received a packet. They also use that
information to direct traffic over fast links. They use it to notice when
another router (or a cable) have dropped off the network, and compensate
if possible by finding another route.</P
><P
>There's an urban legend that the Internet was designed to survive
nuclear war. This is not true, but the Internet's design is extremely good
at getting reliable performance out of flaky hardware in an uncertain
world. This is directly due to the fact that its intelligence is
distributed through thousands of routers rather than concentrated in a few
massive and vulnerable switches (like the phone network). This means that
failures tend to be well localized and the network can route around
them.</P
><P
>Once your packet gets to its destination machine, that machine uses the
service number to feed the packet to the web server. The web server can
tell where to reply to by looking at the command packet's source IP
address. When the web server returns this document, it will be broken up
into a number of packets. The size of the packets will vary according to
the transmission media in the network and the type of service.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="TCP-IP"
></A
>12.4. TCP and IP</H2
><P
>To understand how multiple-packet transmissions are handled, you need to
know that the Internet actually uses two protocols, stacked one on top
of the other.</P
><P
>The lower level,
<I
CLASS="firstterm"
>IP</I
>
(Internet Protocol), is responsible for labeling
individual packets with the source address and destination address of two
computers exchanging information over a network.
For example, when you access http://www.tldp.org, the packets you send
will have your computer's IP address, such as 192.168.1.101, and the IP
address of the www.tldp.org computer, 152.2.210.81. These addresses
work in much the same way that your home address works when someone sends
you a letter. The post office can read the address and determine where
you are and how best to route the letter to you, much like a router does
for Internet traffic.</P
><P
>The upper level,
<I
CLASS="firstterm"
>TCP</I
>
(Transmission Control Protocol), gives you reliability. When two machines
negotiate a TCP connection (which they do using IP), the receiver knows to
send acknowledgements of the packets it sees back to the sender. If the
sender doesn't see an acknowledgement for a packet within some timeout
period, it resends that packet. Furthermore, the sender gives each TCP
packet a sequence number, which the receiver can use to reassemble packets
in case they show up out of order. (This can easily happen if network
links go up or down during a connection.)</P
><P
>TCP/IP packets also contain a checksum to enable detection of data
corrupted by bad links. (The checksum is computed from the rest of the
packet in such a way that if either the rest of the packet or the
checksum is corrupted, redoing the computation and comparing is very likely
to indicate an error.) So, from the point of view of anyone using TCP/IP
and nameservers, it looks like a reliable way to pass streams of bytes
between hostname/service-number pairs. People who write network protocols
almost never have to think about all the packetizing, packet reassembly,
error checking, checksumming, and retransmission that goes on below that
level.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="HTTP"
></A
>12.5. HTTP, an application protocol</H2
><P
>Now let's get back to our example. Web browsers and servers speak an
<I
CLASS="firstterm"
>application protocol</I
> that runs on top of TCP/IP, using it simply
as a way to pass strings of bytes back and forth. This protocol is called
<I
CLASS="firstterm"
>HTTP</I
>
(Hyper-Text Transfer Protocol) and we've already seen one command in it &#8212;
the GET shown above.</P
><P
>When the GET command goes to www.tldp.org's webserver with service
number 80, it will be dispatched to a <I
CLASS="firstterm"
>server
daemon</I
> listening on port 80. Most Internet services
are implemented by server daemons that do nothing but wait on ports,
watching for and executing incoming commands.</P
><P
>If the design of the Internet has one overall rule, it's that all the
parts should be as simple and human-accessible as possible. HTTP, and its
relatives (like the Simple Mail Transfer Protocol,
<I
CLASS="firstterm"
>SMTP</I
>,
that is used to move electronic mail between hosts) tend to use simple
printable-text commands that end with a carriage-return/line feed.</P
><P
>This is marginally inefficient; in some circumstances you could get more
speed by using a tightly-coded binary protocol. But experience has shown
that the benefits of having commands be easy for human beings to describe
and understand outweigh any marginal gain in efficiency that you might get
at the cost of making things tricky and opaque.</P
><P
>Therefore, what the server daemon ships back to you via TCP/IP is also
text. The beginning of the response will look something like this (a few
headers have been suppressed):</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="screen"
>&#13;HTTP/1.1 200 OK
Date: Sat, 10 Oct 1998 18:43:35 GMT
Server: Apache/1.2.6 Red Hat
Last-Modified: Thu, 27 Aug 1998 17:55:15 GMT
Content-Length: 2982
Content-Type: text/html
</PRE
></FONT
></TD
></TR
></TABLE
><P
>These headers will be followed by a blank line and the text of the
web page (after which the connection is dropped). Your browser just
displays that page. The headers tell it how (in particular, the
Content-Type header tells it the returned data is really HTML).</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="languages.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="more.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>How do computer languages work?</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>To Learn More</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>