510 lines
14 KiB
HTML
510 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>How does the Internet work?</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
|
|
REL="HOME"
|
|
TITLE="The Unix and Internet Fundamentals HOWTO"
|
|
HREF="index.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="How do computer languages work?"
|
|
HREF="languages.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="To Learn More"
|
|
HREF="more.html"></HEAD
|
|
><BODY
|
|
CLASS="sect1"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>The Unix and Internet Fundamentals HOWTO</TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="languages.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
></TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="more.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="sect1"
|
|
><H1
|
|
CLASS="sect1"
|
|
><A
|
|
NAME="internet"
|
|
></A
|
|
>12. How does the Internet work?</H1
|
|
><P
|
|
>To help you understand how the Internet works, we'll look at the things
|
|
that happen when you do a typical Internet operation — pointing a browser
|
|
at the front page of this document at its home on the Web at the Linux
|
|
Documentation Project. This document is</P
|
|
><TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="screen"
|
|
> http://www.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html
|
|
</PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><P
|
|
>which means it lives in the file
|
|
HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html under the World Wide Web
|
|
export directory of the host www.tldp.org.</P
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="dns"
|
|
></A
|
|
>12.1. Names and locations</H2
|
|
><P
|
|
>The first thing your browser has to do is to establish a network
|
|
connection to the machine where the document lives. To do that, it first
|
|
has to find the network location of the
|
|
<I
|
|
CLASS="firstterm"
|
|
>host</I
|
|
>
|
|
www.tldp.org (‘host’ is short for ‘host machine’ or ‘network host';
|
|
www.tldp.org is a typical
|
|
<I
|
|
CLASS="firstterm"
|
|
>hostname</I
|
|
>).
|
|
The corresponding location is actually a number called an <I
|
|
CLASS="firstterm"
|
|
>IP
|
|
address</I
|
|
>
|
|
(we'll explain the ‘IP’ part of this term later).</P
|
|
><P
|
|
>To do this, your browser queries a program called a
|
|
<I
|
|
CLASS="firstterm"
|
|
>name server</I
|
|
>. The name server
|
|
may live on your machine, but it's more likely to run on a service machine
|
|
that yours talks to. When you sign up with an ISP, part of your setup
|
|
procedure will almost certainly involve telling your Internet software the
|
|
IP address of a nameserver on the ISP's network.</P
|
|
><P
|
|
>The name servers on different machines talk to each other, exchanging
|
|
and keeping up to date all the information needed to resolve hostnames (map
|
|
them to IP addresses). Your nameserver may query three or four different
|
|
sites across the network in the process of resolving www.tldp.org, but
|
|
this usually happens very quickly (as in less than a second). We'll look
|
|
at how nameservers detail in the next section.</P
|
|
><P
|
|
>The nameserver will tell your browser that www.tldp.org's IP
|
|
address is 152.19.254.81; knowing this, your machine will be able to
|
|
exchange bits with www.tldp.org directly.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="domains"
|
|
></A
|
|
>12.2. The Domain Name System</H2
|
|
><P
|
|
>The whole network of programs and databases that cooperates to
|
|
translate hostnames to IP addresses is called ‘DNS’ (Domain
|
|
Name System). When you see references to a ‘DNS server’, that
|
|
means what we just called a nameserver. Now I'll explain how the overall
|
|
system works.</P
|
|
><P
|
|
>Internet hostnames are composed of parts separated by dots. A
|
|
<I
|
|
CLASS="firstterm"
|
|
>domain</I
|
|
> is a collection of machines that share a common name suffix.
|
|
Domains can live inside other domains. For example, the machine
|
|
www.tldp.org lives in the .tldp.org subdomain of the .org
|
|
domain.</P
|
|
><P
|
|
>Each domain is defined by an <I
|
|
CLASS="firstterm"
|
|
>authoritative name
|
|
server</I
|
|
> that knows the IP addresses of the other machines in the
|
|
domain. The authoritative (or ‘primary') name server may have backups in
|
|
case it goes down; if you see references to a <I
|
|
CLASS="firstterm"
|
|
>secondary name
|
|
server</I
|
|
> or (‘secondary DNS') it's talking about one of those. These
|
|
secondaries typically refresh their information from their primaries every
|
|
few hours, so a change made to the hostname-to-IP mapping on the primary
|
|
will automatically be propagated.</P
|
|
><P
|
|
>Now here's the important part. The nameservers for a domain do
|
|
<EM
|
|
>not</EM
|
|
> have to know the locations of all the machines in
|
|
other domains (including their own subdomains); they only have to know the
|
|
location of the nameservers. In our example, the authoritative name server
|
|
for the .org domain knows the IP address of the nameserver for
|
|
.tldp.org but <EM
|
|
>not</EM
|
|
> the address of all the other
|
|
machines in .tldp.org. </P
|
|
><P
|
|
>The domains in the DNS system are arranged like a big inverted tree.
|
|
At the top are the root servers. Everybody knows the IP addresses of the
|
|
root servers; they're wired into your DNS software.
|
|
The root servers know the IP addresses of the nameservers for the
|
|
top-level domains like .com and .org, but not the addresses of machines
|
|
inside those domains. Each top-level domain server knows where the
|
|
nameservers for the domains directly beneath it are, and so forth.</P
|
|
><P
|
|
>DNS is carefully designed so that each machine can get away with the
|
|
minimum amount of knowledge it needs to have about the shape of the tree,
|
|
and local changes to subtrees can be made simply by changing one
|
|
authoritative server's database of name-to-IP-address mappings.</P
|
|
><P
|
|
>When you query for the IP address of www.tldp.org, what actually
|
|
happens is this: First, your nameserver asks a root server to tell it where
|
|
it can find a nameserver for .org. Once it knows that, it then
|
|
asks the .org server to tell it the IP address of a
|
|
.tldp.org nameserver. Once it has that, it asks the .tldp.org
|
|
nameserver to tell it the address of the host www.tldp.org.</P
|
|
><P
|
|
>Most of the time, your nameserver doesn't actually have to work that
|
|
hard. Nameservers do a lot of cacheing; when yours resolves a hostname, it
|
|
keeps the association with the resulting IP address around in memory for a
|
|
while. This is why, when you surf to a new website, you'll usually only
|
|
see a message from your browser about "Looking up" the host for the first
|
|
page you fetch. Eventually the name-to-address mapping expires and your
|
|
DNS has to re-query — this is important so you don't have invalid
|
|
information hanging around forever when a hostname changes addresses. Your
|
|
cached IP address for a site is also thrown out if the host is
|
|
unreachable. </P
|
|
></DIV
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="transport"
|
|
></A
|
|
>12.3. Packets and routers</H2
|
|
><P
|
|
>What the browser wants to do is send a command to the Web server on
|
|
www.tldp.org that looks like this:</P
|
|
><TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="screen"
|
|
> GET /LDP/HOWTO/Fundamentals.html HTTP/1.0
|
|
</PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><P
|
|
>Here's how that happens. The command is made into a
|
|
<I
|
|
CLASS="firstterm"
|
|
>packet</I
|
|
>,
|
|
a block of bits like a telegram that is wrapped with three important
|
|
things; the <I
|
|
CLASS="firstterm"
|
|
>source address</I
|
|
> (the IP address of your machine), the
|
|
<I
|
|
CLASS="firstterm"
|
|
>destination address</I
|
|
> (152.19.254.81), and a <I
|
|
CLASS="firstterm"
|
|
>service
|
|
number</I
|
|
>
|
|
or <I
|
|
CLASS="firstterm"
|
|
>port number</I
|
|
> (80, in this case) that indicates that it's a
|
|
World Wide Web request.</P
|
|
><P
|
|
>Your machine then ships the packet down the wire (your connection to
|
|
your ISP, or local network) until it gets to a specialized machine called a
|
|
<I
|
|
CLASS="firstterm"
|
|
>router</I
|
|
>.
|
|
The router has a map of the Internet in its memory — not always a complete
|
|
one, but one that completely describes your network neighborhood and knows
|
|
how to get to the routers for other neighborhoods on the Internet.</P
|
|
><P
|
|
>Your packet may pass through several routers on the way to its
|
|
destination. Routers are smart. They watch how long it takes for other
|
|
routers to acknowledge having received a packet. They also use that
|
|
information to direct traffic over fast links. They use it to notice when
|
|
another router (or a cable) have dropped off the network, and compensate
|
|
if possible by finding another route.</P
|
|
><P
|
|
>There's an urban legend that the Internet was designed to survive
|
|
nuclear war. This is not true, but the Internet's design is extremely good
|
|
at getting reliable performance out of flaky hardware in an uncertain
|
|
world. This is directly due to the fact that its intelligence is
|
|
distributed through thousands of routers rather than concentrated in a few
|
|
massive and vulnerable switches (like the phone network). This means that
|
|
failures tend to be well localized and the network can route around
|
|
them.</P
|
|
><P
|
|
>Once your packet gets to its destination machine, that machine uses the
|
|
service number to feed the packet to the web server. The web server can
|
|
tell where to reply to by looking at the command packet's source IP
|
|
address. When the web server returns this document, it will be broken up
|
|
into a number of packets. The size of the packets will vary according to
|
|
the transmission media in the network and the type of service.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="TCP-IP"
|
|
></A
|
|
>12.4. TCP and IP</H2
|
|
><P
|
|
>To understand how multiple-packet transmissions are handled, you need to
|
|
know that the Internet actually uses two protocols, stacked one on top
|
|
of the other.</P
|
|
><P
|
|
>The lower level,
|
|
<I
|
|
CLASS="firstterm"
|
|
>IP</I
|
|
>
|
|
(Internet Protocol), is responsible for labeling
|
|
individual packets with the source address and destination address of two
|
|
computers exchanging information over a network.
|
|
For example, when you access http://www.tldp.org, the packets you send
|
|
will have your computer's IP address, such as 192.168.1.101, and the IP
|
|
address of the www.tldp.org computer, 152.2.210.81. These addresses
|
|
work in much the same way that your home address works when someone sends
|
|
you a letter. The post office can read the address and determine where
|
|
you are and how best to route the letter to you, much like a router does
|
|
for Internet traffic.</P
|
|
><P
|
|
>The upper level,
|
|
<I
|
|
CLASS="firstterm"
|
|
>TCP</I
|
|
>
|
|
(Transmission Control Protocol), gives you reliability. When two machines
|
|
negotiate a TCP connection (which they do using IP), the receiver knows to
|
|
send acknowledgements of the packets it sees back to the sender. If the
|
|
sender doesn't see an acknowledgement for a packet within some timeout
|
|
period, it resends that packet. Furthermore, the sender gives each TCP
|
|
packet a sequence number, which the receiver can use to reassemble packets
|
|
in case they show up out of order. (This can easily happen if network
|
|
links go up or down during a connection.)</P
|
|
><P
|
|
>TCP/IP packets also contain a checksum to enable detection of data
|
|
corrupted by bad links. (The checksum is computed from the rest of the
|
|
packet in such a way that if either the rest of the packet or the
|
|
checksum is corrupted, redoing the computation and comparing is very likely
|
|
to indicate an error.) So, from the point of view of anyone using TCP/IP
|
|
and nameservers, it looks like a reliable way to pass streams of bytes
|
|
between hostname/service-number pairs. People who write network protocols
|
|
almost never have to think about all the packetizing, packet reassembly,
|
|
error checking, checksumming, and retransmission that goes on below that
|
|
level.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="HTTP"
|
|
></A
|
|
>12.5. HTTP, an application protocol</H2
|
|
><P
|
|
>Now let's get back to our example. Web browsers and servers speak an
|
|
<I
|
|
CLASS="firstterm"
|
|
>application protocol</I
|
|
> that runs on top of TCP/IP, using it simply
|
|
as a way to pass strings of bytes back and forth. This protocol is called
|
|
<I
|
|
CLASS="firstterm"
|
|
>HTTP</I
|
|
>
|
|
(Hyper-Text Transfer Protocol) and we've already seen one command in it —
|
|
the GET shown above.</P
|
|
><P
|
|
>When the GET command goes to www.tldp.org's webserver with service
|
|
number 80, it will be dispatched to a <I
|
|
CLASS="firstterm"
|
|
>server
|
|
daemon</I
|
|
> listening on port 80. Most Internet services
|
|
are implemented by server daemons that do nothing but wait on ports,
|
|
watching for and executing incoming commands.</P
|
|
><P
|
|
>If the design of the Internet has one overall rule, it's that all the
|
|
parts should be as simple and human-accessible as possible. HTTP, and its
|
|
relatives (like the Simple Mail Transfer Protocol,
|
|
<I
|
|
CLASS="firstterm"
|
|
>SMTP</I
|
|
>,
|
|
that is used to move electronic mail between hosts) tend to use simple
|
|
printable-text commands that end with a carriage-return/line feed.</P
|
|
><P
|
|
>This is marginally inefficient; in some circumstances you could get more
|
|
speed by using a tightly-coded binary protocol. But experience has shown
|
|
that the benefits of having commands be easy for human beings to describe
|
|
and understand outweigh any marginal gain in efficiency that you might get
|
|
at the cost of making things tricky and opaque.</P
|
|
><P
|
|
>Therefore, what the server daemon ships back to you via TCP/IP is also
|
|
text. The beginning of the response will look something like this (a few
|
|
headers have been suppressed):</P
|
|
><TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="screen"
|
|
> HTTP/1.1 200 OK
|
|
Date: Sat, 10 Oct 1998 18:43:35 GMT
|
|
Server: Apache/1.2.6 Red Hat
|
|
Last-Modified: Thu, 27 Aug 1998 17:55:15 GMT
|
|
Content-Length: 2982
|
|
Content-Type: text/html
|
|
</PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><P
|
|
>These headers will be followed by a blank line and the text of the
|
|
web page (after which the connection is dropped). Your browser just
|
|
displays that page. The headers tell it how (in particular, the
|
|
Content-Type header tells it the returned data is really HTML).</P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="languages.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="more.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>How do computer languages work?</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
> </TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>To Learn More</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |