old-www/LDP/nag2/x-087-2-news.algorithm.html

376 lines
8.7 KiB
HTML

<HTML
><HEAD
><TITLE
>How Does Usenet Handle News?</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.57"><LINK
REL="HOME"
TITLE="Linux Network Administrators Guide"
HREF="index.html"><LINK
REL="UP"
TITLE="Netnews"
HREF="x-087-2-news.html"><LINK
REL="PREVIOUS"
TITLE="What Is Usenet, Anyway?"
HREF="x-087-2-news.usenet.html"><LINK
REL="NEXT"
TITLE="C News"
HREF="x-087-2-cnews.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Linux Network Administrators Guide</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x-087-2-news.usenet.html"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 20. Netnews</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="x-087-2-cnews.html"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="X-087-2-NEWS.ALGORITHM"
>20.3. How Does Usenet Handle News?</A
></H1
><P
>&#13;
Today, Usenet has grown to enormous proportions. Sites that carry the
whole of Netnews usually transfer something like a paltry 60 MB a day.<A
NAME="X-087-2-FNUN02"
HREF="#FTN.X-087-2-FNUN02"
>[1]</A
> Of course, this requires much
more than pushing files around. So let's take a look at the way most Unix
systems handle Usenet news.</P
><P
>
News begins when users create and post articles. Each user enters a
message into a special application called a newsreader, which
formats it appropriately for transmission to the local
news server. In Unix environments the newsreader commonly uses the
<B
CLASS="COMMAND"
>inews</B
> command to transmit articles to the
newsserver using the TCP/IP protocol. But it's also possible to
write the article directly into a file in a special directory called
the news spool. Once the posting is delivered to the local
news server, it takes responsibility for delivering the article to
other news users.</P
><P
>&#13;News is distributed through the net by various transports. The medium used to be UUCP, but today the main traffic is carried by Internet
sites. The routing algorithm used is called <I
CLASS="EMPHASIS"
>flooding</I
>.
Each site maintains a number of links (<I
CLASS="EMPHASIS"
>news feeds</I
>) to
other sites. Any article generated or received by the local news system is
forwarded to them, unless it has already been at that site, in which case it
is discarded. A site may find out about all other sites the article has
already traversed by looking at the <TT
CLASS="LITERAL"
>Path:</TT
> header field. This
header contains a list of all systems through which the article has been
forwarded in bang path notation.</P
><P
>&#13;
To distinguish articles and recognize duplicates, Usenet articles have
to carry a message ID (specified in the <TT
CLASS="LITERAL"
>Message-Id:</TT
> header
field), which combines the posting site's name and a serial number into
&#60;<TT
CLASS="REPLACEABLE"
><I
>serial</I
></TT
>@<TT
CLASS="REPLACEABLE"
><I
>site</I
></TT
>&#8201;&#62;.
For each article processed, the news system logs this ID into a
<I
CLASS="EMPHASIS"
>history</I
> file, against which all newly arrived articles
are checked.</P
><P
>&#13;
The flow between any two sites may be limited by two criteria. For one,
an article is assigned a distribution (in the <TT
CLASS="LITERAL"
>Distribution:</TT
>
header field), which may be used to confine it to a certain group of
sites. On the other hand, the newsgroups exchanged may be limited by
both the sending and receiving systems. The set of newsgroups and
distributions allowed to be transmitted to a site are usually kept in the
<TT
CLASS="FILENAME"
>sys</TT
> file.</P
><P
>&#13;
The sheer number of articles usually requires that improvements be made
to the above scheme. On UUCP networks, systems collect articles over a period
of time and combine them into a single file, which is compressed and sent to
the remote site. This is called
<I
CLASS="EMPHASIS"
>batching</I
>.</P
><P
>&#13;
An alternative technique is the <I
CLASS="EMPHASIS"
>ihave/sendme</I
> protocol that
prevents duplicate articles from being transferred,
thus saving net bandwidth. Instead of putting all articles in batch
files and sending them along, only the message IDs of articles are
combined into a giant &#8220;ihave&#8221; message and sent to the remote
site. The remote site reads this message, compares it to its history file,
and returns the list of articles it wants in a &#8220;sendme&#8221; message.
Only the requested articles are sent.</P
><P
>Of course, ihave/sendme makes sense only if it involves two big sites
that receive news from several independent feeds each, and that poll each
other often enough for an efficient flow of news.</P
><P
>&#13;Sites that are on the Internet generally rely on TCP/IP-based software that
uses the Network News Transfer Protocol (NNTP). NNTP is described in RFC-977;
it is responsible for the transfer of news between news servers and provides
Usenet access to single users on remote hosts.</P
><P
>&#13;
NNTP knows three different ways to transfer news. One is a real-time version
of ihave/sendme, also referred to as <I
CLASS="EMPHASIS"
>pushing</I
> news. The
second technique is called <I
CLASS="EMPHASIS"
>pulling</I
> news, in which the
client requests a list of articles in a given newsgroup or hierarchy that have
arrived at the server's site after a specified date, and chooses those it
cannot find in its history file. The third technique is for interactive
newsreading and allows you or your newsreader to retrieve articles from
specified newgroups, as well as post articles with incomplete header
information. </P
><P
>&#13;
At each site, news is kept in a directory hierarchy below
<TT
CLASS="FILENAME"
>/var/spool/news</TT
>, each article in a separate file, and
each newsgroup in a separate directory. The directory name is made up of the
newsgroup name, with the components being the path components. Thus,
<SPAN
CLASS="SYSTEMITEM"
>comp.os.linux.misc</SPAN
> articles are kept
in <TT
CLASS="FILENAME"
>/var/spool/news/comp/os/linux/misc</TT
>. The articles in a
newsgroup are assigned numbers in the order they arrive. This number serves as
the file's name. The range of numbers of articles currently online is kept
in a file called <TT
CLASS="FILENAME"
>active</TT
>, which at the same time serves
as a list of newsgroups your site knows.</P
><P
>&#13;
Since disk space is a finite resource, you have to start throwing away
articles after some time.<A
NAME="X-087-2-FNUN03"
HREF="#FTN.X-087-2-FNUN03"
>[2]</A
> This is called
<I
CLASS="EMPHASIS"
>expiring</I
>. Usually, articles from certain groups and
hierarchies are expired at a fixed number of days after they arrive. This
may be overridden by the poster by specifying a date of expiration in the
<TT
CLASS="LITERAL"
>Expires:</TT
> field of the article header.&#13;</P
><P
>You now have enough information to choose what to read next. UUCP users
should read about C-News in <A
HREF="x-087-2-cnews.html"
>Chapter 21</A
>. If you're using
a TCP/IP network, read about NNTP in <A
HREF="x-087-2-nntp.html"
>Chapter 22</A
>. If you
need to transfer moderate amounts of news over TCP/IP, the server described
in that chapter may be enough for you. To install a heavy-duty news server
that can handle huge volumes of material, go on to read about InterNet News
in <A
HREF="x-087-2-inn.html"
>Chapter 23</A
>.</P
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.X-087-2-FNUN02"
HREF="x-087-2-news.algorithm.html#X-087-2-FNUN02"
>[1]</A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
> Wait a minute: 60
Megs at 9,600 bps, that's 60 million multiplied by 1,024, that
is&#8230; mutter, mutter&#8230; Hey! That's 34 hours!</P
></TD
></TR
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.X-087-2-FNUN03"
HREF="x-087-2-news.algorithm.html#X-087-2-FNUN03"
>[2]</A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>Some people claim that Usenet is a conspiracy by modem and hard disk vendors.</P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x-087-2-news.usenet.html"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="x-087-2-cnews.html"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>What Is Usenet, Anyway?</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="x-087-2-news.html"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>C News</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>