LDP/LDP/guide/docbook/nag2/ch20.sgm

382 lines
18 KiB
Plaintext

<chapter id="X-087-2-news"><title>Netnews</title>
<para>
<INDEXTERM id="usenet.news" class=startofrange><PRIMARY>Usenet</PRIMARY></INDEXTERM>
Netnews, or Usenet news, remains one of the most important and highly
valued services on computer networks today. Dismissed by some as a
mire of unsolicited commercial email and pornography, Netnews still
maintains several cases of the high-quality discussion groups that
made it a critical resource in pre-web days. Even in these times of a
billion web pages, Netnews is still a source for online help and
community on many topics.
</para>
<sect1 id="X-087-2-news.history"><title>Usenet History</title>
<para>
<indexterm id="idx-news-1" class="startofrange"><primary>news (network)</primary></indexterm>
The idea of network news was born in 1979 when two graduate students, Tom
Truscott and Jim Ellis, thought of using UUCP to connect machines for
information exchange among Unix users. They set up
a small network of three machines in North Carolina.
</para>
<para>
Initially, traffic was handled by a number of shell scripts (later
rewritten in C), but they were never released to the public. They
were quickly replaced by &ldquo;A News,&rdquo; the first public release of news
software.
</para>
<para>
A News was not designed to handle more than a few articles
per group and day. When the volume continued to grow, it was rewritten
by Mark Horton and Matt Glickman, who called it the &ldquo;B&rdquo; release
(a.k.a. B News). The first public release of B News was version 2.1
in 1982. It was expanded continuously, with several new features
added. Its current version is B News 2.11. It is slowly
becoming obsolete; its last official maintainer switched to
INN.
</para>
<para>
<indexterm><primary>Collyer, Geoff</primary></indexterm>
<indexterm><primary>Spencer, Henry</primary></indexterm>
<indexterm><primary>C News</primary></indexterm>
Geoff Collyer and Henry Spencer rewrote B News and released it in 1987; this
is release &ldquo;C,&rdquo; or C News. Since its release, there
have been a number of patches to C News, the most prominent being the
C News Performance Release. On sites that carry a large number of groups,
the overhead involved in frequently invoking <command>relaynews</command>,
which is responsible for dispatching incoming articles to other hosts, is
significant. The Performance Release adds an option to
<command>relaynews</command> that allows it to run in
<emphasis>daemon mode</emphasis>, through which the program puts
itself in the background. The Performance Release is the C News version
currently included in most Linux releases. We describe C News in detail in
<xref linkend="X-087-2-cnews">.
</para>
<para>
<indexterm><primary>NNTP (Network News Transfer Protocol)</primary></indexterm>
<indexterm><primary>news (network)</primary><secondary>NNTP</secondary></indexterm>
All news releases up to C were primarily targeted for UUCP networks,
although they could be used in other environments, as well. Efficient news
transfer over networks like TCP/IP or DECNet required a new scheme. So in
1986, the <emphasis>Network News Transfer Protocol</emphasis> (NNTP) was
introduced. It is based on network connections and specifies a number of
commands to interactively transfer and retrieve articles.
</para>
<para>
<indexterm><primary>nntpd</primary></indexterm>
There are a number of NNTP-based applications available from the Net. One of
them is the <command>nntpd</command> package by Brian Barber and Phil Lapsley,
which you can use to provide newsreading service to a number of hosts inside a
local network. <command>nntpd</command> was designed to complement news
packages, such as B News or C News, to give them NNTP features. If you want
to use NNTP with the C News server, you should read
<xref linkend="X-087-2-nntp">, which explains how to configure the
<command>nntpd</command> daemon and run it with C News.
</para>
<para>
<indexterm><primary>INN (Internet News)</primary></indexterm>
<indexterm><primary>news (network)</primary><secondary>Internet News</secondary></indexterm>
An alternative package supporting NNTP is INN, or
<emphasis>Internet News</emphasis>. It is not just a frontend, but a news
system in its own right. It comprises a sophisticated news relay daemon that
can maintain several concurrent NNTP links efficiently, and is therefore the
news server of choice for many Internet sites. We discuss it in detail in
<xref linkend="X-087-2-inn">.
</para>
</sect1>
<sect1 id="X-087-2-news.usenet"><title>What Is Usenet, Anyway?</title>
<para>
<indexterm><primary>news (network)</primary><secondary>Usenet</secondary></indexterm>
<indexterm><primary>Zen</primary></indexterm>
<indexterm><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
<indexterm><primary>exchanging</primary><secondary>news</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
One of the most astounding facts about Usenet is that it isn't part of
any organization, nor does it have any sort of centralized network management
authority. In fact, it's part of Usenet lore that except for a technical
description, you cannot define <emphasis>what</emphasis> it is; at the risk of sounding stupid, one might define Usenet as a collaboration
of separate sites that exchange Usenet news. To be a Usenet site, all you
have to do is find another Usenet site and strike an agreement with its
owners and maintainers to exchange news with you. Providing another site
with news is called <emphasis>feeding</emphasis> it, whence another
common axiom of Usenet philosophy originates: &ldquo;Get a feed, and you're
on it.&rdquo;
</para>
<?troff .wcon_off>
<para>
<?troff .hw consists>
<indexterm><primary>news (network)</primary><secondary>articles</secondary></indexterm>
<indexterm><primary>articles (news)</primary></indexterm>
The basic unit of Usenet news is the
<emphasis>article</emphasis>. This is a message a user writes and
&ldquo;posts&rdquo; to the net. In order to enable news systems to
deal with it, it is prepended with administrative information, the
so-called article header. It is very similar to the mail header format
laid down in the Internet mail standard RFC-822, in that it consists
<?troff .ne 10>
of several lines of text, each beginning with a field name terminated
by a colon, which is followed by the field's value.<footnote
id="X-087-2-FNUN01"><para> The format of Usenet news messages is
specified in RFC-1036, &ldquo;Standard for interchange of USENET
messages.&rdquo;
</para>
</footnote>
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>groups</secondary></indexterm>
<indexterm><primary>newsgroups</primary></indexterm>
<?troff .ffn>
Articles are submitted to one or more <emphasis>newsgroup</emphasis>. One may
consider a newsgroup a forum for articles relating to a common topic. All
newsgroups are organized in a hierarchy, with each group's name indicating
its place in the hierarchy. This often makes it easy to see what a group is
all about. For example, anybody can see from the newsgroup name that
<systemitem role="newsgroup">comp.os.linux.announce</systemitem> is used for
announcements concerning a computer operating system named Linux.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
These articles are then exchanged between all Usenet sites that are
willing to carry news from this group. When two sites agree to exchange
news, they are free to exchange whatever newsgroups they like, and
may even add their own local news hierarchies. For example,
<systemitem role="sitename">groucho.edu</systemitem> might have a news link
to <systemitem role="sitename">barnyard.edu</systemitem>, which
is a major news feed, and several links to minor sites which it feeds
news. Now Barnyard College might receive all Usenet groups, while GMU
only wants to carry a few major hierarchies like
<systemitem role="newsgroup">sci</systemitem>,
<systemitem role="newsgroup">comp</systemitem>, or
<systemitem role="newsgroup">rec</systemitem>. Some of the downstream
sites, say a UUCP site called <systemitem role="sitename">brewhq</systemitem>,
will want to carry even fewer groups, because they don't have the network or
hardware resources. On the other hand,
<systemitem role="sitename">brewhq</systemitem> might want to receive
newsgroups from the <systemitem role="newsgroup">fj</systemitem>
hierarchy, which GMU doesn't carry. It therefore maintains another link
with <systemitem role="sitename">gargleblaster.com</systemitem>, which carries
all <systemitem role="newsgroup">fj</systemitem> groups and feeds
them to <systemitem role="sitename">brewhq</systemitem>. The news flow is
shown in <xref linkend="X-087-2-news.fig.article-flow">.
</para>
<figure id="X-087-2-news.fig.article-flow">
<title>Usenet newsflow through Groucho Marx University</title>
<!-- <graphic fileref="lag2_2001.jpg"></graphic> -->
<!-- 2016-01-28; MAB, TLDP -->
<mediaobject>
<imageobject>
<imagedata fileref="images/lag2_2001.jpg" format="jpg">
</imageobject>
<imageobject>
<imagedata fileref="images/lag2_2001.eps" format="eps">
</imageobject>
</mediaobject>
</figure>
<para>
The labels on the arrows originating from
<systemitem role="sitename">brewhq</systemitem> may require some explanation,
though. By default, it wants all locally generated news to be sent to
<systemitem role="sitename">groucho.edu</systemitem>. However, as
<systemitem role="sitename">groucho.edu</systemitem> does not carry the
<systemitem role="newsgroup">fj</systemitem> groups, there's no point in
sending it any messages from those groups. Therefore, the feed from
<systemitem role="sitename">brewhq</systemitem> to GMU is labeled
<literal>all,!fj</literal>, meaning that all groups
except those below <systemitem role="newsgroup">fj</systemitem> are sent to it.
</para>
</sect1>
<sect1 id="X-087-2-news.algorithm"><title>How Does Usenet Handle News?</title>
<para>
<indexterm id="idx-newsexchanging-1" class="startofrange"><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
<indexterm id="idx-newsfeeding-1" class="startofrange"><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>flooding algorithm</secondary></indexterm>
<indexterm><primary>flooding algorithm</primary></indexterm>
Today, Usenet has grown to enormous proportions. Sites that carry the
whole of Netnews usually transfer something like a paltry 60 MB a day.<footnote id="X-087-2-FNUN02"><para> Wait a minute: 60
Megs at 9,600 bps, that's 60 million multiplied by 1,024, that
is&hellip; mutter, mutter&hellip; Hey! That's 34 hours!
</para>
</footnote> Of course, this requires much
more than pushing files around. So let's take a look at the way most Unix
systems handle Usenet news.
</para>
<para>
News begins when users create and post articles. Each user enters a
message into a special application called a newsreader, which
formats it appropriately for transmission to the local
news server. In Unix environments the newsreader commonly uses the
<command>inews</command> command to transmit articles to the
newsserver using the TCP/IP protocol. But it's also possible to
write the article directly into a file in a special directory called
the news spool. Once the posting is delivered to the local
news server, it takes responsibility for delivering the article to
other news users.
</para>
<para>
<indexterm><primary>news feeds</primary></indexterm>
News is distributed through the net by various transports. The medium used to be UUCP, but today the main traffic is carried by Internet
sites. The routing algorithm used is called <emphasis>flooding</emphasis>.
Each site maintains a number of links (<emphasis>news feeds</emphasis>) to
other sites. Any article generated or received by the local news system is
forwarded to them, unless it has already been at that site, in which case it
is discarded. A site may find out about all other sites the article has
already traversed by looking at the <literal>Path:</literal> header field. This
header contains a list of all systems through which the article has been
forwarded in bang path notation.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>message IDs</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>history</secondary></indexterm>
To distinguish articles and recognize duplicates, Usenet articles have
to carry a message ID (specified in the <literal>Message-Id:</literal> header
field), which combines the posting site's name and a serial number into
&lt;<replaceable>serial</replaceable>@<replaceable>site</replaceable>&thinsp;&gt;.
For each article processed, the news system logs this ID into a
<emphasis>history</emphasis> file, against which all newly arrived articles
are checked.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>limiting a feed</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>distribution</secondary></indexterm>
The flow between any two sites may be limited by two criteria. For one,
an article is assigned a distribution (in the <literal>Distribution:</literal>
header field), which may be used to confine it to a certain group of
sites. On the other hand, the newsgroups exchanged may be limited by
both the sending and receiving systems. The set of newsgroups and
distributions allowed to be transmitted to a site are usually kept in the
<filename>sys</filename> file.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>batching</secondary></indexterm>
<indexterm><primary>batching</primary><secondary>news</secondary></indexterm>
<indexterm id="idx-deliveringnews-1" class="startofrange"><primary>delivering</primary><secondary>news</secondary></indexterm>
The sheer number of articles usually requires that improvements be made
to the above scheme. On UUCP networks, systems collect articles over a period
of time and combine them into a single file, which is compressed and sent to
the remote site. This is called
<emphasis>batching</emphasis>.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>ihave/sendme</secondary></indexterm>
<indexterm><primary>ihave/sendme protocol (news)</primary></indexterm>
An alternative technique is the <emphasis>ihave/sendme</emphasis> protocol that
prevents duplicate articles from being transferred,
thus saving net bandwidth. Instead of putting all articles in batch
files and sending them along, only the message IDs of articles are
combined into a giant &ldquo;ihave&rdquo; message and sent to the remote
site. The remote site reads this message, compares it to its history file,
and returns the list of articles it wants in a &ldquo;sendme&rdquo; message.
Only the requested articles are sent.
</para>
<para>
Of course, ihave/sendme makes sense only if it involves two big sites
that receive news from several independent feeds each, and that poll each
other often enough for an efficient flow of news.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>NNTP</secondary></indexterm>
Sites that are on the Internet generally rely on TCP/IP-based software that
uses the Network News Transfer Protocol (NNTP). NNTP is described in RFC-977;
it is responsible for the transfer of news between news servers and provides
Usenet access to single users on remote hosts.
</para>
<para>
<indexterm><primary>news (network)</primary><secondary>pulling</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>pushing</secondary></indexterm>
NNTP knows three different ways to transfer news. One is a real-time version
of ihave/sendme, also referred to as <emphasis>pushing</emphasis> news. The
second technique is called <emphasis>pulling</emphasis> news, in which the
client requests a list of articles in a given newsgroup or hierarchy that have
arrived at the server's site after a specified date, and chooses those it
cannot find in its history file. The third technique is for interactive
newsreading and allows you or your newsreader to retrieve articles from
specified newgroups, as well as post articles with incomplete header
information.
</para>
<indexterm class="endofrange" startref="idx-deliveringnews-1">
<indexterm class="endofrange" startref="idx-newsexchanging-1">
<indexterm class="endofrange" startref="idx-newsfeeding-1">
<para>
<?troff .hw /var/spool/news/-comp/os/linux/misc>
<indexterm><primary>news (network)</primary><secondary>spool</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>active file</secondary></indexterm>
At each site, news is kept in a directory hierarchy below
<filename>/var/spool/news</filename>, each article in a separate file, and
each newsgroup in a separate directory. The directory name is made up of the
newsgroup name, with the components being the path components. Thus,
<systemitem role="newsgroup">comp.os.linux.misc</systemitem> articles are kept
in <filename>/var/spool/news/comp/os/linux/misc</filename>. The articles in a
newsgroup are assigned numbers in the order they arrive. This number serves as
the file's name. The range of numbers of articles currently online is kept
in a file called <filename>active</filename>, which at the same time serves
as a list of newsgroups your site knows.
</para>
<?troff .Nd 10>
<para>
<indexterm><primary>news (network)</primary><secondary>deleting old news</secondary></indexterm>
<indexterm><primary>news (network)</primary><secondary>expiration of</secondary></indexterm>
Since disk space is a finite resource, you have to start throwing away
articles after some time.<footnote id="X-087-2-FNUN03">
<para>
Some people claim that Usenet is a conspiracy by modem and hard disk vendors.
</para>
</footnote> This is called
<emphasis>expiring</emphasis>. Usually, articles from certain groups and
hierarchies are expired at a fixed number of days after they arrive. This
may be overridden by the poster by specifying a date of expiration in the
<literal>Expires:</literal> field of the article header.
</para>
<para>
You now have enough information to choose what to read next. UUCP users
should read about C-News in <xref linkend="X-087-2-cnews">. If you're using
a TCP/IP network, read about NNTP in <xref linkend="X-087-2-nntp">. If you
need to transfer moderate amounts of news over TCP/IP, the server described
in that chapter may be enough for you. To install a heavy-duty news server
that can handle huge volumes of material, go on to read about InterNet News
in <xref linkend="X-087-2-inn">.
</para>
<indexterm class="endofrange" startref="idx-news-1">
<INDEXTERM startref="usenet.news" class=endofrange>
</sect1>
</chapter>