mirror of https://github.com/tLDP/LDP
382 lines
18 KiB
Plaintext
382 lines
18 KiB
Plaintext
<chapter id="X-087-2-news"><title>Netnews</title>
|
|
|
|
<para>
|
|
<INDEXTERM id="usenet.news" class=startofrange><PRIMARY>Usenet</PRIMARY></INDEXTERM>
|
|
Netnews, or Usenet news, remains one of the most important and highly
|
|
valued services on computer networks today. Dismissed by some as a
|
|
mire of unsolicited commercial email and pornography, Netnews still
|
|
maintains several cases of the high-quality discussion groups that
|
|
made it a critical resource in pre-web days. Even in these times of a
|
|
billion web pages, Netnews is still a source for online help and
|
|
community on many topics.
|
|
</para>
|
|
|
|
<sect1 id="X-087-2-news.history"><title>Usenet History</title>
|
|
|
|
<para>
|
|
<indexterm id="idx-news-1" class="startofrange"><primary>news (network)</primary></indexterm>
|
|
The idea of network news was born in 1979 when two graduate students, Tom
|
|
Truscott and Jim Ellis, thought of using UUCP to connect machines for
|
|
information exchange among Unix users. They set up
|
|
a small network of three machines in North Carolina.
|
|
</para>
|
|
|
|
<para>
|
|
Initially, traffic was handled by a number of shell scripts (later
|
|
rewritten in C), but they were never released to the public. They
|
|
were quickly replaced by “A News,” the first public release of news
|
|
software.
|
|
</para>
|
|
|
|
<para>
|
|
A News was not designed to handle more than a few articles
|
|
per group and day. When the volume continued to grow, it was rewritten
|
|
by Mark Horton and Matt Glickman, who called it the “B” release
|
|
(a.k.a. B News). The first public release of B News was version 2.1
|
|
in 1982. It was expanded continuously, with several new features
|
|
added. Its current version is B News 2.11. It is slowly
|
|
becoming obsolete; its last official maintainer switched to
|
|
INN.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>Collyer, Geoff</primary></indexterm>
|
|
<indexterm><primary>Spencer, Henry</primary></indexterm>
|
|
<indexterm><primary>C News</primary></indexterm>
|
|
Geoff Collyer and Henry Spencer rewrote B News and released it in 1987; this
|
|
is release “C,” or C News. Since its release, there
|
|
have been a number of patches to C News, the most prominent being the
|
|
C News Performance Release. On sites that carry a large number of groups,
|
|
the overhead involved in frequently invoking <command>relaynews</command>,
|
|
which is responsible for dispatching incoming articles to other hosts, is
|
|
significant. The Performance Release adds an option to
|
|
<command>relaynews</command> that allows it to run in
|
|
<emphasis>daemon mode</emphasis>, through which the program puts
|
|
itself in the background. The Performance Release is the C News version
|
|
currently included in most Linux releases. We describe C News in detail in
|
|
<xref linkend="X-087-2-cnews">.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>NNTP (Network News Transfer Protocol)</primary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>NNTP</secondary></indexterm>
|
|
All news releases up to C were primarily targeted for UUCP networks,
|
|
although they could be used in other environments, as well. Efficient news
|
|
transfer over networks like TCP/IP or DECNet required a new scheme. So in
|
|
1986, the <emphasis>Network News Transfer Protocol</emphasis> (NNTP) was
|
|
introduced. It is based on network connections and specifies a number of
|
|
commands to interactively transfer and retrieve articles.
|
|
</para>
|
|
|
|
|
|
<para>
|
|
<indexterm><primary>nntpd</primary></indexterm>
|
|
There are a number of NNTP-based applications available from the Net. One of
|
|
them is the <command>nntpd</command> package by Brian Barber and Phil Lapsley,
|
|
which you can use to provide newsreading service to a number of hosts inside a
|
|
local network. <command>nntpd</command> was designed to complement news
|
|
packages, such as B News or C News, to give them NNTP features. If you want
|
|
to use NNTP with the C News server, you should read
|
|
<xref linkend="X-087-2-nntp">, which explains how to configure the
|
|
<command>nntpd</command> daemon and run it with C News.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>INN (Internet News)</primary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>Internet News</secondary></indexterm>
|
|
An alternative package supporting NNTP is INN, or
|
|
<emphasis>Internet News</emphasis>. It is not just a frontend, but a news
|
|
system in its own right. It comprises a sophisticated news relay daemon that
|
|
can maintain several concurrent NNTP links efficiently, and is therefore the
|
|
news server of choice for many Internet sites. We discuss it in detail in
|
|
<xref linkend="X-087-2-inn">.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="X-087-2-news.usenet"><title>What Is Usenet, Anyway?</title>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>Usenet</secondary></indexterm>
|
|
<indexterm><primary>Zen</primary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
|
|
<indexterm><primary>exchanging</primary><secondary>news</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
|
|
One of the most astounding facts about Usenet is that it isn't part of
|
|
any organization, nor does it have any sort of centralized network management
|
|
authority. In fact, it's part of Usenet lore that except for a technical
|
|
description, you cannot define <emphasis>what</emphasis> it is; at the risk of sounding stupid, one might define Usenet as a collaboration
|
|
of separate sites that exchange Usenet news. To be a Usenet site, all you
|
|
have to do is find another Usenet site and strike an agreement with its
|
|
owners and maintainers to exchange news with you. Providing another site
|
|
with news is called <emphasis>feeding</emphasis> it, whence another
|
|
common axiom of Usenet philosophy originates: “Get a feed, and you're
|
|
on it.”
|
|
</para>
|
|
<?troff .wcon_off>
|
|
|
|
<para>
|
|
<?troff .hw consists>
|
|
<indexterm><primary>news (network)</primary><secondary>articles</secondary></indexterm>
|
|
<indexterm><primary>articles (news)</primary></indexterm>
|
|
The basic unit of Usenet news is the
|
|
<emphasis>article</emphasis>. This is a message a user writes and
|
|
“posts” to the net. In order to enable news systems to
|
|
deal with it, it is prepended with administrative information, the
|
|
so-called article header. It is very similar to the mail header format
|
|
laid down in the Internet mail standard RFC-822, in that it consists
|
|
<?troff .ne 10>
|
|
of several lines of text, each beginning with a field name terminated
|
|
by a colon, which is followed by the field's value.<footnote
|
|
id="X-087-2-FNUN01"><para> The format of Usenet news messages is
|
|
specified in RFC-1036, “Standard for interchange of USENET
|
|
messages.”
|
|
</para>
|
|
</footnote>
|
|
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>groups</secondary></indexterm>
|
|
<indexterm><primary>newsgroups</primary></indexterm>
|
|
<?troff .ffn>
|
|
Articles are submitted to one or more <emphasis>newsgroup</emphasis>. One may
|
|
consider a newsgroup a forum for articles relating to a common topic. All
|
|
newsgroups are organized in a hierarchy, with each group's name indicating
|
|
its place in the hierarchy. This often makes it easy to see what a group is
|
|
all about. For example, anybody can see from the newsgroup name that
|
|
<systemitem role="newsgroup">comp.os.linux.announce</systemitem> is used for
|
|
announcements concerning a computer operating system named Linux.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
|
|
These articles are then exchanged between all Usenet sites that are
|
|
willing to carry news from this group. When two sites agree to exchange
|
|
news, they are free to exchange whatever newsgroups they like, and
|
|
may even add their own local news hierarchies. For example,
|
|
<systemitem role="sitename">groucho.edu</systemitem> might have a news link
|
|
to <systemitem role="sitename">barnyard.edu</systemitem>, which
|
|
is a major news feed, and several links to minor sites which it feeds
|
|
news. Now Barnyard College might receive all Usenet groups, while GMU
|
|
only wants to carry a few major hierarchies like
|
|
<systemitem role="newsgroup">sci</systemitem>,
|
|
<systemitem role="newsgroup">comp</systemitem>, or
|
|
<systemitem role="newsgroup">rec</systemitem>. Some of the downstream
|
|
sites, say a UUCP site called <systemitem role="sitename">brewhq</systemitem>,
|
|
will want to carry even fewer groups, because they don't have the network or
|
|
hardware resources. On the other hand,
|
|
<systemitem role="sitename">brewhq</systemitem> might want to receive
|
|
newsgroups from the <systemitem role="newsgroup">fj</systemitem>
|
|
hierarchy, which GMU doesn't carry. It therefore maintains another link
|
|
with <systemitem role="sitename">gargleblaster.com</systemitem>, which carries
|
|
all <systemitem role="newsgroup">fj</systemitem> groups and feeds
|
|
them to <systemitem role="sitename">brewhq</systemitem>. The news flow is
|
|
shown in <xref linkend="X-087-2-news.fig.article-flow">.
|
|
</para>
|
|
|
|
<figure id="X-087-2-news.fig.article-flow">
|
|
<title>Usenet newsflow through Groucho Marx University</title>
|
|
<!-- <graphic fileref="lag2_2001.jpg"></graphic> -->
|
|
<!-- 2016-01-28; MAB, TLDP -->
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="images/lag2_2001.jpg" format="jpg">
|
|
</imageobject>
|
|
<imageobject>
|
|
<imagedata fileref="images/lag2_2001.eps" format="eps">
|
|
</imageobject>
|
|
</mediaobject>
|
|
</figure>
|
|
|
|
<para>
|
|
The labels on the arrows originating from
|
|
<systemitem role="sitename">brewhq</systemitem> may require some explanation,
|
|
though. By default, it wants all locally generated news to be sent to
|
|
<systemitem role="sitename">groucho.edu</systemitem>. However, as
|
|
<systemitem role="sitename">groucho.edu</systemitem> does not carry the
|
|
<systemitem role="newsgroup">fj</systemitem> groups, there's no point in
|
|
sending it any messages from those groups. Therefore, the feed from
|
|
<systemitem role="sitename">brewhq</systemitem> to GMU is labeled
|
|
<literal>all,!fj</literal>, meaning that all groups
|
|
except those below <systemitem role="newsgroup">fj</systemitem> are sent to it.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="X-087-2-news.algorithm"><title>How Does Usenet Handle News?</title>
|
|
|
|
<para>
|
|
<indexterm id="idx-newsexchanging-1" class="startofrange"><primary>news (network)</primary><secondary>exchanging</secondary></indexterm>
|
|
<indexterm id="idx-newsfeeding-1" class="startofrange"><primary>news (network)</primary><secondary>feeding</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>flooding algorithm</secondary></indexterm>
|
|
<indexterm><primary>flooding algorithm</primary></indexterm>
|
|
Today, Usenet has grown to enormous proportions. Sites that carry the
|
|
whole of Netnews usually transfer something like a paltry 60 MB a day.<footnote id="X-087-2-FNUN02"><para> Wait a minute: 60
|
|
Megs at 9,600 bps, that's 60 million multiplied by 1,024, that
|
|
is… mutter, mutter… Hey! That's 34 hours!
|
|
</para>
|
|
</footnote> Of course, this requires much
|
|
more than pushing files around. So let's take a look at the way most Unix
|
|
systems handle Usenet news.
|
|
</para>
|
|
|
|
<para>
|
|
News begins when users create and post articles. Each user enters a
|
|
message into a special application called a newsreader, which
|
|
formats it appropriately for transmission to the local
|
|
news server. In Unix environments the newsreader commonly uses the
|
|
<command>inews</command> command to transmit articles to the
|
|
newsserver using the TCP/IP protocol. But it's also possible to
|
|
write the article directly into a file in a special directory called
|
|
the news spool. Once the posting is delivered to the local
|
|
news server, it takes responsibility for delivering the article to
|
|
other news users.
|
|
</para>
|
|
|
|
|
|
<para>
|
|
<indexterm><primary>news feeds</primary></indexterm>
|
|
News is distributed through the net by various transports. The medium used to be UUCP, but today the main traffic is carried by Internet
|
|
sites. The routing algorithm used is called <emphasis>flooding</emphasis>.
|
|
Each site maintains a number of links (<emphasis>news feeds</emphasis>) to
|
|
other sites. Any article generated or received by the local news system is
|
|
forwarded to them, unless it has already been at that site, in which case it
|
|
is discarded. A site may find out about all other sites the article has
|
|
already traversed by looking at the <literal>Path:</literal> header field. This
|
|
header contains a list of all systems through which the article has been
|
|
forwarded in bang path notation.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>message IDs</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>history</secondary></indexterm>
|
|
To distinguish articles and recognize duplicates, Usenet articles have
|
|
to carry a message ID (specified in the <literal>Message-Id:</literal> header
|
|
field), which combines the posting site's name and a serial number into
|
|
<<replaceable>serial</replaceable>@<replaceable>site</replaceable> >.
|
|
For each article processed, the news system logs this ID into a
|
|
<emphasis>history</emphasis> file, against which all newly arrived articles
|
|
are checked.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>limiting a feed</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>distribution</secondary></indexterm>
|
|
The flow between any two sites may be limited by two criteria. For one,
|
|
an article is assigned a distribution (in the <literal>Distribution:</literal>
|
|
header field), which may be used to confine it to a certain group of
|
|
sites. On the other hand, the newsgroups exchanged may be limited by
|
|
both the sending and receiving systems. The set of newsgroups and
|
|
distributions allowed to be transmitted to a site are usually kept in the
|
|
<filename>sys</filename> file.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>batching</secondary></indexterm>
|
|
<indexterm><primary>batching</primary><secondary>news</secondary></indexterm>
|
|
<indexterm id="idx-deliveringnews-1" class="startofrange"><primary>delivering</primary><secondary>news</secondary></indexterm>
|
|
The sheer number of articles usually requires that improvements be made
|
|
to the above scheme. On UUCP networks, systems collect articles over a period
|
|
of time and combine them into a single file, which is compressed and sent to
|
|
the remote site. This is called
|
|
<emphasis>batching</emphasis>.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>ihave/sendme</secondary></indexterm>
|
|
<indexterm><primary>ihave/sendme protocol (news)</primary></indexterm>
|
|
An alternative technique is the <emphasis>ihave/sendme</emphasis> protocol that
|
|
prevents duplicate articles from being transferred,
|
|
thus saving net bandwidth. Instead of putting all articles in batch
|
|
files and sending them along, only the message IDs of articles are
|
|
combined into a giant “ihave” message and sent to the remote
|
|
site. The remote site reads this message, compares it to its history file,
|
|
and returns the list of articles it wants in a “sendme” message.
|
|
Only the requested articles are sent.
|
|
</para>
|
|
|
|
<para>
|
|
Of course, ihave/sendme makes sense only if it involves two big sites
|
|
that receive news from several independent feeds each, and that poll each
|
|
other often enough for an efficient flow of news.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>NNTP</secondary></indexterm>
|
|
Sites that are on the Internet generally rely on TCP/IP-based software that
|
|
uses the Network News Transfer Protocol (NNTP). NNTP is described in RFC-977;
|
|
it is responsible for the transfer of news between news servers and provides
|
|
Usenet access to single users on remote hosts.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>pulling</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>pushing</secondary></indexterm>
|
|
NNTP knows three different ways to transfer news. One is a real-time version
|
|
of ihave/sendme, also referred to as <emphasis>pushing</emphasis> news. The
|
|
second technique is called <emphasis>pulling</emphasis> news, in which the
|
|
client requests a list of articles in a given newsgroup or hierarchy that have
|
|
arrived at the server's site after a specified date, and chooses those it
|
|
cannot find in its history file. The third technique is for interactive
|
|
newsreading and allows you or your newsreader to retrieve articles from
|
|
specified newgroups, as well as post articles with incomplete header
|
|
information.
|
|
</para>
|
|
|
|
<indexterm class="endofrange" startref="idx-deliveringnews-1">
|
|
<indexterm class="endofrange" startref="idx-newsexchanging-1">
|
|
<indexterm class="endofrange" startref="idx-newsfeeding-1">
|
|
|
|
<para>
|
|
<?troff .hw /var/spool/news/-comp/os/linux/misc>
|
|
<indexterm><primary>news (network)</primary><secondary>spool</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>active file</secondary></indexterm>
|
|
At each site, news is kept in a directory hierarchy below
|
|
<filename>/var/spool/news</filename>, each article in a separate file, and
|
|
each newsgroup in a separate directory. The directory name is made up of the
|
|
newsgroup name, with the components being the path components. Thus,
|
|
<systemitem role="newsgroup">comp.os.linux.misc</systemitem> articles are kept
|
|
in <filename>/var/spool/news/comp/os/linux/misc</filename>. The articles in a
|
|
newsgroup are assigned numbers in the order they arrive. This number serves as
|
|
the file's name. The range of numbers of articles currently online is kept
|
|
in a file called <filename>active</filename>, which at the same time serves
|
|
as a list of newsgroups your site knows.
|
|
</para>
|
|
<?troff .Nd 10>
|
|
|
|
<para>
|
|
<indexterm><primary>news (network)</primary><secondary>deleting old news</secondary></indexterm>
|
|
<indexterm><primary>news (network)</primary><secondary>expiration of</secondary></indexterm>
|
|
Since disk space is a finite resource, you have to start throwing away
|
|
articles after some time.<footnote id="X-087-2-FNUN03">
|
|
<para>
|
|
Some people claim that Usenet is a conspiracy by modem and hard disk vendors.
|
|
</para>
|
|
</footnote> This is called
|
|
<emphasis>expiring</emphasis>. Usually, articles from certain groups and
|
|
hierarchies are expired at a fixed number of days after they arrive. This
|
|
may be overridden by the poster by specifying a date of expiration in the
|
|
<literal>Expires:</literal> field of the article header.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
You now have enough information to choose what to read next. UUCP users
|
|
should read about C-News in <xref linkend="X-087-2-cnews">. If you're using
|
|
a TCP/IP network, read about NNTP in <xref linkend="X-087-2-nntp">. If you
|
|
need to transfer moderate amounts of news over TCP/IP, the server described
|
|
in that chapter may be enough for you. To install a heavy-duty news server
|
|
that can handle huge volumes of material, go on to read about InterNet News
|
|
in <xref linkend="X-087-2-inn">.
|
|
</para>
|
|
|
|
<indexterm class="endofrange" startref="idx-news-1">
|
|
<INDEXTERM startref="usenet.news" class=endofrange>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
|