1176 lines
36 KiB
HTML
1176 lines
36 KiB
HTML
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>Principles of Operation</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
|
|
"><LINK
|
|
REL="HOME"
|
|
TITLE="Usenet News HOWTO "
|
|
HREF="index.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="What is the Usenet?"
|
|
HREF="x27.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="Usenet news software"
|
|
HREF="x248.html"></HEAD
|
|
><BODY
|
|
CLASS="SECTION"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>Usenet News HOWTO</TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="x27.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
></TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="x248.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H1
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN64">2. Principles of Operation</H1
|
|
><P
|
|
>Here we discuss the basic concepts behind the operation of a Usenet news
|
|
system.</P
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H2
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN67">2.1. Newsgroups and articles</H2
|
|
><P
|
|
>A Usenet news article sits in a file or in some other on-disk
|
|
data structure on the disks of a Usenet server, and its contents look
|
|
like this:</P
|
|
><TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="PROGRAMLISTING"
|
|
>Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
|
|
Newsgroups: starcom.tech.misc,starcom.tech.security
|
|
Path: news.starcomsoftware.com!purva!shuvam
|
|
From: Shuvam <shuvam@starcomsoftware.com>
|
|
Subject: "You just throw up your hands and reboot" (fwd)
|
|
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
|
Distribution: starcom
|
|
Organization: Starcom Software Pvt Ltd, India
|
|
Message-ID: <Pine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.com>
|
|
Mime-Version: 1.0
|
|
Date: Mon, 2 Jul 2001 16:27:57 GMT
|
|
|
|
Interesting quote, and interesting article.
|
|
|
|
Incidentally, comp.risks may be an interesting newsgroup to follow. We
|
|
must be receiving the feed for this group on our server, since we
|
|
receive all groups under comp.*, unless specifically cancelled. Check it
|
|
out sometime.
|
|
|
|
comp.risks tracks risks in the use of computer technology, including
|
|
issues in protecting ourselves from failures of such stuff.
|
|
|
|
Shuvam
|
|
|
|
> Date: Thu, 14 Jun 2001 08:11:00 -0400
|
|
> From: "Chris Norloff" <cnorloff@norloff.com>
|
|
> Subject: NYSE: "Throw up your hands and reboot"
|
|
>
|
|
> When the New York Stock Exchange computer systems crashed for 85
|
|
> minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
|
|
> Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
|
|
> we're all subject to the vagaries of technology. It happens on your
|
|
> own PC at home. You just throw up your hands and reboot."
|
|
>
|
|
> http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8&pagename=article
|
|
>
|
|
> Chris Norloff
|
|
>
|
|
>
|
|
> This is from --
|
|
>
|
|
> From: risko@csl.sri.com (RISKS List Owner)
|
|
> Newsgroups: comp.risks
|
|
> Subject: Risks Digest 21.48
|
|
> Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
|
|
> Organization: University of California, Berkeley
|
|
>
|
|
> RISKS-LIST: Risks-Forum Digest Monday 19 June 2001
|
|
> Volume 21 : Issue 48
|
|
>
|
|
> FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
|
|
> ACM Committee on Computers and Public Policy,
|
|
> Peter G. Neumann, moderator
|
|
>
|
|
> This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.48.html>
|
|
> and by anonymous ftp at ftp.sri.com, cd risks .
|
|
> </PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><P
|
|
>A Usenet article's header is very interesting if you want to learn
|
|
about the functioning of the Usenet. The <TT
|
|
CLASS="LITERAL"
|
|
>From</TT
|
|
>,
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Subject</TT
|
|
>, and <TT
|
|
CLASS="LITERAL"
|
|
>Date</TT
|
|
> headers are
|
|
familiar to anyone who has used email. The <TT
|
|
CLASS="LITERAL"
|
|
>Message-ID</TT
|
|
>
|
|
header contains a unique ID for each message, and is present in each
|
|
email message, though not many non-technical email users know about it.
|
|
The <TT
|
|
CLASS="LITERAL"
|
|
>Content-Type</TT
|
|
> and <TT
|
|
CLASS="LITERAL"
|
|
>Mime-Version</TT
|
|
>
|
|
headers are used for MIME encoding of articles, attaching files and
|
|
other attachments, and so on, just like in email messages.</P
|
|
><P
|
|
>The <TT
|
|
CLASS="LITERAL"
|
|
>Organisation</TT
|
|
> header is an informational header
|
|
which is supposed to carry some information identifying the organisation
|
|
to which the author of the article belongs. What remains now are the
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Newsgroups</TT
|
|
>, <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
>,
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Path</TT
|
|
> and <TT
|
|
CLASS="LITERAL"
|
|
>Distributions</TT
|
|
> headers.
|
|
These are special to Usenet articles and are very important.</P
|
|
><P
|
|
>The <TT
|
|
CLASS="LITERAL"
|
|
>Newsgroups</TT
|
|
> header specifies which newsgroups
|
|
this article should belong to. The <TT
|
|
CLASS="LITERAL"
|
|
>Distributions</TT
|
|
>
|
|
header, sadly under-utilised in today's globalised Internet world,
|
|
allows the author of an article to specify how far the article will be
|
|
re-transmitted. The author of an article, working in conjunction with
|
|
well-configured networks of Usenet servers, can control the ``radius'' of
|
|
replication of his article, thus posting an article of local significance
|
|
into a newsgroup but setting the <TT
|
|
CLASS="LITERAL"
|
|
>Distribution</TT
|
|
> header to
|
|
some suitable setting, <EM
|
|
>e.g.</EM
|
|
> <TT
|
|
CLASS="LITERAL"
|
|
>local</TT
|
|
>
|
|
or <TT
|
|
CLASS="LITERAL"
|
|
>starcom</TT
|
|
>, to prevent the article from being relayed
|
|
to servers outside the specified domain.</P
|
|
><P
|
|
>The <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
> header specifies the precise
|
|
<STRONG
|
|
>article number</STRONG
|
|
> of this article in each of the
|
|
newsgroups in which it is inserted, for the current server. When an
|
|
article is copied from one server to another as part of a newsfeed,
|
|
the receiving server throws away the old <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
> header
|
|
and inserts its own, with its own article numbers. This indicates an
|
|
interesting feature of the Usenet system: each article in a Usenet server
|
|
has a unique number (an integer) for each newsgroup it is a part of.
|
|
Our sample above has been added to two newsgroups on our server, and has
|
|
the article numbers 211 and 452 in those groups. Therefore, any Usenet
|
|
client software can query our server and ask for article number 211 in
|
|
the newsgroup <TT
|
|
CLASS="LITERAL"
|
|
>starcom.tech.misc</TT
|
|
> and get this article.
|
|
Asking for article number 452 in <TT
|
|
CLASS="LITERAL"
|
|
>starcom.tech.security</TT
|
|
>
|
|
will fetch the article too. On another server, the numbers may be very
|
|
different.</P
|
|
><P
|
|
>The <TT
|
|
CLASS="LITERAL"
|
|
>Path</TT
|
|
> specifies the list of machines through
|
|
which this article has travelled before it has reached the current
|
|
server. UUCP-style syntax is used for this string. The current
|
|
example indicates that a user called <TT
|
|
CLASS="LITERAL"
|
|
>shuvam</TT
|
|
> first
|
|
wrote this article and posted it onto a computer which calls itself
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>purva</TT
|
|
>, and this computer then transferred this article
|
|
by a newsfeed to <TT
|
|
CLASS="LITERAL"
|
|
>news.starcomsoftware.com</TT
|
|
>. The
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Path</TT
|
|
> header is critical for breaking loops in
|
|
newsfeeds, and will be discussed in detail later.</P
|
|
><P
|
|
>Our sample article will sit in the two newsgroups listed above
|
|
forever, unless expired. The Usenet software on a server is usually
|
|
configured to expire articles based on certain conditions,
|
|
<EM
|
|
>e.g.</EM
|
|
> after it's older than a certain number of
|
|
days. The C-News software we use allows expiry control based on the
|
|
newsgroup hierarchy and the type of newsgroup, <EM
|
|
>i.e.</EM
|
|
>
|
|
moderated or unmoderated. Against each class of newsgroups, it allows
|
|
the administrator to specify a number of days after which the article
|
|
will be expired. It is possible for an article to control its own
|
|
expiry, by carrying an <TT
|
|
CLASS="LITERAL"
|
|
>Expires</TT
|
|
> header specifying a
|
|
date and time. Unless overriden in the Usenet server software, the
|
|
article will be expired only after its explicit expiry time is
|
|
reached.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H2
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN107">2.2. Of readers and servers</H2
|
|
><P
|
|
>Computers which access Usenet articles are broadly of two classes:
|
|
the readers and the servers. A Usenet server carries a repository of
|
|
articles, manages them, handles newsfeeds, and offers its repository to
|
|
authorised readers to read. A Usenet reader is merely a computer with
|
|
the appropriate software to allow a user to access a software, fetch
|
|
articles, post new articles, and keep track of which articles it has
|
|
read in each newsgroup. In terms of functionality, Usenet reading
|
|
software is less interesting to a Usenet administrator than a Usenet
|
|
server software. However, in terms of lines of code, the Usenet reader
|
|
software can often be much larger than Usenet server software, primarily
|
|
because of the complexities of modern GUI code.</P
|
|
><P
|
|
>Most modern computers almost exclusively access Usenet servers using
|
|
the NNTP (Network News Transfer Protocol) for reading and posting. This
|
|
protocol can also be used for inter-server communication, but those
|
|
aspects will be discussed later. The NNTP protocol, like any other
|
|
well-designed TCP-based Internet protocol, carries ASCII commands and
|
|
responses terminated with <TT
|
|
CLASS="LITERAL"
|
|
>CR-LF</TT
|
|
>, and comprises a
|
|
sequence of commands, somewhat reminiscent of the POP3 protocol for
|
|
email. Using NNTP, a Usenet reader program connects to a Usenet server,
|
|
asks for a list of active newsgroups, and receives this (often huge)
|
|
list. It then sets the ``current newsgroup'' to one of these, depending
|
|
on what the user wants to browse through. Having done this, it gets the
|
|
meta-data of all current articles in the group, including the author,
|
|
subject line, date, and size of each article, and displays an index of
|
|
articles to the user.</P
|
|
><P
|
|
>The user then scans through this list, selects an article, and
|
|
asks the reader to fetch it. The reader gives the article number of
|
|
this article to the server, and fetches the full article for the user
|
|
to read through. Once the user finishes his NNTP session, he exits,
|
|
and the reader program closes the NNTP socket. It then (usually)
|
|
updates a local file in the user's home area, keeping track of which
|
|
news articles the user has read. These articles are typically not shown
|
|
to the user next time, thus allowing the user to progress rapidly to new
|
|
articles in each session. The reader software is helped along in this
|
|
endeavour by the <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
> header, using which it knows
|
|
all the different identities by which a single article is identified
|
|
in the server. Thus, if you read the sample article given above by
|
|
accessing <TT
|
|
CLASS="LITERAL"
|
|
>starcom.tech.misc</TT
|
|
>, you'll never be shown
|
|
this article again when you access <TT
|
|
CLASS="LITERAL"
|
|
>starcom.tech.misc</TT
|
|
>
|
|
or <TT
|
|
CLASS="LITERAL"
|
|
>starcom.tech.security</TT
|
|
>; your reader software will
|
|
do this by tracking the <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
> header and mapping
|
|
article numbers.</P
|
|
><P
|
|
>When a user posts an article, he first composes his message using
|
|
the user interface of his reader software. When he finally gives the
|
|
command to send the article, the reader software contacts the Usenet
|
|
server using the pre-existing NNTP connection and sends the article to
|
|
it. The article carries a <TT
|
|
CLASS="LITERAL"
|
|
>Newsgroups</TT
|
|
> header with the
|
|
list of newsgroups to post to, often a <TT
|
|
CLASS="LITERAL"
|
|
>Distribution</TT
|
|
>
|
|
header with a distribution specification, and other headers
|
|
like <TT
|
|
CLASS="LITERAL"
|
|
>From</TT
|
|
>, <TT
|
|
CLASS="LITERAL"
|
|
>Subject</TT
|
|
>
|
|
<EM
|
|
>etc.</EM
|
|
> These headers are used by the server
|
|
software to do the right thing. Special and rare headers like
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Expires</TT
|
|
> and <TT
|
|
CLASS="LITERAL"
|
|
>Approved</TT
|
|
> are acted upon
|
|
when present. The server assigns a new article number to the article for
|
|
each newsgroup it is posted to, and creates a new <TT
|
|
CLASS="LITERAL"
|
|
>Xref</TT
|
|
>
|
|
header for the article.</P
|
|
><P
|
|
>Transfer of articles between servers is done in various ways, and
|
|
is discussed in quite a bit of detail in Section XXX titled
|
|
``Newsfeeds'' below.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H2
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN128">2.3. Newsfeeds</H2
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H3
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN130">2.3.1. Fundamental concepts</H3
|
|
><P
|
|
>When we try to analyse newsfeeds in real life, we begin to see
|
|
that, for most sites, traffic flow is not symmetrical in both
|
|
directions. We usually find that one server will feed the bulk
|
|
of the world's articles to one or more secondary servers every
|
|
day, and receive a few articles written by the users of those
|
|
secondary servers in exchange. Thus, we usually find that
|
|
articles flow down from the stem to the branches to the leaves
|
|
of the worldwide Usenet server network, and not exactly in a totally
|
|
balanced mesh flow pattern. Therefore, we use the term
|
|
``upstream server'' to refer to the server from which we receive
|
|
the bulk of our daily dose of articles, and ``downstream
|
|
server'' to refer to those servers which receive the bulk dose
|
|
of articles from us.</P
|
|
><P
|
|
>Newsfeeds relay articles from one server to their ``next door
|
|
neighbour'' servers, metaphorically speaking. Therefore, articles
|
|
move around the globe, not by a massive number of single-hop
|
|
transfers from the originating server to every other server in
|
|
the world, but in a sequence of hops, like passing the baton in
|
|
a relay race. This increases the latency time for an article
|
|
to reach a remote tertiary server after, say, ten hops, but
|
|
it allows tighter control of what gets relayed at every hop,
|
|
and helps in redundancy, decentralisation of server loads,
|
|
and conservation of network bandwidth. In this respect, Usenet
|
|
newsfeeds are more complex than HTTP data flows, which
|
|
typically use single-hop techniques.</P
|
|
><P
|
|
>Each Usenet news server therefore has to worry about
|
|
newsfeeds each time it receives an article, either by a fresh post
|
|
or from an incoming newsfeed. When the Usenet server digests this
|
|
article and files it away in its repository, it simultaneously
|
|
looks through its database to see which other server it should
|
|
feed the article to. In order to do this, it carries out a
|
|
sequence of checks, described below.</P
|
|
><P
|
|
>Each server knows which other servers are its ``next door
|
|
neighbours;'' this information is kept in its newsfeed
|
|
configuration information. Against each of its ``next door
|
|
neighbours,'' there will be a list of newsgroups which it
|
|
wants, and a list of distributions. The new article's list of
|
|
newsgroups will be matched against the newsgroup list of the
|
|
``next door neighbour'' to see whether there's even a single
|
|
common newsgroup which makes it necessary to feed the article to
|
|
it. If there's a matching newsgroup, and the server's distribution
|
|
list matches the article's distribution, then the article is
|
|
marked for feeding to this neighbour.</P
|
|
><P
|
|
>When the neighbour receives the article as part of the
|
|
feed, it performs some sanity checks of its own. The first check
|
|
it performs is on the <TT
|
|
CLASS="LITERAL"
|
|
>Newsgroups</TT
|
|
> header of
|
|
the new article. If none of the newsgroups listed there are part
|
|
of the active newsgroups list of this server, then the article
|
|
can be rejected. An article rejected thus may even be queued for
|
|
outgoing feeds to other servers, but will not be digested for
|
|
incorporation into the local article repository.</P
|
|
><P
|
|
>The next check performed is against the
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Path</TT
|
|
> header of the incoming article. If this
|
|
header lists the name of the current Usenet server anywhere,
|
|
it indicates that it has already passed through this server at
|
|
least once before, and is now re-appearing here erroneously because
|
|
of a newsfeed loop. Such loops are quite often configured into
|
|
newsfeed topologies for redundancy: ``I'll get the articles from
|
|
Server X if not Server Y, and may the first one in win.'' The
|
|
Usenet server software automatically detects a duplicate feed
|
|
of an article and rejects it.</P
|
|
><P
|
|
>The next check is against what is called the server's
|
|
<EM
|
|
>history database</EM
|
|
>. Every Usenet server has
|
|
a history database, which is a list of the message IDs of all
|
|
current articles in the local repository. Oftentimes the history
|
|
database also carries the message IDs of all messages recently
|
|
expired. If the incoming article's message ID matches any of the
|
|
entries in the database, then again it is rejected without being
|
|
filed in the local repository. This is a second loop detection
|
|
method. Sometimes, the mere checking of the article's
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>Path</TT
|
|
> header does not detection of all
|
|
potential problems, because the problem may be a re-insertion
|
|
instead of a loop. A re-insertion happens when the same incoming
|
|
batch of news articles is re-fed into the local server, perhaps
|
|
after recovering the system's data from tapes after a system
|
|
crash. In such cases, there's no newsfeed loop, but there's
|
|
still the risk that one article may be digested into the local
|
|
server twice. The history database prevents this.</P
|
|
><P
|
|
>All these simple checks are very effective, and work
|
|
across server and software types, as per the Internet standards.
|
|
Together, they allow robust and fail-safe Usenet article flow
|
|
across the world.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H3
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN144">2.3.2. Types of newsfeeds</H3
|
|
><P
|
|
>This section explains the basics of newsfeeds, without getting
|
|
into details of software and configuration files.</P
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H4
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN147">2.3.2.1. Queued feeds</H4
|
|
><P
|
|
> This is the commonest method of sending articles from one server
|
|
to another, and is followed whenever large volumes of articles
|
|
are to be transferred per day. This approach needs a one-time
|
|
modification to the upstream server's configuration for each
|
|
outgoing feed, to define a new <EM
|
|
>queue.</EM
|
|
>
|
|
</P
|
|
><P
|
|
> In essence all queued feeds work in the following way. When the
|
|
sending server receives an article, it processes it for
|
|
inclusion into its local repository, and also checks through all
|
|
its outgoing feed definitions to see whether the article needs
|
|
to be queued for any of the feeds. If yes, it is added to a
|
|
<EM
|
|
>queue file</EM
|
|
> for each outgoing feed. The
|
|
precise details
|
|
of the queue file can change depending on the software
|
|
implementation, but the basic processes remain the same. A queue
|
|
file is a list of queued articles, but does not contain the
|
|
article contents. Typical queue files are ASCII text files with
|
|
one line per article giving the path to a copy of the article in
|
|
the local spool area.
|
|
</P
|
|
><P
|
|
> Later, a separate process picks up each queue file and creates
|
|
one or more <EM
|
|
>batches</EM
|
|
> for each outgoing feed.
|
|
A <EM
|
|
>batch</EM
|
|
> is a large file containing multiple
|
|
Usenet news
|
|
articles. Once the batches are created, various transport
|
|
mechanisms can be used to move the files from sending server to
|
|
receiving server. You can even use scripted FTP. You only need
|
|
to ensure that the batch is picked up from the upstream server
|
|
and somehow copied into a designated incoming batch directory in
|
|
the downstream server.
|
|
</P
|
|
><P
|
|
> UUCP has traditionally been the mechanism of choice for batch
|
|
movement, because it predates the Internet and wide availability
|
|
of fast packet-switched data networks. Today, with TCP/IP
|
|
everywhere, UUCP once again emerges as the most logical choice
|
|
of batch movement, because it too has moved with the times: it
|
|
can work over TCP.
|
|
</P
|
|
><P
|
|
> NNTP is the <EM
|
|
>de facto</EM
|
|
> mechanism of choice
|
|
for moving
|
|
queued newsfeeds for carrier-class Usenet servers on the
|
|
Internet, and unfortunately, for a lot of other Usenet servers
|
|
as well. The reason why we find this choice unfortunate is
|
|
discussed in <A
|
|
HREF="x1243.html#FEEDEFFICIENCY"
|
|
>Section 12.1</A
|
|
>> below. But in NNTP
|
|
feeds, an intermediate step of building batches out of queue
|
|
files can be eliminated --- this is both its strength and its
|
|
weakness.
|
|
</P
|
|
><P
|
|
> In the case of queued NNTP feeds, articles get added to queue
|
|
files as described above. An NNTP transmit process periodically
|
|
wakes up, picks up a queue file, and makes an NNTP connection to
|
|
the downstream server. It then begins a processing loop where,
|
|
for each queued article, it uses the NNTP
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>IHAVE</TT
|
|
>
|
|
command to inform the downstream server of the article's
|
|
message~ID. The downstream server checks its local repository to
|
|
see whether it already has the message. If not, it responds with
|
|
a <TT
|
|
CLASS="LITERAL"
|
|
>SENDME</TT
|
|
> response. The transmitting server
|
|
then pumps
|
|
out the article contents in plaintext form. When all articles
|
|
in the queue have been thus processed, the sending server closes
|
|
the connection. If the NNTP connection breaks in between due to
|
|
any reason, the sending server truncates the queue file and
|
|
retains only those articles which are yet to be transmitted,
|
|
thus minimising repeat transmissions.
|
|
</P
|
|
><P
|
|
><A
|
|
NAME="DIALUPNONNTP"
|
|
></A
|
|
>>
|
|
A queued NNTP feed works with the sending server making an NNTP
|
|
connection to the receiving server. This implies that the
|
|
receiving server must have an IP address which is known to the
|
|
sending server or can be looked up in the DNS. If the receiving
|
|
server connects to the Internet periodically using a dialup
|
|
connection and works with a dynamically assigned IP address,
|
|
this can get tricky. UUCP feeds suffer no such problems because
|
|
the sending server for the newsfeed can be the UUCP server,
|
|
<EM
|
|
>i.e.</EM
|
|
>
|
|
passive. The receiving server for the feed can be the UUCP
|
|
master, <EM
|
|
>i.e.</EM
|
|
> the active party. So the
|
|
receiving server can then
|
|
initiate the UUCP connection and connect to the sending server.
|
|
Thus, if even one of the two parties has a static IP address,
|
|
UUCP queued feeds can work fine.
|
|
</P
|
|
><P
|
|
> Thus, NNTP feeds can be sent out a little faster than the
|
|
batched transmission processes used for UUCP and other older
|
|
methods, because no batches need to be constructed. However,
|
|
NNTP is often used in newsfeeds where it is not necessary and it
|
|
results in colossal waste of bandwidth. Before we study
|
|
efficiency issues of NNTP versus batched feeds, we will cover
|
|
another way feeds can be organised using NNTP: the pull feeds.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H4
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="AEN168">2.3.2.2. Pull feeds</H4
|
|
><P
|
|
> This method of transferring a set of articles works only over
|
|
NNTP, and requires absolutely no configuration on the
|
|
transmitting, or upstream, server. In fact, the upstream server
|
|
cannot even easily detect that the downstream server is pulling
|
|
out a feed --- it appears to be just a heavy and thorough
|
|
newsreader, that's all.
|
|
</P
|
|
><P
|
|
> This pull feed works by the downstream server pulling out
|
|
articles i one by one, just like any NNTP newsreader, using the
|
|
NNTP <TT
|
|
CLASS="LITERAL"
|
|
>ARTICLE</TT
|
|
> command with the Message-ID as
|
|
parameter.
|
|
The interesting detail is how it gets the message~IDs to begin
|
|
with. For this, it uses an NNTP command, specially designed for
|
|
pull feeds, called <TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
>. This command
|
|
takes a hierarchy and a date,
|
|
<TABLE
|
|
BORDER="1"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="SCREEN"
|
|
> NEWNEWS comp 15081997 </PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
>
|
|
</P
|
|
><P
|
|
> This command is sent by the downstream server over NNTP to the
|
|
upstream server, and in effect asks the upstream server to list
|
|
out all news articles which are newer than 15 August 1997 in the
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>comp</TT
|
|
> hierarchy. The upstream server responds
|
|
with a
|
|
(often huge) list of message~IDs, one per line, ending with a
|
|
period on a line by itself.
|
|
</P
|
|
><P
|
|
> The pulling server then compares each newly received message~ID
|
|
with its own article database and makes a (possibly shorter)
|
|
list of all articles which it does not have, thus eliminating
|
|
duplicate fetches. That done, it begins fetching articles one
|
|
by one, using the NNTP <TT
|
|
CLASS="LITERAL"
|
|
>ARTICLE</TT
|
|
> command as
|
|
mentioned above.
|
|
</P
|
|
><P
|
|
> In addition, there is another NNTP command,
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>NEWGROUPS</TT
|
|
>,
|
|
which allows the NNTP client --- <EM
|
|
>i.e.</EM
|
|
> the
|
|
downstream server in
|
|
this case --- to ask its upstream server what were the new
|
|
newsgroups created since a given date. This allows the
|
|
downstream server to add the new groups to its
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>active</TT
|
|
> file.
|
|
</P
|
|
><P
|
|
> The <TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
> based approach is usually one of
|
|
the most inefficient methods of pulling out a large Usenet feed.
|
|
By inefficiency, here we refer to the CPU loads and RAM
|
|
utilisation on the upstream server, not on bandwidth usage. This
|
|
inefficiency is because most Usenet news servers do not keep
|
|
their article databases indexed by hierarchy and date; CNews
|
|
certainly does not. This means that a <TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
>
|
|
command issued to an upstream server will put that server into a
|
|
sequential search of its article database, to see which articles
|
|
fit into the hierarchy given and are newer than the given date.
|
|
</P
|
|
><P
|
|
> If pull feeds were to become the most common way of sending out
|
|
articles, then all upstream servers would badly need an
|
|
efficient way of sorting their article databases to allow each
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
> command to rapidly generate its list
|
|
of matching articles. A slow upstream server today might take
|
|
minutes to begin responding to a <TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
>
|
|
command, and
|
|
the downstream server may time out and close its NNTP connection
|
|
in the meanwhile. We have often seen this happening, till we
|
|
tweak timeouts.
|
|
</P
|
|
><P
|
|
> There are basic efficiency issues of bandwidth utilisation
|
|
involved in NNTP for news feeds, which are applicable for both
|
|
queued and pull feeds. But the problem with
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>NEWNEWS</TT
|
|
> is unique to pull feeds, and relates
|
|
to server loads, not bandwidth wastage.
|
|
</P
|
|
></DIV
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECTION"
|
|
><H2
|
|
CLASS="SECTION"
|
|
><A
|
|
NAME="CONTROLMSG">2.4. Control messages</H2
|
|
><P
|
|
>The Usenet is a massive dispersed collection of servers which
|
|
operate almost without any supervision, provided they have adequate disk
|
|
space and do not suffer disk corruption due to power failures,
|
|
<EM
|
|
>etc.</EM
|
|
> (It is indeed surprising how self-managing a
|
|
good Usenet server is, provided these two pre-requisites are met.) These
|
|
servers are each under the control of human administrators, but it is
|
|
preferable that certain routine actions be performed across all these
|
|
servers remotely from one location, without the manual intervention of
|
|
these humans.</P
|
|
><P
|
|
>One common need for centralised operations is the creation of new
|
|
groups in the standard eight hierarchies. The Usenet follows a fairly
|
|
formal process which asks for votes from readers worldwide before
|
|
deciding on the restructuring of its newsgroups list, including merging of
|
|
low-volume groups, splitting of high-volume groups into many specialised
|
|
groups, creating new groups, and even deleting groups. Once the voting
|
|
process for a change concludes and the change action is to be carried
|
|
out, it would be extremely tedious to send email to the hundreds of
|
|
thousands of Usenet administrators and hope that they make the changes
|
|
right, and answer their doubts if they get confused. It would be much
|
|
better to have an <EM
|
|
>automatic</EM
|
|
> way to make the
|
|
changes across all servers, of course with proper authorisation.</P
|
|
><P
|
|
>The solution to this does not lie in giving some central authority
|
|
the ability to run an OS-level command of his choice on all the world's
|
|
Usenet servers, because OS commands differ from OS to OS, and because
|
|
few Usenet administrators would trust a stranger from another part of
|
|
the world with OS level access. Therefore, the solution lay in defining
|
|
a small set of common Usenet maintenance actions, and permitting only
|
|
these actions to be triggered on all servers through the passing of
|
|
special command messages, called <EM
|
|
>control
|
|
messages</EM
|
|
>.</P
|
|
><P
|
|
>Control messages look like ordinary Usenet articles, more or less.
|
|
They have an extra header line, with its value in a specific format,
|
|
but they usually carry body text which looks like a normal human-written
|
|
article. Here is a control message (a spurious one at that, but it'll
|
|
do for now):</P
|
|
><TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="PROGRAMLISTING"
|
|
>Xref: news.starcomsoftware.com control:814217
|
|
Path: news.starcomsoftware.com!linux594.dn.net!news.dn.hoopoo.com!
|
|
feed-out.newsfeeds.com!newsfeeds.com!feed.newsfeeds.com!
|
|
newsfeeds.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!
|
|
newsfeed.icl.net!newsfeed.skycache.com!Cidera!newsfeed.gamma.ru!
|
|
Gamma.RU!carrier.kiev.ua!goblin.nadrabank.kiev.ua!not-for-mail
|
|
From: tale@uunet.uu.net (David C Lawrence)
|
|
Newsgroups: news.groups,humanities.hipcrime
|
|
Subject: cmsg newgroup humanities.hipcrime
|
|
Control: newgroup humanities.hipcrime
|
|
Date: Sun, 18 Feb 2001 11:50:28 GMT
|
|
Organization: The Cabal
|
|
Lines: 20
|
|
Approved: tale@uunet.uu.net
|
|
Message-ID: <3afWYZTIR.G5YOC2@uunet.uu.net>
|
|
NNTP-Posting-Host: 203.145.147.67
|
|
X-Trace: goblin.nadrabank.kiev.ua 982528840 21455 203.145.147.67
|
|
(18 Feb 2001 20:40:40 GMT)
|
|
X-Complaints-To: usenet@nadrabank.kiev.ua
|
|
NNTP-Posting-Date: 18 Feb 2001 20:40:40 GMT
|
|
X-No-Archive: Yes
|
|
|
|
humanities.hipcrime is an unmoderated newsgroup which passed its
|
|
vote for creation by 326:10 as reported in news.announce.newgroups
|
|
on 18 Feb 2001.
|
|
|
|
For your newsgroups file:
|
|
humanities.hipcrime HipCrime for Humanity - you committed one now!
|
|
|
|
Anyone can create a newsgroup in the alt, biz, comp, earth,
|
|
humanities, misc, news, meow, rec, sci, soc, talk, us, or
|
|
any other Usenet hierarchy. New newsgroup proposals may be
|
|
optionally discussed in news.groups. Please be sure that your
|
|
/usr/lib/news/control.ctl is configured correctly:
|
|
|
|
## NEWGROUP MESSAGES
|
|
## honor them all and log in \${LOG}/newgroup.log
|
|
newgroup:*:alt.*|biz.*|comp.*|earth.*|humanities.*|misc.*|news.*|\
|
|
meaw.*|rec.*|sci.*|soc.*|talk.*|us.*:doit=newgroup
|
|
|
|
## RMGROUP MESSAGES
|
|
## drop them all and don't log
|
|
rmgroup:*:*:drop
|
|
|
|
Meow!
|
|
David C Lawrence</PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><P
|
|
>A control message must have a <TT
|
|
CLASS="LITERAL"
|
|
>Control</TT
|
|
>
|
|
header. Besides, all control messages <EM
|
|
>will</EM
|
|
>
|
|
have an <TT
|
|
CLASS="LITERAL"
|
|
>Approved</TT
|
|
> header, like messages posted
|
|
to moderated newsgroups. The <TT
|
|
CLASS="LITERAL"
|
|
>Control</TT
|
|
> header
|
|
actually specifies a command to run on the local server, and the
|
|
parameter(s) to supply to it. The local Usenet server software is
|
|
supposed to figure out its own way to get the task done. In this
|
|
example, the command in the <TT
|
|
CLASS="LITERAL"
|
|
>Control</TT
|
|
> header is
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>newgroup</TT
|
|
>, which creates a new newsgroup. And its
|
|
parameter is <TT
|
|
CLASS="LITERAL"
|
|
>humanities.hipcrime</TT
|
|
>, which gives the
|
|
name of the newsgroup to create.</P
|
|
><P
|
|
>In C-News, the control message implementation works through
|
|
separate shellscripts kept in a fixed directory,
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>$NEWSBIN/ctl/</TT
|
|
>, as a security measure; if the
|
|
executable script isn't present there, the control message command will
|
|
be ignored. The control message types supported are:</P
|
|
><P
|
|
></P
|
|
><UL
|
|
><LI
|
|
><P
|
|
><TT
|
|
CLASS="LITERAL"
|
|
>checkgroups</TT
|
|
>: control message to
|
|
check whether the list of newsgroups in your active file are all correct
|
|
as per a master list of newsgroups sent in the control message</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
><TT
|
|
CLASS="LITERAL"
|
|
>newgroup</TT
|
|
>: control message to create a
|
|
new newsgroup</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
><TT
|
|
CLASS="LITERAL"
|
|
>rmgroup</TT
|
|
>: control message to delete a
|
|
newsgroup and all articles in it</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
><TT
|
|
CLASS="LITERAL"
|
|
>sendsys</TT
|
|
>: control message to cause an
|
|
email response to be sent to the author with the <TT
|
|
CLASS="LITERAL"
|
|
>sys</TT
|
|
>
|
|
file of your server in it. This results in a response storm of
|
|
emails from all the Usenet servers in the world to the author. These
|
|
responses allow the sender of the control message to analyse all the
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>sys</TT
|
|
> files of the world's Usenet servers and create the
|
|
directed graph of Usenet newsfeeds. Why someone would want to do this is
|
|
hard to guess, but the result is surely an awesome picture of one facet
|
|
of networked human civilisation, like looking at a giant world map.</P
|
|
><P
|
|
>Incidentally, there is no invasion of privacy here, because
|
|
your server's <TT
|
|
CLASS="LITERAL"
|
|
>sys</TT
|
|
> file is supposed to be public
|
|
information, if you take feeds from the public Usenet.</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
><TT
|
|
CLASS="LITERAL"
|
|
>version</TT
|
|
>: control message which results
|
|
in your Usenet software sending an email to the author of the message,
|
|
containing the type and version of the Usenet news software you are
|
|
using. This too is not an invasion of privacy, because this information
|
|
is supposed to be public knowledge.</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
>The cancel message: the most frequently occurring type of
|
|
control messages. They specify the message ID of an article, and result
|
|
in the cancellation (deletion) of that article. If you post an article
|
|
and regret it a moment later, your Usenet newsreader software usually
|
|
allows you to ``cancel'' it by generating a cancel message.</P
|
|
></LI
|
|
></UL
|
|
><P
|
|
>The Usenet news software maintains a pseudo-newsgroup called
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>control</TT
|
|
>, where it files all control messages it
|
|
receives. If you have an incoming newsfeed from the public Usenet, your
|
|
server's <TT
|
|
CLASS="LITERAL"
|
|
>control</TT
|
|
> group will usually be full with
|
|
thousands of cancel messages from trigger-happy fingers all over the
|
|
world. Usenet news server software like C-News allows you to filter the
|
|
incoming feed based on newsgroups, and will discard articles for groups
|
|
they do not subscribe to. But since all servers have to receive and
|
|
process control messages, they will all accept these cancel messages,
|
|
though many of them may apply to articles which are not part of your
|
|
highly-pruned subset of groups. <TT
|
|
CLASS="LITERAL"
|
|
>C'est la vie</TT
|
|
>.</P
|
|
><P
|
|
>Remember to set expiry for the <TT
|
|
CLASS="LITERAL"
|
|
>control</TT
|
|
> group to
|
|
one day or even shorter, so that the junk can be cleaned out as rapidly as
|
|
possible, just like the <TT
|
|
CLASS="LITERAL"
|
|
>junk</TT
|
|
> newsgroup.</P
|
|
><P
|
|
>The beauty of the control message architecture is that it
|
|
integrates seamlessly into the newsfeed mechanism for automatic control
|
|
of the network of servers. No separate channel of connection is needed
|
|
for the control actions. And article replication automatically
|
|
propagates control messages with human-readable articles, thus
|
|
guaranteeing reach across heterogenous networks technologies.</P
|
|
><P
|
|
>What your Usenet server does on receiving a
|
|
control message is governed by an authorisation file:
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>$NEWSCTL/controlperms</TT
|
|
> in the case of C-News
|
|
and <TT
|
|
CLASS="LITERAL"
|
|
>control.ctl</TT
|
|
> in the case of INN, for
|
|
instance. The security measures implemented by this module are
|
|
further enhanced by the <TT
|
|
CLASS="LITERAL"
|
|
>pgpcontrol</TT
|
|
> package with its
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>pgpverify</TT
|
|
> script. Using <TT
|
|
CLASS="LITERAL"
|
|
>pgpverify</TT
|
|
>,
|
|
your server can check that all control messages (except for article
|
|
cancellation messages) are digitally signed by a trusted party
|
|
using military-spec public key cryptography. Our integrated
|
|
Usenet news software distribution includes integration with
|
|
<TT
|
|
CLASS="LITERAL"
|
|
>pgpverify</TT
|
|
>.</P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="x27.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="x248.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>What is the Usenet?</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
> </TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>Usenet news software</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |