old-www/HOWTO/Usenet-News-HOWTO/x64.html

<HTML
><HEAD
><TITLE
>Principles of Operation</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Usenet News HOWTO "
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="What is the Usenet?"
HREF="x27.html"><LINK
REL="NEXT"
TITLE="Usenet news software"
HREF="x248.html"></HEAD
><BODY
CLASS="SECTION"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Usenet News HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x27.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="x248.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECTION"
><H1
CLASS="SECTION"
><A
NAME="AEN64">2. Principles of Operation</H1
><P
>Here we discuss the basic concepts behind the operation of a Usenet news
system.</P
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN67">2.1. Newsgroups and articles</H2
><P
>A Usenet news article sits in a file or in some other on-disk
data structure on the disks of a Usenet server, and its contents look
like this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
Newsgroups: starcom.tech.misc,starcom.tech.security
Path: news.starcomsoftware.com!purva!shuvam
From: Shuvam &#60;shuvam@starcomsoftware.com&#62;
Subject: "You just throw up your hands and reboot" (fwd)
Content-Type: TEXT/PLAIN; charset=US-ASCII
Distribution: starcom
Organization: Starcom Software Pvt Ltd, India
Message-ID: &#60;Pine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.com&#62;
Mime-Version: 1.0
Date: Mon, 2 Jul 2001 16:27:57 GMT

Interesting quote, and interesting article.

Incidentally, comp.risks may be an interesting newsgroup to follow. We
must be receiving the feed for this group on our server, since we
receive all groups under comp.*, unless specifically cancelled. Check it
out sometime.

comp.risks tracks risks in the use of computer technology, including
issues in protecting ourselves from failures of such stuff.

Shuvam

&#62; Date: Thu, 14 Jun 2001 08:11:00 -0400
&#62; From: "Chris Norloff" &#60;cnorloff@norloff.com&#62;
&#62; Subject: NYSE: "Throw up your hands and reboot"
&#62;
&#62; When the New York Stock Exchange computer systems crashed for 85
&#62; minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
&#62; Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
&#62; we're all subject to the vagaries of technology. It happens on your
&#62; own PC at home. You just throw up your hands and reboot."
&#62;
&#62; http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8&#38;pagename=article
&#62;
&#62; Chris Norloff
&#62;
&#62;
&#62; This is from --
&#62;
&#62; From: risko@csl.sri.com (RISKS List Owner)
&#62; Newsgroups: comp.risks
&#62; Subject: Risks Digest 21.48
&#62; Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
&#62; Organization: University of California, Berkeley
&#62;
&#62; RISKS-LIST: Risks-Forum Digest  Monday 19 June 2001
&#62; Volume 21 : Issue 48
&#62;
&#62;    FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
&#62;    ACM Committee on Computers and Public Policy,
&#62;    Peter G. Neumann, moderator
&#62;
&#62; This issue is archived at &#60;URL:http://catless.ncl.ac.uk/Risks/21.48.html&#62;
&#62; and by anonymous ftp at ftp.sri.com, cd risks .
&#62; </PRE
></FONT
></TD
></TR
></TABLE
><P
>A Usenet article's header is very interesting if you want to learn
about the functioning of the Usenet. The <TT
CLASS="LITERAL"
>From</TT
>,
<TT
CLASS="LITERAL"
>Subject</TT
>, and <TT
CLASS="LITERAL"
>Date</TT
> headers are
familiar to anyone who has used email. The <TT
CLASS="LITERAL"
>Message-ID</TT
>
header contains a unique ID for each message, and is present in each
email message, though not many non-technical email users know about it.
The <TT
CLASS="LITERAL"
>Content-Type</TT
> and <TT
CLASS="LITERAL"
>Mime-Version</TT
>
headers are used for MIME encoding of articles, attaching files and
other attachments, and so on, just like in email messages.</P
><P
>The <TT
CLASS="LITERAL"
>Organisation</TT
> header is an informational header
which is supposed to carry some information identifying the organisation
to which the author of the article belongs. What remains now are the
<TT
CLASS="LITERAL"
>Newsgroups</TT
>, <TT
CLASS="LITERAL"
>Xref</TT
>,
<TT
CLASS="LITERAL"
>Path</TT
> and <TT
CLASS="LITERAL"
>Distributions</TT
> headers.
These are special to Usenet articles and are very important.</P
><P
>The <TT
CLASS="LITERAL"
>Newsgroups</TT
> header specifies which newsgroups
this article should belong to. The <TT
CLASS="LITERAL"
>Distributions</TT
>
header, sadly under-utilised in today's globalised Internet world,
allows the author of an article to specify how far the article will be
re-transmitted. The author of an article, working in conjunction with
well-configured networks of Usenet servers, can control the ``radius'' of
replication of his article, thus posting an article of local significance
into a newsgroup but setting the <TT
CLASS="LITERAL"
>Distribution</TT
> header to
some suitable setting, <EM
>e.g.</EM
> <TT
CLASS="LITERAL"
>local</TT
>
or <TT
CLASS="LITERAL"
>starcom</TT
>, to prevent the article from being relayed
to servers outside the specified domain.</P
><P
>The <TT
CLASS="LITERAL"
>Xref</TT
> header specifies the precise
<STRONG
>article number</STRONG
> of this article in each of the
newsgroups in which it is inserted, for the current server. When an
article is copied from one server to another as part of a newsfeed,
the receiving server throws away the old <TT
CLASS="LITERAL"
>Xref</TT
> header
and inserts its own, with its own article numbers. This indicates an
interesting feature of the Usenet system: each article in a Usenet server
has a unique number (an integer) for each newsgroup it is a part of.
Our sample above has been added to two newsgroups on our server, and has
the article numbers 211 and 452 in those groups. Therefore, any Usenet
client software can query our server and ask for article number 211 in
the newsgroup <TT
CLASS="LITERAL"
>starcom.tech.misc</TT
> and get this article.
Asking for article number 452 in <TT
CLASS="LITERAL"
>starcom.tech.security</TT
>
will fetch the article too. On another server, the numbers may be very
different.</P
><P
>The <TT
CLASS="LITERAL"
>Path</TT
> specifies the list of machines through
which this article has travelled before it has reached the current
server. UUCP-style syntax is used for this string. The current
example indicates that a user called <TT
CLASS="LITERAL"
>shuvam</TT
> first
wrote this article and posted it onto a computer which calls itself
<TT
CLASS="LITERAL"
>purva</TT
>, and this computer then transferred this article
by a newsfeed to <TT
CLASS="LITERAL"
>news.starcomsoftware.com</TT
>. The
<TT
CLASS="LITERAL"
>Path</TT
> header is critical for breaking loops in
newsfeeds, and will be discussed in detail later.</P
><P
>Our sample article will sit in the two newsgroups listed above
forever, unless expired. The Usenet software on a server is usually
configured to expire articles based on certain conditions,
<EM
>e.g.</EM
> after it's older than a certain number of
days. The C-News software we use allows expiry control based on the
newsgroup hierarchy and the type of newsgroup, <EM
>i.e.</EM
>
moderated or unmoderated. Against each class of newsgroups, it allows
the administrator to specify a number of days after which the article
will be expired. It is possible for an article to control its own
expiry, by carrying an <TT
CLASS="LITERAL"
>Expires</TT
> header specifying a
date and time. Unless overriden in the Usenet server software, the
article will be expired only after its explicit expiry time is
reached.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN107">2.2. Of readers and servers</H2
><P
>Computers which access Usenet articles are broadly of two classes:
the readers and the servers. A Usenet server carries a repository of
articles, manages them, handles newsfeeds, and offers its repository to
authorised readers to read. A Usenet reader is merely a computer with
the appropriate software to allow a user to access a software, fetch
articles, post new articles, and keep track of which articles it has
read in each newsgroup. In terms of functionality, Usenet reading
software is less interesting to a Usenet administrator than a Usenet
server software. However, in terms of lines of code, the Usenet reader
software can often be much larger than Usenet server software, primarily
because of the complexities of modern GUI code.</P
><P
>Most modern computers almost exclusively access Usenet servers using
the NNTP (Network News Transfer Protocol) for reading and posting. This
protocol can also be used for inter-server communication, but those
aspects will be discussed later. The NNTP protocol, like any other
well-designed TCP-based Internet protocol, carries ASCII commands and
responses terminated with <TT
CLASS="LITERAL"
>CR-LF</TT
>, and comprises a
sequence of commands, somewhat reminiscent of the POP3 protocol for
email. Using NNTP, a Usenet reader program connects to a Usenet server,
asks for a list of active newsgroups, and receives this (often huge)
list. It then sets the ``current newsgroup'' to one of these, depending
on what the user wants to browse through. Having done this, it gets the
meta-data of all current articles in the group, including the author,
subject line, date, and size of each article, and displays an index of
articles to the user.</P
><P
>The user then scans through this list, selects an article, and
asks the reader to fetch it.  The reader gives the article number of
this article to the server, and fetches the full article for the user
to read through. Once the user finishes his NNTP session, he exits,
and the reader program closes the NNTP socket. It then (usually)
updates a local file in the user's home area, keeping track of which
news articles the user has read. These articles are typically not shown
to the user next time, thus allowing the user to progress rapidly to new
articles in each session. The reader software is helped along in this
endeavour by the <TT
CLASS="LITERAL"
>Xref</TT
> header, using which it knows
all the different identities by which a single article is identified
in the server. Thus, if you read the sample article given above by
accessing <TT
CLASS="LITERAL"
>starcom.tech.misc</TT
>, you'll never be shown
this article again when you access <TT
CLASS="LITERAL"
>starcom.tech.misc</TT
>
or <TT
CLASS="LITERAL"
>starcom.tech.security</TT
>; your reader software will
do this by tracking the <TT
CLASS="LITERAL"
>Xref</TT
> header and mapping
article numbers.</P
><P
>When a user posts an article, he first composes his message using
the user interface of his reader software. When he finally gives the
command to send the article, the reader software contacts the Usenet
server using the pre-existing NNTP connection and sends the article to
it. The article carries a <TT
CLASS="LITERAL"
>Newsgroups</TT
> header with the
list of newsgroups to post to, often a <TT
CLASS="LITERAL"
>Distribution</TT
>
header with a distribution specification, and other headers
like <TT
CLASS="LITERAL"
>From</TT
>, <TT
CLASS="LITERAL"
>Subject</TT
>
<EM
>etc.</EM
> These headers are used by the server
software to do the right thing. Special and rare headers like
<TT
CLASS="LITERAL"
>Expires</TT
> and <TT
CLASS="LITERAL"
>Approved</TT
> are acted upon
when present. The server assigns a new article number to the article for
each newsgroup it is posted to, and creates a new <TT
CLASS="LITERAL"
>Xref</TT
>
header for the article.</P
><P
>Transfer of articles between servers is done in various ways, and
is discussed in quite a bit of detail in Section XXX titled
``Newsfeeds'' below.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN128">2.3. Newsfeeds</H2
><DIV
CLASS="SECTION"
><H3
CLASS="SECTION"
><A
NAME="AEN130">2.3.1. Fundamental concepts</H3
><P
>When we try to analyse newsfeeds in real life, we begin to see
	that, for most sites, traffic flow is not symmetrical in both
	directions. We usually find that one server will feed the bulk
	of the world's articles to one or more secondary servers every
	day, and receive a few articles written by the users of those
	secondary servers in exchange. Thus, we usually find that
	articles flow down from the stem to the branches to the leaves
	of the worldwide Usenet server network, and not exactly in a totally
	balanced mesh flow pattern. Therefore, we use the term
	``upstream server'' to refer to the server from which we receive
	the bulk of our daily dose of articles, and ``downstream
	server'' to refer to those servers which receive the bulk dose
	of articles from us.</P
><P
>Newsfeeds relay articles from one server to their ``next door
	neighbour'' servers, metaphorically speaking. Therefore, articles
	move around the globe, not by a massive number of single-hop
	transfers from the originating server to every other server in
	the world, but in a sequence of hops, like passing the baton in
	a relay race.  This increases the latency time for an article
	to reach a remote tertiary server after, say, ten hops, but
	it allows tighter control of what gets relayed at every hop,
	and helps in redundancy, decentralisation of server loads,
	and conservation of network bandwidth. In this respect, Usenet
	newsfeeds are more complex than HTTP data flows, which
	typically use single-hop techniques.</P
><P
>Each Usenet news server therefore has to worry about
	newsfeeds each time it receives an article, either by a fresh post
	or from an incoming newsfeed. When the Usenet server digests this
	article and files it away in its repository, it simultaneously
	looks through its database to see which other server it should
	feed the article to. In order to do this, it carries out a
	sequence of checks, described below.</P
><P
>Each server knows which other servers are its ``next door
	neighbours;'' this information is kept in its newsfeed
	configuration information. Against each of its ``next door
	neighbours,'' there will be a list of newsgroups which it
	wants, and a list of distributions. The new article's list of
	newsgroups will be matched against the newsgroup list of the
	``next door neighbour'' to see whether there's even a single
	common newsgroup which makes it necessary to feed the article to
	it. If there's a matching newsgroup, and the server's distribution
	list matches the article's distribution, then the article is
	marked for feeding to this neighbour.</P
><P
>When the neighbour receives the article as part of the
	feed, it performs some sanity checks of its own. The first check
	it performs is on the <TT
CLASS="LITERAL"
>Newsgroups</TT
> header of
	the new article. If none of the newsgroups listed there are part
	of the active newsgroups list of this server, then the article
	can be rejected. An article rejected thus may even be queued for
	outgoing feeds to other servers, but will not be digested for
	incorporation into the local article repository.</P
><P
>The next check performed is against the
	<TT
CLASS="LITERAL"
>Path</TT
> header of the incoming article. If this
	header lists the name of the current Usenet server anywhere,
	it indicates that it has already passed through this server at
	least once before, and is now re-appearing here erroneously because
	of a newsfeed loop. Such loops are quite often configured into
	newsfeed topologies for redundancy: ``I'll get the articles from
	Server X if not Server Y, and may the first one in win.'' The
	Usenet server software automatically detects a duplicate feed
	of an article and rejects it.</P
><P
>The next check is against what is called the server's
	<EM
>history database</EM
>. Every Usenet server has
	a history database, which is a list of the message IDs of all
	current articles in the local repository. Oftentimes the history
	database also carries the message IDs of all messages recently
	expired. If the incoming article's message ID matches any of the
	entries in the database, then again it is rejected without being
	filed in the local repository. This is a second loop detection
	method. Sometimes, the mere checking of the article's
	<TT
CLASS="LITERAL"
>Path</TT
> header does not detection of all
	potential problems, because the problem may be a re-insertion
	instead of a loop. A re-insertion happens when the same incoming
	batch of news articles is re-fed into the local server, perhaps
	after recovering the system's data from tapes after a system
	crash. In such cases, there's no newsfeed loop, but there's
	still the risk that one article may be digested into the local
	server twice. The history database prevents this.</P
><P
>All these simple checks are very effective, and work
	across server and software types, as per the Internet standards.
	Together, they allow robust and fail-safe Usenet article flow
	across the world.</P
></DIV
><DIV
CLASS="SECTION"
><H3
CLASS="SECTION"
><A
NAME="AEN144">2.3.2. Types of newsfeeds</H3
><P
>This section explains the basics of newsfeeds, without getting
	into details of software and configuration files.</P
><DIV
CLASS="SECTION"
><H4
CLASS="SECTION"
><A
NAME="AEN147">2.3.2.1. Queued feeds</H4
><P
>	    This is the commonest method of sending articles from one server
	    to another, and is followed whenever large volumes of articles
	    are to be transferred per day. This approach needs a one-time
	    modification to the upstream server's configuration for each
	    outgoing feed, to define a new <EM
>queue.</EM
>
	</P
><P
>	    In essence all queued feeds work in the following way. When the
	    sending server receives an article, it processes it for
	    inclusion into its local repository, and also checks through all
	    its outgoing feed definitions to see whether the article needs
	    to be queued for any of the feeds. If yes, it is added to a
	    <EM
>queue file</EM
> for each outgoing feed. The
	    precise details
	    of the queue file can change depending on the software
	    implementation, but the basic processes remain the same. A queue
	    file is a list of queued articles, but does not contain the
	    article contents. Typical queue files are ASCII text files with
	    one line per article giving the path to a copy of the article in
	    the local spool area.
	</P
><P
>	    Later, a separate process picks up each queue file and creates
	    one or more <EM
>batches</EM
> for each outgoing feed.
	    A <EM
>batch</EM
> is a large file containing multiple
	    Usenet news
	    articles. Once the batches are created, various transport
	    mechanisms can be used to move the files from sending server to
	    receiving server. You can even use scripted FTP.  You only need
	    to ensure that the batch is picked up from the upstream server
	    and somehow copied into a designated incoming batch directory in
	    the downstream server.
	</P
><P
>	    UUCP has traditionally been the mechanism of choice for batch
	    movement, because it predates the Internet and wide availability
	    of fast packet-switched data networks. Today, with TCP/IP
	    everywhere, UUCP once again emerges as the most logical choice
	    of batch movement, because it too has moved with the times: it
	    can work over TCP.
	</P
><P
>	    NNTP is the <EM
>de facto</EM
> mechanism of choice
	    for moving
	    queued newsfeeds for carrier-class Usenet servers on the
	    Internet, and unfortunately, for a lot of other Usenet servers
	    as well. The reason why we find this choice unfortunate is
	    discussed in <A
HREF="x1243.html#FEEDEFFICIENCY"
>Section 12.1</A
>&#62; below. But in NNTP
	    feeds, an intermediate step of building batches out of queue
	    files can be eliminated --- this is both its strength and its
	    weakness.
	</P
><P
>	    In the case of queued NNTP feeds, articles get added to queue
	    files as described above. An NNTP transmit process periodically
	    wakes up, picks up a queue file, and makes an NNTP connection to
	    the downstream server. It then begins a processing loop where,
	    for each queued article, it uses the NNTP
	    <TT
CLASS="LITERAL"
>IHAVE</TT
>
	    command to inform the downstream server of the article's
	    message~ID. The downstream server checks its local repository to
	    see whether it already has the message. If not, it responds with
	    a <TT
CLASS="LITERAL"
>SENDME</TT
> response. The transmitting server
	    then pumps
	    out the article contents in plaintext form.  When all articles
	    in the queue have been thus processed, the sending server closes
	    the connection. If the NNTP connection breaks in between due to
	    any reason, the sending server truncates the queue file and
	    retains only those articles which are yet to be transmitted,
	    thus minimising repeat transmissions.
	</P
><P
><A
NAME="DIALUPNONNTP"
></A
>&#62;
	    A queued NNTP feed works with the sending server making an NNTP
	    connection to the receiving server. This implies that the
	    receiving server must have an IP address which is known to the
	    sending server or can be looked up in the DNS. If the receiving
	    server connects to the Internet periodically using a dialup
	    connection and works with a dynamically assigned IP address,
	    this can get tricky. UUCP feeds suffer no such problems because
	    the sending server for the newsfeed can be the UUCP server,
	    <EM
>i.e.</EM
>
	    passive. The receiving server for the feed can be the UUCP
	    master, <EM
>i.e.</EM
> the active party. So the
	    receiving server can then
	    initiate the UUCP connection and connect to the sending server.
	    Thus, if even one of the two parties has a static IP address,
	    UUCP queued feeds can work fine.
	</P
><P
>	    Thus, NNTP feeds can be sent out a little faster than the
	    batched transmission processes used for UUCP and other older
	    methods, because no batches need to be constructed. However,
	    NNTP is often used in newsfeeds where it is not necessary and it
	    results in colossal waste of bandwidth.  Before we study
	    efficiency issues of NNTP versus batched feeds, we will cover
	    another way feeds can be organised using NNTP: the pull feeds.
	</P
></DIV
><DIV
CLASS="SECTION"
><H4
CLASS="SECTION"
><A
NAME="AEN168">2.3.2.2. Pull feeds</H4
><P
>	    This method of transferring a set of articles works only over
	    NNTP, and requires absolutely no configuration on the
	    transmitting, or upstream, server. In fact, the upstream server
	    cannot even easily detect that the downstream server is pulling
	    out a feed --- it appears to be just a heavy and thorough
	    newsreader, that's all.
	</P
><P
>	    This pull feed works by the downstream server pulling out
	    articles i one by one, just like any NNTP newsreader, using the
	    NNTP <TT
CLASS="LITERAL"
>ARTICLE</TT
> command with the Message-ID as
	    parameter.
	    The interesting detail is how it gets the message~IDs to begin
	    with. For this, it uses an NNTP command, specially designed for
	    pull feeds, called <TT
CLASS="LITERAL"
>NEWNEWS</TT
>. This command
	    takes a hierarchy and a date,
	    <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
> NEWNEWS comp 15081997 </PRE
></FONT
></TD
></TR
></TABLE
>
	</P
><P
>	    This command is sent by the downstream server over NNTP to the
	    upstream server, and in effect asks the upstream server to list
	    out all news articles which are newer than 15 August 1997 in the
	    <TT
CLASS="LITERAL"
>comp</TT
> hierarchy. The upstream server responds
	    with a
	    (often huge) list of message~IDs, one per line, ending with a
	    period on a line by itself.
	</P
><P
>	    The pulling server then compares each newly received message~ID
	    with its own article database and makes a (possibly shorter)
	    list of all articles which it does not have, thus eliminating
	    duplicate fetches.  That done, it begins fetching articles one
	    by one, using the NNTP <TT
CLASS="LITERAL"
>ARTICLE</TT
> command as
	    mentioned above.
	</P
><P
>	    In addition, there is another NNTP command,
	    <TT
CLASS="LITERAL"
>NEWGROUPS</TT
>,
	    which allows the NNTP client --- <EM
>i.e.</EM
> the
	    downstream server in
	    this case --- to ask its upstream server what were the new
	    newsgroups created since a given date. This allows the
	    downstream server to add the new groups to its
	    <TT
CLASS="LITERAL"
>active</TT
> file.
	</P
><P
>	    The <TT
CLASS="LITERAL"
>NEWNEWS</TT
> based approach is usually one of
	    the most inefficient methods of pulling out a large Usenet feed.
	    By inefficiency, here we refer to the CPU loads and RAM
	    utilisation on the upstream server, not on bandwidth usage. This
	    inefficiency is because most Usenet news servers do not keep
	    their article databases indexed by hierarchy and date; CNews
	    certainly does not. This means that a <TT
CLASS="LITERAL"
>NEWNEWS</TT
>
	    command issued to an upstream server will put that server into a
	    sequential search of its article database, to see which articles
	    fit into the hierarchy given and are newer than the given date.
	</P
><P
>	    If pull feeds were to become the most common way of sending out
	    articles, then all upstream servers would badly need an
	    efficient way of sorting their article databases to allow each
	    <TT
CLASS="LITERAL"
>NEWNEWS</TT
> command to rapidly generate its list
	    of matching articles. A slow upstream server today might take
	    minutes to begin responding to a <TT
CLASS="LITERAL"
>NEWNEWS</TT
>
	    command, and
	    the downstream server may time out and close its NNTP connection
	    in the meanwhile. We have often seen this happening, till we
	    tweak timeouts.
	</P
><P
>	    There are basic efficiency issues of bandwidth utilisation
	    involved in NNTP for news feeds, which are applicable for both
	    queued and pull feeds. But the problem with
	    <TT
CLASS="LITERAL"
>NEWNEWS</TT
> is unique to pull feeds, and relates
	    to server loads, not bandwidth wastage.
	</P
></DIV
></DIV
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="CONTROLMSG">2.4. Control messages</H2
><P
>The Usenet is a massive dispersed collection of servers which
operate almost without any supervision, provided they have adequate disk
space and do not suffer disk corruption due to power failures,
<EM
>etc.</EM
> (It is indeed surprising how self-managing a
good Usenet server is, provided these two pre-requisites are met.) These
servers are each under the control of human administrators, but it is
preferable that certain routine actions be performed across all these
servers remotely from one location, without the manual intervention of
these humans.</P
><P
>One common need for centralised operations is the creation of new
groups in the standard eight hierarchies. The Usenet follows a fairly
formal process which asks for votes from readers worldwide before
deciding on the restructuring of its newsgroups list, including merging of
low-volume groups, splitting of high-volume groups into many specialised
groups, creating new groups, and even deleting groups. Once the voting
process for a change concludes and the change action is to be carried
out, it would be extremely tedious to send email to the hundreds of
thousands of Usenet administrators and hope that they make the changes
right, and answer their doubts if they get confused. It would be much
better to have an <EM
>automatic</EM
> way to make the
changes across all servers, of course with proper authorisation.</P
><P
>The solution to this does not lie in giving some central authority
the ability to run an OS-level command of his choice on all the world's
Usenet servers, because OS commands differ from OS to OS, and because
few Usenet administrators would trust a stranger from another part of
the world with OS level access. Therefore, the solution lay in defining
a small set of common Usenet maintenance actions, and permitting only
these actions to be triggered on all servers through the passing of
special command messages, called <EM
>control
messages</EM
>.</P
><P
>Control messages look like ordinary Usenet articles, more or less.
They have an extra header line, with its value in a specific format,
but they usually carry body text which looks like a normal human-written
article. Here is a control message (a spurious one at that, but it'll
do for now):</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>Xref: news.starcomsoftware.com control:814217
Path: news.starcomsoftware.com!linux594.dn.net!news.dn.hoopoo.com!
	feed-out.newsfeeds.com!newsfeeds.com!feed.newsfeeds.com!
	newsfeeds.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!
	newsfeed.icl.net!newsfeed.skycache.com!Cidera!newsfeed.gamma.ru!
	Gamma.RU!carrier.kiev.ua!goblin.nadrabank.kiev.ua!not-for-mail
From: tale@uunet.uu.net (David C Lawrence)
Newsgroups: news.groups,humanities.hipcrime
Subject: cmsg newgroup humanities.hipcrime
Control: newgroup humanities.hipcrime
Date: Sun, 18 Feb 2001 11:50:28 GMT
Organization: The Cabal
Lines: 20
Approved: tale@uunet.uu.net
Message-ID: &#60;3afWYZTIR.G5YOC2@uunet.uu.net&#62;
NNTP-Posting-Host: 203.145.147.67
X-Trace: goblin.nadrabank.kiev.ua 982528840 21455 203.145.147.67
         (18 Feb 2001 20:40:40 GMT)
X-Complaints-To: usenet@nadrabank.kiev.ua
NNTP-Posting-Date: 18 Feb 2001 20:40:40 GMT
X-No-Archive: Yes

humanities.hipcrime is an unmoderated newsgroup which passed its
vote for creation by 326:10 as reported in news.announce.newgroups
on 18 Feb 2001.

For your newsgroups file:
humanities.hipcrime	HipCrime for Humanity - you committed one now!

Anyone can create a newsgroup in the alt, biz, comp, earth,
humanities, misc, news, meow, rec, sci, soc, talk, us, or
any other Usenet hierarchy.  New newsgroup proposals may be
optionally discussed in news.groups. Please be sure that your
/usr/lib/news/control.ctl is configured correctly:

##	NEWGROUP MESSAGES
## honor them all and log in \${LOG}/newgroup.log
newgroup:*:alt.*|biz.*|comp.*|earth.*|humanities.*|misc.*|news.*|\
	meaw.*|rec.*|sci.*|soc.*|talk.*|us.*:doit=newgroup

##	RMGROUP MESSAGES
## drop them all and don't log
rmgroup:*:*:drop

Meow!
David C Lawrence</PRE
></FONT
></TD
></TR
></TABLE
><P
>A control message must have a <TT
CLASS="LITERAL"
>Control</TT
>
header. Besides, all control messages <EM
>will</EM
>
have an <TT
CLASS="LITERAL"
>Approved</TT
> header, like messages posted
to moderated newsgroups. The <TT
CLASS="LITERAL"
>Control</TT
> header
actually specifies a command to run on the local server, and the
parameter(s) to supply to it. The local Usenet server software is
supposed to figure out its own way to get the task done. In this
example, the command in the <TT
CLASS="LITERAL"
>Control</TT
> header is
<TT
CLASS="LITERAL"
>newgroup</TT
>, which creates a new newsgroup. And its
parameter is <TT
CLASS="LITERAL"
>humanities.hipcrime</TT
>, which gives the
name of the newsgroup to create.</P
><P
>In C-News, the control message implementation works through
separate shellscripts kept in a fixed directory,
<TT
CLASS="LITERAL"
>$NEWSBIN/ctl/</TT
>, as a security measure; if the
executable script isn't present there, the control message command will
be ignored. The control message types supported are:</P
><P
></P
><UL
><LI
><P
><TT
CLASS="LITERAL"
>checkgroups</TT
>:  control message to
check whether the list of newsgroups in your active file are all correct
as per a master list of newsgroups sent in the control message</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>newgroup</TT
>: control message to create a
new newsgroup</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>rmgroup</TT
>: control message to delete a
newsgroup and all articles in it</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>sendsys</TT
>: control message to cause an
email response to be sent to the author with the <TT
CLASS="LITERAL"
>sys</TT
>
file of your server in it. This results in a response storm of
emails from all the Usenet servers in the world to the author. These
responses allow the sender of the control message to analyse all the
<TT
CLASS="LITERAL"
>sys</TT
> files of the world's Usenet servers and create the
directed graph of Usenet newsfeeds. Why someone would want to do this is
hard to guess, but the result is surely an awesome picture of one facet
of networked human civilisation, like looking at a giant world map.</P
><P
>Incidentally, there is no invasion of privacy here, because
your server's <TT
CLASS="LITERAL"
>sys</TT
> file is supposed to be public
information, if you take feeds from the public Usenet.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>version</TT
>: control message which results
in your Usenet software sending an email to the author of the message,
containing the type and version of the Usenet news software you are
using. This too is not an invasion of privacy, because this information
is supposed to be public knowledge.</P
></LI
><LI
><P
>The cancel message: the most frequently occurring type of
control messages. They specify the message ID of an article, and result
in the cancellation (deletion) of that article. If you post an article
and regret it a moment later, your Usenet newsreader software usually
allows you to ``cancel'' it by generating a cancel message.</P
></LI
></UL
><P
>The Usenet news software maintains a pseudo-newsgroup called
<TT
CLASS="LITERAL"
>control</TT
>, where it files all control messages it
receives. If you have an incoming newsfeed from the public Usenet, your
server's <TT
CLASS="LITERAL"
>control</TT
> group will usually be full with
thousands of cancel messages from trigger-happy fingers all over the
world. Usenet news server software like C-News allows you to filter the
incoming feed based on newsgroups, and will discard articles for groups
they do not subscribe to. But since all servers have to receive and
process control messages, they will all accept these cancel messages,
though many of them may apply to articles which are not part of your
highly-pruned subset of groups. <TT
CLASS="LITERAL"
>C'est la vie</TT
>.</P
><P
>Remember to set expiry for the <TT
CLASS="LITERAL"
>control</TT
> group to
one day or even shorter, so that the junk can be cleaned out as rapidly as
possible, just like the <TT
CLASS="LITERAL"
>junk</TT
> newsgroup.</P
><P
>The beauty of the control message architecture is that it
integrates seamlessly into the newsfeed mechanism for automatic control
of the network of servers. No separate channel of connection is needed
for the control actions. And article replication automatically
propagates control messages with human-readable articles, thus
guaranteeing reach across heterogenous networks technologies.</P
><P
>What your Usenet server does on receiving a
control message is governed by an authorisation file:
<TT
CLASS="LITERAL"
>$NEWSCTL/controlperms</TT
> in the case of C-News
and <TT
CLASS="LITERAL"
>control.ctl</TT
> in the case of INN, for
instance. The security measures implemented by this module are
further enhanced by the <TT
CLASS="LITERAL"
>pgpcontrol</TT
> package with its
<TT
CLASS="LITERAL"
>pgpverify</TT
> script. Using <TT
CLASS="LITERAL"
>pgpverify</TT
>,
your server can check that all control messages (except for article
cancellation messages) are digitally signed by a trusted party
using military-spec public key cryptography. Our integrated
Usenet news software distribution includes integration with
<TT
CLASS="LITERAL"
>pgpverify</TT
>.</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x27.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="x248.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>What is the Usenet?</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Usenet news software</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>