LDP/LDP/howto/docbook/Spam-Filtering-for-MX/glossary.xml

<glossary id="glossary" xreflabel="Glossary">
  <?dbhtml filename="gloss.html"?>

  <title>Glossary</title>

  <abstract>
    <para>
      These are definitions for some of the words and terms that are
      used throughout this document.
    </para>
  </abstract>

  <glossdiv>
    <title>B</title>

    <glossentry id="bayesian">
      <glossterm>Bayesian Filters</glossterm>
      <glossdef>
	<para>
	  A filter that assigns a probability of spam based on the
	  recurrence of words (or, more recently, word
	  constellations/phrases) between messages.
	</para>

	<para>
	  You initially train the filter by feeding it known junk mail
	  (spam) and known legitimate mail (ham).  A bayesian score
	  is then be assigned to each word (or phrase) in each
	  message, indicating whether this particular word or phrase
	  occurs most commonly in ham or in spam.  The word, along
	  with its score, is stored in a <emphasis>bayesian
	  index</emphasis>.
	</para>

	<para>
	  Such filters may catch indicators that may be missed by
	  human programmers trying to manually create keyword-based
	  filters.  At the very least, they automate this task.
	</para>

	<para>
	  Bayesian word indexes are most certainly specific to the
	  language in which they received training.  Moreover, they are
	  specific to individual users.  Thus, they are perhaps more
	  suitable for individual content filters (e.g. in <xref
	  linkend="mua"/>s) than they are for system-wide, SMTP-time
	  filtering.
	</para>

	<para>
	  Moreover, spammers have developed techniques to defeat
	  simple bayesian filters, by including random dictionary
	  words and/or short stories in their messages.  This
	  decreases the spam probability assigned by a baynesian
	  filter, and in the long run, degrades the quality of the
	  bayesian index.
	</para>

	<para>
	  See also:
	  <ulink url="http://www.everything2.com/index.pl?node=Bayesian"/>.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>C</title>

    <glossentry id="coldamage">
      <glossterm>Collateral Damage</glossterm>
      <glossdef>
	<para>
	  Blocking of a legitimate sender host due to an entry in a
	  DNS blocklist.
	</para>

	<para>
	  Some blocklists (like SPEWS) routinely list the entire IP
	  address space of an ISP if they feel the ISP is not
	  responsive to abuse complaints, thereby affecting
	  <emphasis>all</emphasis> its customers.
	</para>

	<para>
	  See also: <xref linkend="falsepos"/>
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="colspam">
      <glossterm>Collateral Spam</glossterm>
      <glossdef>
	<para>
	  Automated messages sent in response to an original message
	  (mostly spam or malware) where the sender address is forged.
	  Typical examples of collateral spam include virus scan
	  reports (<quote>You have a virus</quote>) or other <xref
	  linkend="dsn" />s).
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>D</title>

    <glossentry id="dns">
      <glossterm>Domain Name System</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: DNS</emphasis>) The de-facto standard for
	  obtaining information about internet domain names.  Examples
	  of such information include IP addresses of its servers
	  (so-called <emphasis>A records</emphasis>), the dedication
	  of incoming mail exchangers (MX records), generic server
	  information (SRV records), and miscellaneous text
	  information (TXT records).
	</para>

	<para>
	  DNS is a hierarctical, distributed system; each domain name
	  is associated with a set of one or more DNS servers that
	  provide information about that domain - including delegation
	  of name service for its subdomains.
	</para>

	<para>
	  For instance, the top-level domain <quote>org</quote> is
	  operated by The Public Interest Registry; its DNS servers
	  delegate queries for the domain name <quote>tldp.org</quote>
	  to specific name servers for The Linux Documentation
	  Project.  In turn, TLDPs name server (actually operated by
	  UNC) may or may not delegate queries for third-level names,
	  such as <quote>www.tldp.org</quote>.
	</para>

	<para>
	  DNS lookups are usually performed by forwarding name
	  servers, such as those provided by an Internet Service
	  Provider (e.g. via DHCP).
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="dsn">
      <glossterm>Delivery Status Notification</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: DSN</emphasis>) A message automatically
	  created by an MTA or MDA, to inform the sender of an
	  original messsage (usually included in the DSN) about its
	  status.  For instance, DSNs may inform the sender of the
	  original message that it could not be delivered due to a
	  temporary or permanent problem, and/or whether or not and
	  for how long delivery attempts will continue.
	</para>

	<para>
	  Delivery Status Notifications are sent with an empty <xref
	  linkend="envfrom" /> address.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>E</title>

    <glossentry id="envfrom">
      <glossterm>Envelope Sender</glossterm>
      <glossdef>
	<para>
	  The e-mail address given as sender of a message during the
	  SMTP transaction, using the <command>MAIL FROM:</command>
	  command.  This may be different from the address provided in
	  the <quote>From:</quote> header of the message itself.
	</para>

	<para>
	  One special case is <xref linkend="dsn" /> (bounced message,
	  return receipt, vacation message..).  For such mails, the
	  <xref linkend="envfrom"/> is empty.  This is to prevent
	  <xref linkend="loop"/>s, and generally to be able to
	  distinguish these from <quote>regular</quote> mails.
	</para>

	<para>
	  See also: <xref linkend="smtpintro" />
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="envto">
      <glossterm>Envelope Recipient</glossterm>
      <glossdef>
	<para>
	  The e-mail address(es) to which the message is sent.  These
	  are provided during the SMTP transaction, using the
	  <command>RCPT TO</command> command.  These may be different
	  from the addresses provided in the <quote>To:</quote> and
	  <quote>Cc:</quote> headers of the message itself.
	</para>

	<para>
	  See also: <xref linkend="smtpintro" />
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>F</title>

    <glossentry id="falseneg">
      <glossterm>False Negative</glossterm>

      <glossdef>
	<para>
	  Junk mail (spam, virus, malware) that is misclassified as
	  legitimate mail (and consequently, not filtered out).
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="falsepos">
      <glossterm>False Positive</glossterm>

      <glossdef>
	<para>
	  Legitimate mail that is misclassified as junk (and
	  consequently, blocked).
	</para>

	<para>
	  See also: <xref linkend="coldamage"/>.
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="fqdn">
      <glossterm>Fully Qualified Domain Name</glossterm>
      <glossdef>
	<para>
	  (a.k.a. <quote>FQDN</quote>).  A full, globally unique,
	  internet name, including DNS domain.  For instance:
	  <quote>www.yahoo.com</quote>.
	</para>

	<para>
	  A <emphasis>FQDN</emphasis> does not always point to a single
	  host.  For instance, common service names such as
	  <quote>www</quote> often point to many IP addresses, in
	  order to provide some load balancing on the servers.
	  However, the <emphasis>primary</emphasis> host name of a
	  given machine should always be unique to that machine; for
	  instance: <quote>p16.www.scd.yahoo.com</quote>.
	</para>

	<para>
	  A <emphasis>FQDN</emphasis> always contains a period (".").
	  The part before the first period is the
	  <emphasis>unqualified name</emphasis>, and is not globally
	  unique.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>J</title>

    <glossentry id="joejob">
      <glossterm>Joe Job</glossterm>
      <glossdef>
	<para>
	  A spam designed to look like it came from someone else's
	  valid address, often in a malicous attempt at generating
	  complaints from third parties and/or cause other damage to
	  the owner of that address.
	</para>

	<para>
	  See also:
	  <ulink url="http://www.everything2.com/index.pl?node=Joe%20Job"/>
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>M</title>

    <glossentry id="mda">
      <glossterm>Mail Delivery Agent</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: MDA</emphasis>) Software that runs on the
	  machine where a users' mailbox is located, to deliver mail
	  into that mailbox.  Often, that delivery is performed
	  directly by the MTA <xref linkend="mta" />, which then
	  serves a secondary role as an MDA.  Examples of separate
	  Mail Delivery Agents include: Deliver, Procmail, Cyrmaster
	  and/or Cyrdeliver (from the Cyrus IMAP suite).
	</para>
      </glossdef>
    </glossentry>

    <glossentry id="loop">
      <glossterm>Mail Loop</glossterm>
      <glossdef>
	<para>
	  A situation where one automated message triggers another,
	  which directly or indirectly triggers the first message
	  over again, and so on.
	</para>

	<para>
	  Imagine a mailing list where one of the subscribers is the
	  address of the list itself.  This situation is often dealt
	  with by the list server adding an <quote>X-Loop:</quote>
	  line in the message header, and not processing mails that
	  already have one.
	</para>

	<para>
	  Another equivalent term is <emphasis>Ringing</emphasis>.
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="mta">
      <glossterm>Mail Transport Agent</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: MTA</emphasis>) Software that runs on a
	  mail server, such as the mail exchanger(s) of a internet
	  domain, to send mail to and receive mail from other hosts.
	  Popular MTAs include: Sendmail, Postfix, Exim, Smail.
	</para>
      </glossdef>
    </glossentry>

    <glossentry id="mua">
      <glossterm>Mail User Agent</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: MUA</emphasis>; a.k.a. <emphasis>Mail
	  Reader</emphasis>) User software to access, download, read,
	  and send mail.  Examples include Microsoft Outlook/Outlook
	  Express, Apple Mail.app, Mozilla Thunderbird, Ximian
	  Evolution.
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="mx">
      <glossterm>Mail Exchanger</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: MX</emphasis>) A machine dedicated to
	  (sending and/or) receiving mail for an internet domain.
	</para>

	<para>
	  The DNS zone information for a internet domain normally
	  contains a list of <xref linkend="fqdn" />s that act as
	  incoming mail exchangers for that domain.  Each such listing
	  is called an <quote>MX record</quote>, and it also contains
	  a number indicating its <quote>priority</quote> among
	  several <quote>MX records</quote>.  The listing with the
	  lowest number has the first priority, and is considered the
	  <quote>primary mail exchanger</quote> for that domain.
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="micropay">
      <glossterm>Micropayment Schemes</glossterm>
      <glossdef>
	<para>
	  (a.k.a. <emphasis>sender pay</emphasis> schemes).  The
	  sender of a message expends some machine resources to create
	  a virtual <emphasis>postage stamp</emphasis> for each
	  recipient of a message - usually by solving a mathematical
	  challenge that requires a large number of memory read/write
	  operations, but is relatively CPU speed insensitive.  This
	  stamp is then added to the headers of the message, and the
	  recipient would validate the stamp through a much simpler
	  decoding operation.
	</para>

	<para>
	  The idea is that because the message requires a postage
	  stamp for every recipient address, spamming hundreds or
	  thousands of users at once would be prohibitively
	  "expensive".
	</para>

	<para>
	  Two such systems are:
	</para>

	<itemizedlist>
	  <listitem>
	    <para>
	      <ulink url="http://www.camram.org/">Camram</ulink>
	    </para>
	  </listitem>

	  <listitem>
	    <para>
	      <ulink url="http://research.microsoft.com/research/sv/PennyBlack/">Microsoft's
	      Penny Black Project</ulink>
	    </para>
	  </listitem>
	</itemizedlist>
      </glossdef>
    </glossentry>

  </glossdiv>


  <glossdiv>
    <title>O</title>

    <glossentry id="openproxy">
      <glossterm>Open Proxy</glossterm>

      <glossdef>
	<para>
	  A <xref linkend="proxy" /> which openly accepts TCP/IP
	  connections from anywhere, and forwards them anywhere.
	</para>

	<para>
	  These are typically exploited by spammers and virii, who
	  use them to conceal their own IP address, and/or to more
	  effectively distribute transmission loads across several
	  hosts and networks.
	</para>

	<para>
	  See also: <xref linkend="zombie" />
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="openrelay">
      <glossterm>Open Relay</glossterm>
      <glossdef>
	<para>
	  A <xref linkend="relay" /> which openly accepts mail from
	  anywhere, and forwards them to anywhere.
	</para>

	<para>
	  In the 1980s, virtually every public SMTP server was an
	  <xref linkend="openrelay" />.  Messages would often travel
	  between multiple third-party machines before it reached the
	  intended recipient.  Now, legitimate mail are almost
	  exclusively sent directly from an outgoing <xref
	  linkend="mta"/> on the sender's end to the incoming <xref
	  linkend="mx"/>(s) for the recipient's domain.
	</para>

	<para>
	  Conversely, <xref linkend="openrelay"/> servers that still
	  exist on the internet are almost exclusively exploited by
	  spammers to hide their own identity, and to perform some
	  load balancing on the task of sending out millions of
	  messages, presumably before DNS blocklists have a chance to
	  get all of these machines listed.
	</para>

	<para>
	  See also the discussion on <xref linkend="relayprevent"/>.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>P</title>

    <glossentry id="proxy">
      <glossterm>proxy</glossterm>
      <glossdef>
	<para>
	  A machine that acts on behalf of someone else.  It may
	  forward e.g. HTTP requests or TCP/IP connections, usually to
	  or from the internet.  For instance, companies - or
	  sometimes entire countries - often use <quote>Web Proxy
	  Servers</quote> to filter outgoing HTTP requests from their
	  internal network.  This may or may not be transparent to the
	  end user.
	</para>

	<para>
	  See also: <xref linkend="openproxy" />, <xref
	    linkend="relay" />.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>R</title>


    <glossentry id="ratware">
      <glossterm>Ratware</glossterm>
      <glossdef>
	<para>
	  Mass-mailing virii and e-mail software used by spammers,
	  specifically designed to deliver large amounts of mail in a
	  very short time.
	</para>

	<para>
	  Most ratware implementations incorporate only as much SMTP
	  client code as strictly neccessary to deliver mail in the
	  best-case scenario.  They provide false or inaccurate
	  information in the SMTP dialogue with the receiving host.
	  They do not wait for responses from the receiver before
	  issuing commands, and disconnect if no response has been
	  received in a few seconds.  They do not follow normal retry
	  mechanisms in case of temporary failures.
	</para>
      </glossdef>
    </glossentry>


    <glossentry id="relay">
      <glossterm>Relay</glossterm>
      <glossdef>
	<para>
	  A machine that forwards e-mail, usually to or from the
	  internet.  One example of a relay is the
	  <quote>smarthost</quote> that an ISP provides to its
	  customers for sending outgoing mail.
	</para>

	<para>
	  See also: <xref linkend="openrelay" />, <xref
	    linkend="proxy" />
	</para>
      </glossdef>
    </glossentry>

    <glossentry id="rfc">
      <glossterm>Request for Comments</glossterm>
      <glossdef>
	<para>
	  (<emphasis>abbrev: RFC</emphasis>) From

	  <ulink
	  url="http://www.rfc-editor.org/">http://www.rfc-editor.org/</ulink>:
	  <quote>
	    The Request for Comments (RFC) document series is a set of
	    technical and organizational notes about the internet
	    [...].  Memos in the RFC series discuss many aspects of
	    computer networking, incluing protocols, procedures,
	    programs, and concepts, as well as meeting notes,
	    opinions, and sometimes humor.
	  </quote>
	</para>

	<para>
	  These documents make up the <quote>rules</quote> internet
	  conduct, including descriptions of protocols and data
	  formats.  Of particular interest for mail deliveries are:

	  <itemizedlist>
	    <listitem>
	      <para>
		<ulink url="http://www.ietf.org/rfc/rfc2821">RFC
		  2821</ulink>, "Simple Mail transfer Protocol", and
	      </para>
	    </listitem>

	    <listitem>
	      <para>
		<ulink url="http://www.ietf.org/rfc/rfc2821">RFC
		  2822</ulink>, "Internet Message Format".
	      </para>
	    </listitem>
	  </itemizedlist>
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>S</title>

    <glossentry id="spamtrap">
      <glossterm>Spam Trap</glossterm>
      <glossdef>
	<para>
	  An e-mail address that is <emphasis>seeded</emphasis> to
	  address-harvesting robots via public locations, then used to
	  feed collaborative tools such as <xref linkend="dnsbl"/> and
	  <xref linkend="jmsr"/>.
	</para>

	<para>
	  Mails sent to these addresses are normally spam or malware.
	  However, some of it will be <emphasis>collateral</emphasis>,
	  spam - i.e. <xref linkend="dsn"/> to faked sender addresses.
	  Thus, unless the spam trap has safeguards in place to
	  disregard such messages, the resulting tool may not be
	  completely reliable.
	</para>

      </glossdef>
    </glossentry>
  </glossdiv>


  <glossdiv>
    <title>Z</title>

    <glossentry id="zombie">
      <glossterm>Zombie Host</glossterm>
      <glossdef>
	<para>
	  A machine with an internet connection that is infected by a
	  mass-mailing virus or worm.  Such machines invariably run a
	  flavor of the Microsoft&reg; Windows&reg; operating system,
	  and are almost always in <quote>residential</quote> IP
	  address blocks.  Their owners either do not know or do not
	  care that the machines are infected, and often, their ISP
	  will not take any actions to shut them down.
	</para>

	<para>
	  Fortunately, there are various DNS blocklists, such as
	  <quote>dul.dnsbl.sorbs.net</quote>, that incorporate such
	  "residential" address blocks.  You should be able to use
	  these blocklists to reject incoming mail.  Legitimate mail
	  from residential users should normally go through their
	  ISP's <quote>smarthost</quote>.
	</para>
      </glossdef>
    </glossentry>
  </glossdiv>
</glossary>