LDP/LDP/howto/docbook/Spam-Filtering-for-MX/glossary.xml

676 lines
18 KiB
XML

<glossary id="glossary" xreflabel="Glossary">
<?dbhtml filename="gloss.html"?>
<title>Glossary</title>
<abstract>
<para>
These are definitions for some of the words and terms that are
used throughout this document.
</para>
</abstract>
<glossdiv>
<title>B</title>
<glossentry id="bayesian">
<glossterm>Bayesian Filters</glossterm>
<glossdef>
<para>
A filter that assigns a probability of spam based on the
recurrence of words (or, more recently, word
constellations/phrases) between messages.
</para>
<para>
You initially train the filter by feeding it known junk mail
(spam) and known legitimate mail (ham). A bayesian score
is then be assigned to each word (or phrase) in each
message, indicating whether this particular word or phrase
occurs most commonly in ham or in spam. The word, along
with its score, is stored in a <emphasis>bayesian
index</emphasis>.
</para>
<para>
Such filters may catch indicators that may be missed by
human programmers trying to manually create keyword-based
filters. At the very least, they automate this task.
</para>
<para>
Bayesian word indexes are most certainly specific to the
language in which they received training. Moreover, they are
specific to individual users. Thus, they are perhaps more
suitable for individual content filters (e.g. in <xref
linkend="mua"/>s) than they are for system-wide, SMTP-time
filtering.
</para>
<para>
Moreover, spammers have developed techniques to defeat
simple bayesian filters, by including random dictionary
words and/or short stories in their messages. This
decreases the spam probability assigned by a baynesian
filter, and in the long run, degrades the quality of the
bayesian index.
</para>
<para>
See also:
<ulink url="http://www.everything2.com/index.pl?node=Bayesian"/>.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>C</title>
<glossentry id="coldamage">
<glossterm>Collateral Damage</glossterm>
<glossdef>
<para>
Blocking of a legitimate sender host due to an entry in a
DNS blocklist.
</para>
<para>
Some blocklists (like SPEWS) routinely list the entire IP
address space of an ISP if they feel the ISP is not
responsive to abuse complaints, thereby affecting
<emphasis>all</emphasis> its customers.
</para>
<para>
See also: <xref linkend="falsepos"/>
</para>
</glossdef>
</glossentry>
<glossentry id="colspam">
<glossterm>Collateral Spam</glossterm>
<glossdef>
<para>
Automated messages sent in response to an original message
(mostly spam or malware) where the sender address is forged.
Typical examples of collateral spam include virus scan
reports (<quote>You have a virus</quote>) or other <xref
linkend="dsn" />s).
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>D</title>
<glossentry id="dns">
<glossterm>Domain Name System</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: DNS</emphasis>) The de-facto standard for
obtaining information about internet domain names. Examples
of such information include IP addresses of its servers
(so-called <emphasis>A records</emphasis>), the dedication
of incoming mail exchangers (MX records), generic server
information (SRV records), and miscellaneous text
information (TXT records).
</para>
<para>
DNS is a hierarctical, distributed system; each domain name
is associated with a set of one or more DNS servers that
provide information about that domain - including delegation
of name service for its subdomains.
</para>
<para>
For instance, the top-level domain <quote>org</quote> is
operated by The Public Interest Registry; its DNS servers
delegate queries for the domain name <quote>tldp.org</quote>
to specific name servers for The Linux Documentation
Project. In turn, TLDPs name server (actually operated by
UNC) may or may not delegate queries for third-level names,
such as <quote>www.tldp.org</quote>.
</para>
<para>
DNS lookups are usually performed by forwarding name
servers, such as those provided by an Internet Service
Provider (e.g. via DHCP).
</para>
</glossdef>
</glossentry>
<glossentry id="dsn">
<glossterm>Delivery Status Notification</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: DSN</emphasis>) A message automatically
created by an MTA or MDA, to inform the sender of an
original messsage (usually included in the DSN) about its
status. For instance, DSNs may inform the sender of the
original message that it could not be delivered due to a
temporary or permanent problem, and/or whether or not and
for how long delivery attempts will continue.
</para>
<para>
Delivery Status Notifications are sent with an empty <xref
linkend="envfrom" /> address.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>E</title>
<glossentry id="envfrom">
<glossterm>Envelope Sender</glossterm>
<glossdef>
<para>
The e-mail address given as sender of a message during the
SMTP transaction, using the <command>MAIL FROM:</command>
command. This may be different from the address provided in
the <quote>From:</quote> header of the message itself.
</para>
<para>
One special case is <xref linkend="dsn" /> (bounced message,
return receipt, vacation message..). For such mails, the
<xref linkend="envfrom"/> is empty. This is to prevent
<xref linkend="loop"/>s, and generally to be able to
distinguish these from <quote>regular</quote> mails.
</para>
<para>
See also: <xref linkend="smtpintro" />
</para>
</glossdef>
</glossentry>
<glossentry id="envto">
<glossterm>Envelope Recipient</glossterm>
<glossdef>
<para>
The e-mail address(es) to which the message is sent. These
are provided during the SMTP transaction, using the
<command>RCPT TO</command> command. These may be different
from the addresses provided in the <quote>To:</quote> and
<quote>Cc:</quote> headers of the message itself.
</para>
<para>
See also: <xref linkend="smtpintro" />
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>F</title>
<glossentry id="falseneg">
<glossterm>False Negative</glossterm>
<glossdef>
<para>
Junk mail (spam, virus, malware) that is misclassified as
legitimate mail (and consequently, not filtered out).
</para>
</glossdef>
</glossentry>
<glossentry id="falsepos">
<glossterm>False Positive</glossterm>
<glossdef>
<para>
Legitimate mail that is misclassified as junk (and
consequently, blocked).
</para>
<para>
See also: <xref linkend="coldamage"/>.
</para>
</glossdef>
</glossentry>
<glossentry id="fqdn">
<glossterm>Fully Qualified Domain Name</glossterm>
<glossdef>
<para>
(a.k.a. <quote>FQDN</quote>). A full, globally unique,
internet name, including DNS domain. For instance:
<quote>www.yahoo.com</quote>.
</para>
<para>
A <emphasis>FQDN</emphasis> does not always point to a single
host. For instance, common service names such as
<quote>www</quote> often point to many IP addresses, in
order to provide some load balancing on the servers.
However, the <emphasis>primary</emphasis> host name of a
given machine should always be unique to that machine; for
instance: <quote>p16.www.scd.yahoo.com</quote>.
</para>
<para>
A <emphasis>FQDN</emphasis> always contains a period (".").
The part before the first period is the
<emphasis>unqualified name</emphasis>, and is not globally
unique.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>J</title>
<glossentry id="joejob">
<glossterm>Joe Job</glossterm>
<glossdef>
<para>
A spam designed to look like it came from someone else's
valid address, often in a malicous attempt at generating
complaints from third parties and/or cause other damage to
the owner of that address.
</para>
<para>
See also:
<ulink url="http://www.everything2.com/index.pl?node=Joe%20Job"/>
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>M</title>
<glossentry id="mda">
<glossterm>Mail Delivery Agent</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: MDA</emphasis>) Software that runs on the
machine where a users' mailbox is located, to deliver mail
into that mailbox. Often, that delivery is performed
directly by the MTA <xref linkend="mta" />, which then
serves a secondary role as an MDA. Examples of separate
Mail Delivery Agents include: Deliver, Procmail, Cyrmaster
and/or Cyrdeliver (from the Cyrus IMAP suite).
</para>
</glossdef>
</glossentry>
<glossentry id="loop">
<glossterm>Mail Loop</glossterm>
<glossdef>
<para>
A situation where one automated message triggers another,
which directly or indirectly triggers the first message
over again, and so on.
</para>
<para>
Imagine a mailing list where one of the subscribers is the
address of the list itself. This situation is often dealt
with by the list server adding an <quote>X-Loop:</quote>
line in the message header, and not processing mails that
already have one.
</para>
<para>
Another equivalent term is <emphasis>Ringing</emphasis>.
</para>
</glossdef>
</glossentry>
<glossentry id="mta">
<glossterm>Mail Transport Agent</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: MTA</emphasis>) Software that runs on a
mail server, such as the mail exchanger(s) of a internet
domain, to send mail to and receive mail from other hosts.
Popular MTAs include: Sendmail, Postfix, Exim, Smail.
</para>
</glossdef>
</glossentry>
<glossentry id="mua">
<glossterm>Mail User Agent</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: MUA</emphasis>; a.k.a. <emphasis>Mail
Reader</emphasis>) User software to access, download, read,
and send mail. Examples include Microsoft Outlook/Outlook
Express, Apple Mail.app, Mozilla Thunderbird, Ximian
Evolution.
</para>
</glossdef>
</glossentry>
<glossentry id="mx">
<glossterm>Mail Exchanger</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: MX</emphasis>) A machine dedicated to
(sending and/or) receiving mail for an internet domain.
</para>
<para>
The DNS zone information for a internet domain normally
contains a list of <xref linkend="fqdn" />s that act as
incoming mail exchangers for that domain. Each such listing
is called an <quote>MX record</quote>, and it also contains
a number indicating its <quote>priority</quote> among
several <quote>MX records</quote>. The listing with the
lowest number has the first priority, and is considered the
<quote>primary mail exchanger</quote> for that domain.
</para>
</glossdef>
</glossentry>
<glossentry id="micropay">
<glossterm>Micropayment Schemes</glossterm>
<glossdef>
<para>
(a.k.a. <emphasis>sender pay</emphasis> schemes). The
sender of a message expends some machine resources to create
a virtual <emphasis>postage stamp</emphasis> for each
recipient of a message - usually by solving a mathematical
challenge that requires a large number of memory read/write
operations, but is relatively CPU speed insensitive. This
stamp is then added to the headers of the message, and the
recipient would validate the stamp through a much simpler
decoding operation.
</para>
<para>
The idea is that because the message requires a postage
stamp for every recipient address, spamming hundreds or
thousands of users at once would be prohibitively
"expensive".
</para>
<para>
Two such systems are:
</para>
<itemizedlist>
<listitem>
<para>
<ulink url="http://www.camram.org/">Camram</ulink>
</para>
</listitem>
<listitem>
<para>
<ulink url="http://research.microsoft.com/research/sv/PennyBlack/">Microsoft's
Penny Black Project</ulink>
</para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>O</title>
<glossentry id="openproxy">
<glossterm>Open Proxy</glossterm>
<glossdef>
<para>
A <xref linkend="proxy" /> which openly accepts TCP/IP
connections from anywhere, and forwards them anywhere.
</para>
<para>
These are typically exploited by spammers and virii, who
use them to conceal their own IP address, and/or to more
effectively distribute transmission loads across several
hosts and networks.
</para>
<para>
See also: <xref linkend="zombie" />
</para>
</glossdef>
</glossentry>
<glossentry id="openrelay">
<glossterm>Open Relay</glossterm>
<glossdef>
<para>
A <xref linkend="relay" /> which openly accepts mail from
anywhere, and forwards them to anywhere.
</para>
<para>
In the 1980s, virtually every public SMTP server was an
<xref linkend="openrelay" />. Messages would often travel
between multiple third-party machines before it reached the
intended recipient. Now, legitimate mail are almost
exclusively sent directly from an outgoing <xref
linkend="mta"/> on the sender's end to the incoming <xref
linkend="mx"/>(s) for the recipient's domain.
</para>
<para>
Conversely, <xref linkend="openrelay"/> servers that still
exist on the internet are almost exclusively exploited by
spammers to hide their own identity, and to perform some
load balancing on the task of sending out millions of
messages, presumably before DNS blocklists have a chance to
get all of these machines listed.
</para>
<para>
See also the discussion on <xref linkend="relayprevent"/>.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>P</title>
<glossentry id="proxy">
<glossterm>proxy</glossterm>
<glossdef>
<para>
A machine that acts on behalf of someone else. It may
forward e.g. HTTP requests or TCP/IP connections, usually to
or from the internet. For instance, companies - or
sometimes entire countries - often use <quote>Web Proxy
Servers</quote> to filter outgoing HTTP requests from their
internal network. This may or may not be transparent to the
end user.
</para>
<para>
See also: <xref linkend="openproxy" />, <xref
linkend="relay" />.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>R</title>
<glossentry id="ratware">
<glossterm>Ratware</glossterm>
<glossdef>
<para>
Mass-mailing virii and e-mail software used by spammers,
specifically designed to deliver large amounts of mail in a
very short time.
</para>
<para>
Most ratware implementations incorporate only as much SMTP
client code as strictly neccessary to deliver mail in the
best-case scenario. They provide false or inaccurate
information in the SMTP dialogue with the receiving host.
They do not wait for responses from the receiver before
issuing commands, and disconnect if no response has been
received in a few seconds. They do not follow normal retry
mechanisms in case of temporary failures.
</para>
</glossdef>
</glossentry>
<glossentry id="relay">
<glossterm>Relay</glossterm>
<glossdef>
<para>
A machine that forwards e-mail, usually to or from the
internet. One example of a relay is the
<quote>smarthost</quote> that an ISP provides to its
customers for sending outgoing mail.
</para>
<para>
See also: <xref linkend="openrelay" />, <xref
linkend="proxy" />
</para>
</glossdef>
</glossentry>
<glossentry id="rfc">
<glossterm>Request for Comments</glossterm>
<glossdef>
<para>
(<emphasis>abbrev: RFC</emphasis>) From
<ulink
url="http://www.rfc-editor.org/">http://www.rfc-editor.org/</ulink>:
<quote>
The Request for Comments (RFC) document series is a set of
technical and organizational notes about the internet
[...]. Memos in the RFC series discuss many aspects of
computer networking, incluing protocols, procedures,
programs, and concepts, as well as meeting notes,
opinions, and sometimes humor.
</quote>
</para>
<para>
These documents make up the <quote>rules</quote> internet
conduct, including descriptions of protocols and data
formats. Of particular interest for mail deliveries are:
<itemizedlist>
<listitem>
<para>
<ulink url="http://www.ietf.org/rfc/rfc2821">RFC
2821</ulink>, "Simple Mail transfer Protocol", and
</para>
</listitem>
<listitem>
<para>
<ulink url="http://www.ietf.org/rfc/rfc2821">RFC
2822</ulink>, "Internet Message Format".
</para>
</listitem>
</itemizedlist>
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>S</title>
<glossentry id="spamtrap">
<glossterm>Spam Trap</glossterm>
<glossdef>
<para>
An e-mail address that is <emphasis>seeded</emphasis> to
address-harvesting robots via public locations, then used to
feed collaborative tools such as <xref linkend="dnsbl"/> and
<xref linkend="jmsr"/>.
</para>
<para>
Mails sent to these addresses are normally spam or malware.
However, some of it will be <emphasis>collateral</emphasis>,
spam - i.e. <xref linkend="dsn"/> to faked sender addresses.
Thus, unless the spam trap has safeguards in place to
disregard such messages, the resulting tool may not be
completely reliable.
</para>
</glossdef>
</glossentry>
</glossdiv>
<glossdiv>
<title>Z</title>
<glossentry id="zombie">
<glossterm>Zombie Host</glossterm>
<glossdef>
<para>
A machine with an internet connection that is infected by a
mass-mailing virus or worm. Such machines invariably run a
flavor of the Microsoft&reg; Windows&reg; operating system,
and are almost always in <quote>residential</quote> IP
address blocks. Their owners either do not know or do not
care that the machines are infected, and often, their ISP
will not take any actions to shut them down.
</para>
<para>
Fortunately, there are various DNS blocklists, such as
<quote>dul.dnsbl.sorbs.net</quote>, that incorporate such
"residential" address blocks. You should be able to use
these blocklists to reject incoming mail. Legitimate mail
from residential users should normally go through their
ISP's <quote>smarthost</quote>.
</para>
</glossdef>
</glossentry>
</glossdiv>
</glossary>