old-www/HOWTO/Spam-Filtering-for-MX/gloss.html

1192 lines
20 KiB
HTML
Raw Permalink Blame History

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>Glossary</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="Spam Filtering for Mail Exchangers"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Final ACLs"
HREF="exim-final.html"><LINK
REL="NEXT"
TITLE="GNU General Public License"
HREF="gpl.html"></HEAD
><BODY
CLASS="glossary"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Spam Filtering for Mail Exchangers: </TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="exim-final.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="gpl.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="GLOSSARY"
><H1
><A
NAME="glossary"
></A
>Glossary</H1
><BLOCKQUOTE
CLASS="ABSTRACT"
><DIV
CLASS="abstract"
><A
NAME="AEN2126"
></A
><P
></P
><P
>&#13; These are definitions for some of the words and terms that are
used throughout this document.
</P
><P
></P
></DIV
></BLOCKQUOTE
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2128"
></A
>B</H1
><DL
><DT
><A
NAME="bayesian"
></A
><B
>Bayesian Filters</B
></DT
><DD
><P
>&#13; A filter that assigns a probability of spam based on the
recurrence of words (or, more recently, word
constellations/phrases) between messages.
</P
><P
>&#13; You initially train the filter by feeding it known junk mail
(spam) and known legitimate mail (ham). A bayesian score
is then be assigned to each word (or phrase) in each
message, indicating whether this particular word or phrase
occurs most commonly in ham or in spam. The word, along
with its score, is stored in a <EM
>bayesian
index</EM
>.
</P
><P
>&#13; Such filters may catch indicators that may be missed by
human programmers trying to manually create keyword-based
filters. At the very least, they automate this task.
</P
><P
>&#13; Bayesian word indexes are most certainly specific to the
language in which they received training. Moreover, they are
specific to individual users. Thus, they are perhaps more
suitable for individual content filters (e.g. in <A
HREF="gloss.html#mua"
><I
CLASS="glossterm"
>Mail User Agent</I
></A
>s) than they are for system-wide, SMTP-time
filtering.
</P
><P
>&#13; Moreover, spammers have developed techniques to defeat
simple bayesian filters, by including random dictionary
words and/or short stories in their messages. This
decreases the spam probability assigned by a baynesian
filter, and in the long run, degrades the quality of the
bayesian index.
</P
><P
>&#13; See also:
<A
HREF="http://www.everything2.com/index.pl?node=Bayesian"
TARGET="_top"
>http://www.everything2.com/index.pl?node=Bayesian</A
>.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2142"
></A
>C</H1
><DL
><DT
><A
NAME="coldamage"
></A
><B
>Collateral Damage</B
></DT
><DD
><P
>&#13; Blocking of a legitimate sender host due to an entry in a
DNS blocklist.
</P
><P
>&#13; Some blocklists (like SPEWS) routinely list the entire IP
address space of an ISP if they feel the ISP is not
responsive to abuse complaints, thereby affecting
<EM
>all</EM
> its customers.
</P
><P
>&#13; See also: <A
HREF="gloss.html#falsepos"
><I
CLASS="glossterm"
>False Positive</I
></A
>
</P
></DD
><DT
><A
NAME="colspam"
></A
><B
>Collateral Spam</B
></DT
><DD
><P
>&#13; Automated messages sent in response to an original message
(mostly spam or malware) where the sender address is forged.
Typical examples of collateral spam include virus scan
reports (<SPAN
CLASS="QUOTE"
>"You have a virus"</SPAN
>) or other <A
HREF="gloss.html#dsn"
><I
CLASS="glossterm"
>Delivery Status Notification</I
></A
>s).
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2158"
></A
>D</H1
><DL
><DT
><A
NAME="dns"
></A
><B
>Domain Name System</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: DNS</EM
>) The de-facto standard for
obtaining information about internet domain names. Examples
of such information include IP addresses of its servers
(so-called <EM
>A records</EM
>), the dedication
of incoming mail exchangers (MX records), generic server
information (SRV records), and miscellaneous text
information (TXT records).
</P
><P
>&#13; DNS is a hierarctical, distributed system; each domain name
is associated with a set of one or more DNS servers that
provide information about that domain - including delegation
of name service for its subdomains.
</P
><P
>&#13; For instance, the top-level domain <SPAN
CLASS="QUOTE"
>"org"</SPAN
> is
operated by The Public Interest Registry; its DNS servers
delegate queries for the domain name <SPAN
CLASS="QUOTE"
>"tldp.org"</SPAN
>
to specific name servers for The Linux Documentation
Project. In turn, TLDPs name server (actually operated by
UNC) may or may not delegate queries for third-level names,
such as <SPAN
CLASS="QUOTE"
>"www.tldp.org"</SPAN
>.
</P
><P
>&#13; DNS lookups are usually performed by forwarding name
servers, such as those provided by an Internet Service
Provider (e.g. via DHCP).
</P
></DD
><DT
><A
NAME="dsn"
></A
><B
>Delivery Status Notification</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: DSN</EM
>) A message automatically
created by an MTA or MDA, to inform the sender of an
original messsage (usually included in the DSN) about its
status. For instance, DSNs may inform the sender of the
original message that it could not be delivered due to a
temporary or permanent problem, and/or whether or not and
for how long delivery attempts will continue.
</P
><P
>&#13; Delivery Status Notifications are sent with an empty <A
HREF="gloss.html#envfrom"
><I
CLASS="glossterm"
>Envelope Sender</I
></A
> address.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2179"
></A
>E</H1
><DL
><DT
><A
NAME="envfrom"
></A
><B
>Envelope Sender</B
></DT
><DD
><P
>&#13; The e-mail address given as sender of a message during the
SMTP transaction, using the <B
CLASS="command"
>MAIL FROM:</B
>
command. This may be different from the address provided in
the <SPAN
CLASS="QUOTE"
>"From:"</SPAN
> header of the message itself.
</P
><P
>&#13; One special case is <A
HREF="gloss.html#dsn"
><I
CLASS="glossterm"
>Delivery Status Notification</I
></A
> (bounced message,
return receipt, vacation message..). For such mails, the
<A
HREF="gloss.html#envfrom"
><I
CLASS="glossterm"
>Envelope Sender</I
></A
> is empty. This is to prevent
<A
HREF="gloss.html#loop"
><I
CLASS="glossterm"
>Mail Loop</I
></A
>s, and generally to be able to
distinguish these from <SPAN
CLASS="QUOTE"
>"regular"</SPAN
> mails.
</P
><P
>&#13; See also: <A
HREF="smtpintro.html"
>The SMTP Transaction</A
>
</P
></DD
><DT
><A
NAME="envto"
></A
><B
>Envelope Recipient</B
></DT
><DD
><P
>&#13; The e-mail address(es) to which the message is sent. These
are provided during the SMTP transaction, using the
<B
CLASS="command"
>RCPT TO</B
> command. These may be different
from the addresses provided in the <SPAN
CLASS="QUOTE"
>"To:"</SPAN
> and
<SPAN
CLASS="QUOTE"
>"Cc:"</SPAN
> headers of the message itself.
</P
><P
>&#13; See also: <A
HREF="smtpintro.html"
>The SMTP Transaction</A
>
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2203"
></A
>F</H1
><DL
><DT
><A
NAME="falseneg"
></A
><B
>False Negative</B
></DT
><DD
><P
>&#13; Junk mail (spam, virus, malware) that is misclassified as
legitimate mail (and consequently, not filtered out).
</P
></DD
><DT
><A
NAME="falsepos"
></A
><B
>False Positive</B
></DT
><DD
><P
>&#13; Legitimate mail that is misclassified as junk (and
consequently, blocked).
</P
><P
>&#13; See also: <A
HREF="gloss.html#coldamage"
><I
CLASS="glossterm"
>Collateral Damage</I
></A
>.
</P
></DD
><DT
><A
NAME="fqdn"
></A
><B
>Fully Qualified Domain Name</B
></DT
><DD
><P
>&#13; (a.k.a. <SPAN
CLASS="QUOTE"
>"FQDN"</SPAN
>). A full, globally unique,
internet name, including DNS domain. For instance:
<SPAN
CLASS="QUOTE"
>"www.yahoo.com"</SPAN
>.
</P
><P
>&#13; A <EM
>FQDN</EM
> does not always point to a single
host. For instance, common service names such as
<SPAN
CLASS="QUOTE"
>"www"</SPAN
> often point to many IP addresses, in
order to provide some load balancing on the servers.
However, the <EM
>primary</EM
> host name of a
given machine should always be unique to that machine; for
instance: <SPAN
CLASS="QUOTE"
>"p16.www.scd.yahoo.com"</SPAN
>.
</P
><P
>&#13; A <EM
>FQDN</EM
> always contains a period (".").
The part before the first period is the
<EM
>unqualified name</EM
>, and is not globally
unique.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2229"
></A
>J</H1
><DL
><DT
><A
NAME="joejob"
></A
><B
>Joe Job</B
></DT
><DD
><P
>&#13; A spam designed to look like it came from someone else's
valid address, often in a malicous attempt at generating
complaints from third parties and/or cause other damage to
the owner of that address.
</P
><P
>&#13; See also:
<A
HREF="http://www.everything2.com/index.pl?node=Joe%20Job"
TARGET="_top"
>http://www.everything2.com/index.pl?node=Joe%20Job</A
>
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2237"
></A
>M</H1
><DL
><DT
><A
NAME="mda"
></A
><B
>Mail Delivery Agent</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: MDA</EM
>) Software that runs on the
machine where a users' mailbox is located, to deliver mail
into that mailbox. Often, that delivery is performed
directly by the MTA <A
HREF="gloss.html#mta"
><I
CLASS="glossterm"
>Mail Transport Agent</I
></A
>, which then
serves a secondary role as an MDA. Examples of separate
Mail Delivery Agents include: Deliver, Procmail, Cyrmaster
and/or Cyrdeliver (from the Cyrus IMAP suite).
</P
></DD
><DT
><A
NAME="loop"
></A
><B
>Mail Loop</B
></DT
><DD
><P
>&#13; A situation where one automated message triggers another,
which directly or indirectly triggers the first message
over again, and so on.
</P
><P
>&#13; Imagine a mailing list where one of the subscribers is the
address of the list itself. This situation is often dealt
with by the list server adding an <SPAN
CLASS="QUOTE"
>"X-Loop:"</SPAN
>
line in the message header, and not processing mails that
already have one.
</P
><P
>&#13; Another equivalent term is <EM
>Ringing</EM
>.
</P
></DD
><DT
><A
NAME="mta"
></A
><B
>Mail Transport Agent</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: MTA</EM
>) Software that runs on a
mail server, such as the mail exchanger(s) of a internet
domain, to send mail to and receive mail from other hosts.
Popular MTAs include: Sendmail, Postfix, Exim, Smail.
</P
></DD
><DT
><A
NAME="mua"
></A
><B
>Mail User Agent</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: MUA</EM
>; a.k.a. <EM
>Mail
Reader</EM
>) User software to access, download, read,
and send mail. Examples include Microsoft Outlook/Outlook
Express, Apple Mail.app, Mozilla Thunderbird, Ximian
Evolution.
</P
></DD
><DT
><A
NAME="mx"
></A
><B
>Mail Exchanger</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: MX</EM
>) A machine dedicated to
(sending and/or) receiving mail for an internet domain.
</P
><P
>&#13; The DNS zone information for a internet domain normally
contains a list of <A
HREF="gloss.html#fqdn"
><I
CLASS="glossterm"
>Fully Qualified Domain Name</I
></A
>s that act as
incoming mail exchangers for that domain. Each such listing
is called an <SPAN
CLASS="QUOTE"
>"MX record"</SPAN
>, and it also contains
a number indicating its <SPAN
CLASS="QUOTE"
>"priority"</SPAN
> among
several <SPAN
CLASS="QUOTE"
>"MX records"</SPAN
>. The listing with the
lowest number has the first priority, and is considered the
<SPAN
CLASS="QUOTE"
>"primary mail exchanger"</SPAN
> for that domain.
</P
></DD
><DT
><A
NAME="micropay"
></A
><B
>Micropayment Schemes</B
></DT
><DD
><P
>&#13; (a.k.a. <EM
>sender pay</EM
> schemes). The
sender of a message expends some machine resources to create
a virtual <EM
>postage stamp</EM
> for each
recipient of a message - usually by solving a mathematical
challenge that requires a large number of memory read/write
operations, but is relatively CPU speed insensitive. This
stamp is then added to the headers of the message, and the
recipient would validate the stamp through a much simpler
decoding operation.
</P
><P
>&#13; The idea is that because the message requires a postage
stamp for every recipient address, spamming hundreds or
thousands of users at once would be prohibitively
"expensive".
</P
><P
>&#13; Two such systems are:
</P
><P
></P
><UL
><LI
><P
>&#13; <A
HREF="http://www.camram.org/"
TARGET="_top"
>Camram</A
>
</P
></LI
><LI
><P
>&#13; <A
HREF="http://research.microsoft.com/research/sv/PennyBlack/"
TARGET="_top"
>Microsoft's
Penny Black Project</A
>
</P
></LI
></UL
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2290"
></A
>O</H1
><DL
><DT
><A
NAME="openproxy"
></A
><B
>Open Proxy</B
></DT
><DD
><P
>&#13; A <A
HREF="gloss.html#proxy"
><I
CLASS="glossterm"
>proxy</I
></A
> which openly accepts TCP/IP
connections from anywhere, and forwards them anywhere.
</P
><P
>&#13; These are typically exploited by spammers and virii, who
use them to conceal their own IP address, and/or to more
effectively distribute transmission loads across several
hosts and networks.
</P
><P
>&#13; See also: <A
HREF="gloss.html#zombie"
><I
CLASS="glossterm"
>Zombie Host</I
></A
>
</P
></DD
><DT
><A
NAME="openrelay"
></A
><B
>Open Relay</B
></DT
><DD
><P
>&#13; A <A
HREF="gloss.html#relay"
><I
CLASS="glossterm"
>Relay</I
></A
> which openly accepts mail from
anywhere, and forwards them to anywhere.
</P
><P
>&#13; In the 1980s, virtually every public SMTP server was an
<A
HREF="gloss.html#openrelay"
><I
CLASS="glossterm"
>Open Relay</I
></A
>. Messages would often travel
between multiple third-party machines before it reached the
intended recipient. Now, legitimate mail are almost
exclusively sent directly from an outgoing <A
HREF="gloss.html#mta"
><I
CLASS="glossterm"
>Mail Transport Agent</I
></A
> on the sender's end to the incoming <A
HREF="gloss.html#mx"
><I
CLASS="glossterm"
>Mail Exchanger</I
></A
>(s) for the recipient's domain.
</P
><P
>&#13; Conversely, <A
HREF="gloss.html#openrelay"
><I
CLASS="glossterm"
>Open Relay</I
></A
> servers that still
exist on the internet are almost exclusively exploited by
spammers to hide their own identity, and to perform some
load balancing on the task of sending out millions of
messages, presumably before DNS blocklists have a chance to
get all of these machines listed.
</P
><P
>&#13; See also the discussion on <A
HREF="smtpchecks.html#relayprevent"
>Open Relay Prevention</A
>.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2313"
></A
>P</H1
><DL
><DT
><A
NAME="proxy"
></A
><B
>proxy</B
></DT
><DD
><P
>&#13; A machine that acts on behalf of someone else. It may
forward e.g. HTTP requests or TCP/IP connections, usually to
or from the internet. For instance, companies - or
sometimes entire countries - often use <SPAN
CLASS="QUOTE"
>"Web Proxy
Servers"</SPAN
> to filter outgoing HTTP requests from their
internal network. This may or may not be transparent to the
end user.
</P
><P
>&#13; See also: <A
HREF="gloss.html#openproxy"
><I
CLASS="glossterm"
>Open Proxy</I
></A
>, <A
HREF="gloss.html#relay"
><I
CLASS="glossterm"
>Relay</I
></A
>.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2323"
></A
>R</H1
><DL
><DT
><A
NAME="ratware"
></A
><B
>Ratware</B
></DT
><DD
><P
>&#13; Mass-mailing virii and e-mail software used by spammers,
specifically designed to deliver large amounts of mail in a
very short time.
</P
><P
>&#13; Most ratware implementations incorporate only as much SMTP
client code as strictly neccessary to deliver mail in the
best-case scenario. They provide false or inaccurate
information in the SMTP dialogue with the receiving host.
They do not wait for responses from the receiver before
issuing commands, and disconnect if no response has been
received in a few seconds. They do not follow normal retry
mechanisms in case of temporary failures.
</P
></DD
><DT
><A
NAME="relay"
></A
><B
>Relay</B
></DT
><DD
><P
>&#13; A machine that forwards e-mail, usually to or from the
internet. One example of a relay is the
<SPAN
CLASS="QUOTE"
>"smarthost"</SPAN
> that an ISP provides to its
customers for sending outgoing mail.
</P
><P
>&#13; See also: <A
HREF="gloss.html#openrelay"
><I
CLASS="glossterm"
>Open Relay</I
></A
>, <A
HREF="gloss.html#proxy"
><I
CLASS="glossterm"
>proxy</I
></A
>
</P
></DD
><DT
><A
NAME="rfc"
></A
><B
>Request for Comments</B
></DT
><DD
><P
>&#13; (<EM
>abbrev: RFC</EM
>) From
<A
HREF="http://www.rfc-editor.org/"
TARGET="_top"
>http://www.rfc-editor.org/</A
>:
<SPAN
CLASS="QUOTE"
>"
The Request for Comments (RFC) document series is a set of
technical and organizational notes about the internet
[...]. Memos in the RFC series discuss many aspects of
computer networking, incluing protocols, procedures,
programs, and concepts, as well as meeting notes,
opinions, and sometimes humor.
"</SPAN
>
</P
><P
>&#13; These documents make up the <SPAN
CLASS="QUOTE"
>"rules"</SPAN
> internet
conduct, including descriptions of protocols and data
formats. Of particular interest for mail deliveries are:
<P
></P
><UL
><LI
><P
>&#13; <A
HREF="http://www.ietf.org/rfc/rfc2821"
TARGET="_top"
>RFC
2821</A
>, "Simple Mail transfer Protocol", and
</P
></LI
><LI
><P
>&#13; <A
HREF="http://www.ietf.org/rfc/rfc2821"
TARGET="_top"
>RFC
2822</A
>, "Internet Message Format".
</P
></LI
></UL
>
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2354"
></A
>S</H1
><DL
><DT
><A
NAME="spamtrap"
></A
><B
>Spam Trap</B
></DT
><DD
><P
>&#13; An e-mail address that is <EM
>seeded</EM
> to
address-harvesting robots via public locations, then used to
feed collaborative tools such as <A
HREF="dnschecks.html#dnsbl"
>DNS Blacklists</A
> and
<A
HREF="datachecks.html#jmsr"
>Junk Mail Signature Repository</A
>.
</P
><P
>&#13; Mails sent to these addresses are normally spam or malware.
However, some of it will be <EM
>collateral</EM
>,
spam - i.e. <A
HREF="gloss.html#dsn"
><I
CLASS="glossterm"
>Delivery Status Notification</I
></A
> to faked sender addresses.
Thus, unless the spam trap has safeguards in place to
disregard such messages, the resulting tool may not be
completely reliable.
</P
></DD
></DL
></DIV
><DIV
CLASS="glossdiv"
><H1
CLASS="glossdiv"
><A
NAME="AEN2366"
></A
>Z</H1
><DL
><DT
><A
NAME="zombie"
></A
><B
>Zombie Host</B
></DT
><DD
><P
>&#13; A machine with an internet connection that is infected by a
mass-mailing virus or worm. Such machines invariably run a
flavor of the Microsoft<66> Windows<77> operating system,
and are almost always in <SPAN
CLASS="QUOTE"
>"residential"</SPAN
> IP
address blocks. Their owners either do not know or do not
care that the machines are infected, and often, their ISP
will not take any actions to shut them down.
</P
><P
>&#13; Fortunately, there are various DNS blocklists, such as
<SPAN
CLASS="QUOTE"
>"dul.dnsbl.sorbs.net"</SPAN
>, that incorporate such
"residential" address blocks. You should be able to use
these blocklists to reject incoming mail. Legitimate mail
from residential users should normally go through their
ISP's <SPAN
CLASS="QUOTE"
>"smarthost"</SPAN
>.
</P
></DD
></DL
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="exim-final.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="gpl.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Final ACLs</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>GNU General Public License</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>