old-www/LDP/LG/issue27/lg_answer27.html

<!--startcut =======================================================  -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>The Answer Guy Issue 27</title>
</head>

<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
ALINK="#FF0000">
<!--endcut =========================================================  -->
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <hr> <P>

<!-- ===============================================================  -->
<center>
<H1><A NAME="answer">
<img src="../gx/ans.gif" alt="" border=0 align=middle>
The Answer Guy
<img src="../gx/ans.gif" alt="" border=0 align=middle>
</A></H1> <BR>
<H4>By James T. Dennis,
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a><BR>
Starshine Technical Services, <A HREF="http://www.starshine.org/">
http://www.starshine.org/</A> </H4>
</center>

<p><hr><p>
<H3>Contents:</H3>
<ul>
<li><a HREF="./lg_answer27.html#mccluan">Regarding Compile Errors with
Tripwire 1.2</a>
<li><a HREF="./lg_answer27.html#karsh">Applix Spreadsheet ELF Macro
Language</a>
<li><a HREF="./lg_answer27.html#geene">Answer Guy Issue 18 -- Procmail Spam
Filter</a>
<li><a HREF="./lg_answer27.html#geene2">Great Procmail Article</a>
<li><a HREF="./lg_answer27.html#sindona">Linux Cluster Configuration</a>
<li><a HREF="./lg_answer27.html#holloway">IP Masquerading/Proxy?</a>
</ul>

<p><hr><p>
<!--================================================================-->

<a name="mccluan"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Regarding Compile Errors with Tripwire 1.2
</h3>
<P> <B>
From: Tc McCluan, <A HREF="mailto:tc@4dcomm.com">tc@4dcomm.com</A>
</B> <P><B>
<img align=bottom alt=" " src="../gx/ans2.gif">
I was on http://www.starshine.org/linux/ and since I am unable to
compile Tripwire 1.2 on my system (redhat 4.2 with 2.0.33 kernel)
I am trying all avenues of help.
</B> <P><B>
I have tried the recommendation in the /contrib/README.linux but
I still get the same error message.  I have tried many combinations,
but still no luck.
</B> <P><B>
Following are the list of errors I am getting, hopefully you can spot where
this compile is failing.
Thanks in advance,
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
 You could look for my Tripwire patch at
<P>
	http://www.starshine.org/linux/
<P>
 ... or you could grab the RPM file from any Red Hat "contrib"
 mirror like:
<P>
	ftp://ftp.redhat.com/pub/contrib/i386/tripwire-1.2-1.i386.rpm
<P>
 ... for a precompiled binary or:
<P>
	 ftp://ftp.redhat.com/pub/contrib/SRPMS/tripwire-1.2-1.src.rpm
<P>
 ... for sources that you should be able to build cleanly.
<P>
 So far I really haven't found a tripwire configuration that I really
 like.  I can never quite get the balance between what aspects to ignore
 (permission and ownership changes on /dev/tty*, /dev/pty*, etc) and
 which ones I need to watch.
<P>
 So, if anyone out there as a really good tw.config file that really
 minimizes the superfluous alerts and maximized the intrustion detection,
 I'd like to hear about it.
<P>
 Also if anyone has a YARD or other rescue disk builder that is customized
 for creating write-protected tripwire boot/root diskette sets (for
 periodic integrity auditing of Linux systems) I'd like to see a step-by-step
 Mini-HOWTO or tutorial (maybe as a submission to Linux Gazette).
<P>
-- Jim

<p><hr><p>
<!--================================================================-->

<a name="karsh"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Applix Spreadsheet ELF Macro Language
</h3>
<P> <B>
From: Paul T. Karsh ITTC-237B 8-286-xxxx, <A
HREF="mailto:karchpte@acm.org">karchpte@acm.org</A>
</B> <P><B>
I happened on the Linux Gazette in the process of searching
for some information on "scripting" macros in the Applixware
spreadsheet.  Although this is not strictly a Linux question,
I hope you can help me with some "pointers" (links ?) on how to learn
this language.  The Applixware help is no help and the company
at which I consult does not have the on-line Applixware books nor
the hardcopy "macro" manual.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	I played with Applixware a little bit -- but was highly
	discouraged to find that its file conversion package
	couldn't handle more recent versions of MS Word and Excel.
	That was my main interest in the product since I occasionally
	get file attachments in these proprietary formats -- and
	sometimes they are potential customers.
<P>
	As for the issue of learning this Macro language without
	having the appropriate documentation.  I would ask your
	client where their manuals and/or installation CD is --
	if they can't produce it and are unwilling to order a
	replacement then I would question their decision to use
	the product.
<P>
	Applixware is a commercial product.  Assuming this is
	on a Linux system you'd probably want to contact Red Hat
	Corporation to order replacement manuals (I think RH is
	the sole Linux distributor for Applixware -- just as
	Caldera is the sole distributor for the Linux version
	of WordPerfect).
<P>
	If they have the installation CD -- borrow it and install
	its online documentation on some system somewhere (long
	enough to get the information your need).  Be sure to remove
	that installation unless the appropriate licensing arrangements
	are made, of course.
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
Is there somewhere on the net (FTP or anything) where I can get an intro
to this?  I tried the Applixware site; it just seems to be page after
page of PR.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	I would like to see far more technical content on their
	web site as well.  (The same desire applies to other hardware
	and software company sites).
<P>
-- Jim

<p><hr><p>
<!--================================================================-->

<a name="geene"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Answer Guy Issue 18 -- Procmail Spam Filter
</h3>
<P> <B>
From: Anthony E. Geene, <A HREF="mailto:agreene@pobox.com">
agreene@pobox.com</A>
</B> <P><B>
I'm not a procmail user, but I've found that most spam is sent using
envelope addresses, the standard recipient headers are not addressed to
the actual recipient. So I set up filters to catch my mailing list mail
and any mail that is addressed to a list of my vailid addresses. Other
mail is put elsewhere for later review.
</B> <P><B>
Such a method is relatively simple and would catch all but the
more sophisticated spammers.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	It is a good suggestion.  It doesn't work if you have
	some people that prefer to Bcc: you (use "blind carbon
	copies").  Naturally many people's mail user agents
	(MUA's) like elm, pine, etc don't have obvious options for
	Bcc:'s --  others do (and most Unix/Linux MUA's allow
	some way to do it -- even if it isn't *obvious*).
<P>
	There are probably a number of other "false positive"
	situations.  As you say most automated mailing lists
	have headers that would trigger on your criteria.  The
	obvious response to these problems is to make a list of
	all the exceptional cases (of which you are aware) and
	add appropriate rules to precede your anti-spam filter.
<P>
	In addition it is important to ensure that your disposition
	of apparently bogus messages is a refile to a specific
	mail folder.  You don't want to file it to /dev/null!
<P>
	As you check your "probably junk" folder you can manually
	refile the exceptions -- and optionally add new rules to
	"pre-approve" lists of your favorite correspondents.
<P>
	Note:  if you keep a list of correspondents and a list of
	known spammers, and you write a recipe to check the list
	you may be concerned about the amount of time spent in
	'grep'.  Here's a hint:  keep the list sorted and use the
	'look' command.
<P>
	(The advantage of 'look' is that it does a "binary" search
	(think about successive approximation to "zero in on" the
	desired lines) on a sorted file -- and returns the lines
	that match.  While the overhead of 'grep' grows in a linear
	fashion (the search doubles in time as the file doubles
	in size) that of 'look' grows much more slowly (it's
	proportional to the square root of number of records/lines
	in the file).  Similar results would be attained if one
	used 'dbm' hashes (indexes) -- but there is greater overhead
	in programming (Perl offers modules to support dbm, gdbm,
	ndbm and other hashing libraries -- it also has much higher
	load time overhead as a result of it's generality).
<P>
	The point is that even on a small file (100 lines) I
	can see about a 10% difference in overhead.  After a
	few thousand lines the difference is substantial
	(grep takes twice as long to run).
<P>
	None of this matters much on your personal workstation
	which has only one active user and receives a couple
	hundred e-mail items per day.  However -- if you're
	filtering on the company mailhub, or at your ISP's
	location -- it's worth it to reduce your impact.
<P>
-- Jim

<p><hr><p>
<!--================================================================-->

<a name="geene2"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Great Procmail Article!
</h3>
<P> <B>
From: Anthony E. Geene, <A HREF="mailto:agreene@pobox.com">
agreene@pobox.com</A>
</B> <P><B>
I read your procmail article in issue 14 of the Linux Gazette. It was
the best explanation of how procmail works that I've seen yet.
</B> <P><B>
I just wanted to say Thanks,<BR>
Anthony,
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	Thanks for the feedback.  BTW there is a new article on
	use TDG (The Dotfile Generator) as a GUI front end for
	creating procmail scripts.  I haven't finished reading
	it yet -- but it looks pretty good to me.
<P>
	In your earlier mail you mentioned that you aren't using
	procmail yet.  This article on TDG and my explanation of
	what's going on "under the hood" may yet change that.
	(Also, somewhere on that morass of half-baked pages that
	I keep as a "website" are some links to other procmail
	and mail filtering resources).
<P>
-- Jim

<p><hr><p>
<!--================================================================-->

<a name="sindona"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Linux Cluster configuration
</h3>
<P> <B>
From: Antonio Sindona, <A HREF="mailto:Antonio.Sindona@trinacria.it">
Antonio.Sindona@trinacria.it</A>
</B> <P><B>
I'd like to create a *Linux cluster configuration* to have some degree of
fault-tolerance (Linux normally works ... hardware not always ! ;-)  ). Do
You know if somebody tried to develop something to solve this problem ?
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
 The first place I'd look for info on fault tolerance for
 Linux would be:
	Linux High Availability HOWTO<BR>
        http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
<P>
Then take a look at:
<P>
	Linux Parallel Processing HOWTO<BR>
        http://yara.ecn.purdue.edu/~pplinux/pphowto.html
<P>
... and:
<P>
	MP and Clustering for Linux<BR>
	http://linas.org/linux/mp.html
<P>
 One of the most famous Linux parallel computing projects (which has
 been written up in the _Linux_Journal_ among other places) is the
 Beowulf Project:
<P>
	http://sdcd.gsfc.nasa.gov/ESS/linux.html
<P>
 After you've been overwhelmed by reading all of that you can
 slog through all of the links at:
<P>
	Linux Parallel Processing Using Clusters<BR>
        http://yara.ecn.purdue.edu/~pplinux/ppcluster.html
<P>
 .... which include links to some classic Unix projects like
 "Condor," PVM, and MPI.
<P>
 After reading all of those you'll undoubtedly decide that Linux
 is years ahead of Microsoft in the field of clustering.  (MS'
 "wolfpack" project is still vaporware last I heard).  However,
 lest we grow complacent we should consider some features that Linux
 needs to compete with mainframe and mini clustering technologies
 (like those in VMS, and the ones that HP managed to eke out of
 their aquisition of Apollo -- when they gutted DomainOS, from what
 I hear).
<P>
 The two features Linux needs in order to attain the next level
 of clustering capacity are "transparent checkpointing" and
 "process migration."
<P>
 "Transparent checkpointing" allows the kernel to periodically
 take a comprehensive snapshot of a process' state (to disk or
 to some network filesystem) and allows the OS to restart a
 process "where it left off" in the event of a system failure.
<P>
 (System failures that damage the checkpoint files notwithstanding,
 of course).
<P>
 "Process Migration" allows a node's kernel to push a process
 onto another (presumably less heavily loaded) system.  The process
 continues to run on the new system without any knowlege of the
 transition.
<P>
 At first it seems like "checkpointing" would cost way too much
 in performance.  However, it turns out that relatively little
 of your system's RAM has been modified from the disk images
 (binaries and libraries) in any given time frame.  I've heard
 reliable reports that this has almost trivial overhead on a
 Unix/Linux like system.
<P>
 It's easy to see how "checkpointing" is a necessary feature
 to support process migration.  However, it's not enough.  You
 also need mechanisms to allow the target kernel to give the
 incoming process access to all of the resources that it had
 allocated (open file descriptors, other interprocess channels,
 etc).  For Unix like systems you also have to account for
 the process structure (the PID of the process can't change)
 -- and there has to be some implicit inter-node communications
 to maintain the process groups (to get a process' exit
 status to its parent and to allow members of a process group
 to get status and send signals to it.
<P>
 There have been a number of operating systems that have implemented
 checkpointing and process migration features.  Chorus Mi/X, Berkeley
 Sprite and Amoeba (a project that the father of Minix, Andrew S.
 Tanenbaum, collaborated on) come to mind.
<blockquote>
	(see http://www.am.cs.vu.nl/ for info on Amoeba,
         http://HTTP.CS.Berkeley.EDU/projects/sprite/ for
	 Sprite, and http://www.chorus.com for Chorus Mi/X
	 info).
</blockquote>
 One Unix package that is supposed to offer these features is Softway Ltd's
 Hibernator II.  Just SGI and a Fujitsu mainframe version are supported.
 This is probably an expensive commercial package and we shouldn't
 hold our breath for a Linux port.
<P>
	* http://softway.com.au/softway/products/hibernator.html
<P>
 The MOSIX project also supports transparent process migration (imagine
 that copy of emacs being moved from one overloaded CPU to an idle
 machine while you were using it).  It is currently available on
 BSD/OS.  However we're in luck!  As I was typing this and checking
 my URL's and references I noticed the following statement on their
 pages:
<P>
	``MOSIX for Linux (RedHat) is now under development''
<P>
 (Yay!).
<P>
 You can read more about MOSIX (and see this note yourself) at:
<P>
        http://www.cs.huji.ac.il/mosix/
		(Hebrew University, Israel)<BR>
        http://www.cnds.jhu.edu/mirrors/mosix/txt_main.html
<P>
 One OS project that I've been keeping my eye on for awhile has been
 EROS (http://www.cis.upenn.edu/~eros/).  This isn't widely
 available yet -- but I have high hopes for it.  It will use a
 "persistence" model that implicitly checkpoints the state of the
 entire system (all processes and threads).
<P>
 EROS is not "Unix" though it should eventually support a Unix/Linux
 compatible subsystem (called Dionysix).  The major difference is that
 EROS is a pure "capabilities" system.   ``Capabilities'' are the key to a
 security model that is much different than the traditional identity/group
 (Unix),  process privileges (VMS and Posix.6), and ACL (NT, Netware,
 etc) that are common in other operating systems.  Read Mr. Shapiro's
 web pages for more info on that.
<P>
 I personally think we (in the Linux community) have quite a bit
 to learn from other operating systems -- their strengths and
 their weaknesses.  To anyone of us who would say "But those are
 just obscure systems.  Nobody is running those!"  I would point
 out that millions of PC users still have that same reaction to
 Linux.
<P>
 So, to learn *far* more than you ever wanted to know about
 operating systems *other* than DOS, MacOS, and Unix take a
 look at the links on my short page about OS':
<P>
	 http://www.starshine.org/jim/os/
<P>
-- Jim

<p><hr><p>
<!--================================================================-->

<a name="holloway"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
IP Masquerading/Proxy?
</h3>
<P> <B>
From: Jack Holloway
</B> <P><B>
Ok... I'm alittle foggy on the terminology... if I have a machine on an
ethernet network that is hooked to the internet, and I want all of the other
machines on the network to connect to the internet THROUGH the machine
connected to the internet, I need to use IP masquerading or proxy server
stuff?
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	You can use IP Masquerading and/or any sort of proxy systems.
<P>
	IP Masquerading is a particular form of NAT (network
	address translation).
<P>
	The one machine (your Linux box) that is connected to
	your LAN and to the Internet is the "router" or "gateway."
	("routers" work at the "transport" layer, while "gateways"
	work at the "applications" layer of the OSI reference model).
	(More on that later).
<P>
	One "real" (IANA issued) IP address is assigned to the
	"outer" interface and attached to the Internet (through
	your ISP). This will typically be a PPP link through your
	router/gateway's modem -- though it might be any network
	interface that you can get Linux to use.
<P>
	One the other interface (typically an ethernet card) you
	assign one out of any of the "private" or "reserved for
	disconnected networks" IP address ranges as defined in
	RFC1918 (previously in RFC1597 and 16??).  These RFC1918
	addresses are guaranteed to never be issued to any
	Internet host (so those of use using them on our networks
	will never create an ambiguity with *our* router by attempting
	to access a machine *outside* our network that has an IP
	address that duplicates one *inside* of our network).
<P>
	The RFC1918 address blocks are:
<PRE>
		10.*.*.*				(one class A net)
		172.16.*.* through 172.31.*.*		(16 class B's)
		192.168.0.* through 192.168.255.*	(255 class C's)
</PRE>
	You can pick any of those RFC1918 address blocks and
	you can subnet them anyway that's convenient.  I use
	192.168.64.0 for my home LAN.
<P>
	Within my LAN I use the .1 address (192.168.64.1) for my
	Linux gateway/router's ethernet -- it gets its other (real)
	IP address dynamically from my ISP when 'diald' establishes
	a connection (diald is a daemon that automatically invokes my
	ppp connection whenever traffic routing to the network is
	required -- I actually have another RFC1918 address assigned
	to the SLIP connection that diald uses for internal purposes).
	I run a caching nameserver on this box (which we'll call "gw").
<P>
	All systems on my LAN execute a line like the following:
<PRE>
		route add -net 192.168.64.0 eth0
</PRE>
	... in their rc scripts at some point.  This configures
	them to all agree where packets for this network go.
	This is called a "static" route.
<P>
	I then point the /etc/resolv.conf on all of the "client" machines
	on my LAN to "gw" and add a default route to each of them that
	looks like:
<PRE>
		route add default gw 192.168.64.1
			# other traffic goes to host named "gw"
</PRE>
	(the "client" machines don't have to be Linux and don't
	have to have any special support for IP Masquerading --
	you just assign them IP addresses like 192.168.64.2, etc.
	to each of them).
<P>
	In the "gw" server I have the kernel compiled with masquerading
	and "forwarding" support enabled (of course).  I don't put in
	the default static route -- that would be a loop.  "gw" also
	has a different /etc/resolv.conf file -- one that points to
	a couple of my ISP nameservers.
<P>
	Note:  One trick I've learned about resolv.conf files --
	You only get three nameserver entries (in most versions of
	the bind libraries) -- so I repeat the first and the last
	one.  When a query times out (for a client) it moves to the
	second nameserver.  Meanwhile the first nameserver still
	has a good chance of getting a response  (DNS over today's
	busy Internet times out more often than nameservers fail).
	So, a timeout on the second nameserver leads to a repeat
	request on the first one -- which has probably received
	and cached a response by this time.  I could explain that
	in more detail -- but the real gist is:  try it.  It helps.
<P>
	Now, back to masquerading:
<P>
	All it takes for masquerading to work is to run the command
<PRE>
		LAN="192.168.64.0/24"
		ipfwadm -F -a accept -m -S $LAN -D 0.0.0.0/0
</PRE>
	... which means:
<P>
		use the "IP firewall administrative" program to
		make the following change to the "forwarding"  (-F)
		table:
<blockquote>
			add/append (-a) a rule to accept for
			masquerading (-m) any packet from (-S
			or "source address") my LAN (which is a
			shell variable I defined in the preceding
			line) that is going to (whose "destination"
  			-D) is anywhere (0.0.0.0/0).
</blockquote>
	Here's how that works.  When the kernel receives a packet
	that's not destined for the localhost (the gateway itself)
	it checks to see if forwarding is enabled, then it looks in
	the routing table to see where the packet should go.  My
	gateway's default route is pointing to the sl0 interface
	(the SLIP interface that diald maintains to detect outgoing
	traffic) -- when diald detects traffic on sl0 -- it
	runs my PPP connection script which changes the default
	route to point to my ISP's routers (which is part of the
	information that's negotiated via PPP along with my
	dynamic IP address).  Now the packet is "forwarded" from
	interface to the other. Assuming that the packet came
	from my LAN (via the ethernet card in "gw" the kernel's
	packet filtering ("firewall") code takes over.
<P>
	ipfw inspects the packet to see if it was part of an
	existing TCP session (part of a connection that it has
	already been working with).  If it is than ipfw notes
	the TCP "port" that this session is assigned to, otherwise
	ipfw just picks another port.  If it picks a new port it
	adds an entry to it's masquerading table that records the
	packet's original source address and source port.  The
	"client" machine on my LAN is expecting any reply packets to
	come back to the appropriate source port (which is how it
	knows which process' "socket" to write the reply packets to)
	-- ipfw then re-writes the packet headers, changing the source
	address to match the one on ppp0 (the "real IP address for
	which my ISP knows a route), and changing the source port to
	the one it selected.
<P>
	When ipfw receives reply packets the kernel routes them to
	sockets which ipfw owns (the source port on my outgoing
	packets becomes the destination port on the reply packets).
	ipfw then looks that socket up in its table, retrieves the
	*original* source addr and port (for the outgoing packet that
	generated this reply) rewrites the destination fields (on the
	*reply* packet).  Finally the (now re-written) packet is
	routed to the LAN.
<P>
	Effectively IP Masquerading makes a whole LAN full of machines
	look like one really busy one to the rest of the Internet.
	While a typical workstation might only have a few dozen
	active network connections available, a masquerading gateway
	might have hundreds or thousands.  As a practical matter
	the TCP/IP protocol provides a 16 bit field for "ports" and
	Most Unix systems can't handle more than a few thousand
	concurrent open connections (sockets) and file descriptors.
	(This has to do with the tables that the kernel allocates
	for the data structures that manage all this -- regardless
	of whether masquerading is active or not).  Luckily you're
	unlikely to have enough bandwidth to approach Linux'
	capacity.
<P>
	I'm sorry for the length of that description.  Note that
	it is purely conceptual (I've never read the code, I've
	just deduced what it must be doing from what I know of
	how TCP works).
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
Ouch!  That's a big question there!  Ok, firstly, do own IPs for every
machine on your network? (That is, do you have an internet unique IP for
each machine)  If so, all you want is routed.  If you don't, then to
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	'routed' is deprecated.  In addition he doesn't need routed
	or gated to talk to his ISP (and almost certainly can't
	use it with them -- they won't listen to his routes unless
	he goes out and gets an AS number and negotiates a contract
	for "peering" with them which would absurd unless he were
	becoming a multi-home ISP or something like that).
<P>
	The case where routed or gated makes sense is with his
	own internetwork of LAN's.  If he has several ethernet
	segments and is moving systems around them frequently
	(or adding new IP devices to them) then it would be
	be useful.  For simpler and for more structured LANs
	(each ether segment gets a subnet -- a global, static
	routing table is distributed to all routers) you don't
	need or want 'routed' or 'gated'.
<P>
	If he had a block of ISP (or IANA) issued IP addresses,
	his ISP would have to include routing to them (they
	don't make sense otherwise).  Usually this amounts to
	some static routes that they maintain in their systems
	-- specifically some entries that are invoked whenever
	your system authenticates on one of their terminal servers
	or routers.
<P>
	You don't have to run any software on your end to make
	use of this routing.  (That's a confusing statement --
	you have to run PPP or SLIP to connect to them -- but
	once you're connected they will route packets to you
	even if your routes back to them are completely missing).
<P>
	As I've described above -- you just have to have your
	own LAN routing set up properly.  That means that each
	system on your LAN has "-net" routes unto your ethernet
	and a "default gw" route to your router/gateway (masquerading
	host).
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
browse the web you can use a proxy server(which looks to the outside world
as if only the proxy is actually on the net.).  If you want to telnet etc.
out, you will need IP-Masquerading, which isn't the most reliable way of
doing things.  ask me further in email if you need more detail!
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
	I disagree with several points here.  Both masquerading *and*
	proxying look like "only the proxy is actually on the net."
	-- because only the router/gateway has an IP address with
	valid Internet routes.  The rest of your LAN is "hidden"
	(behind your "gw") because those IP addresses don't have
	valid Internet routes.  The are IP addresses but they are
	not *Internet* addresses!
<P>
	Proxying is an applications layer solution.  Masquerading
	and NAT are transport layer.  The difference is what
	data structures the software is dealing with.
<P>
	At the network layer we're working with "data frames."  This
	is what an ethernet bridge or switch uses -- the MAC (BIA)
	addresses.  That's also the layer at which ARP (address
	resolution protocol) works.  It's how one host finds
	finds the ethernet card address of another system that's
	on the same LAN (how our client machines "find" our router/gw).
<P>
	At the transport layer we deal with packets. These have IP
	addresses (as opposed to the MAC -- media access control --
	addresses in the ethernet "frame" header).  This is where
	the masquerading happens.  As I've described masquerading
	involves a relatively "dumb" (mechanical) bit of packet
	patching with some table reference and maintenance.
	Technically there are some details I left out -- like
	recomputing the packet checksums.
<P>
	The problem is that the transport layer conveys no information
	about the applications protocol for which it is a carrier.
	For "normal" TCP protocols (like HTTP and telnet) this is
	no problem.  However, FTP and a few other protocols do "bad"
	things.  In particular an FTP session consists of *two* TCP
	sessions (a control session which is initiated from the client
	to the server) and a data session which is initiated from the
	server back to the client!  The IP address and port to which
	this "return connection" goes is passed to the server via
	the control connection.  This last detail has caused more
	firewall designers and admins to rip out their hair than
	all the cheap combs from China.  In the context of masquerading
	it means that the masquerading server must monitor the
	*data* (the stuff in the payloads of the packets) and make
	some selective patches therein.  In the other cases we only
	touched the headers of each packet -- never the contents of
	their payloads.
<P>
	So, this is the part of Masquerading that is unreliable.
	Linux IP Masquerading is by no means the only flavor --
	though it's probably the most widely used by now.  Linux
	as several modules for dealing with unruly protocols --
	so the usually work.
<P>
	However, I've found it more reliable to use the TIS FWTK
	ftp-gw (Trusted Information Systems http://www.tis.com,
	Firewall Toolkit).  This is a proxy.
<P>
	Proxy packages work at the applications layer.  You have
	to have support for each applications protocol (http, ftp,
	telnet, rlogin, smtp, etc) that you want to allow "through"
	your firewall.  They come in two forms:  SOCKS and FWTK
	(There are many of them besides these -- but all of them
	follow one *model* or the other).
<P>
	In the FWTK model the user opens his or her initial
	connection to the firewall (I 'ftp' to gw.starshine.org).
	The firewall (gateway) is running the FWTK proxy *instead
	of* (or *in addition to*) the normal server (ftpd).  If
	it is "in addition to" than one or the other must be on a
	different port or using a different IP Alias on the machine
	(more on that later).  Now my FTP server (ftp-gw) prompts
	me to "login"
<P>
	For a normal FTP server I'd type my name (or "ftp" or "anonymous").
	For ftp-gw I'm trying to go *though* this machine and unto
	one that's on the other side (on the Internet).  So I have to
	provide more information.  So I type:
<P>
		ftp@sunsite.unc.edu
<P>
	... or
<P>
		webauthor@www.svlug.org
<P>
	... or whatever.  The gateway ftp server then opens a connection
	to my target (everthing *after* the @ sign) and passes my
	name (everything before the @ sign) to *its* login prompt.
<P>
	The TIS FWTK comes with a number of other small proxies --
	and most of them work in a similar fasion.  (There are
	also options to limit *who* can access *what* and *when*
	(via administrator edited access control lists).
<P>
	The key point here is that FWTK doesn't require any special
	client software support.  The users have to be trained how
	to traverse the firewall and the have to remember how to do it.
<P>
	FWTK is only appropriate for relative small groups of technically
	savvy users (who are easy to train in this and won't make the
	sysadmin's life a constant hell of walking everyone through this
	extra connectivity step).
<P>
	SOCKS has a model that works for larger groups of less savvy
	users.  However, it requires that you install SOCKS aware
	versions of your client applications.  So you have to replace
	your normal telnet, ftp, rlogin, etc with a  "socksified"
	version.  In many cases it is possible to effectively
	"socksify" all of your client utilities by replacing a shared
	library (Unix/Linux) or a DLL (Windows).  Many commercial
	TCP clients and utilities are built with SOCKS support
	(Netscape Navigator and Communicator are prime examples).
	I think the Trumpet shareware utilities for Windows are
	another.
<P>
	The hassle is installing and configuring this software on every
	client system.  However, the advantage is that none of the users
	has to remember, or even know, about the firewall.  The SOCKS
	applications will automatically negotiate sessions through the
	firewall.
<P>
	There are some protocols that are inherently easy or even
	unnecessary to proxy. For example DNS doesn't need to
	be proxied.  You run your caching copy of named and let
	all of the client machines talk to and trust it.  This
	gives a great performance boost to most of the clients
	and saves quite a bit of bandwidth on the critical link
	to the ISP.  There is no reason that I can think of not
	to run a caching nameserver somewhere on your Internet
	connected LAN.
<P>
	HTTP is a protocol that benefits quite a bit from proxying.
	It is trivial to add caching features a web proxy -- and
	I think just about all of them do so.
<P>
	SMTP is a protocol that doesn't need proxying (from the
	standpoint of the clients on your LAN).  You configure
	an internally accessible system to accept mail and
	it will relay it to your gateway via whatever means
	you configure.   A typical model would be that outgoing
	mail is collected on an internal hub, which is configured
	to relay it to the external gateway, which, in turn,
	relays it to the ISP and on to the world.  To see what
	this looks like read the "Received" headers in some of your
	mail.
<P>
	The externally visible mail gateway can route mail
	back to the internal hub -- which can run POP and/or
	IMAP servers for the clients to use to actually get
	their mail.  (You could have the internal hub route
	all of the mail directly to people's desktops via
	SMTP too.
<P>
	The reason you generally don't need proxying for
	SMTP is that most sites use some form of masquerading
	(mail appears to come from the "domain" rather than from
	a particular host whithin the domain).  FWTK includes
	smapd -- and there is an independent and free smtpd
	which act as proxies for sendmail.  Here the intend is
	to have a small simple program receive mail and
	pass it along to the larger, vastly more complicated
	'sendmail' itself.  (I don't want to get into the
	raging debates about sendmail vs. qmail etc -- suffice
	it to say there are many alternatives).
<P>
	Note that masquerading and proxying are not mutually
	exclusive.  I use masquerading and I have ftp-gw and
	squid (caching web service) installed.  I could
	also install SOCKS on the same gateway.
<P>
	Incidentally I mentioned that it's possible to run ftpd
	and ftp-gw on the same machine without putting them on
	different ports.  Here's two ways of doing that:
<P>
	IP Aliasing method:
<ol>
<li>		You install ftpd and ftp-gw
<li>		You create an IP Alias (you add an extra address
		    to your gateway system's internal interface with
		    a command like:
<PRE>
				ifconfig eth0:1 192.168.64.129
</PRE>
<li>		You configure your TCP Wrappers to virtual host
		    a service by adding a line like this to your
		    /etc/hosts.allow file:
<PRE>
     in.ftpd@192.168.64.129: 192.168.64. : twist  /usr/local/fwtk/ftp-gw
</PRE>
		   This will "twist" any ftp request *to that IP alias*
		   into an ftp-gw session.  FTP requests to any other
		   interface address will be handled in the usual way
		   (tcpd will launch the ftp daemon that's listed in
		   inetd.conf).
</ol>
	   That's all there is to that method.  Note that you can
	   to other interesting things with this sort of virtual
	   hosting, if you're clever.
<P>
	Loopback Twist method:
<ol>
<li>		Install ftpd and ftp-gw (as you would for
		  the other method).
<P>
<li>		Configure tcp wrappers to allow normal ftp
		  access *from* the localhost address (127.0.0.1)
<P>
<li>		Configure tcp wrappers to twist any other
		  ftp requests into ftp-gw
</ol>
	   That looks like this (in the /etc/hosts.allow file):
<PRE>
in.ftpd: 127.0.0.1 : ALLOW
in.ftpd : ALL : twist  /usr/local/fwtk/ftp-gw
</PRE>
	    WARNING! This second line would allow *anyone*
	    (from inside or outside) of your LAN to access the
	    proxy.  However, ftp-gw reads a file --
            /usr/local/etc/netperm-table according to the way I
	    compiled mine -- to determine who is allowed to access
	    each of its proxy services.
<P>
	    So, this line is neither as dangerous as it looks
	    nor as safe as it should be.  Changing it to:
<PRE>
in.ftpd : LOCAL : twist  /usr/local/fwtk/ftp-gw
</PRE>
	   ... is safer and more appropriate.
<P>
	One key point here is that you can use proxies on
	your masquerading route/gateway to allow access from
	the "outside" back *into* services inside your LAN.
	Usually you want to prevent this (the whole point of
	a firewall).  However you can use tcpd and netperm to
	allow specific 'friendly' networks to get to servers
	on one of your LAN's, despite the fact that there are
	no routes directly to those machines.
<P>
	This brings us back to other forms of NAT.  I mentioned
	at the get-go that masquerading is one form of NAT.  It
	specifically involves a "many to one" arrangement.
	(The "many" clients on your LAN appearing as "one" connection
	to the Internet).
<P>
	Another form of NAT is "many to many" -- where you have a
	table translations.  Thus each of your systems might be
	configured to use one address, and be translated to appear as
	if it came from anoter.  I personally don't see much use for
	this arrangement.  The one case I could see for it might be
	if you had a net of devices that you couldn't renumber, which
	had "illegal" or "invalid" addresses.
<P>
	One other form of NAT involves a different "many to many"
	translation -- its not currently available for Linux but
	it's used in the Cisco Local Director product.  This is a
	trick for doing IP level load balancing.   You have a
	"reverse masquerade" host accept requests to "a" busy
	server (one service on one IP address) and you have it
	masquerade the session to any of multiple "inside" machines
	that have the same service and content available.
<P>
	For load balancing it's trivially easy to use DNS "round
	robin records" -- so I don't see much application for this
	form of NAT either.
<P>
	Anyway -- that's all I have the energy to type for now.
<P>
	I hope this explains the terms and concepts and gives you
	enough examples to set up what you want.  For the most part
	you can just use the one magic ipfwadm command to "turn on"
	masquerading.  The rest is just the configuration of your
	network and of your ISP connection -- which you've presumably
	already done.
<P>
-- Jim

<!--================================================================-->
<P> <hr> <P>
<center><H4>Previous "Answer Guy" Columns</H4></center>
<P>
<A HREF="../issue13/answer.html">Answer Guy #1, January 1997</A><BR>
<A HREF="../issue14/answer.html">Answer Guy #2, February 1997</A><br>
<A HREF="../issue15/answer.html">Answer Guy #3, March 1997</A><br>
<A HREF="../issue16/answer.html">Answer Guy #4, April 1997</A><br>
<A HREF="../issue17/answer.html">Answer Guy #5, May 1997</A><br>
<A HREF="../issue18/lg_answer18.html">Answer Guy #6, June 1997</A><br>
<A HREF="../issue19/lg_answer19.html">Answer Guy #7, July 1997</A><br>
<A HREF="../issue20/lg_answer20.html">Answer Guy #8, August 1997</A><br>
<A HREF="../issue21/lg_answer21.html">Answer Guy #9, September 1997</A><br>
<A HREF="../issue22/lg_answer22.html">Answer Guy #10, October 1997</A><br>
<A HREF="../issue23/lg_answer23.html">Answer Guy #11, December 1997</A><br>
<A HREF="../issue24/lg_answer24.html">Answer Guy #12, January 1998</A><br>
<A HREF="../issue25/lg_answer25.html">Answer Guy #13, February 1998</A><br>
<A HREF="../issue26/lg_answer26.html">Answer Guy #14, March 1998</A>
<P><HR><P>
<center><H5>Copyright &copy; 1998, James T. Dennis <BR>
Published in <I>Linux Gazette</I> Issue 27 April 1998</H5></center>
<P> <hr> <P>
<!--================================================================-->
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif" ALT="[ TABLE OF
CONTENTS ]"></A> <A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="lg_bytes27.html"><IMG SRC="../gx/back2.gif" ALT=" Back "></A>
<A HREF="./kodis.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<!--startcut =======================================================  -->
</body>
</html>
<!--endcut =========================================================  -->