old-www/LDP/LG/issue32/tag_cluster.html

<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html><head>
<META NAME="generator" CONTENT="lgazmail v1.1pre9c">
<TITLE>The Answer Guy 32:
Web server clustering project
</TITLE>
<!-- ORIGINAL SUBJECT:
Web server clustering project
JTD SUBTITLE:

-->
</head>

<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
ALINK="#FF0000">
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <hr> <P>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<H1 align="center"><A NAME="answer">
	<img src="../gx/dennis/qbubble.gif" alt="" border="0" align="middle">
	<a href="./index.html">The Answer Guy</a>
	<img src="../gx/dennis/bbubble.gif" alt="" border="0" align="middle">
</A></H1>
<BR>
<H4 align="center">By James T. Dennis,
	<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a>
	<BR>Starshine Technical Services, <A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
</H4>
<p><hr><p>
<!--endcut ========================================================= -->
<H3><img src="../gx/dennis/qbub.gif" alt="(?)"width="50" height="28"
	align="left" border="0">Web server clustering project</H3>


<p><strong>From Jim Kinney
in the</strong>
	<a href="news:comp.unix.questions">comp.unix.questions</a>
	<strong>newsgroup on 22 Jul 1998 </strong></p>

<!-- begin body -->


<STRONG><P>
I am starting the research into the design and implementation of a 3
node cluster to provide high availability web, database, and support
services to a computer based physics lab. As envisioned, the primary
interface machine will be the web server. The database that provides
the dynamic web pages will be on a separate machine. Some other
processes that accept input from the web process and output to the
database will be on the third machine.
</P></STRONG>


<BLOCKQUOTE><IMG SRC="../gx/dennis/bbub.gif" ALT="(!)" width="50" height="28" border="0" lign="bottom"
>Have you looked at the &quot;High Availability HOWTO&quot;?
</blockquote>


<code><blockquote><blockquote
><A HREF="http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html"
>http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html</A>
</blockquote></blockquote></code>

<blockquote>There's also the common "round robin DNS" model ---
which is already used by many service providers.  It
has its limitations --- but it's the first thing
to try if the clients can be configured to gracefully
retry transactions on failure.
</blockquote>

<blockquote>There's also the MOSIX project which was developed
under BSD/OS and is allegedly being ported to Linux.
This provides for process migration (again, more of
a performance clustering and load balancing feature set).
</blockquote>

<code><blockquote><blockquote
><A HREF="http://www.cs.huji.ac.il/mosix/">http://www.cs.huji.ac.il/mosix/</A>
</blockquote></blockquote></code>

<blockquote>However, there is another concept called "checkpointing."
You can think of this as having regular, transparent,
non-terminal "core dumps" (snapshots) taken of each
process (or process group).   These are written to
disk and can be reloaded and restarted at the point where
they left off.  I'm not aware of any projects to
provide checkpointing to Linux (or checkpointing subsystems).
(Obviously any application can do its own checkpointing
in a non-transparent fashion --- roughly equivalent to the
periodic automatic saves performed by '<TT>emacs</TT>' and other editors).
</blockquote>

<blockquote>I have a pointer to some miscellaneous notes on checkpointing:
</blockquote>

<code><blockquote><blockquote
	><A HREF="http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint/"
	>http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint/</A>
</blockquote></blockquote></code>

<blockquote>The implication here is that you could create a hybrid
checkpointing <EM>and</EM> process migration model that would
provide high availability.  In a client/server context
this would probably only be suitable for situations where
the communications protocols were very robust --- and
it might still require some IP and/or MAC address assumption
or some specialized routing tricks.
</blockquote>

<blockquote>One such routing trick might be the IP NAT project.
IP masquerading is one form of NAT (allowing many
clients to masquerade as a single proxy system).
</blockquote>


<code><blockquote><blockquote
	><A HREF="http://www.csn.tu-chemnitz.de/~mha/"
	>http://www.csn.tu-chemnitz.de/~mha/</A>
</blockquote></blockquote></code>

<blockquote>Another form of NAT is many-to-many.  Let's say you
connected two disconnected sites that both chose
<tt>10.1.*.*</tt> addresses for their use --- you could put
a NAT router between them that would bidirectionally
translate the <tt>10.1.*.*</tt> to corresponding <tt>172.16.*.*</tt>
addresses.  Thus the two sites would be able to
interoperate over a broader range of protocols than
would be the case for IP masqurading --- since the
TCP/UDP ports would not be re-written --- each <tt>10.1.*.*</tt>
address corresponds on a one-to-one basis with a <tt>172.16.*.*</tt>
counterpart.
</blockquote>

<blockquote>The one other form of NAT is one-to-many (or "load
balancing").  This makes one simple router look like a server.
In actuality that "server" is just dispatching the packets it
receives to any one of the backend servers it chooses
(statistically or based on metrics that they communicate
amongst themselves, privately).  Cisco has a product called
"Local Director" that does exactly this.  One of the experimental
versions of the Linux IP NAT code also appeared to do this
with some success.  I don't know if any further work as
progressed on these lines.
</blockquote>

<blockquote>Yet another approach that might make sense is to provide
for replication of the data (files) across servers and
to use protocols that transparent select among available
servers (mirrors).  This sounds just like CODA.
</blockquote>

<code><blockquote><blockquote>
<A HREF="http://www.coda.cs.cmu.edu/top.html"
	>http://www.coda.cs.cmu.edu/top.html</A>
</blockquote></blockquote></code>


<BLOCKQUOTE>A less sophisticated approach to replication is to
use the <TT>rsync</TT> package to maintain some failover servers
(mirrors) --- and require that writes all go to one
active server.
</blockquote>


<STRONG><P><IMG SRC="../gx/dennis/qbub.gif" ALT="(?)" width="50" height="28" border="0" lign="bottom"
>So, I am open to suggestions, comments, info, links to sites, book
titles, etc. I have proposed a one year development time for the whole
cluster, with a single machine application prototype of the user
visible/used portion by around the Jan 1999. I love my job!
</p></strong>

<P><STRONG>Jim Kinney M.S.
<BR>Educational Technology Specialist
<BR>Department of Physics
<BR>Emory University
</strong></p>


<BLOCKQUOTE><IMG SRC="../gx/dennis/bbub.gif" ALT="(!)" width="50" height="28" border="0" lign="bottom"
>Web, mail, DNS and a number of other Internet services
are naturally robust.  With DNS you normally list up to
three servers per host (in the <TT>/etc/resolv.conf</TT>) and all
of these will be checked before a name lookup will fail.
With SMTP the client will try each of the hosts listed
in the results of an MX query.  Round robin DNS will force
most clients to try multiple different IP addresses on
failure most of the time.
</blockquote>


<BLOCKQUOTE>However, the applications that really need HA (fail over)
and clustering for performance are things like db servers.
</blockquote>


<BLOCKQUOTE>Having two systems monitor and process something like
a set of db transactions in parallel (one active the
other "mimic'ing" the first but not returning results)
would be very interesting.  The "mimic" would attempt
to maintain the same applications state as the server
--- and would assume the server's IP and MAC (ethernet
media access control) addresses on failure --- to then
transparently continue the transaction processing
that was going on.
</blockquote>


<BLOCKQUOTE>You might prototype such a system using web and ftp
(the FTP application is a more dramatic demonstration
--- since a web server involves many short transactions
and mostly operates in a "disconnected" fashion).
</blockquote>


<BLOCKQUOTE>One approach might be to have a custom ethernet
driver that can be instructed to throw all of its
output into the bit bucket.  Thus the mimic is
normally silent, but following a failure on the
server it does the address assumption and rips off
the muzzle.  I suspect you'd have to have another
interface between the two servers, one which is
dedicated to maintaining the same state between
the server and the mimic.
</blockquote>


<BLOCKQUOTE>(For example if the server get a collision or
an error that wasn't sensed by the mimic -- or
vice versa -- the two might get horribly out
of sync when the upper layer protocols require
a resend.  With special drivers the two systems
might resolve these discrepancies at the kernel/driver
layer --- so that the applications will always get
the same data on their sockets).
</blockquote>


<BLOCKQUOTE>I really have no idea how much tweaking this would
take and whether or not it's even feasible.
</blockquote>


<BLOCKQUOTE>However, it seems that your intent is to provide
failover that is transparent to the applications
layer.  So, the work obviously has to happen below that.
</blockquote>


<BLOCKQUOTE>It is unclear whether you are primarily interested
in deploying a set of servers for use by your
Physics team or whether you are interested in
doing research and development in the computer science.
</blockquote>

<blockquote>In any event your project will probably involve a
hybrid of several of these approaches:
</blockquote>

<ul>
<li>Round robin DNS
<li>Failover with IP and MAC address assumption.
    (and/or load balancing NAT).
<li>Replication and or "mirroring" (more failover).
<li>Multi-initiator SCSI (where a single SCSI
    bus has multiple computers active on it, such
    that these computers have shared access to
    the attached peripheral devices).
</ul>

<blockquote>It would be very interesting to see someone develop
process migration and checkpointing features for Linux
though there doesn't seem to be any active work going
on now.
</blockquote>


<BLOCKQUOTE>I'd also love to see an
"<a href="http://google.stanford.edu/linux?query=doc:cesdis.gsfc.nasa.gov/linux-web/beowulf/beowulf.html">Beowulf</a> enabled" SQL dbserver
(where a couple of failover capable "dispatchers" could
farm out transactions to multiple clustered Linux boxes
in some sensible manner).  I'm not even sure if that's
feasible --- but it sure would knock down the scaleability
walls that I hear about from those dbadmins.
</blockquote>

<!-- end body -->

<!--startcut =======================================================  -->
<P> <hr> <P>
<H5 align="center"><a href="http://www.linuxgazette.com/copying.html"
	>Copyright &copy;</a> 1998, James T. Dennis <BR>
Published in <I>Linux Gazette</I> Issue 32 September 1998</H5>
<P> <hr> <P>

<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
<table width="98%"><tr valign="center" align="center">
<td rowspan="3"><A HREF="./lg_answer32.html"><IMG
        SRC="../gx/dennis/answernew.gif"
        ALT="[ Answer Guy Index ]"></A></td>
  <td><A HREF="tag_phreak.html">phreak</A>
  <td><A HREF="tag_abandon.html">abandon</A>
  <td><A HREF="tag_javaterm.html">javaterm</A>
  <td><A HREF="tag_BBS.html">BBS</A>
  <td><A HREF="tag_flaws.html">flaws</A>
  <td><A HREF="tag_doslinux.html">doslinux</A>
  <td><A HREF="tag_resume.html">resume</A>

</tr><tr valign="center" align="center">
  <td><A HREF="tag_softwindows.html">softwindows</A>
  <td><A HREF="tag_convert.html">convert</A>
  <td><A HREF="tag_apache.html">apache</A>
  <td><A HREF="tag_emulate.html">emulate</A>
  <td><A HREF="tag_database.html">database</A>
  <td><A HREF="tag_distrib.html">distrib</A>
  <td><A HREF="tag_proxy.html">proxy</A>

</tr><tr valign="center" align="center">
  <td><A HREF="tag_disable.html">disable</A>
  <td><A HREF="tag_DVI.html">DVI</A>
  <td><A HREF="tag_superblock.html">superblock</A>
  <td><A HREF="tag_serial.html">serial</A>
  <td><A HREF="tag_permission.html">permission</A>
  <td><A HREF="tag_detach.html">detach</A>
  <td><A HREF="tag_cdr.html">cdr</A>

</tr><tr valign="center" align="center">
  <td><A HREF="tag_rs422.html">rs422</A>
  <td><A HREF="tag_modem.html">modem</A>
  <td><A HREF="tag_notfound.html">notfound</A>
  <td><A HREF="tag_tuning.html">tuning</A>
  <td><A HREF="tag_libc5.html">libc5</A>
  <td><A HREF="tag_startup.html">startup</A>
  <td><A HREF="tag_clock.html">clock</A>
  <td><A HREF="tag_ping.html">ping</A>

</tr><tr valign="center" align="center">
  <td><A HREF="tag_accounts.html">accounts</A>
  <td><A HREF="tag_lilo.html">lilo</A>
  <td><A HREF="tag_NDS.html">NDS</A>
  <td><A HREF="tag_95slow.html">95slow</A>
  <td><A HREF="tag_nonlinux.html">nonlinux</A>
  <td><A HREF="tag_progenv.html">progenv</A>
  <td><A HREF="tag_cluster.html">cluster</A>
  <td><A HREF="tag_ftpd.html">ftpd</A>

</tr></table>
<P> <hr> <P>
<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif"
        ALT="[ Table Of Contents ]"></A>
<A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
        ALT="[ Front Page ]"></A>
<A HREF="lg_bytes32.html"><IMG SRC="../gx/back2.gif"
        ALT="[ Previous Section ]"></A>
<A HREF="./stemen.html"><IMG SRC="../gx/fwd.gif"
        ALT="[ Next Section ]"></A>
<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
</body>
</html>
<!--endcut ========================================================= -->