LDP/LDP/howto/docbook/openMosix-HOWTO/openMosix_What_Is.sgml

484 lines
17 KiB
Plaintext

<CHAPTER ID="WHAT">
<TITLE>So what is openMosix Anyway ? </TITLE>
<SECT1><TITLE>A very, very brief introduction to clustering </TITLE>
<PARA>
Most of the time, your computer is bored. Start a program like xload
or top that monitors your system use, and you will probably find that
your processor load is not even hitting the 1.0 mark. If you have two
or more computers, chances are that at any given time, at least one
of them is doing nothing. Unfortunately, when you really do need CPU
power - during a C++ compile, or encoding Ogg Vorbis music files -
you need a lot of it at once. The idea behind clustering is to spread
these loads among all available computers, using the resources that
are free on other machines.
</PARA>
<PARA>
The basic unit of a cluster is a single computer, also called a
"node". Clusters can grow in size - they "scale" - by adding more
machines. A cluster as a whole will be more powerful the faster the
individual computers and the faster their connection speeds are. In
addition, the operating system of the cluster must make the best use
of the available hardware in response to changing conditions. This
becomes more of a challenge if the cluster is composed of different
hardware types (a "heterogeneous" cluster), if the configuration of
the cluster changes unpredictably (machines joining and leaving the
cluster), and the loads cannot be predicted ahead of time.
</PARA>
<SECT2>
<TITLE>A very, very brief introduction to clustering </TITLE>
<SECT3>
<TITLE>HPC vs Fail-over vs Load-balancing</TITLE>
<PARA>
Basically there are 3 types of clusters,
Fail-over, Load-balancing and HIGH Performance Computing,
The most deployed ones are probably the Failover cluster and the Load-balancing Cluster.
</PARA>
<itemizedlist><listitem>
<PARA><emphasis>Fail-over Clusters</emphasis> consist of 2 or more network connected
computers with a separate heartbeat connection between the 2 hosts.
The Heartbeat connection between the 2 machines is being used to
monitor whether all the services are still in use: as soon as a
service on one machine breaks down the other machines try to take
over.
</PARA> </listitem>
<listitem>
<PARA>
With load-balancing clusters the concept is that when a request for
say a web-server comes in, the cluster checks which machine is the
least busy and then sends the request to that machine. Actually most
of the times a Load-balancing cluster is also a Fail-over cluster but
with the extra load balancing functionality and often with more nodes.
</PARA>
</listitem>
<listitem>
<PARA>
The last variation of clustering is the High Performance Computing
Cluster: the machines are being configured specially to give data
centers that require extreme performance what they need. Beowulfs
have been developed especially to give research facilities the
computing speed they need. These kind of clusters also have some
load-balancing features; they try to spread different processes to
more machines in order to gain performance. But what it mainly comes
down to in this situation is that a process is being parallelized and
that routines that can be ran separately will be spread on different
machines instead of having to wait till they get done one after
another.
</PARA>
</listitem>
</itemizedlist>
<para>Most common known examples of loadbalancing and failover clusters are webfarms, databases or firewalls.
People want to have a 99,99999% uptime for their services, the internet is open 24/24 7/7/ 365/365
not unlike in the old days when you could shut down your server when the office closed.
</para><para>People that are in need of cpu cycles often can afford to schedule downtime for their environments,
as long as they can use the maximum power of their machines when they need it.
</para>
</SECT3>
<SECT3>
<TITLE>Supercomputers vs. clusters</TITLE>
<PARA>
Traditionally Supercomputers have only been built by a
selected number of vendors: a company or organization that required
the performance of such a machine had to have a huge budget available
for its Supercomputer. Lots of universities could not afford the
costs of a Supercomputer by themselves, therefore other alternatives
were being researched by them. The concept of a cluster was born when
people first tried to spread different jobs over more computers and
then gather back the data those jobs produced. With cheaper and more
common hardware available to everybody, results similar to real
Supercomputers were only to be dreamed of during the first years, but
as the PC platform developed further, the performance gap between a
Supercomputer and a cluster of multiple personal computers became
smaller.
</PARA>
</SECT3>
<SECT3>
<TITLE>Cluster models [(N)UMA, PVM/MPI]</TITLE>
<PARA>
There are different ways of doing parallel processing: (N)UMA, DSM,
PVM and MPI are all different kinds of Parallel Processing schemes. Some of them are implemented in hardware,
others in software, others in both.
</PARA>
<PARA>(N)UMA ((Non-)Uniform Memory Access), machines for example have
shared access to the memory where they can execute their code. In the
Linux kernel there is a NUMA implementation that varies the memory
access times for different regions of memory. It then is the kernel's
task to use the memory that is the closest to the CPU it is using.
</PARA>
<para><emphasis>DSM</emphasis> aka Distributed Shared memory, has been implemented in both software and hardware ,
the concept is to provide an abstraction layer for physically distributed memory.
</para>
<PARA>
PVM and MPI are the tools that are most commonly being used when people
talk about GNU/Linux based Beowulfs. </para><Para> MPI stands for Message Passing
Interface. It is the open standard specification for message passing
libraries. MPICH is one of the most used implementations of MPI. Next
to MPICH you also can find LAM, another implementation of MPI based on
the free reference implementation of the libraries.
</PARA>
<PARA>
PVM or Parallel Virtual Machine is another cousin of MPI that is also
quite often being used as a tool to create a Beowulf. PVM lives in
user space so no special kernel modifications are required: basically
each user with enough rights can run PVM.
</PARA>
</SECT3>
<SECT3>
<TITLE>openMosix's role</TITLE>
<PARA>
The openMosix software package turns networked computers running
GNU/Linux into a cluster. It automatically balances the load between
different nodes of the cluster, and nodes can join or leave the
running cluster without disruption of the service. The load is
spread out among nodes according to their connection and CPU speeds.
</PARA>
<PARA>
Since openMosix is part of the kernel and maintains full
compatibility with Linux, a user's programs, files, and other
resources will all work as before without any further changes. The
casual user will not notice the difference between a Linux and an
openMosix system. To her, the whole cluster will function as one
(fast) GNU/Linux system.
</PARA>
<PARA>
openMosix is a Linux-kernel patch which provides full compatibility
with standard Linux for IA32-compatible platforms. The internal
load-balancing algorithm transparently migrates processes to other
cluster members. The advantage is a better load-sharing between the
nodes. The cluster itself tries to optimize utilization at any time
(of course the sysadmin can affect the automatic load-balancing by
manual configuration during runtime).
</PARA>
<PARA>
This transparent process-migration feature makes the whole cluster
look like a BIG SMP-system with as many processors as available
cluster-nodes (of course multiplied with X for X-processor systems
such as dual/quad systems and so on). openMosix also provides a
powerful optimized File System (oMFS) for HPC-applications, which
unlike NFS provides cache, time stamp and link consistency.
</PARA>
</SECT3>
</SECT2>
</SECT1>
<SECT1><TITLE>The story so far</TITLE>
<SECT2><TITLE>Historical Development</TITLE>
<PARA>
Rumours say that Mosix comes from Moshe Unix.
Initially Mosix started out as an application running on BSD/OS 3.0.
<PROGRAMLISTING>
Announcing MO6 for BSD/OS 3.0
Oren Laadan (orenl@cs.huji.ac.il)
Tue, 9 Sep 1997 19:50:12 +0300 (IDT)
Hi:
We are pleased to announce the availability of MO6 Version 3.0
Release 1.04 (beta-4) - compatible with BSD/OS 3.0, patch level
K300-001 through M300-029.
MO6 is a 6 processor version of the MOSIX multicomputer enhancements
of BSD/OS for a PC Cluster. If you have 2 to 6 PC's connected by a
LAN, you can experience truly multi-computing environment by using
the MO6 enhancements.
The MO6 Distribution
--------------------
MO6 is available either in "source" or "binary" distribution. It is
installed as a patch to BSD/OS, using an interactive installation
script.
MO6 is available at http://www.cnds.jhu.edu/mirrors/mosix/
or at our site: http://www.cs.huji.ac.il/mosix/
Main highlights of the current release:
--------------------------------------
- Memory ushering (depletion prevention) by process migration.
- Improved installation procedure.
- Enhanced migration control.
- Improved administration tools.
- More user utilities.
- More documentation and new man pages.
- Dynamic configurations.
Please send feedback and comments to mosix@cs.huji.ac.il.
-------------------
</programlisting>
GNU/Linux was chosen as a development platform for the 7th
incarnation in 1999.
Early 1999 Mosix M06 Beta was released for Linux 2.2.1
At the end of 2001 and early 2002 openMosix, the open version of Mosix was
born (more in the next paragraph).
</PARA>
</SECT2>
<SECT2><TITLE>openMosix</TITLE>
<!-- Contributed by Matt -->
<PARA>
openMosix is in addition to whatever you find at mosix.org and in full
appreciation and respect for Prof. Barak's leadership in the
outstanding Mosix project.
</PARA>
<PARA>
Moshe Bar has been involved for a number of years with the Mosix
project (www.mosix.com) and was co-project manager of the Mosix
project and general manager of the commercial Mosix company.
</PARA>
<PARA>
After a difference of opinions on the commercial future of Mosix, he
has started a new clustering company - Qlusters, Inc. - and
Prof. Barak has decided not to participate for the moment in this
venture (although he did seriously consider joining) and held long
running negotiations with investors.
It appears that Mosix is not any longer supported openly as a GPL
project. Because there is a significant user base out there (about
1000 installations world-wide), Moshe Bar has decided to continue the
development and support of the Mosix project under a new name:
openMosix and under the full GPL2 license. Whatever code in openMosix
comes from the old Mosix project is Copyright 2002 by Amnon Barak. All
the new code is Copyright 2002 by Moshe Bar.
</PARA>
<PARA>
There could (and will) be significant changes in the architecture of
the future openMosix versions. New concepts about auto-configuration,
node-discovery and new user-land tools are being discussed in the
openMosix mailing lists. Most of these new functionalities are already
implemented while some of them, such as DSM (Distributed Shared
Memory) are still being worked on at the moment I write this (march
2003).
</PARA>
<PARA>
To approach standardization and future compatibility the
proc-interface has changed from /proc/mosix to /proc/hpc and the
/etc/mosix.map was changed to /etc/hpc.map. More recently the standard
for the config file has been set to be located in /etc/openmosix.map
(this is in fact the first config file the /etc/init.d/openmosix
script will look for). Adapted command-line user-space tools for
openMosix are already available on the web-page of the project.
</PARA>
<PARA>
The openmosix.map config file can be replaced with a
node-auto-discovery system which is called omdiscd (openMosix auto
DISCovery Daemon) about which we will discuss later.
</PARA>
<PARA>
openMosix is supported by various competent people (see
openmosix.sourceforge.net) working together around the world. The
main goal of the project is to create a standardized
clustering-environment for all kinds of HPC-applications.
</PARA>
<PARA>
openMosix has also a project web-page at <ulink
url="http://openMosix.sourceforge.net"><citetitle>http://openMosix.sourceforge.net</citetitle></ulink>
with a CVS tree and mailing-lists for developers as well as users.
</PARA>
</SECT2>
<SECT2><TITLE>Current state</TITLE>
<PARA>
Like most active Open Source programs, openMosix's rate of change
tends to outstrip the followers' ability to keep the documentation
up to date.
</para>
<para>
As I write this part in February 2003 openMosix 2.4.20 is available
and openMosix Userland Tools v0.2.4 are available, including the new
autodiscovery tools.
</para>
<para>For a more recent state of development please take a look at the <ulink
url="http://openmosix.sf.net/"><citetitle>openMosix website</citetitle></ulink>
</para>
</sect2>
<sect2><title>Which applications work</title>
<para>It is almost impossible to give a list off all the applications that
work with openMosix. The community however tries to keep track of the
applications that
<ulink url="http://howto.ipng.be/openMosixWiki/index.php/work%20smoothly">
<citetitle>
migrate
</citetitle></ulink>
and the ones who
<ulink url="http://howto.ipng.be/openMosixWiki/index.php/don%27t"><citetitle>don't</citetitle></ulink>.
</para>
</SECT2>
</SECT1>
<SECT1><TITLE>openMosix in action: An example</TITLE>
<PARA>
openMosix clusters can take various forms. To demonstrate this, let's
assume you are a student and share a dorm room with a rich computer
science guy, with whom you have linked computers to form an openMosix
cluster. Let's also assume you are currently converting music files
from your CDs to Ogg Vorbis for your private use, which is legal in
your country. Your roommate is working on a project in C++ that he
says will bring World Peace. However, at just this moment he is in
the bathroom doing unspeakable things, and his computer is idle.
</PARA>
<PARA>
So when you start a program like bladeenc to convert Bach's
.... from .wav to .ogg format, the openMosix routines on your
machine compare the load on both nodes and decide that things will
go faster if that process is sent from your Pentium-233 to his
Athlon XP. This happens automatically: you just type or click your
commands as you would if you were on a standalone machine. All you
notice is that when you start two more coding runs, things go a lot
faster, and the response time doesn't go down.
</PARA>
<PARA>
Now while you're still typing ...., your roommate comes back,
mumbling something about red chili peppers in cafeteria food. He
resumes his tests, using a program called 'pmake', a version of
'make' optimized for parallel execution. Whatever he's doing, it
uses up so much CPU time that openMosix even starts to send
subprocesses to your machine to balance the load.
</PARA><PARA>
This setup is called *single-pool*: all computers are used as a
single cluster. The advantage/disadvantage of this is that your
computer is part of the pool: your stuff will run on other
computers, but their stuff will run on yours too.
</PARA>
</SECT1>
<SECT1><TITLE>Components</TITLE>
<SECT2><TITLE>Process migration</TITLE>
<PARA>
With openMosix you can start a process on one machine and find out it
actually runs on another machine in the cluster. Each process has its
own Unique Home Node (UHN) where it gets created.
</PARA>
<PARA>
Migration means that a process is splitted in 2 parts, a user part and
a system part. The user part will be moved to a remote node while the
system part will stay on the UHN. This system-part is sometimes
called the deputy process: this process takes care of resolving most
of the system calls.
</para><para>
openMosix takes care of the communication between these 2 processes.
</PARA>
</SECT2>
<SECT2><TITLE>The openMosix File System (oMFS)</TITLE>
<PARA>
<emphasis>Please note that oMFS has been removed from the openMosix patch as of 2.4.26-om1</emphasis>
</para>
<para>
oMFS is a feature of openMosix which allows you to access remote
filesystems in a cluster as if they were locally mounted. The
filesystems of your other nodes can be mounted on /mfs and you will,
for instance, find the files in /home on node 3 on each machine in
/mfs/3/home.
</PARA>
</SECT2>
<SECT2><TITLE>Direct File System Access (DFSA) </TITLE>
<PARA>
Both Mosix and openMosix provide a cluster-wide file-system (MFS) with
the DFSA-option (Direct File-System Access). It provides access to all
local and remote file-systems of the nodes in a Mosix or openMosix
cluster.
</PARA>
</SECT2>
</SECT1>
<!--
<SECT1><TITLE>Work in Progress</TITLE>
<SECT2><TITLE>Network RAM</TITLE>
<PARA>
</PARA>
</SECT2>
<SECT2><TITLE>Migratable sockets</TITLE>
<PARA>
</PARA>
</SECT2>
<SECT2><TITLE>High availability</TITLE>
<PARA>
</PARA>
</SECT2>
<sect2><title>Other Features</title>
<para>Other features such as ClusterMask and Autodetection have been implemented and will be discussed somewhere
else in this document.
</para>
</sect2>
</SECT1>
-->
<SECT1><title>openMosix Test Drive
</title>
<para>
In support of openMosix, Major Chai Mee Joon is giving OM users a free
trial account to his online openMosix cluster service, which users can
use to test and experiment openMosix with.
</para>
<para>
The availability of this online openMosix cluster service will help
both new users overcome the initial openMosix configuration issues,
and also provides higher computing power to openMosix users who are
developing or porting their applications.
</para>
<para>Please send an email to <email>om@majorlinux.com</email> for a trial account.
</para>
</sect1>