mirror of https://github.com/tLDP/LDP
484 lines
17 KiB
Plaintext
484 lines
17 KiB
Plaintext
<CHAPTER ID="WHAT">
|
|
<TITLE>So what is openMosix Anyway ? </TITLE>
|
|
<SECT1><TITLE>A very, very brief introduction to clustering </TITLE>
|
|
<PARA>
|
|
|
|
Most of the time, your computer is bored. Start a program like xload
|
|
or top that monitors your system use, and you will probably find that
|
|
your processor load is not even hitting the 1.0 mark. If you have two
|
|
or more computers, chances are that at any given time, at least one
|
|
of them is doing nothing. Unfortunately, when you really do need CPU
|
|
power - during a C++ compile, or encoding Ogg Vorbis music files -
|
|
you need a lot of it at once. The idea behind clustering is to spread
|
|
these loads among all available computers, using the resources that
|
|
are free on other machines.
|
|
|
|
</PARA>
|
|
<PARA>
|
|
|
|
The basic unit of a cluster is a single computer, also called a
|
|
"node". Clusters can grow in size - they "scale" - by adding more
|
|
machines. A cluster as a whole will be more powerful the faster the
|
|
individual computers and the faster their connection speeds are. In
|
|
addition, the operating system of the cluster must make the best use
|
|
of the available hardware in response to changing conditions. This
|
|
becomes more of a challenge if the cluster is composed of different
|
|
hardware types (a "heterogeneous" cluster), if the configuration of
|
|
the cluster changes unpredictably (machines joining and leaving the
|
|
cluster), and the loads cannot be predicted ahead of time.
|
|
|
|
</PARA>
|
|
|
|
<SECT2>
|
|
<TITLE>A very, very brief introduction to clustering </TITLE>
|
|
<SECT3>
|
|
<TITLE>HPC vs Fail-over vs Load-balancing</TITLE>
|
|
<PARA>
|
|
Basically there are 3 types of clusters,
|
|
Fail-over, Load-balancing and HIGH Performance Computing,
|
|
The most deployed ones are probably the Failover cluster and the Load-balancing Cluster.
|
|
|
|
</PARA>
|
|
<itemizedlist><listitem>
|
|
<PARA><emphasis>Fail-over Clusters</emphasis> consist of 2 or more network connected
|
|
computers with a separate heartbeat connection between the 2 hosts.
|
|
The Heartbeat connection between the 2 machines is being used to
|
|
monitor whether all the services are still in use: as soon as a
|
|
service on one machine breaks down the other machines try to take
|
|
over.
|
|
</PARA> </listitem>
|
|
<listitem>
|
|
<PARA>
|
|
With load-balancing clusters the concept is that when a request for
|
|
say a web-server comes in, the cluster checks which machine is the
|
|
least busy and then sends the request to that machine. Actually most
|
|
of the times a Load-balancing cluster is also a Fail-over cluster but
|
|
with the extra load balancing functionality and often with more nodes.
|
|
</PARA>
|
|
</listitem>
|
|
<listitem>
|
|
<PARA>
|
|
The last variation of clustering is the High Performance Computing
|
|
Cluster: the machines are being configured specially to give data
|
|
centers that require extreme performance what they need. Beowulfs
|
|
have been developed especially to give research facilities the
|
|
computing speed they need. These kind of clusters also have some
|
|
load-balancing features; they try to spread different processes to
|
|
more machines in order to gain performance. But what it mainly comes
|
|
down to in this situation is that a process is being parallelized and
|
|
that routines that can be ran separately will be spread on different
|
|
machines instead of having to wait till they get done one after
|
|
another.
|
|
</PARA>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>Most common known examples of loadbalancing and failover clusters are webfarms, databases or firewalls.
|
|
People want to have a 99,99999% uptime for their services, the internet is open 24/24 7/7/ 365/365
|
|
not unlike in the old days when you could shut down your server when the office closed.
|
|
</para><para>People that are in need of cpu cycles often can afford to schedule downtime for their environments,
|
|
as long as they can use the maximum power of their machines when they need it.
|
|
</para>
|
|
</SECT3>
|
|
|
|
<SECT3>
|
|
<TITLE>Supercomputers vs. clusters</TITLE>
|
|
<PARA>
|
|
|
|
Traditionally Supercomputers have only been built by a
|
|
selected number of vendors: a company or organization that required
|
|
the performance of such a machine had to have a huge budget available
|
|
for its Supercomputer. Lots of universities could not afford the
|
|
costs of a Supercomputer by themselves, therefore other alternatives
|
|
were being researched by them. The concept of a cluster was born when
|
|
people first tried to spread different jobs over more computers and
|
|
then gather back the data those jobs produced. With cheaper and more
|
|
common hardware available to everybody, results similar to real
|
|
Supercomputers were only to be dreamed of during the first years, but
|
|
as the PC platform developed further, the performance gap between a
|
|
Supercomputer and a cluster of multiple personal computers became
|
|
smaller.
|
|
|
|
</PARA>
|
|
</SECT3>
|
|
<SECT3>
|
|
<TITLE>Cluster models [(N)UMA, PVM/MPI]</TITLE>
|
|
<PARA>
|
|
There are different ways of doing parallel processing: (N)UMA, DSM,
|
|
PVM and MPI are all different kinds of Parallel Processing schemes. Some of them are implemented in hardware,
|
|
others in software, others in both.
|
|
</PARA>
|
|
<PARA>(N)UMA ((Non-)Uniform Memory Access), machines for example have
|
|
shared access to the memory where they can execute their code. In the
|
|
Linux kernel there is a NUMA implementation that varies the memory
|
|
access times for different regions of memory. It then is the kernel's
|
|
task to use the memory that is the closest to the CPU it is using.
|
|
</PARA>
|
|
<para><emphasis>DSM</emphasis> aka Distributed Shared memory, has been implemented in both software and hardware ,
|
|
the concept is to provide an abstraction layer for physically distributed memory.
|
|
</para>
|
|
|
|
<PARA>
|
|
PVM and MPI are the tools that are most commonly being used when people
|
|
talk about GNU/Linux based Beowulfs. </para><Para> MPI stands for Message Passing
|
|
Interface. It is the open standard specification for message passing
|
|
libraries. MPICH is one of the most used implementations of MPI. Next
|
|
to MPICH you also can find LAM, another implementation of MPI based on
|
|
the free reference implementation of the libraries.
|
|
|
|
</PARA>
|
|
<PARA>
|
|
PVM or Parallel Virtual Machine is another cousin of MPI that is also
|
|
quite often being used as a tool to create a Beowulf. PVM lives in
|
|
user space so no special kernel modifications are required: basically
|
|
each user with enough rights can run PVM.
|
|
|
|
|
|
</PARA>
|
|
</SECT3>
|
|
<SECT3>
|
|
<TITLE>openMosix's role</TITLE>
|
|
<PARA>
|
|
|
|
The openMosix software package turns networked computers running
|
|
GNU/Linux into a cluster. It automatically balances the load between
|
|
different nodes of the cluster, and nodes can join or leave the
|
|
running cluster without disruption of the service. The load is
|
|
spread out among nodes according to their connection and CPU speeds.
|
|
|
|
</PARA>
|
|
<PARA>
|
|
|
|
Since openMosix is part of the kernel and maintains full
|
|
compatibility with Linux, a user's programs, files, and other
|
|
resources will all work as before without any further changes. The
|
|
casual user will not notice the difference between a Linux and an
|
|
openMosix system. To her, the whole cluster will function as one
|
|
(fast) GNU/Linux system.
|
|
|
|
</PARA>
|
|
<PARA>
|
|
openMosix is a Linux-kernel patch which provides full compatibility
|
|
with standard Linux for IA32-compatible platforms. The internal
|
|
load-balancing algorithm transparently migrates processes to other
|
|
cluster members. The advantage is a better load-sharing between the
|
|
nodes. The cluster itself tries to optimize utilization at any time
|
|
(of course the sysadmin can affect the automatic load-balancing by
|
|
manual configuration during runtime).
|
|
</PARA>
|
|
|
|
<PARA>
|
|
This transparent process-migration feature makes the whole cluster
|
|
look like a BIG SMP-system with as many processors as available
|
|
cluster-nodes (of course multiplied with X for X-processor systems
|
|
such as dual/quad systems and so on). openMosix also provides a
|
|
powerful optimized File System (oMFS) for HPC-applications, which
|
|
unlike NFS provides cache, time stamp and link consistency.
|
|
</PARA>
|
|
|
|
</SECT3>
|
|
</SECT2>
|
|
|
|
|
|
</SECT1>
|
|
<SECT1><TITLE>The story so far</TITLE>
|
|
<SECT2><TITLE>Historical Development</TITLE>
|
|
<PARA>
|
|
|
|
Rumours say that Mosix comes from Moshe Unix.
|
|
Initially Mosix started out as an application running on BSD/OS 3.0.
|
|
|
|
|
|
<PROGRAMLISTING>
|
|
Announcing MO6 for BSD/OS 3.0
|
|
Oren Laadan (orenl@cs.huji.ac.il)
|
|
Tue, 9 Sep 1997 19:50:12 +0300 (IDT)
|
|
|
|
Hi:
|
|
|
|
We are pleased to announce the availability of MO6 Version 3.0
|
|
Release 1.04 (beta-4) - compatible with BSD/OS 3.0, patch level
|
|
K300-001 through M300-029.
|
|
|
|
MO6 is a 6 processor version of the MOSIX multicomputer enhancements
|
|
of BSD/OS for a PC Cluster. If you have 2 to 6 PC's connected by a
|
|
LAN, you can experience truly multi-computing environment by using
|
|
the MO6 enhancements.
|
|
|
|
The MO6 Distribution
|
|
--------------------
|
|
MO6 is available either in "source" or "binary" distribution. It is
|
|
installed as a patch to BSD/OS, using an interactive installation
|
|
script.
|
|
|
|
MO6 is available at http://www.cnds.jhu.edu/mirrors/mosix/
|
|
or at our site: http://www.cs.huji.ac.il/mosix/
|
|
|
|
Main highlights of the current release:
|
|
--------------------------------------
|
|
- Memory ushering (depletion prevention) by process migration.
|
|
- Improved installation procedure.
|
|
- Enhanced migration control.
|
|
- Improved administration tools.
|
|
- More user utilities.
|
|
- More documentation and new man pages.
|
|
- Dynamic configurations.
|
|
|
|
Please send feedback and comments to mosix@cs.huji.ac.il.
|
|
-------------------
|
|
</programlisting>
|
|
|
|
GNU/Linux was chosen as a development platform for the 7th
|
|
incarnation in 1999.
|
|
|
|
Early 1999 Mosix M06 Beta was released for Linux 2.2.1
|
|
|
|
At the end of 2001 and early 2002 openMosix, the open version of Mosix was
|
|
born (more in the next paragraph).
|
|
</PARA>
|
|
</SECT2>
|
|
<SECT2><TITLE>openMosix</TITLE>
|
|
<!-- Contributed by Matt -->
|
|
|
|
|
|
<PARA>
|
|
openMosix is in addition to whatever you find at mosix.org and in full
|
|
appreciation and respect for Prof. Barak's leadership in the
|
|
outstanding Mosix project.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
Moshe Bar has been involved for a number of years with the Mosix
|
|
project (www.mosix.com) and was co-project manager of the Mosix
|
|
project and general manager of the commercial Mosix company.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
After a difference of opinions on the commercial future of Mosix, he
|
|
has started a new clustering company - Qlusters, Inc. - and
|
|
Prof. Barak has decided not to participate for the moment in this
|
|
venture (although he did seriously consider joining) and held long
|
|
running negotiations with investors.
|
|
|
|
It appears that Mosix is not any longer supported openly as a GPL
|
|
project. Because there is a significant user base out there (about
|
|
1000 installations world-wide), Moshe Bar has decided to continue the
|
|
development and support of the Mosix project under a new name:
|
|
openMosix and under the full GPL2 license. Whatever code in openMosix
|
|
comes from the old Mosix project is Copyright 2002 by Amnon Barak. All
|
|
the new code is Copyright 2002 by Moshe Bar.
|
|
</PARA>
|
|
|
|
|
|
<PARA>
|
|
There could (and will) be significant changes in the architecture of
|
|
the future openMosix versions. New concepts about auto-configuration,
|
|
node-discovery and new user-land tools are being discussed in the
|
|
openMosix mailing lists. Most of these new functionalities are already
|
|
implemented while some of them, such as DSM (Distributed Shared
|
|
Memory) are still being worked on at the moment I write this (march
|
|
2003).
|
|
</PARA>
|
|
|
|
<PARA>
|
|
To approach standardization and future compatibility the
|
|
proc-interface has changed from /proc/mosix to /proc/hpc and the
|
|
/etc/mosix.map was changed to /etc/hpc.map. More recently the standard
|
|
for the config file has been set to be located in /etc/openmosix.map
|
|
(this is in fact the first config file the /etc/init.d/openmosix
|
|
script will look for). Adapted command-line user-space tools for
|
|
openMosix are already available on the web-page of the project.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
The openmosix.map config file can be replaced with a
|
|
node-auto-discovery system which is called omdiscd (openMosix auto
|
|
DISCovery Daemon) about which we will discuss later.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
openMosix is supported by various competent people (see
|
|
openmosix.sourceforge.net) working together around the world. The
|
|
main goal of the project is to create a standardized
|
|
clustering-environment for all kinds of HPC-applications.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
openMosix has also a project web-page at <ulink
|
|
url="http://openMosix.sourceforge.net"><citetitle>http://openMosix.sourceforge.net</citetitle></ulink>
|
|
with a CVS tree and mailing-lists for developers as well as users.
|
|
</PARA>
|
|
|
|
</SECT2>
|
|
|
|
|
|
<SECT2><TITLE>Current state</TITLE>
|
|
<PARA>
|
|
Like most active Open Source programs, openMosix's rate of change
|
|
tends to outstrip the followers' ability to keep the documentation
|
|
up to date.
|
|
</para>
|
|
<para>
|
|
As I write this part in February 2003 openMosix 2.4.20 is available
|
|
and openMosix Userland Tools v0.2.4 are available, including the new
|
|
autodiscovery tools.
|
|
</para>
|
|
<para>For a more recent state of development please take a look at the <ulink
|
|
url="http://openmosix.sf.net/"><citetitle>openMosix website</citetitle></ulink>
|
|
</para>
|
|
|
|
</sect2>
|
|
<sect2><title>Which applications work</title>
|
|
<para>It is almost impossible to give a list off all the applications that
|
|
work with openMosix. The community however tries to keep track of the
|
|
applications that
|
|
<ulink url="http://howto.ipng.be/openMosixWiki/index.php/work%20smoothly">
|
|
<citetitle>
|
|
migrate
|
|
</citetitle></ulink>
|
|
|
|
and the ones who
|
|
<ulink url="http://howto.ipng.be/openMosixWiki/index.php/don%27t"><citetitle>don't</citetitle></ulink>.
|
|
</para>
|
|
</SECT2>
|
|
|
|
|
|
</SECT1>
|
|
|
|
<SECT1><TITLE>openMosix in action: An example</TITLE>
|
|
<PARA>
|
|
|
|
openMosix clusters can take various forms. To demonstrate this, let's
|
|
assume you are a student and share a dorm room with a rich computer
|
|
science guy, with whom you have linked computers to form an openMosix
|
|
cluster. Let's also assume you are currently converting music files
|
|
from your CDs to Ogg Vorbis for your private use, which is legal in
|
|
your country. Your roommate is working on a project in C++ that he
|
|
says will bring World Peace. However, at just this moment he is in
|
|
the bathroom doing unspeakable things, and his computer is idle.
|
|
|
|
</PARA>
|
|
|
|
<PARA>
|
|
|
|
So when you start a program like bladeenc to convert Bach's
|
|
.... from .wav to .ogg format, the openMosix routines on your
|
|
machine compare the load on both nodes and decide that things will
|
|
go faster if that process is sent from your Pentium-233 to his
|
|
Athlon XP. This happens automatically: you just type or click your
|
|
commands as you would if you were on a standalone machine. All you
|
|
notice is that when you start two more coding runs, things go a lot
|
|
faster, and the response time doesn't go down.
|
|
|
|
</PARA>
|
|
|
|
<PARA>
|
|
|
|
Now while you're still typing ...., your roommate comes back,
|
|
mumbling something about red chili peppers in cafeteria food. He
|
|
resumes his tests, using a program called 'pmake', a version of
|
|
'make' optimized for parallel execution. Whatever he's doing, it
|
|
uses up so much CPU time that openMosix even starts to send
|
|
subprocesses to your machine to balance the load.
|
|
|
|
</PARA><PARA>
|
|
|
|
This setup is called *single-pool*: all computers are used as a
|
|
single cluster. The advantage/disadvantage of this is that your
|
|
computer is part of the pool: your stuff will run on other
|
|
computers, but their stuff will run on yours too.
|
|
|
|
</PARA>
|
|
</SECT1>
|
|
<SECT1><TITLE>Components</TITLE>
|
|
<SECT2><TITLE>Process migration</TITLE>
|
|
<PARA>
|
|
|
|
With openMosix you can start a process on one machine and find out it
|
|
actually runs on another machine in the cluster. Each process has its
|
|
own Unique Home Node (UHN) where it gets created.
|
|
</PARA>
|
|
|
|
<PARA>
|
|
|
|
Migration means that a process is splitted in 2 parts, a user part and
|
|
a system part. The user part will be moved to a remote node while the
|
|
system part will stay on the UHN. This system-part is sometimes
|
|
called the deputy process: this process takes care of resolving most
|
|
of the system calls.
|
|
</para><para>
|
|
openMosix takes care of the communication between these 2 processes.
|
|
</PARA>
|
|
</SECT2>
|
|
|
|
<SECT2><TITLE>The openMosix File System (oMFS)</TITLE>
|
|
<PARA>
|
|
<emphasis>Please note that oMFS has been removed from the openMosix patch as of 2.4.26-om1</emphasis>
|
|
</para>
|
|
<para>
|
|
|
|
oMFS is a feature of openMosix which allows you to access remote
|
|
filesystems in a cluster as if they were locally mounted. The
|
|
filesystems of your other nodes can be mounted on /mfs and you will,
|
|
for instance, find the files in /home on node 3 on each machine in
|
|
/mfs/3/home.
|
|
|
|
</PARA>
|
|
</SECT2>
|
|
<SECT2><TITLE>Direct File System Access (DFSA) </TITLE>
|
|
<PARA>
|
|
Both Mosix and openMosix provide a cluster-wide file-system (MFS) with
|
|
the DFSA-option (Direct File-System Access). It provides access to all
|
|
local and remote file-systems of the nodes in a Mosix or openMosix
|
|
cluster.
|
|
</PARA>
|
|
</SECT2>
|
|
|
|
</SECT1>
|
|
<!--
|
|
<SECT1><TITLE>Work in Progress</TITLE>
|
|
|
|
<SECT2><TITLE>Network RAM</TITLE>
|
|
<PARA>
|
|
|
|
</PARA>
|
|
</SECT2>
|
|
|
|
<SECT2><TITLE>Migratable sockets</TITLE>
|
|
<PARA>
|
|
|
|
</PARA>
|
|
</SECT2>
|
|
<SECT2><TITLE>High availability</TITLE>
|
|
<PARA>
|
|
|
|
</PARA>
|
|
</SECT2>
|
|
|
|
<sect2><title>Other Features</title>
|
|
<para>Other features such as ClusterMask and Autodetection have been implemented and will be discussed somewhere
|
|
else in this document.
|
|
</para>
|
|
</sect2>
|
|
</SECT1>
|
|
-->
|
|
<SECT1><title>openMosix Test Drive
|
|
</title>
|
|
<para>
|
|
In support of openMosix, Major Chai Mee Joon is giving OM users a free
|
|
trial account to his online openMosix cluster service, which users can
|
|
use to test and experiment openMosix with.
|
|
</para>
|
|
|
|
<para>
|
|
The availability of this online openMosix cluster service will help
|
|
both new users overcome the initial openMosix configuration issues,
|
|
and also provides higher computing power to openMosix users who are
|
|
developing or porting their applications.
|
|
</para>
|
|
|
|
<para>Please send an email to <email>om@majorlinux.com</email> for a trial account.
|
|
|
|
</para>
|
|
</sect1>
|
|
|