The MOSIX FAQ - Full listing
|
Table of Contents
Question:
What is MOSIX?
Answer:
Please read the
What is MOSIX web page.
Question:
Why this name?
Answer:
MOSIX stands for Multi-computer
Operating System
for UnIX.
Question:
What is the MOSIX distribution license ?
Answer:
MOSIX for Linux is subject to the GNU General Public License version 2,
as published by the Free Software Foundation.
Question:
Where can I download MOSIX ?
Answer:
From the MOSIX
download web page.
Question:
History of MOSIX
Answer:
Version 0
Year (started/completed): 1977/79
Name: UNIX with Satellite Processors
Machines: PDP-11/45 + PDP-11/10
Operating system: Bell Lab's Unix 6
References: [1,2] *
Version 1
Year (started/completed): 1981/83
Name: MOS
Machines: PDP-11/45 + PDP-11/23
Operating system: Bell Lab's Unix 7
References: [3,4] *
Version 2
Year (started/completed): 1983/84
Name: MOS
Machines: CADMUS/PCS MC68K
Operating system: Bell Lab's Unix 7 with some BSD 4.1 extensions
References: [5,6] *
Version 3
Year(started/completed): 1987/88
Name: NSMOS
Machines: NS32332
Operating system: AT&T Unix system V release 2
References: [7] *
Version 4
Year(started/completed): 1988
Name: MOSIX
Machines: VAX-780 + VAX-750
Operating system: AT&T Unix System V release 2
Version 5
Year(started/completed): 1988/89
Name: MOSIX
Machines: NS32532
Operating system: AT&T Unix System V release 2
References: [8,9] *
Version 6
Year(started/completed): 1992/93
Name: MOSIX
Machines: 486/Pentium
Operating system: BSD/OS
References: [10,11] *
Version 7
Year(started/completed): 1998/99
Name: MOSIX
Machines: X86/Pentium
Operating system: LINUX
References: [12] *
* References in the next question.
Question:
MOSIX reference papers
Answer:
Partial list of references:
- Barak A. and Shapir A.,
"UNIX with satellite Processors,"
Software - Practice & Experience, Vol. 10, No. 5, pp. 383-392, May 1980.
- Barak A., Shapir A., Steinberg G. and Karshmer A.I.,
"A Modular, Distributed UNIX,
Proc. 14-th Hawaii Int. Conf. on System Science, pp. 740-747, January 1981.
- Barak A. and Litman A.,
"MOS - A Multicomputer Distributed Operating System,"
Software - Practice & Experience, Vol. 15, No. 8, pp. 725-737, Aug. 1985.
- Barak A. and Shiloh A.,
"A Distributed Load-balancing Policy for a Multicomputer,"
Software - Practice & Experience, Vol. 15, No. 9, pp. 901-913, Sept. 1985.
- A. and Paradise G. O.,
"MOS - Scaling Up UNIX,"
Proc. Summer 1986 USENIX Conf., pp. 414-418, Atlanta, GA, June 1986.
- Barak A. and Paradise G. O.,
"MOS - a Load Balancing UNIX,"
Proc. Autumn 86 EUUG Conf., pp. 273-280, Manchester, Sept. 1986.
- Barel A.,
"NSMOS - MOS Port to the National's 32000 Family Architecture."
Proc. 2nd Israel Conf. Computer Systems and Soft. Eng., Tel-Aviv, May 1987.
- Barak A. and Wheeler R.,
"MOSIX: An Integrated Multiprocessor UNIX,"
Proc. Winter 1989 USENIX Conf., pp. 101-112, San Diego, CA, Feb. 1989.
- Barak A., Guday S. and Wheeler R.,
"The MOSIX Distributed Operating System, Load Balancing for UNIX."
Lecture Notes in Computer Science, Vol. 672, Springer-Verlag, May 1993.
- Barak A., Laden O. and Yarom Y.,
"The NOW MOSIX and its Preemptive Process Migration Scheme",
IEEE TCOS, Vol. 7, No. 2, pp. 5-11, Summer 1995.
- Barak A. and La'adan O.,
"The MOSIX Multicomputer Operating System for High Performance Cluster
Computing,"
Journal of Future Generation Computer Systems, Vol. 13, No. 4-5,
pp. 361-372, March 1998.
- Barak A., La'adan O. and Shiloh A.,
"Scalable Cluster Computing with MOSIX for LINUX,"
Proc. 5-th Annual Linux Expo, pp. 95-100, Raleigh, May 1999.
Question:
The MOSIX monitor
Answer:
The MOSIX distribution includes a built in monitor, called "mon".
It can display the number of active nodes in the cluster (t),
relative loads (l), amount of (used/free) memory (m), utilization (u)
and relative CPU speeds (s).
Type "mon" to start it and "h" for help.
Note: the web monitor is not part of MOSIX.
We do not maintain or distribute it.
Question:
How can I help ?
Answer:
We welcome contributed (GPL) software as well as volunteers to:
- Reply to requests for "help" in our mailing list.
- Testing the MOSIX kernel - just run your "usual" programs
(no need to develop new tests).
If you suspect/detect a problem, check first with a non-MOSIX kernel.
Otherwise, turn on the debugger and send us the details.
- Improve the existing documentation.
- Join the "MOSIX how to" task force.
- Develop documentation/slide presentations in your native language.
- Improve the installation procedure, e.g. for "diskless" nodes.
- Develop installation procedures for Linux distributions that are
not support (yet).
- Develop RPMs.
- Help the cluster installation task force (joint effort of IBM's
LUI project and VA).
- Develop "user-level" aids e.g. mosixview
(http://www.waplocater.de/mosixview/)
or even "Checkpoints" (for HA), etc.
MOSIX and Linux distributions
|
Question:
Why can't I install MOSIX 0.97.8 (or earlier) on RedHat 7 using the automatic installation?
Answer:
The default gcc
compiler on RedHat 7.0 is gcc 2.96, which cannot be used to compile the kernel. Instead on RedHat 7.0 kgcc
should be used, so you have to manually patch the kernel makefiles to use kgcc instead of gcc.
Hopefully the Linux 2.2.18 kernel will automatically detect if kgcc
has to be used and use it in this situation.
Question:
A MOSIX kernel on RedHat 7.0 dies with "FATAL: Can't determinate library version".
Answer:
Download the latest GLIBC update from one of RedHat's mirrors.
Question:
Why the installer can't patch my kernel?
Answer:
Make sure you are using
vanilla (pure) kernel sources.
Avoid using the kernel sources supplied with the distribution's,
usually they are heavily patched. This can cause the MOSIX patch to fail.
Question:
May I mix different versions of MOSIX in the same cluster?
Answer:
It is best to have all the nodes in the cluster run the exact same MOSIX
version. The minimum requirement is that the first two digits of the
MOSIX version are the same in all the nodes.
Question:
May I mix different RPM packages in the same cluster?
Answer:
No.
Question:
May I mix kernels with and without the DFSA option configured ?
Answer:
No. It is required that in the same cluster kernels with the DFSA
option configured are not mixed with kernels without the DFSA option.
Question:
How can I see how many nodes are in my cluster?
Answer:
Login to one of the nodes and run "mon". You should see all
the nodes in your cluster.
Question:
How do I know that the process migration works?
Answer:
Assuming that your nodes are of the same speed,
login to one of the nodes and run "mon".
Then run several copies of a test (CPU bound) program,
e.g.,
awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'
First you should see an increase of the load in one node.
After a few seconds, if the process migration works you will
see how the load is spread among the nodes.
Question:
Test the process migration on different speed machines
Answer:
Login to one of the faster nodes and run "mon".
Login to the slowest node and run one copy of a test (CPU bound) program,
e.g.,
awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'
After a few seconds, if the process migration works you will
see how the process migrate to a faster node.
Linux Utility for cluster Installation (LUI)
|
Question:
What is LUI?
Answer:
Linux Utility for cluster Installation (LUI) is a package developed by IBM
for installing Linux machines. Read about the
LUI project
documentation and installation manuals (which are included in the
distribution package).
Question:
Which LUI release?
Answer:
1.10.1
Question:
Installation procedure hints
Answer:
First install LUI (NFS, BOOTPD, make Ethernet boot-disk, etc) in a server
node.
Next, you need to inform LUI how to install your (client) nodes.
There are two methods: (a) by using the LUI commands (mklimm, mklimg,
allimr,
etc), or (b) by using the graphic mode, called GLUI (at {LUI-dir}/bin).
For beginners, it is recommended to use the LUI command option,
which is more easy to understand, comes with a good man pages and sample
scripts (at {LUI-dir}/sample). After you are familiar with the LUI commands
it will be more easy to understand GLUI.
Note that although GLUI looks nicer and more friendly, it is more tricky
to use and there is no documentation how to use it.
To boot a client machine you can either use BOOTPD or DHCP.
If you use BOOTP then LUI handles automatically the /etc/bootptab file.
If you use DHCP, LUI will not edit the /etc/dhcpd.conf file and you need
to edit this file manually. We recommend the use of the BOOTPD daemon.
Question:
Defining the disk-table and its resource
Answer:
Read carefully the scripts in {LUI-dir}/sample/disktable.sample*.
After defining the disk-tabel file (in the server - see sample at
{LUI-dir}/sample) remember to define the resources for each client's
partition file-system.
Note that if some client's partitions are using NFS then you need to
define this option in the disk-tabel file for this client, and also
define recourse for it although it is not a local file system.
Question:
How Lui works?
Answer:
The following directories are built in the server (in /tftpboot):
./lim, ./tar, ./rpm. LUI also build one directory for each client,
using the IP address of the client as its name. e.g.,./xxx.yyy.zzz.kkk.
- The /tar directory has the tar file to be installed in the client node;
- /rpm - if you use RPMs, copy the relevant RPM file(s) you want to
install in the client to this directory;
- /lim - here LUI saves all its database. LUI built one data file for each
resource and two data files for each client (you can view these
file structure to understand more about the LUI configuration);
- /lim/log - here LUI save log files for each client you install;
- /lim and /lim/log directories are used for debugging;
- /xxx.yyy.zzz.kkk - here LUI saves the root file system for each client.
The client first boot mount the root file system from this directory.
The following procedure is executed by the `clone' script in each client node:
- Boot with BOOTP, taking the IP address from the server
- Mount from server:/tftpbbotp/xxx.yyy.zzz.kkk to /
- Reed the disk tabel file and pre-partition the client's disk
- Install the tar files
- Mount the client's new file system at /mnt
- Install the RPM (optional)
- Copy the kernel and the system.map
- Edit /etc/lilo.conf
- Run postinstall and user exit script (optional)
Note that this (clone) script is written in perl and is located in
`{LUI-dir}/bin'.
Clone write a log file to /lim/log/{NODE NAME}.log.
If your installation fail you can either run `conle' with -d (for debug) or edit the script and add your STDOUT & STDERR comment.
Question:
What about LUI bugs?
Answer:
Can be viewed at the
LUI Bug updates .
Below is a collection of bugs (from the LUI web) that are relevant
to the installation procedure:
- Display isn't output in real time.
- STDOUT & STDERR should be interleaved.
- Add dhcpd.conf manipulation.
- RedHat 7 not supported (but works).
Question:
What does the number 3 in /proc/MOSIX/nodes/<n>/status
mean?
Answer:
Take a closer look at the INFORMATION section of the MOSIX (5) man page.
The status "3" means the node is configured for MOSIX and up:
0x0003 = 0x0002 | 0x0001 = DS_MOSIX_DEF | DS_MOSIX_UP
Question:
What is node number `0'? What does `0' mean in /proc/<pid>/where?
Answer:
Node number `0' is always the current node, so a `0' in /proc/<pid>/where means that the process is running on the home-node.
Question:
Can ReiserFS be used with MOSIX ?
Answer:
Yes for MOSIX 1.0.
For MOSIX 0.9 Bjoern Rabenstein wrote:
Thanks to the help from the list, I managed to unite MOSIX and ReiserFS.
Indeed, you need simply to modify one single line after patching.
It's better to do first the MOSIX patch, since an error during patching
will stop the automatic install procedure. After that, apply the ReiserFS
patch and modify /usr/src/linux/include/linux/fs.h by inserting the line
after #include
Question:
Can GFS be used with MOSIX ?
Answer:
GFS supports DFSA for MOSIX 0.9.
Currently there is no support for MOSIX
1.0.
Question:
I installed MOSIX but one node can't see the other (the status of one machine on another is "1").
Answer:
This problem usually means that the cluster contains two different kernels.
Make sure the same kernel is in all of them, and make sure you copied all the
modules.
Question:
I installed MOSIX, but automatic migration doesn't work (manual works)
Answer:
First, remember that some process (threads) can`t be migrated.
So when testing use only simple loops.
Automatic migration failure usually means that a machine was not installed
properly. Make sure you run MOSIX.install on all nodes and that you use the
exact same kernel (with the same modules) in all the nodes.
Question:
Why JAVA processes won't migrate
Answer:
Most JAVA VMs use shared memory, and thus can not migrate. Try using a
"green threads" VM.