The MOSIX FAQ - Full listing

Table of Contents



General


Question:

What is MOSIX?

Answer:

Please read the What is MOSIX web page.

Question:

Why this name?

Answer:

MOSIX stands for Multi-computer Operating System for UnIX.

Question:

What is the MOSIX distribution license ?

Answer:

MOSIX for Linux is subject to the GNU General Public License version 2, as published by the Free Software Foundation.

Question:

Where can I download MOSIX ?

Answer:

From the MOSIX download web page.

Question:

History of MOSIX

Answer:

Version 0
Year (started/completed): 1977/79
Name: UNIX with Satellite Processors
Machines: PDP-11/45 + PDP-11/10
Operating system: Bell Lab's Unix 6
References: [1,2] *

Version 1
Year (started/completed): 1981/83
Name: MOS
Machines: PDP-11/45 + PDP-11/23
Operating system: Bell Lab's Unix 7
References: [3,4] *

Version 2
Year (started/completed): 1983/84
Name: MOS
Machines: CADMUS/PCS MC68K
Operating system: Bell Lab's Unix 7 with some BSD 4.1 extensions
References: [5,6] *

Version 3
Year(started/completed): 1987/88
Name: NSMOS
Machines: NS32332
Operating system: AT&T Unix system V release 2
References: [7] *

Version 4
Year(started/completed): 1988
Name: MOSIX
Machines: VAX-780 + VAX-750
Operating system: AT&T Unix System V release 2

Version 5
Year(started/completed): 1988/89
Name: MOSIX
Machines: NS32532
Operating system: AT&T Unix System V release 2
References: [8,9] *

Version 6
Year(started/completed): 1992/93
Name: MOSIX
Machines: 486/Pentium
Operating system: BSD/OS
References: [10,11] *

Version 7
Year(started/completed): 1998/99
Name: MOSIX
Machines: X86/Pentium
Operating system: LINUX
References: [12] *

* References in the next question.


Question:

MOSIX reference papers

Answer:

Partial list of references:

  1. Barak A. and Shapir A., "UNIX with satellite Processors," Software - Practice & Experience, Vol. 10, No. 5, pp. 383-392, May 1980.

  2. Barak A., Shapir A., Steinberg G. and Karshmer A.I., "A Modular, Distributed UNIX, Proc. 14-th Hawaii Int. Conf. on System Science, pp. 740-747, January 1981.

  3. Barak A. and Litman A., "MOS - A Multicomputer Distributed Operating System," Software - Practice & Experience, Vol. 15, No. 8, pp. 725-737, Aug. 1985.

  4. Barak A. and Shiloh A., "A Distributed Load-balancing Policy for a Multicomputer," Software - Practice & Experience, Vol. 15, No. 9, pp. 901-913, Sept. 1985.

  5. A. and Paradise G. O., "MOS - Scaling Up UNIX," Proc. Summer 1986 USENIX Conf., pp. 414-418, Atlanta, GA, June 1986.

  6. Barak A. and Paradise G. O., "MOS - a Load Balancing UNIX," Proc. Autumn 86 EUUG Conf., pp. 273-280, Manchester, Sept. 1986.

  7. Barel A., "NSMOS - MOS Port to the National's 32000 Family Architecture." Proc. 2nd Israel Conf. Computer Systems and Soft. Eng., Tel-Aviv, May 1987.

  8. Barak A. and Wheeler R., "MOSIX: An Integrated Multiprocessor UNIX," Proc. Winter 1989 USENIX Conf., pp. 101-112, San Diego, CA, Feb. 1989.

  9. Barak A., Guday S. and Wheeler R., "The MOSIX Distributed Operating System, Load Balancing for UNIX." Lecture Notes in Computer Science, Vol. 672, Springer-Verlag, May 1993.

  10. Barak A., Laden O. and Yarom Y., "The NOW MOSIX and its Preemptive Process Migration Scheme", IEEE TCOS, Vol. 7, No. 2, pp. 5-11, Summer 1995.

  11. Barak A. and La'adan O., "The MOSIX Multicomputer Operating System for High Performance Cluster Computing," Journal of Future Generation Computer Systems, Vol. 13, No. 4-5, pp. 361-372, March 1998.

  12. Barak A., La'adan O. and Shiloh A., "Scalable Cluster Computing with MOSIX for LINUX," Proc. 5-th Annual Linux Expo, pp. 95-100, Raleigh, May 1999.


Question:

The MOSIX monitor

Answer:

The MOSIX distribution includes a built in monitor, called "mon". It can display the number of active nodes in the cluster (t), relative loads (l), amount of (used/free) memory (m), utilization (u) and relative CPU speeds (s).

Type "mon" to start it and "h" for help.

Note: the web monitor is not part of MOSIX. We do not maintain or distribute it.


Question:

How can I help ?

Answer:

We welcome contributed (GPL) software as well as volunteers to:

  1. Reply to requests for "help" in our mailing list.

  2. Testing the MOSIX kernel - just run your "usual" programs (no need to develop new tests). If you suspect/detect a problem, check first with a non-MOSIX kernel. Otherwise, turn on the debugger and send us the details.

  3. Improve the existing documentation.

  4. Join the "MOSIX how to" task force.

  5. Develop documentation/slide presentations in your native language.

  6. Improve the installation procedure, e.g. for "diskless" nodes.

  7. Develop installation procedures for Linux distributions that are not support (yet).

  8. Develop RPMs.

  9. Help the cluster installation task force (joint effort of IBM's LUI project and VA).

  10. Develop "user-level" aids e.g. mosixview (http://www.waplocater.de/mosixview/) or even "Checkpoints" (for HA), etc.



MOSIX and Linux distributions


Question:

Why can't I install MOSIX 0.97.8 (or earlier) on RedHat 7 using the automatic installation?

Answer:

The default gcc compiler on RedHat 7.0 is gcc 2.96, which cannot be used to compile the kernel. Instead on RedHat 7.0 kgcc should be used, so you have to manually patch the kernel makefiles to use kgcc instead of gcc.
Hopefully the Linux 2.2.18 kernel will automatically detect if kgcc has to be used and use it in this situation.

Question:

A MOSIX kernel on RedHat 7.0 dies with "FATAL: Can't determinate library version".

Answer:

Download the latest GLIBC update from one of RedHat's mirrors.


MOSIX installation


Question:

Why the installer can't patch my kernel?

Answer:

Make sure you are using vanilla (pure) kernel sources. Avoid using the kernel sources supplied with the distribution's, usually they are heavily patched. This can cause the MOSIX patch to fail.

Question:

May I mix different versions of MOSIX in the same cluster?

Answer:

It is best to have all the nodes in the cluster run the exact same MOSIX version. The minimum requirement is that the first two digits of the MOSIX version are the same in all the nodes.

Question:

May I mix different RPM packages in the same cluster?

Answer:

No.

Question:

May I mix kernels with and without the DFSA option configured ?

Answer:

No. It is required that in the same cluster kernels with the DFSA option configured are not mixed with kernels without the DFSA option.


Testing my cluster


Question:

How can I see how many nodes are in my cluster?

Answer:

Login to one of the nodes and run "mon". You should see all the nodes in your cluster.

Question:

How do I know that the process migration works?

Answer:

Assuming that your nodes are of the same speed, login to one of the nodes and run "mon". Then run several copies of a test (CPU bound) program, e.g.,

awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'

First you should see an increase of the load in one node. After a few seconds, if the process migration works you will see how the load is spread among the nodes.


Question:

Test the process migration on different speed machines

Answer:

Login to one of the faster nodes and run "mon". Login to the slowest node and run one copy of a test (CPU bound) program, e.g.,

awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'

After a few seconds, if the process migration works you will see how the process migrate to a faster node.



Linux Utility for cluster Installation (LUI)


Question:

What is LUI?

Answer:

Linux Utility for cluster Installation (LUI) is a package developed by IBM for installing Linux machines. Read about the LUI project documentation and installation manuals (which are included in the distribution package).

Question:

Which LUI release?

Answer:

1.10.1

Question:

Installation procedure hints

Answer:

First install LUI (NFS, BOOTPD, make Ethernet boot-disk, etc) in a server node.

Next, you need to inform LUI how to install your (client) nodes. There are two methods: (a) by using the LUI commands (mklimm, mklimg, allimr, etc), or (b) by using the graphic mode, called GLUI (at {LUI-dir}/bin).

For beginners, it is recommended to use the LUI command option, which is more easy to understand, comes with a good man pages and sample scripts (at {LUI-dir}/sample). After you are familiar with the LUI commands it will be more easy to understand GLUI. Note that although GLUI looks nicer and more friendly, it is more tricky to use and there is no documentation how to use it.

To boot a client machine you can either use BOOTPD or DHCP. If you use BOOTP then LUI handles automatically the /etc/bootptab file. If you use DHCP, LUI will not edit the /etc/dhcpd.conf file and you need to edit this file manually. We recommend the use of the BOOTPD daemon.


Question:

Defining the disk-table and its resource

Answer:

Read carefully the scripts in {LUI-dir}/sample/disktable.sample*.

After defining the disk-tabel file (in the server - see sample at {LUI-dir}/sample) remember to define the resources for each client's partition file-system.

Note that if some client's partitions are using NFS then you need to define this option in the disk-tabel file for this client, and also define recourse for it although it is not a local file system.


Question:

How Lui works?

Answer:

The following directories are built in the server (in /tftpboot): ./lim, ./tar, ./rpm. LUI also build one directory for each client, using the IP address of the client as its name. e.g.,./xxx.yyy.zzz.kkk.

The following procedure is executed by the `clone' script in each client node:

Note that this (clone) script is written in perl and is located in `{LUI-dir}/bin'. Clone write a log file to /lim/log/{NODE NAME}.log.

If your installation fail you can either run `conle' with -d (for debug) or edit the script and add your STDOUT & STDERR comment.


Question:

What about LUI bugs?

Answer:

Can be viewed at the LUI Bug updates .

Below is a collection of bugs (from the LUI web) that are relevant to the installation procedure:




The /proc interface


Question:

What does the number 3 in /proc/MOSIX/nodes/<n>/status mean?

Answer:

Take a closer look at the INFORMATION section of the MOSIX (5) man page.
The status "3" means the node is configured for MOSIX and up: 0x0003 = 0x0002 | 0x0001 = DS_MOSIX_DEF | DS_MOSIX_UP

Question:

What is node number `0'? What does `0' mean in /proc/<pid>/where?

Answer:

Node number `0' is always the current node, so a `0' in /proc/<pid>/where means that the process is running on the home-node.


File systems


Question:

Can ReiserFS be used with MOSIX ?

Answer:

Yes for MOSIX 1.0.

For MOSIX 0.9 Bjoern Rabenstein wrote: Thanks to the help from the list, I managed to unite MOSIX and ReiserFS. Indeed, you need simply to modify one single line after patching. It's better to do first the MOSIX patch, since an error during patching will stop the automatic install procedure. After that, apply the ReiserFS patch and modify /usr/src/linux/include/linux/fs.h by inserting the line after #include


Question:

Can GFS be used with MOSIX ?

Answer:

GFS supports DFSA for MOSIX 0.9.

Currently there is no support for MOSIX 1.0.



Problems


Question:

I installed MOSIX but one node can't see the other (the status of one machine on another is "1").

Answer:

This problem usually means that the cluster contains two different kernels. Make sure the same kernel is in all of them, and make sure you copied all the modules.

Question:

I installed MOSIX, but automatic migration doesn't work (manual works)

Answer:

First, remember that some process (threads) can`t be migrated. So when testing use only simple loops. Automatic migration failure usually means that a machine was not installed properly. Make sure you run MOSIX.install on all nodes and that you use the exact same kernel (with the same modules) in all the nodes.

Question:

Why JAVA processes won't migrate

Answer:

Most JAVA VMs use shared memory, and thus can not migrate. Try using a "green threads" VM.