Most massively parallel processors (MPPs) provide a way to start a program on a requested number of processors; mpirun makes use of the appropriate command whenever possible. In contrast, workstation clusters require that each process in a parallel job be started individually, though programs to help start these processes exist (see Using the Secure Server below). Because workstation clusters are not already organized as an MPP, additional information is required to make use of them. mpich should be installed with a list of participating workstations in the file machines.<arch> in the directory /usr/local/mpich/share. This file is used by mpirun to choose processors to run on. (Using heterogeneous clusters is discussed below.) The rest of this section discusses some of the details of this process, and how you can check for problems. These instructions apply to only the ch_p4 device.
Use the script tstmachines in /usr/local/mpich/sbin to ensure that you can use all of the machines that you have listed. This script performs an rsh and a short directory listing; this tests that you both have access to the node and that a program in the current directory is visible on the remote node. If there are any problems, they will be listed. These problems must be fixed before proceeding.
The only argument to tstmachines is the name of the architecture;
this is the same name as the extension on the machines file. For example,
/usr/local/mpich/bin/tstmachines sun4tests that a program in the current directory can be executed by all of the machines in the sun4 machines list. This program is silent if all is well; if you want to see what it is doing, use the -v (for verbose) argument:
/usr/local/mpich/bin/tstmachines -v sun4The output from this command might look like
Trying true on host1.uoffoo.edu ... Trying true on host2.uoffoo.edu ... Trying ls on host1.uoffoo.edu ... Trying ls on host2.uoffoo.edu ... Trying user program on host1.uoffoo.edu ... Trying user program on host2.uoffoo.edu ...If tstmachines finds a problem, it will suggest possible reasons and solutions.
The Installation Guide explains how to set up your environment so that the ch_p4 device on networks will use the secure shell ssh instead of rsh. This is useful on networks where for security reasons the use of rsh is discouraged or disallowed.
Because each workstation in a cluster (usually) requires that a new user log into it, and because this process can be very time-consuming, mpich provides a program that may be used to speed this process. This is the secure server, and is located in serv_p4 in the directory /usr/local/mpich/bin *. The script chp4_servs in the same directory may be used to start serv_p4 on those workstations that you can rsh programs on. You can also start the server by hand and allow it to run in the background; this is appropriate on machines that do not accept rsh connections but on which you have accounts.
Before you start this server, check to see if the secure server has
been installed for general use; if so, the same server can be used by
everyone. In this mode, root access is required to install the server. If
the server has not been installed, then you can install it for your own use
without needing any special privileges with
chp4_servs -port=1234This starts the secure server on all of the machines listed in the file /usr/local/mpich/share/machines.<arch>.
The port number, provided with the option -port=, must be different from any other port in use on the workstations.
To make use of the secure server for the ch_p4 device, add the
following definitions to your environment:
setenv MPI_USEP4SSPORT yes setenv MPI_P4SSPORT 1234The value of MPI_P4SSPORT must be the port with which you started the secure server. When these environment variables are set, mpirun attempts to use the secure server to start programs that use the ch_p4 device. (The command line argument -p4ssport to mpirun may be used instead of these environment variables; mpirun -help will give you more information.)
A heterogeneous network of workstations is one in which the machines
connected by the network have different architectures and/or operating
systems. For example, a network may contain 3 Sun SPARC (sun4)
workstations and 3 SGI IRIX workstations, all of which communicate via
the TCP/IP protocol. The mpirun command may be told to use all of these with
mpirun -arch sun4 -np 3 -arch IRIX -np 3 program.%aWhile the ch_p4 device supports communication between workstations in heterogeneous TCP/IP networks, it does not allow the coupling of multiple multicomputers. To support such a configuration, you should use the globus2 device. See the following section for details.
The special program name program.%a allows you to specify the different
executables for the program, since a Sun executable won't run on an SGI
workstation and vice versa. The %a is replaced with the architecture
name; in this example, program.sun4 runs on the Suns and
program.IRIX runs on the SGI IRIX workstations. You can also put the
programs into different directories; for example,
mpirun -arch sun4 -np 3 -arch IRIX -np 3 /tmp/%a/programFor even more control over how jobs get started, we need to look at how mpirun starts a parallel program on a workstation cluster. Each time mpirun runs, it constructs and uses a new file of machine names for just that run, using the machines file as input. (The new file is called PIyyyy, where yyyy is the process identifier.) If you specify -keep_pg on your mpirun invocation, you can use this information to see where mpirun ran your last few jobs. You can construct this file yourself and specify it as an argument to mpirun. To do this for ch_p4, use
mpirun -p4pg pgfile myprogwhere pfile is the name of the file. The file format is defined below.
This is necessary when you want closer control over the hosts you run on, or when mpirun cannot construct it automatically. Such is the case when
You want to run on a different set of machines than those listed in the machines file.
You want to run different executables on different hosts (your program is not SPMD).
You want to run on a heterogeneous network, which requires different executables.
You want to run all the processes on the same workstation, simulating parallelism by time-sharing one machine.
You want to run on a network of shared-memory multiprocessors and need to specify the number of processes that will share memory on each machine.
<hostname> <#procs> <progname> [<login>]An example of such a file, where the command is being issued from host sun1, might be
sun1 0 /users/jones/myprog sun2 1 /users/jones/myprog sun3 1 /users/jones/myprog hp1 1 /home/mbj/myprog mbjThe above file specifies four processes, one on each of three suns and one on another workstation where the user's account name is different. Note the 0 in the first line. It is there to indicate that no other processes are to be started on host sun1 than the one started by the user by his command.
You might want to run all the processes on your own machine, as a test.
You can do this by repeating its name in the file:
sun1 0 /users/jones/myprog sun1 1 /users/jones/myprog sun1 1 /users/jones/myprogThis will run three processes on sun1, communicating via sockets.
To run on a shared-memory multiprocessor, with 10 processes, you would use
a file like:
sgimp 9 /u/me/progNote that this is for 10 processes, one of them started by the user directly, and the other nine specified in this file. This requires that mpich was configured with the option -comm=shared; see the installation manual for more information.
If you are logged into host gyrfalcon and want to start a job with
one process on gyrfalcon and three processes on alaska, where
the alaska processes communicate through shared memory, you would use
local 0 /home/jbg/main alaska 3 /afs/u/graphicsIt is not possible to provide different command line argument to different MPI processes.
There are several enviroment variables that can be used to tune the performance of the ch_p4 device. Note that these environment variables must be defined for all processes that are created, not just the process that you are launching MPI programs from (i.e., setting these variables should be part of your .login or .cshrc startup files).
In some installations, certain
hosts can be connected in multiple ways. For example, the ``normal'' Ethernet
may be supplemented by a high-speed FDDI ring. Usually, alternate host names
are used to identify the high-speed connection. All you need to do is put
these alternate names in your machines.xxxx file.
In this case, it is important not to use the form local 0 but to use
the name of the local host. For example, if hosts host1 and
host2 have ATM connected to host1-atm and host2-atm
respectively, the correct ch_p4 procgroup file to connect them
(running the program /home/me/a.out) is
host1-atm 0 /home/me/a.out host2-atm 1 /home/me/a.out
Shared libraries can help reduce the size of an executable. This is particularly valuable on clusters of workstations, where the executable must normally be copied over a network to each machine that is to execute the parallel program. However, there are some practical problems in using shared libraries; this section discusses some of them and how to solve most of those problems. Currently, shared libraries are not supported from C++.
In order to build shared libraries for mpich, you must have configured and built mpich with the --enable-sharedlib option. Because each Unix system and in fact each compiler uses a different and often incompatible set of options for creating shared objects and libraries, mpich may not be able to determine the correct options. Currently, mpich understands Solaris, GNU gcc (on most platforms, including LINUX and Solaris), and IRIX. Information on building shared libraries on other platforms should be sent to mpi-bugs@mcs.anl.gov.
Once the shared libraries are built, you must tell the mpich compilation and
linking commands to use shared libraries (the reason that shared libraries are
not the default will become clear below). You can do this either with the
command line option -shlib or by setting the environment variable
MPICH_USE_SHLIB to yes. For example,
mpicc -o cpi -shlib cpi.cor
setenv MPICH_USE_SHLIB yes mpicc -o cpi cpi.cUsing the environment variable MPICH_USE_SHLIB allows you to control whether shared libraries are used without changing the compilation commands; this can be very useful for projects that use makefiles.
Running a program built with shared libraries can be tricky. Some (most?) systems do not remember where the shared library was found when the executable was linked! Instead, they depend on finding the shared library in either a default location (such as /lib) or in a directory specified by an environment variable such as LD_LIBRARY_PATH or by a command line argument such as -R or -rpath (more on this below). The mpich configure tests for this and will report whether an executable built with shared libraries remembers the location of the libraries. It also attemps to use a compiler command line argument to force the executable to remember the location of the shared library.
If you need to set an environment variable to indicate where the mpich shared libraries are, you need to ensure that both the process that you run mpirun from and any processes that mpirun starts gets the enviroment variable. The easiest way to do this is to set the environment variable within your .cshrc (for csh or tcsh users) or .profile (for sh and ksh users) file.
However, setting the environment variable within your startup scripts can
cause problems if you use several different systems. For example, you may
have a single .cshrc file that you use with both an SGI (IRIX) and
Solaris system. You do not want to set the LD_LIBRARY_PATH to point
the SGI at the Solaris version of the mpich shared libraries*. Instead, you
would like to set the environment variable before running mpirun:
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/usr/local/mpich/lib/shared mpirun -np 4 cpiUnfortunately, this won't always work. Depending on the method that mpirun and mpich use to start the processes, the environment variable may not be sent to the new process. This will cause the program to fail with a message like
ld.so.1: /home/me/cpi: fatal: libpmpich.so.1.0: open failed: No such file or directory KilledTo work around this problem, you should use the (new) secure server (Section Using the Secure Server ). This server is built with
make serv_p4and can be installed on all machines in the machines file for the current architecture with
chp4_servs -port=1234The new secure server propagates all environment variables to the remote process, and ensures that the environment in which that process (containing your MPI program) contains all environment variables that start with LD_ (just in case the system uses LD_SEARCH_PATH or some other name for finding shared libraries).
An alternative to using LD_LIBRARY_PATH and the secure server is to add an option to the link command that provides the path to use in searching for shared libraries. Unfortunately, the option that you would like is ``append this directory to the search path'' (such as you get with -L). Instead, many compilers provide only ``replace the search path with this path.''* For example, some compilers allow -Rpath:path:...:path to specify a replacement path. Thus, if both mpich and the user provide library search paths with -R, one of the search paths will be lost. Eventually, mpicc and friends can check for -R options and create a unified version, but they currently do not do this. You can, however, provide a complete search path yourself if your compiler supports an option such as -R.
The preceeding may sound like a lot of effort to go to, and in some ways it is. For large clusters, however, the effort will be worth it: programs will start faster and more reliably.