Next Previous Contents

3. Examples

This section presents examples of two alternate methods of using the PVM module. The source code for these examples is included in the PVM module source code distribution in the examples subdirectory. The first method uses PVM library routines to manage a simple distributed application. The second method uses the higher-level master-slave interface. This interface can provide a high degree of tolerance to failure of slave machines which proves useful in long-running distributed applications.

3.1 Example 1: A Simple Hello World Program

In programming language tutorials, the first example is usually a program which simply prints out a message such as Hello World and then exits. The intent of such a trivial example is to illustrate all the steps involved in writing and running a program in that language.

To write a Hello World program using the PVM module, we will write two programs, the master ( hello_master), and the slave ( hello_slave). The master process will spawn a slave process on different host and then wait for a message from that slave process. When the slave runs, it sends a message to the master, or parent, and then exits. For the purpose of this example, we will assume that the PVM consists of two hosts, named vex and pirx, and that the slave process will run on pirx.

The hello_master program

First, consider the master process, hello_master. Conceptually, it must specify the full path to the slave executable and then send that information to the slave host (pirx). For this example, we assume that the master and slave executables are in the same directory and that the master process is started in that directory. With this assumption, we can construct the path to the slave executable using the getcwd and path_concat functions. We then send this information to the slave host using the pvm_spawn function:

   path = path_concat (getcwd(), "hello_slave");
   slave_tid = pvm_spawn (path, PvmTaskHost, "pirx", 1);
The first argument to pvm_spawn specifies the full path to the slave executable. The second argument is a bit mask specifying options associated with spawning the slave process. The PvmTaskHost option indicates that the slave process is to be started on a specific host. The third argument gives the name of the slave host and the last argument indicates how many copies of this process should be started. The return value of pvm_spawn is an array of task identifiers for each of the slave processes; negative values indicate that an error occurred.

Having spawned the hello_slave process on pirx, the master process calls the pvm_recv function to receive a message from the slave.

   bufid = pvm_recv (-1, -1);
The first argument to pvm_recv specifies the task identifier of the slave process expected to send the message and the second argument specifies the type of message that is expected. A slave task identifier -1 means that a message from any slave will be accepted. Similarly, a message identifier of -1 means that any type of message will be accepted. In this example, we could have specified the slave task id and the message identifier explicitly:
  bufid = pvm_recv (slave_tid, 1);
When a suitable message is received, the contents of the message are stored in a PVM buffer and pvm_recv returns the buffer identifier which may be used by the PVM application to retrieve the contents of the buffer.

Retrieving the contents of the buffer normally requires knowing the format in which the information is stored. In this case, because we accepted all types of messages from the slave, we may need to examine the message buffer to find out what kind of message was actually recieved. The pvm_bufinfo function is used to obtain information about the contents of the buffer.

   (,msgid,) = pvm_bufinfo (bufid);
Given the buffer identifier, pvm_bufinfo returns the number of bytes, the message identifier and the task identifer sending the message.

Because we know that the slave process sent a single object of Struct_Type, we retrieve it by calling the pvm_recv_obj function.

   variable obj = pvm_recv_obj();
   vmessage ("%s says %s", obj.from, obj.msg);
This function is not part of the PVM package but is a higher level function provided by the PVM module. It simplifies the process of sending S-lang objects between hosts by handling some of the bookkeeping required by the lower level PVM interface. Having retrieved a S-lang object from the message buffer, we can then print out the message. Running hello_master, we see:
  vex> ./hello_master
  pirx says Hello World
Note that before exiting, all PVM processes should call the pvm_exit function to inform the pvmd daemon of the change in PVM status.
   pvm_exit();
   exit(0);
At this point, the script may exit normally.

The hello_slave program

Now, consider the slave process, hello_slave. Conceptually, it must first determine the location of its parent process, then create and send a message to that process.

The task identifier of the parent process is obtained using the pvm_parent function.

   variable ptid = pvm_parent();
For this example, we will send a message consisting of a S-lang structure with two fields, one containing the name of the slave host and the other containing the string "Hello World".

We use the pvm_send_obj function to send this this message because it automatically handles packaging all the separate structure fields into a PVM message buffer and also sends along the structure field names and data types so that the structure can be automatically re-assembled by the receiving process. This makes it possible to write code which transparently sends S-lang objects from one host to another. To create and send the structure:

   variable s = struct {msg, from};
   s.msg = "Hello World";
   s.from = getenv ("HOST");

   pvm_send_obj (ptid, 1, s);
The first argument to pvm_send_obj specifies the task identifier of the destination process, the second argument is a message identifier which is used to indicate what kind of message has been sent. The remaining arguments contain the data objects to be included in the message.

Having sent a message to the parent process, the slave process then calls pvm_exit to inform the pvmd daemon that its work is complete. This allows pvmd to notify the parent process that a slave process has exited. The slave then exits normally.

3.2 Example 2: Using the Master-Slave Interface

The PVM module provides a higher level interface to support the master-slave paradigm for distributed computations. The symbols associated with this interface have the pvm_ms prefix to distinguish them from those symbols associated with the PVM package itself.

The pvm_ms interface provides a means for handling computations which consist of a predetermined list of tasks which can be performed by running arbitrary slave processes which take command-line arguments. The interface provides a high degree of robustness, allowing one to add or delete hosts from the PVM while the distributed process is running and also ensuring that the task list will be completed even if one or more slave hosts fail (e.g. crash) during the computation. Experience has shown that this failure tolerance is surprisingly important. Long-running distributed computations experience failure of one or more hosts with surprising frequency and it is essential that such failures do not require restarting the entire distributed computation from the beginning.

Scripts using this interface must initialize it by loading the pvm_ms package via, e.g.

      require ("pvm_ms");
As an example of how to use this interface, we examine the scripts master and slave.

The master program

The master script first builds a list of tasks each consisting of an array of strings which provide the command line for each slave process that will be spawned on the PVM. For this simple example, the same command line will be executed a specified number of times. First, the script constructs the path to the slave executable, (Slave_Pgm), and then the command line (Cmd), that each slave instance will invoke. Then the array of tasks is constructed:

 variable pgm_argvs = Array_Type[N];
 variable pgm_argv = [Slave_Pgm, Cmd];

 pgm_argvs[*] = pgm_argv;

The distribution of these tasks across the available PVM is automatically handled by the pvm_ms interface. The interface will simultaneously start as many tasks as possible up to some maximum number of processes per host. Here we specify that a maximum of two processes per host may run simultaneously and then submit the list of tasks to the PVM:

   pvm_ms_set_num_processes_per_host (2);
   exit_status = pvm_ms_run_master (pgm_argvs);

As each slave process is completed, its exit status is recorded along with any messages printed to stdout during the execution. When the entire list of tasks is complete, an array of structures is returned containing status information for each task that was executed. In this example, the master process simply prints out this information.

The slave program

The slave process in this example is relatively simple. Its command line arguments provide the task to be completed. These arguments are then passed to pvm_ms_run_slave

  pvm_ms_run_slave (__argv[[1:]]);
which spawns a subshell, runs the specified command, communicates the task completion status to the parent process and exits.


Next Previous Contents