Our Books: Parallel and Distributed Programming Using C++

We have thus divided our problem into two parts. The child-programme and the education process. These two remain very closely connected. We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine, and see how well it learns...

--Alan Turing Can A Machine Think?

The PVM (Parallel Virtual Machine) is a software system that provides the software developer with the facilities to write and run programs that exploit parallelism. The PVM presents a collection of networked computers to the developer as a single logical machine with parallel capabilities. The collection of computers can all have the same architecture or the collection can consist of computers with different architectures. The PVM can even be connected to computers that fall into the MPP (Massively Parallel Processor) class. Although PVM programs can be developed for a single computer, the real advantages come when there are two or more computers connected.

The PVM supports the message passing model as a means of communication between concurrently executing tasks. An application interacts with the PVM through a library that consists of APIs for process control, sending messages, receiving messages, signaling processes, etc. A C++ program interfaces with the PVM library in the same way that it interacts with any other function library. While a program that accesses PVM library calls does require certain functions to be called to initialize the environment, there is nothing that forces any particular form or architecture on a C++ program. This means that the C++ programmer can combine PVM capabilities with other styles of C++ programming (e.g., object-oriented, parameterized programming, agent oriented programming, and structured programming). The use of libraries to provide additional functionality to C++ is considered one of its advantages. Through the use of libraries such as PVM, MPI, or Linda, a C++ developer can use different models of parallelism, whereas other languages are restricted to whatever parallel primitives are built into the language. The PVM library is perhaps the easiest way to add parallel programming capabilities to the C++ language.

6.1. The Classic Parallelism Models Supported by PVM

The PVM system supports the MIMD (Multiple Instruction Multiple Data) and SPMD(Single Program Multiple Data) models of parallelism. Actually SPMD is a variation on the SIMD (Single Instruction Multiple Data) model. The models classify programs by instruction streams and data streams. In the MIMD model a program consists of two or more concurrently executing instruction streams each with its own local data stream. Essentially each processor has its own memory. In the PVM environment the MIMD is considered a distributed memory model. That is in contrast to a shared memory model. In shared memory models each processor can see the same memory locations. In the distributed model memory values must be communicated through message passing. On the other hand, the SPMD model consists of a single program (the same set of instructions) concurrently executing on two or more machines with the program on each machine processing a different data stream. In other words, the same program on each machine is working with different pieces of data. The PVM environment supports both the MIMD and SIMD or a combination of these two models. Figure 6-1 shows the four classic models and where PVM programs are classified.

Figure 6-1

Notice in Figure 6-1 that the SISD and MISD models are not applicable to the PVM. The SISD model describes a uniprocessor machine and the MISD model has not yet been practically applied. The two models in Figure 6-1 that can be used with PVMs determine how a C++ program interacts with the computers. The software developer sees one logical virtual computer as allowing either two or more different concurrently executing tasks each with access to its own data, or the same task executing as a set of concurrent clones with each clone accessing some different piece of data. For our purposes the MI (Multiple Instructions) and (Single Program) in Figure 6-1 refer to PVM tasks.

6.2. The PVM Library for C++

The PVM functionality is accessed by C++ through a collection of library routines provided by the PVM environment. The routines are typically divided into seven categories:

· Process Management & Control

· Messaging Packing & Sending

· Message Unpacking & Receiving

· Task Signaling

· Message Buffer Management

· Information and Utility Functions

· Group Operations

The library routines are easy to integrate into the C++ environment. The pvm_ prefix to each function helps to keep the namespace clear. To use the PVM library routines your programs must include the pvm3.h header file and link to libpvm. Program 6.1 and 6.2 shows how a simple PVM program works. The instructions for compiling and executing Program 6.1 are contained in Program Profile 6.1


//Program 6.1
 
#include "pvm3.h"
#include 
#include 

int main(int argc,char *argv[])
{
   int RetCode,MessageId;
   int PTid, Tid;
   char Message[100];
   float Result[1];
   PTid = pvm_mytid();
   RetCode = pvm_spawn("program6-2",NULL,0,"",1,&Tid);
   if(RetCode == 1){
      MessageId = 1;
      strcpy(Message,"22");
      pvm_initsend(PvmDataDefault);
      pvm_pkstr(Message);
      pvm_send(Tid,MessageId);
      pvm_recv(Tid,MessageId);
      pvm_upkfloat(Result,1,1);
      cout << Result[0] << endl;
      pvm_exit();
      return(0);
   }
   else{
          cerr << "Could not spawn task " << endl;
          pvm_exit();
          return(1);
   }
}

Program Profile 6.1

Program Name:

program6-1.cc

Description:

Uses pvm_send to send a number to another PVM task that is executing (Program 6.2) and pvm_recv to receive a number from that task

Libraries Required:

libpvm3

Headers Required:

<pvm3.h> <iostream> <string.h>

Compile & Link Instructions:

c++ -o program6-1 -I $PVM_ROOT/include -L $PVM_ROOT/lib/$PVM_ARCH -l pvm3

Test Environment:

Solaris 8, PVM 3.4.3, SuSE Linux 7.1, gcc 2.95.2,

Execution Instructions:

./program6-1

Notes:

pvmd must be running.

Program 6.1 calls eight commonly used PVM routines: pvm_mytid(),pvm_spawn(), pvm_initsend(), pvm_pkstr(), pvm_send(), pvm_recv(),pvm_upkfloat(), and pvm_exit(). The pvm_mytid()routine returns the task identifier of the calling process. The PVM system associates a task identifier with each process that it creates. The task identifier is used to send messages between tasks, to receive messages from other tasks, to signal tasks, to interrupt tasks and so on. Any PVM task may communicate with any other PVM task as long as it has access to the task identifier of the task it wants to communicate with. The pvm_spawn() routine is used to start new PVM processes. Program 6.1 uses the pvm_spawn() process to start a new process to execute Program 6.2 . The task identifier for the new task is returned in the &Tidparameter of the pvm_spawn()call. The PVM environment uses message buffers to pass data between tasks. Each task can have one or more message buffers. However, only one buffer is considered the active message buffer. Prior to sending each message thepvm_initsend()routine is called to prepare or initialize the active message buffer. The pvm_pkstr()routine is used to pack the string contained in the Message variable. This packing encodes the string for transport to another task in another process possibly on another machine with different machine architecture. The PVM environment handles the details of the architecture to architecture conversions. The PVM environment requires the use of a packing routine prior to sending and an unpacking routine during receiving to make the message readable by the receiver. However, there is an exception to this that we shall discuss later. The pvm_send() and pvm_recv()are used to send and receive messages. The MessageId simply identifies which message the caller or sender is working with. Notice in Program 6.1 that the pvm_send() and pvm_receive() routines contain the task identifier of the task receiving the data and the task identifier of the task sending the data. Thepvm_upkfloat()routine takes the message it retrieves from the active message buffer and unpacks it into an array of type float. Program 6.1 spawns a PVM task to execute Program 6.2

Notice that Program 6.1 and 6.2 both contain a call to the routine pvm_exit(). It's important that this function is called when the PVM processing for a task is finished. Although this routine does not kill the process or stop the process it does PVM cleanup for the task and disconnects the task from the PVM. Notice that Programs 6.1 and 6.2 are self-contained standalone programs. Both contain the main() function. Program Profile 6.2 has the implementation details for


// Program 6.2 

#include "pvm3.h"
#include "stdlib.h"

int main(int argc, char *argv[])
{
   int MessageId, Ptid;
   char Message[100];
   float Num,Result;
   Ptid = pvm_parent();
   MessageId = 1;
   pvm_recv(Ptid,MessageId);
   pvm_upkstr(Message);
   Num = atof(Message);
   Result = Num / 7.0001;
   pvm_initsend(PvmDataDefault);
   pvm_pkfloat(&Result,1,1);
   pvm_send(Ptid,MessageId);
   pvm_exit();
   return(0);
}

Program Profile 6.2

Program Name:

program6-2.cc

Description:

This program receives a number from its parent process and divides that number by 7. It sends the result to its parent process.

Libraries Required:

libpvm3

Headers Required:

<pvm3.h> <stdlib.h>

Compile & Link Instructions:

c++ -o program6-2 -I $PVM_ROOT/include program6-2.cc -L $PVM_ROOT/lib/PVM_ARCH -lpvm3

Test Environment:

SuSE Linux 7.1 gnu C++ 2.95.2 , Solaris 8 Workshop 6 , PVM 3.4.3

Execution Instructions:

This program is spawned by Program 6.1

Notes:

pvmd must be running.

6.2.1. Compiling and Linking a C++/PVM Program

Version 3.4.x of the PVM environment packages the routines in a single library, libpvm3.a . To compile a PVM program include the pvm3.h header file and link with libpvm3.a:

$ c++ -o mypvm_program -I $PVM_ROOT/include mypvm_program.cc -I$PVM_ROOT/lib -lpvm3

The $PVM_ROOT environment variable points to the PVM installed directory. This command will produce a binary called mypvm_program.

To execute these Programs 6.1 and 6.2, you must have the PVM environment properly installed. Three basic methods can be used to execute a PVM program:

· as a standalone binary

· using the PVM console

· using XPVM

6.2.2. Executing a PVM Program as a Standalone

The pvmd program must be started and each host involved in the PVM must have the correctly compiled programs in the appropriate directory. The default directory for the compiled programs (binaries) is:

$HOME/pvm3/bin/$PVM_ARCH

where the PVM_ARCH contains the name of the machine's architecture. See Table 6-2 and items 1 and 2 from Section 6.1.5. The binaries should have the proper file permissions set to allow them to be accessed and executed. The pvmd program can be started as:

pvmd &

or:

pvmd hostfile &

where hostfile is a configuration file that has special options to be passed to the pvmd program. See item 5 from section 6.1.5. After the pvmd program has been started on one of the computers involved in the PVM, a PVM program can then be started simply by:

$ MyPvmProgram

If this program spawns any other tasks they will be started automatically.

6.2.2.1. Starting PVM Programs Using the PVM Console

To execute the programs using the PVM console, type the following at the PVM console. Start the PVM

console by typing:

$pvm

and at the pvm> prompt, type the name of the program to be executed:

pvm> spawn -> MyPvmProgram

6.2.2.2. Start PVM programs using XPVM

Besides starting the programs using the terminal based PVM console, XPVM graphical interface for X Windows can be used. Figure 6-2 shows what to type in the tasks dialog of a XPVM session.

Figure 6-2

The PVM library does not force any particular structure on a C++ program. The first PVM routine called by a program enrolls that program into the PVM. It is good practice to always call pvm_exit()for every program that is part of the PVM. If this routine is not called for every PVM task the system will hang. It is a good rule of thumb to call pvm_mytid(), and pvm_parent() early in the processing of the task. Table 6-1 contains the library routines broken down into the seven commonly used categories.

Table 6-1

6.2.3. A PVM Preliminary Requirements Checklist

In addition to obtaining and properly installing a PVM distribution, there are a few other minor considerations. When the PVM environment is implemented as a network of computers the following items must be handled before your C++ program can interact with the PVM environment.

Item 1

The environment variable PVM_ROOT and PVM_ARCH should be set. The environment variable PVM_ROOT should be set to the directory where PVM is installed.

Using the Bourne Shell (bash) Using the C Shell

$ PVM_ROOT=/usr/lib/pvm3 setenv PVM_ROOT /usr/lib/pvm3

$ export PVM_ROOT

The PVM_ARCH environment variable identifies the architecture of the machine. Each machine involved in the PVM must be identified by architecture. For example, our Ultrasparcs have the designation SUN4SOL2 and our Linux machines have the designation LINUX. Table 6-2 shows the most commonly used architectures for the PVM environment. Check with your distribution of PVM if an appropriate architecture for your machines is not contained in Table 6-2.

Table 6-2

Table 6-2 shows the name and machine type associated with the name. Set your PVM_ARCH environment variable to one of the names in Table 6-2. For instance:

Using the Bourne Shell (bash) Using the C Shell

$PVM_ARCH=LINUXsetenv PVM_ARCH LINUX

$export PVM_ARCH

Item 2

The binaries (executables) for any programs participating in the PVM have to be either located on all machines involved or accessible by all machines involved in the PVM. In addition to availability, each program must be compiled to work for the architecture it will run on. This means if we have UltraSparcs, PowerPC's, and Intel processors involved in the PVM then we must have a version of the program compiled for each architecture. That version must be located in a place that the PVM is aware of. The location is often $HOME/pvm3/bin. However, the location can be specified in a PVM configuration file usually referred to as thehostfile or .xpvm_hosts if the XPVM environment is used. The hostfile would contain an entry such as:

ep=/usr/local/pvm3/bin

This specifies any user binaries needed by the PVM can be found in the /usr/local/pvm3/bin directory.

Item 3

The user initiating the PVM program must have network access to each machine involved in the PVM. This access is typically rsh or ssh access. See the man pages for more details on the rsh and ssh programs. By default, the PVM accesses each machine using the login name of the user initiating the PVM program or the account name of the machine starting the PVM program. If another account besides the initiating login account is required, an entry must be added to the hostfile or .xpvm_hosts. For example:

lo=flashgordon

Item 4

Create.rhosts file on each host listing all the hosts you wish to use. These are the computers that have

the potential to be involved in the PVM. Depending on the setting in the .xpvm_hosts file or the pvm_hosts file, these computers will automatically be added to the PVM when the pvmd is started. Computers listed in these files can also be dynamically added to the PVM at run time.

Item 5

Create a $HOME/.xpvm_hosts and/or a $HOME/pvm_hosts file listing all the hosts you wish to use prepended by an &. The & means don't automatically ad the host. Not using & will cause the host to be automatically added. The pvm_hostfile is a user created file. The name is arbitrary. However, .xpvm_hosts is the required name when using the XPVM environment. Figure 6-3 shows an example of a PVM hostfile. The same format would be used for the PVM console hostfile or for .xpvm_hosts.

Figure 6-3

The primary thing to keep in mind is network access of the user running the PVM program. The owner of the PVM program should have account access to every computer involved in the pool of processors that will be executing parts of the program. This access will use either the rsh, rlogin commands or ssh. The program to be executed must be available on each host and the PVM environment must be aware of what the hosts are and where the binaries will be installed.

Part 1: Adding Parallel Programming Capabilities to C++ Through the PVM