0% found this document useful (0 votes)
26 views28 pages

Lec5 MPI

The document discusses the Message Passing Interface (MPI) standard which defines communication primitives for message passing between tasks executing on multiple processors. It describes the minimum set of MPI functions for initialization, termination, process control, synchronization, and message passing. The functions support both blocking and non-blocking send and receive operations with data types and message tags.

Uploaded by

SAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views28 pages

Lec5 MPI

The document discusses the Message Passing Interface (MPI) standard which defines communication primitives for message passing between tasks executing on multiple processors. It describes the minimum set of MPI functions for initialization, termination, process control, synchronization, and message passing. The functions support both blocking and non-blocking send and receive operations with data types and message tags.

Uploaded by

SAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture 5 The Message

Passing Interface (MPI)

Parallel Computing
Fall 2022

1
The Message Passing Interface
Introduction
 The Message-Passing Interface (MPI)is an attempt to create a standard to
allow tasks executing on multiple processors to communicate through some
standardized communication primitives.
 It defines a standard library for message passing that one can use to
develop message-passing program using C or Fortran.
 The MPI standard define both the syntax and the semantics of these
functional interface to message passing.
 MPI comes intro a variety of flavors, freely available such as LAM-MPI and
MPIch, and also commercial versions such as Critical Software’s WMPI.
 It supporst message-passing on a variety of platforms from Linux-based or
Windows-based PC to supercomputer and multiprocessor systems.
 After the introduction of MPI whose functionality includes a set of 125
functions, a revision of the standard took place that added C++ support,
external memory accessibility and also Remote Memory Access (similar to
BSP’s put and get capability)to the standard. The resulting standard is
known as MPI-2 and has grown to almost 241 functions.

2
The Message Passing Interface
A minimum set
 A minimum set of MPI functions is described below. MPI functions use the prefix
MPI and after the prefix the remaining keyword start with a capital letter.
 A brief explanation of the primitives can be found on the textbook (beginning
page 242). A more elaborate presentation is available in the optional book.

Function Class Standard Function Operation


Initialization and Termination MPI MPI_Init Start of SPMD code
MPI MPI_Finalize End of SPMD code
Abnormal Stop MPI MPI_Abort One process halts all
Process Control MPI MPI_Comm_size Number of processes
MPI MPI_Comm_rank Identifier of Calling Process
MPI MPI_Wtime Local (wall-clock) time
Synchronization MPI MPI_Barrier Global Synchronization

DRMA MPI-2 MPI_Win_create Make memory info global


MPI-2 MPI_Win_free Undo global effect
MPI-2 MPI_Put Copy into remote memory
MPI-2 MPI_Get Copy from remote memory
Message Passing MPI MPI_Send Send message to remote proc.
MPI MPI_Recv Receive mess. from remote proc. 3
General MPI program
#include <mpi.h>

Main(int argc, char* argv[])
{

MPI_Init(&argc, argv); /*no MPI functions called before this*/

MPI_Finalize(); /*no MPI functions called after this*/

}

4
MPI and MPI-2
Initialization and Termination
#include <mpi.h>
int MPI_Init (int *argc, char **argv);
int MPI_Finalize(void);
 Multiple processes from the same source are created by
issuing the function MPI_Init and these processes are safely
terminated by issuing a MPI_Finalize.
 The arguments of MPI_Init are the command line arguments
minus the ones that were used/processed by the MPI
implementation(main function’s parameters – argc and
argv). Thus command line processing should only be
performed in the program after the execution of this function
call. Successful return returns a MPI_SUCCESS; otherwise an
error-code that is implementation dependent is returned.
 Definitions are available in <mpi.h>.
5
MPI and MPI-2
Abort
int MPI_Abort(MPI_Comm comm, int errcode);

 Note that MPI_Abort aborts an MPI program cleanly.


The first one is a communicator and second
argument is an integer error code.
 A communicator is a collection of processes that can
send messages to each other. A default
communicator is MPI_COMM_WORLD, which consists
of all the processes running when program execution
begins.

6
MPI and MPI-2
Communicators; Process Control
 Under MPI, a communication domain is a set of processors that are
allowed to communicate with each other. Information about such a
domain is stored in a communicator that uniquely identify the processors
that participate in a communication operation.
 A default communication domain is all the processors of a parallel
execution; it is called MPI COMM WORLD. By using multiple
communicators between possibly overlapping groups of processors we
make sure that messages are not interfering with each other.

#include <mpi.h>
int MPI_Comm_size ( MPI_Comm comm, int *size);
int MPI_Comm_rank ( MPI_Comm comm, int *rank);
Thus
MPI_Comm_size ( MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank ( MPI_COMM_WORLD, &pid );
return the number of processors nprocs and the processor id pid of the calling
processor.

7
MPI
Message-Passing primitives
#include <mpi.h>
/* Blocking send and receive */
int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm);
int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int src, int tag, MPI_Comm comm, MPI_Status *stat);

/* Non-Blocking send and receive */


int MPI_Isend(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm, MPI_Request *req);
int MPI_Irecv(void *buf, int count, MPI_Datatype dtype, int src, int tag, MPI_Comm comm, MPI_Request *req);
int MPI_Wait(MPI_Request *preq, MPI_Status *stat);

buf - initial address of send/receive buffer


count - number of elements in send buffer (nonnegative integer) or maximum number of elements in receive buffer.
dtyp - datatype of each send/receive buffer element (handle)
dest,src - rank of destination/source (integer)
Wild-card: MPI_ANY_SOURCE for recv only. No wildcard for dest.
tag - message tag (integer). Range 0...32767.
Wild-card: MPI_ANY_TAG for recv only; send must specify tag.
comm - communicator (handle)
stat - status object (Status), which can be the MPI constant; it returns the source and tag of the message that was
acctually received.
MPI_STATUS_IGNORE if the return status is not desired
Struct of status:
status->MPI_SOURCE,
status->MPI_TAG,
status->MPI_ERROR
8
data type correspondence
between MPI and C
MPI_CHAR --> signed char ,
MPI_SHORT --> signed short int ,
MPI_INT --> signed int
MPI_LONG --> signed long int ,
MPI_UNSIGNED_CHAR --> unsigned char ,
MPI_UNSIGNED_SHORT --> unsigned short int ,
MPI_UNSIGNED --> unsigned int
MPI_UNSIGNED_LONG --> unsigned long int ,
MPI_FLOAT --> float ,
MPI_DOUBLE --> double,
MPI_LONG_DOUBLE --> long double,
MPI_BYTE
MPI_PACKED

9
MPI
Message-Passing primitives
 The MPI Send and MPI Recv functions are blocking, that is they
do not return unless it is safe to modify or use the contents of
the send/receive buffer.
 MPI also provides for non-blocking send and receive primitives.
These are MPI Isend and MPI Irecv, where the I stands for
Immediate.
 These functions allow a process to post that it wants to send to
or receive from another process, and then allow the process to
call a function (eg. MPI Wait to complete the send-receive pair.
 Non-blocking send-receives allow for the overlapping of
computation/communication. Thus MPI Wait plays the role of
synchronizer: the send/receive are only advisories and
communication is only effected at the MPI Wait.

10
MPI
Message-Passing primitives: tag
 A tag is simply an integer argument that is passed to a communication function and that
can be used to uniquely identify a message. For example, in MPI if process A sends a
message to process B, then in order for B to receive the message, the tag used in A's call
to MPI_Send must be the same as the tag used in B's called to MPI_Recv. Thus, if the
characteristics of two messages sent by A to B are identical (i.e., same count and
datatype), then A and B can distinguish between the two by using different tags.
 For example, suppose A is sending two floats, x and y, to B. Then the processes can be
sure that the values are received correctly, regardless of the order in which A sends and B
receives, provided different tags are used:

/* Assume system provides some buffering */


if (my_rank == A) {
tag = 0;
MPI_Send(&x, 1, MPI_FLOAT, B, tag, MPI_COMM_WORLD);
...
tag = 1;
MPI_Send(&y, 1, MPI_FLOAT, B, tag, MPI_COMM_WORLD);
} else if (my_rank == B) {
tag = 1;
MPI_Recv(&y, 1, MPI_FLOAT, A, tag, MPI_COMM_WORLD, &status);
...
tag = 0;
MPI_Recv(&x, 1, MPI_FLOAT, A, tag, MPI_COMM_WORLD, &status); }

11
MPI Message-Passing primitives:
Communicators
 Now if one message from process A to process B is being sent by the
library, and another, with identical characteristics, is being sent by the
user's code, unless the library developer insists that user programs
refrain from using certain tag values, this approach cannot be
made to work. Clearly, partitioning the set of possible tags is at best an
inconvenience: if one wishes to modify an existing user code so that it
can use a library that partitions tags, each message passing function in
the entire user code must be checked.
 The solution that was ultimately decided on was the communicator.
 Formally, a communicator is a pair of objects: the first is a group or
ordered collection of processes, and the second is a context, which can
be viewed as a unique, system-defined tag. Every communication
function in MPI takes a communicator argument, and a communication
can succeed only if all the processes participating in the communication
use the same communicator argument. Thus, a library can either
require that its functions be passed a unique library-specific
communicator, or its functions can create their own unique
communicator. In either case, it is straightforward for the library
designer and the user to make certain that their messages are not
confused.
12
For example, suppose now that the user's code is sending a float, x, from process A to process B, while the library is sending a float, y, from
A to B:
/* Assume system provides some buffering */
void User_function(int my_rank, float* x) {
MPI_Status status;
if (my_rank == A) {
/* MPI_COMM_WORLD is pre-defined in MPI */
MPI_Send(x, 1, MPI_FLOAT, B, 0, MPI_COMM_WORLD);
} else if (my_rank == B) {
MPI_Recv(x, 1, MPI_FLOAT, A, 0, MPI_COMM_WORLD, &status);
}
...
}
void Library_function(float* y) {
MPI_Comm library_comm;
MPI_Status status; int my_rank;
/* Create a communicator with the same group */ /* as MPI_COMM_WORLD, but a different context */
MPI_Comm_dup(MPI_COMM_WORLD, &library_comm);
/* Get process rank in new communicator */
MPI_Comm_rank(library_comm, &my_rank);
if (my_rank == A)
MPI_Send(y, 1, MPI_FLOAT, B, 0, library_comm);
else if (my_rank == B)
{ MPI_Recv(y, 1, MPI_FLOAT, A, 0, library_comm, &status); }
...
}
int main(int argc, char* argv[]) {
...
if (my_rank == A) {
User_function(A, &x);
...
Library_function(&y);
} else if (my_rank == B) {
Library_function(&y);
...
User_function(B, &x);
}
...
}

13
MPI Message-Passing primitives:
User-defined datatypes
 The second main innovation in MPI, user-defined datatypes, allows programmers to exploit
this power, and as a consequence, to create messages consisting of logically unified sets of
data rather than only physically contiguous blocks of data.
 Loosely, an MPI datatype is a sequence of displacements in memory together with a
collection of basic datatypes (e.g., int, float, double, and char). Thus, an MPI-datatype
specifies the layout in memory of data to be collected into a single message or data to be
distributed from a single message.
 For example, suppose we specify a sparse matrix entry with the following definition.
typedef struct {
double entry;
int row, col;
} mat_entry_t;
 MPI provides functions for creating a variable that stores the layout in memory of a variable
of type mat_entry_t. One does this by first defining an MPI datatype
MPI_Datatype mat_entry_mpi_t;
to be used in communication functions, and then calling various MPI functions to initialize
mat_entry_mpi_t so that it contains the required layout. Then, if we define
mat_entry_t x;
we can send x by simply calling
MPI_Send(&x, 1, mat_entry_mpi_t, dest, tag, comm);
and we can receive x with a similar call to MPI_Recv.

14
MPI: An example with the
blocking operations
#include <stdio.h>
#include <mpi.h>
#define N 10000000 // Choose N to be multiple of nprocs to avoid problems.
// Parallel sum of 1 , 2 , 3, ... , N
int main(int argc,char **argv){
int pid,nprocs,i,j;
int sum, start, end, total;
MPI_Status status;
MPI_Init(argc,argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&pid );
sum = 0; total = 0;
start = (N/nprocs)*pid +1 ; // Each processor
end =(N/nprocs)*(pid+1);
for(i=start;i<=end;i++) sum += i;
if (pid != 0 ) {
MPI_Send(&sum,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else {
for (j=1;j<nprocs;j++) {
MPI_Recv(&total,1,MPI_INT,j,1,MPI_COMM_WORLD,&status);
sum = sum + total;
}
}
if (pid == 0 ) {
printf(" The sum from 1 to %d is %d \n",N,sum);
}
MPI_Finalize();
}
// Note: Program neither compiled nor run!+-
15
Non-Blocking send and Receive
/* Non-Blocking send and receive */
int MPI_Isend(void *buf, int count, MPI_Datatype dtype, int dest, int tag,
MPI_Comm comm, MPI_Request *req);
int MPI_Irecv(void *buf, int count, MPI_Datatype dtype, int src, int tag,
MPI_Comm comm, MPI_Request *req);
int MPI_Wait(MPI_Request *preq, MPI_Status *stat);

 #include "mpi.h"
 int MPI_Wait ( MPI_Request *request, MPI_Status *status)
 Waits for an MPI send or receive to complete
 Input Parameter
 request (handle)
 Output Parameter
 status object (Status) . May be MPI_STATUS_IGNORE.

16
MPI: An example with the non-
blocking operations
#include <stdio.h>
#include <mpi.h>
#define N 10000000 // Choose N to be multiple of nprocs to avoid problems.
// Parallel sum of 1 , 2 , 3, ... , N
int main(int argc,char **argv){
int pid,nprocs,i,j;
int sum, start, end, total;
MPI_Status status;
MPI_Request request;
MPI_Init(argc,argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&pid );
sum = 0; total = 0;
start = (N/nprocs)*pid +1 ; // Each processor
end = (N/nprocs)*(pid+1);
for(i=start;i<=end;i++) sum += i;
if (pid != 0 ) {
// MPI_Send(&sum,1,MPI_INT,0,1,MPI_COMM_WORLD);
MPI_Isend(&sum,1,MPI_INT,0,1,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
}
else {
for (j=1;j<nprocs;j++) {
MPI_Recv(&total,1,MPI_INT,j,1,MPI_COMM_WORLD,&status);
sum = sum + total;
}
}
if (pid == 0 ) {
printf(" The sum from 1 to %d is %d \n",N,sum);
}
MPI_Finalize();
} // Note: Program neither compiled nor run! 17
MPI Basic Collective Operations
 One simple collective operations:

int MPI_Bcast(void * message, int count, MPI_Datatype


datatype, int root, MPI_Comm comm)

 The routine MPI_Bcast sends data from one process to


all others

18
MPI_Bcast

Process 1
Data Present
Empty

Write data to Process 2


all processes
Data Present
Empty

Process 3
Data Present
Empty
Process 0
Data Present
Data written, unblock

19
Simple Program that Demonstrates MPI_Bcast:
#include <mpi.h>
#include <stdio.h>
int main (int argc, char *argv[]){
int k,id,p,size;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(id == 0)
k = 20;
else
k = 10;
for(p=0; p<size; p++){
if(id == p)
printf("Process %d: k= %d before\n",id,k);
}
//note MPI_Bcast must be put where all other processes
//can see it.
MPI_Bcast(&k,1,MPI_INT,0,MPI_COMM_WORLD);
for(p=0; p<size; p++){
if(id == p)
printf("Process %d: k= %d after\n",id,k);
}
MPI_Finalize();
return 0

20
Simple Program that Demonstrates
MPI_Bcast:
 The Output would look like:
Process 0: k= 20 before
Process 0: k= 20 after
Process 3: k= 10 before
Process 3: k= 20 after
Process 2: k= 10 before
Process 2: k= 20 after
Process 1: k= 10 before
Process 1: k= 20 after
21
MPI
Timing Control
#include <mpi.h>
double MPI_Wtime(void);
 It returns the time in seconds since an arbitrary point
in the past on the calling processor. Gives wall clock
time.
 If you want to give the time since the beginning of the
program under MPI, you may need to do something
like the fragment below.
double t1, mpi_basetime;
MPI_Init(&argc,&argv), mpi_basetime = MPI_Wtime();
t1= (MPI_Wtime()-mpi_basetime);

22
MPI
Barrier Synchronization
#include <mpi.h>
int MPI_Barrier(MPI_Comm comm);

 It blocks the caller until all group members have


called it; the call returns at any process only after all
group members have entered the call.
 MPI Barrier is just a barrier synchronization. No side-
effect other than the obvious is effected.

23
MPI-2 in LAM MPI
Remote Memory Access: MPI Put
 The RMA (Remote Memory Access)facility allowed by LAM MPI according the MPI-2
implementation is effected by MPI Put and MPI Get. There are multiple ways one can
use such operations under LAM MPI. We mention only one.

#include <mpi.h>
int MPI_Put(orgaddr, orgcnt, orgtype, rank, targdisp, targcnt, targtype, win)

 Input Parameters are:


void *orgaddr; - initial address of origin buffer (choice)
int orgcnt; - number of entries in origin buffer (nonnegative integer)
MPI_Datatype orgtype; - datatype of each entry in origin buffer (handle)
int rank; - rank of target (nonnegative integer)
MPI_Aint targdisp; - displacement from start of window to target buffer (nonnegative integer)
int targcnt; - number of entries in target buffer (nonnegative integer)
MPI_Datatype targtype; - datatype of each entry in target buffer (handle)
MPI_Win win; - window object used for communication (handle)

24
MPI-2 in LAM MPI
Remote Memory Access: MPI Get
 A get has a similar syntax.
#include <mpi.h>
int MPI_Get(orgaddr, orgcnt, orgtype, rank, targdisp, targcnt, targtype, win)

void *orgaddr - initial address of origin buffer (choice);


int orgcnt - number of entries in origin buffer (nonnegative integer);
MPI_Datatype orgtype- datatype of each entry in origin buffer (handle);
int rank - rank of target (nonnegative integer);
MPI_Aint targdisp - displacement from window start to the beginning of the target buffer
(nonnegative integer);
int targcnt – - number of entries in target buffer (nonnegative integer)
MPI_Datatype targtype - datatype of each entry in target buffer (handle);
MPI_Win win - window object used for communication (handle);

 Memory is copied from target memory to the origin. The origin datatype may
not specify overlapping entries in the origin bugger. The target buffer must be
contained within the target window, and the copied data must fit, without
truncation, in the origin buffer.
25
MPI-2 in LAM MPI
Communication Windows
 MPI-2 requires that RMA requests be registered through window creation and termination operations.
#include <mpi.h>
int MPI_Win_create(void *base, MPI_Aint size, int disp_unit,
MPI_Info info, MPI_Comm comm, MPI_Win *newwin)
int MPI_Win_free(MPI_Win *newwin)
int MPI_Win_fence(int assertion, MPI_Win newwin)
 Operation MPI_Win_create creates a window for remote memory access operation.
base - initial address of window (choice)
size - size of window in bytes (nonnegative integer)
disp_unit - local unit size for displacements, in bytes (positive integer)
info - info argument (handle)
comm - communicator (handle)
newwin - window object returned by the call (handle)
assertion - A suggestion to the communication algorithm; use 0 if you don’w know what values are
acceptable.
 Operation MPI_Win_free frees the window object newwin and returns a null handle (equal to MPI WIN
NULL) .
 The MPI call MPI_Win_fence(assertion, newwin) synchronizes RMA calls on newwin. The call is collective
on the group of newwin. All RMA operations on newwin originating at a given process and started before
the fence call will complete at that process before the fence call returns. They will be completed at their
target before the fence call returns at the target. RMA operations on newwin started by a process after the
fence call returns will access their target window only after MPI Win fence has been called by the target
process. Calls to MPI Win fence should both precede and follow calls to put, get or accumulate that are
synchronized with fence calls.
 MPI Win fence statements must surround an MPI-2 RMA put or get operation.

26
MPI-2 in LAM MPI
Communication Windows:Example

#include <mpi.h>
int x,y;
int pid, nprocs;
MPI_Win win;

..... We assume we have retrieved pid using an appropriate function call .....

MPI_Win_create(&y,sizeof(int),1,MPI_INFO_NULL,MPI_COMM_WORLD,&win);
MPI_Win_fence(0,win);
if (pid == 0) {
MPI_Put(&x,sizeof(int),MPI_CHAR,5,0,sizeof(int),MPI_CHAR,win);
}
MPI_Win_fence(0,win);
MPI_Win_free(0,&win);

27
MPI-2 in LAM MPI:
Put and Get examples
 MPI_Put *****************************************
char *des,*src;
int dp,off,size,K;

MPI_Win_create(des,(K),1,MPI_INFO_NULL,MPI_COMM_WORLD,&win);
MPI_Win_fence(0,win);
if (pid == 0) {
MPI_Put(src,size,MPI_CHAR,dp,off,size,MPI_CHAR,win);
}
MPI_Win_fence(0,win);
MPI_Win_free(0,&win);

 MPI Get *****************************************


MPI_Win_create((src),(K),1,MPI_INFO_NULL,MPI_COMM_WORLD,&win)
MPI_Win_fence(0,win);
if (pid == 0) {
MPI_Get(des,size,MPI_CHAR,dp,off,size,MPI_CHAR,win);
}
MPI_Win_fence(0,win);
MPI_Win_free(0,&win);

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy