Difference between revisions of "Programming/OpenMPI"
m |
|||
Line 1: | Line 1: | ||
== Programming Details == | == Programming Details == | ||
− | '''MPI''' defines not only point-to-point communication (e.g., send and receive), it also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in | + | '''MPI''' defines not only point-to-point communication (e.g., send and receive), it also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in single communication action. Reliable broadcast, for example, is where one process has a message at the beginning of the operation, and at the end of the operation, all processes in a group have the message. |
Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was specifically designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc. | Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was specifically designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc. | ||
Line 18: | Line 18: | ||
* Message-Passing Parallel Programming | * Message-Passing Parallel Programming | ||
− | The message passing model can be thought of a process together with the program's own data and the parallelism is achieved by having each of these processes co-operate on the same task. This model also has some limitations these are: | + | The message-passing model can be thought of as a process together with the program's own data and the parallelism is achieved by having each of these processes co-operate on the same task. This model also has some limitations these are: |
* All variables are private to each process. | * All variables are private to each process. | ||
Line 33: | Line 33: | ||
* Sending a message can either be synchronous or asynchronous. (eg. '''MPI_Ssend''' (Synchronous) and '''MPI_Bsend''' (Asynchronous)). | * Sending a message can either be synchronous or asynchronous. (eg. '''MPI_Ssend''' (Synchronous) and '''MPI_Bsend''' (Asynchronous)). | ||
− | * | + | * Asynchronous send is not completed until the message has started to be received. |
* An asynchronous send completes as soon as the message has gone. | * An asynchronous send completes as soon as the message has gone. | ||
* Receives are usually synchronous - the receiving process must wait until the message arrives. | * Receives are usually synchronous - the receiving process must wait until the message arrives. | ||
Line 46: | Line 46: | ||
====Communication considerations==== | ====Communication considerations==== | ||
− | * Sends and receive | + | * Sends and receive calls must match. If these are not there is a danger of deadlock and your program may stall! |
* Most programs do not need to be complicated and scientific codes have a simple structure which in turn have simple communication patterns. | * Most programs do not need to be complicated and scientific codes have a simple structure which in turn have simple communication patterns. | ||
* Use collective communication. | * Use collective communication. | ||
Line 160: | Line 160: | ||
</pre> | </pre> | ||
− | '''Note''' : mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v3.0.0 | + | '''Note''': mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v3.0.0 |
Line 179: | Line 179: | ||
#SBATCH -p compute | #SBATCH -p compute | ||
#SBATCH --exclusive | #SBATCH --exclusive | ||
+ | #SBATCH --mail-user= your email address here | ||
echo $SLURM_JOB_NODELIST | echo $SLURM_JOB_NODELIST |
Revision as of 09:04, 24 May 2021
Programming Details
MPI defines not only point-to-point communication (e.g., send and receive), it also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in single communication action. Reliable broadcast, for example, is where one process has a message at the beginning of the operation, and at the end of the operation, all processes in a group have the message.
Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was specifically designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc.
The Open MPI code has 3 major code modules:
- OMPI - MPI code
- ORTE - the Open Run-Time Environment
- OPAL - the Open Portable Access Layer
Programming Models
When we look at programming models we consider 2 basic ideas:
- Serial programming
- Message-Passing Parallel Programming
The message-passing model can be thought of as a process together with the program's own data and the parallelism is achieved by having each of these processes co-operate on the same task. This model also has some limitations these are:
- All variables are private to each process.
- All communication between each process by sending and receiving messages (hence the OpenMPI name).
- Most message passing programs use the Single-Program-Multiple-Data (SPMD) model.
- It is possible to run an MPI type program on one or more nodes, although if your program is only ever intended to run on one node you should consider openMP instead here.
Below is a data diagram of OpenMPI:
Communication modes
- Sending a message can either be synchronous or asynchronous. (eg. MPI_Ssend (Synchronous) and MPI_Bsend (Asynchronous)).
- Asynchronous send is not completed until the message has started to be received.
- An asynchronous send completes as soon as the message has gone.
- Receives are usually synchronous - the receiving process must wait until the message arrives.
Communication types
- Point to point - single point transfer call.
- Broadcast - all data is transmitted to all processes.
- Scatter/Gather data - parts of the data are sent to each process via a MPI_scatter call for processing. Then a MPI_gather call to bring the data back to a root process.
- Reduction - Combine data from several processes to form a single result (ie. form a global sum, product, max, min, etc.).
Communication considerations
- Sends and receive calls must match. If these are not there is a danger of deadlock and your program may stall!
- Most programs do not need to be complicated and scientific codes have a simple structure which in turn have simple communication patterns.
- Use collective communication.
Program Examples
C Example
#include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { int rank; int buf; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if(rank == 0) { buf = 777; MPI_Bcast(&buf, 1, MPI_INT, 0, MPI_COMM_WORLD); } else { MPI_Recv(&buf, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); printf("rank %d receiving received %d\n", rank, buf); } MPI_Finalize(); return 0; }
Fortran example
program hello include 'mpif.h' integer rank, size, ierror, tag, status(MPI_STATUS_SIZE) call MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) print*, 'node', rank, ': Hello world' call MPI_FINALIZE(ierror) end
Python example
#!/usr/bin/env python from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() if rank == 0: data = {'key1' : [7, 2.72, 2+3j], 'key2' : ( 'abc', 'xyz')} else: data = None data = comm.bcast(data, root=0) if rank != 0: print ("data is %s and %d" % (data,rank)) else: print ("I am master\n")
Modules Available
The following modules are available for OpenMPI:
- module add gcc/8.2.0 (GNU compiler)
- module add intel/2018 (Intel compiler)
- module add openmpi/3.0.0/gcc-8.2.0
- module add intel/mpi/64/2018
Compilation
C
[username@login01 ~]$ module add gcc/8.2.0 [username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0 [username@login01 ~]$ gcc -o testMPI testMPI.c
Fortran
[username@login01 ~]$ module add gcc/8.2.0 [username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0 [username@login01 ~]$ mpifort -o testMPI testMPI.f03
Note: mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v3.0.0
Usage Examples
Batch Submission
#!/bin/bash #SBATCH -J MPI-testXX #SBATCH -N 10 #SBATCH --ntasks-per-node 28 #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p compute #SBATCH --exclusive #SBATCH --mail-user= your email address here echo $SLURM_JOB_NODELIST module purge module add gcc/8.2.0 module add openmpi/3.0.0/gcc-8.2.0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no mpirun -mca pml cm -mca mtl psm2 /home/user/CODE_SAMPLES/OPENMPI/scatteravg 100
[username@login01 ~]$ sbatch MPI-demo.job Submitted batch job 289523
Further Information
- http://mpitutorial.com/tutorials/
- OpenMPI (Wiki)
- https://en.wikipedia.org/wiki/Open_MPI
- https://www.open-mpi.org/
- C Programming
- C++ Programming
- Fortran Programming
- Python Programming