Difference between revisions of "Programming/OpenMPI"

From HPC
Jump to: navigation , search
m
m (C Example)
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Programming Details ==
 
== Programming Details ==
  
'''MPI''' defines not only point-to-point communication (e.g., send and receive), it also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in a single communication action. Reliable broadcast, for example, is where one process has a message at the beginning of the operation, and at the end of the operation, all processes in a group have the message.  
+
'''MPI''' defines not only point-to-point communication (e.g., send and receive), but also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in single communication action. Reliable broadcast, for example, is where one process has a message at the beginning of the operation, and at the end of the operation, all processes in a group have the message.  
  
Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was specifically designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc.  
+
Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was explicitly designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc.  
  
 
The Open MPI code has 3 major code modules:
 
The Open MPI code has 3 major code modules:
Line 18: Line 18:
 
* Message-Passing Parallel Programming
 
* Message-Passing Parallel Programming
  
The message passing model can be thought of a process together with the program's own data and the parallelism is achieved by having each of these processes co-operate on the same task. This model also has some limitations these are:
+
The message-passing model can be thought of as a process together with the program's own data and the parallelism is achieved by having each of these processes cooperate on the same task. This model also has some limitations these are:
  
 
* All variables are private to each process.
 
* All variables are private to each process.
 
* All communication between each process by sending and receiving messages (hence the OpenMPI name).
 
* All communication between each process by sending and receiving messages (hence the OpenMPI name).
* Most message passing programs use the Single-Program-Multiple-Data (SPMD) model.
+
* Most message-passing programs use the Single-Program-Multiple-Data (SPMD) model.
* It is possible to run an MPI type program on one or more nodes, although if your program is only ever intended to run on one node you should consider openMP instead here.
+
* It is possible to run an MPI-type program on one or more nodes, although if your program is only ever intended to run on one node you should consider openMP instead here.
  
 
Below is a data diagram of OpenMPI:
 
Below is a data diagram of OpenMPI:
Line 32: Line 32:
 
====Communication modes====
 
====Communication modes====
  
* Sending a message can either be synchronous or asynchronous.
+
* Sending a message can either be synchronous or asynchronous. (eg. '''MPI_Ssend''' (Synchronous) and '''MPI_Bsend''' (Asynchronous)).
* A synchronous send is not completed until the message has started to be received.
+
* Asynchronous send is not completed until the message has started to be received.
 
* An asynchronous send completes as soon as the message has gone.
 
* An asynchronous send completes as soon as the message has gone.
 
* Receives are usually synchronous - the receiving process must wait until the message arrives.
 
* Receives are usually synchronous - the receiving process must wait until the message arrives.
Line 41: Line 41:
 
* ''Point to point'' - single point transfer call.
 
* ''Point to point'' - single point transfer call.
 
* ''Broadcast'' - all data is transmitted to all processes.
 
* ''Broadcast'' - all data is transmitted to all processes.
* ''Scatter/Gather data'' - parts of the data are sent to each process via a MPI_scatter call for processing. Then a MPI_gather call to bring the data back to a root process.
+
* ''Scatter/Gather data'' - parts of the data are sent to each process via a '''MPI_scatter''' call for processing. Then a '''MPI_gather''' call brings the data back to a root process.
 
* ''Reduction'' - Combine data from several processes to form a single result (ie. form a global sum, product, max, min, etc.).
 
* ''Reduction'' - Combine data from several processes to form a single result (ie. form a global sum, product, max, min, etc.).
  
 
====Communication considerations====
 
====Communication considerations====
  
* Sends and receive call must match. If these are not there is a danger of deadlock and your program may stall!
+
* Sends and receives calls must match. If these are not there is a danger of deadlock and your program may stall!
* Most programs do not need to be complicated and scientific codes have a simple structure which in turn have simple communication patterns.
+
* Most programs do not need to be complicated and scientific codes have a simple structure which in turn has simple communication patterns.
 
* Use collective communication.
 
* Use collective communication.
  
Line 75: Line 75:
 
         else
 
         else
 
         {
 
         {
                 MPI_Recv(&buf, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
+
                 MPI_Bcast(&buf, 1, MPI_INT, 0, MPI_COMM_WORLD, &status);
 
                 printf("rank %d receiving received %d\n", rank, buf);
 
                 printf("rank %d receiving received %d\n", rank, buf);
 
         }
 
         }
Line 131: Line 131:
 
The following modules are available for OpenMPI:
 
The following modules are available for OpenMPI:
  
* module add gcc/4.9.3 (GNU compiler)
+
* module add gcc/8.2.0 (GNU compiler)
* module add intel/compiler/64/2016.2.181 (Intel compiler)
+
* module add intel/2018 (Intel compiler)
  
* module add openmpi/gcc/1.10.2
+
* module add openmpi/3.0.0/gcc-8.2.0
* module add openmpi/gcc/1.10.5
+
* module add intel/mpi/64/2018
* module add openmpi/intel/1.10.2
 
* module add openmpi/intel/1.8.8
 
* module add openmpi/intel/2.0.1
 
  
  
Line 147: Line 144:
 
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
  
[username@login01 ~]$ module add gcc/4.9.3
+
[username@login01 ~]$ module add gcc/8.2.0
[username@login01 ~]$ module add openmpi/gcc/1.10.5
+
[username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0
 
[username@login01 ~]$ gcc -o testMPI testMPI.c
 
[username@login01 ~]$ gcc -o testMPI testMPI.c
  
Line 157: Line 154:
 
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
  
[username@login01 ~]$ module add gcc/4.9.3
+
[username@login01 ~]$ module add gcc/8.2.0
[username@login01 ~]$ module add openmpi/gcc/1.10.5
+
[username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0
 
[username@login01 ~]$ mpifort -o testMPI testMPI.f03
 
[username@login01 ~]$ mpifort -o testMPI testMPI.f03
  
 
</pre>
 
</pre>
  
'''Note''' : mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v1.7.
+
'''Note''': mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v3.0.0
  
  
Line 182: Line 179:
 
#SBATCH -p compute
 
#SBATCH -p compute
 
#SBATCH --exclusive
 
#SBATCH --exclusive
 +
#SBATCH --mail-user= your email address here
  
 
echo $SLURM_JOB_NODELIST
 
echo $SLURM_JOB_NODELIST
  
 
module purge
 
module purge
module add gcc/4.9.3
+
module add gcc/8.2.0
module add openmpi/gcc/1.10.5
+
module add openmpi/3.0.0/gcc-8.2.0
  
 
export I_MPI_DEBUG=5
 
export I_MPI_DEBUG=5
Line 203: Line 201:
 
</pre>
 
</pre>
  
== Further Information ==
+
== Next Steps ==
  
 +
* [http://mpitutorial.com/tutorials/ http://mpitutorial.com/tutorials/]
 
* [[applications/OpenMPI|OpenMPI (Wiki)]]
 
* [[applications/OpenMPI|OpenMPI (Wiki)]]
 
* [https://en.wikipedia.org/wiki/Open_MPI https://en.wikipedia.org/wiki/Open_MPI]
 
* [https://en.wikipedia.org/wiki/Open_MPI https://en.wikipedia.org/wiki/Open_MPI]
Line 212: Line 211:
 
* [[programming/Fortran|Fortran Programming]]
 
* [[programming/Fortran|Fortran Programming]]
 
* [[programming/Python|Python Programming]]
 
* [[programming/Python|Python Programming]]
{|
+
 
|style="width:5%; border-width: 0" | [[File:icon_home.png]]
+
 
|style="width:95%; border-width: 0" |
+
 
* [[Main_Page|Home]]
+
[[Applications/OpenMPI| Back to OpenMPI Application Page]]   /  [[Main Page]]   /  [[FurtherTopics/FurtherTopics #Modules| Further Topics]]
* [[Applications|Application support]]
 
* [[General|General]]
 
* [[Training|Training]]
 
* [[Programming|Programming support]]
 
|-
 
|}
 

Latest revision as of 15:22, 23 August 2023

Programming Details

MPI defines not only point-to-point communication (e.g., send and receive), but also defines other communication patterns, such as collective communication. Collective operations are where multiple processes are involved in single communication action. Reliable broadcast, for example, is where one process has a message at the beginning of the operation, and at the end of the operation, all processes in a group have the message.

Message-passing performance and resource utilization are the king and queen of high-performance computing. Open MPI was explicitly designed in such a way that it could operate at the very bleeding edge of high performance: incredibly low latencies for sending short messages, extremely high short message injection rates on supported networks, fast ramp-ups to maximum bandwidth for large messages, etc.

The Open MPI code has 3 major code modules:

  • OMPI - MPI code
  • ORTE - the Open Run-Time Environment
  • OPAL - the Open Portable Access Layer


Programming Models

When we look at programming models we consider 2 basic ideas:

  • Serial programming
  • Message-Passing Parallel Programming

The message-passing model can be thought of as a process together with the program's own data and the parallelism is achieved by having each of these processes cooperate on the same task. This model also has some limitations these are:

  • All variables are private to each process.
  • All communication between each process by sending and receiving messages (hence the OpenMPI name).
  • Most message-passing programs use the Single-Program-Multiple-Data (SPMD) model.
  • It is possible to run an MPI-type program on one or more nodes, although if your program is only ever intended to run on one node you should consider openMP instead here.

Below is a data diagram of OpenMPI:


MPI-01.jpg

Communication modes

  • Sending a message can either be synchronous or asynchronous. (eg. MPI_Ssend (Synchronous) and MPI_Bsend (Asynchronous)).
  • Asynchronous send is not completed until the message has started to be received.
  • An asynchronous send completes as soon as the message has gone.
  • Receives are usually synchronous - the receiving process must wait until the message arrives.

Communication types

  • Point to point - single point transfer call.
  • Broadcast - all data is transmitted to all processes.
  • Scatter/Gather data - parts of the data are sent to each process via a MPI_scatter call for processing. Then a MPI_gather call brings the data back to a root process.
  • Reduction - Combine data from several processes to form a single result (ie. form a global sum, product, max, min, etc.).

Communication considerations

  • Sends and receives calls must match. If these are not there is a danger of deadlock and your program may stall!
  • Most programs do not need to be complicated and scientific codes have a simple structure which in turn has simple communication patterns.
  • Use collective communication.


Program Examples

C Example


#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv)
{
        int rank;
        int buf;
        MPI_Status status;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        if(rank == 0)
        {
                buf = 777;
                MPI_Bcast(&buf, 1, MPI_INT, 0, MPI_COMM_WORLD);
        }
        else
        {
                MPI_Bcast(&buf, 1, MPI_INT, 0, MPI_COMM_WORLD, &status);
                printf("rank %d receiving received %d\n", rank, buf);
        }
        MPI_Finalize();
        return 0;
}

Fortran example


program hello
   include 'mpif.h'
   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
   
   call MPI_INIT(ierror)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   print*, 'node', rank, ': Hello world'
   call MPI_FINALIZE(ierror)
end

Python example


#!/usr/bin/env python

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
   data = {'key1' : [7, 2.72, 2+3j],
           'key2' : ( 'abc', 'xyz')}
else:
   data = None

data = comm.bcast(data, root=0)

if rank != 0:
        print ("data is %s and %d" % (data,rank))
else:
        print ("I am master\n")

Modules Available

The following modules are available for OpenMPI:

  • module add gcc/8.2.0 (GNU compiler)
  • module add intel/2018 (Intel compiler)
  • module add openmpi/3.0.0/gcc-8.2.0
  • module add intel/mpi/64/2018


Compilation

C


[username@login01 ~]$ module add gcc/8.2.0
[username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0
[username@login01 ~]$ gcc -o testMPI testMPI.c

Fortran


[username@login01 ~]$ module add gcc/8.2.0
[username@login01 ~]$ module add openmpi/3.0.0/gcc-8.2.0
[username@login01 ~]$ mpifort -o testMPI testMPI.f03

Note: mpifort is a new name for the Fortran wrapper compiler that debuted in Open MPI v3.0.0


Usage Examples

Batch Submission


#!/bin/bash
#SBATCH -J MPI-testXX
#SBATCH -N 10
#SBATCH --ntasks-per-node 28
#SBATCH -o %N.%j.%a.out
#SBATCH -e %N.%j.%a.err
#SBATCH -p compute
#SBATCH --exclusive
#SBATCH --mail-user= your email address here

echo $SLURM_JOB_NODELIST

module purge
module add gcc/8.2.0
module add openmpi/3.0.0/gcc-8.2.0

export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:tmi
export I_MPI_FALLBACK=no

mpirun -mca pml cm -mca mtl psm2 /home/user/CODE_SAMPLES/OPENMPI/scatteravg 100


[username@login01 ~]$ sbatch MPI-demo.job
Submitted batch job 289523

Next Steps


Back to OpenMPI Application Page / Main Page / Further Topics