Training/openMP
Contents
Introduction
openMP allows threaded programming across a shared memory system, so on our HPC this means utilizing more than one processing core across one computing node.
A shared memory computer consists of a number of processing cores together with some memory. Shared memory systems is a single address space across the whole memory system.
- every processing core can read and write all memory locations in the system
- one logical memory space
- all cores refer to a memory location using the same address
Programming model
Within the idea of the shared memory model we use the idea of threads which can share memory by all the other threads. These also have the following characteristics:
- Private data can only be accessed by the thread owning it
- Each thread can run simultaneous to other threads but also asynchronously, so we need to be careful of race conditions.
- Usually we have one thread per processing core, although there maybe hardware support for more (e.g. hyper-threading)
Thread Synchronization
As previously mention threads execute asynchronous;y, which means each thread proceeds through program instruction independently of other threads.
Although this makes for a very flexible system we must be very careful about the actions on shared variables occur in the correct order.
- e.g. If we access a variable to read on thread 1 before thread 2 has written to it we will cause a program crash, likewise is updates to shared variables are accessed by different threads at the same time, one of the updates may get overwritten.
To prevent this happen we must either use variables that are independent of the different threads (ie different parts of an array) or perform some sort of synchronization within the code so different threads get to the same point at the same time.
First threaded program
Creating the most basic C program would be like the following:
#include<stdio.h> int main() { printf(“ hello world\n”); return 0; }
To thread this we must tell the compiler which parts of the program to make into threads
#include<omp.h> #include<stdio.h> int main() { #pragma omp parallel { printf(“hello "); printf("world\n”); } return 0; }
Lets look at the extra components to make this a parallel threaded program
- We have an openMP include file (#include <omp.h>)
- We use the #pragma omp parallel which tells the compiler the following region within the { } is the going to be executed as threads
To compile this we use the command:
$ gcc -fopenmp myprogram.c -o myprogram ( for the gcc compiler), or
$ icc -fopenmp myprogram.c -o myprogram (for the Intel compiler)
And when we run this we would get something like the following
$ ./myprogram hello hello world world hello hello world world
- Not very coherent but remember the threads all executed at different times, that is asynchronously and this is why we must be very careful about communicating between different threads of the same program.
Parallel loops
Loops are the main source of parallelism in many applications. If the iterations of a loop are independent (can be done in any order) then we can share out the iterations between different threads.
- e.g. if we have two threads and the loop
for (i=0; i<100; i++) { a[i] += b[i]; }
we could do iteration 0-49 on one thread and iterations 50-99 on the other. We can think of an iteration, or a set of iterations, as a task.