Difference between revisions of "Training/openMP"
m |
m |
||
Line 104: | Line 104: | ||
All threads again wait of the end of the closing parallel region brace to finish before proceeding (i.e. a synchronization barrier). | All threads again wait of the end of the closing parallel region brace to finish before proceeding (i.e. a synchronization barrier). | ||
+ | |||
+ | In this program we always expect 4 threads to be given to us by the underlying operating system. Unfortunately this may not happen and we will allocated what the scheduler is prepared to offer. This could cause us serious program difficulties if we rely on a fixed number of threads every time. | ||
+ | |||
+ | We must call the openMP library (at runtime) how many threads we actually got, this is done with the following code: | ||
+ | |||
+ | <pre> | ||
+ | #include<omp.h> | ||
+ | #include<stdio.h> | ||
+ | int main() | ||
+ | { | ||
+ | double A[1000]; | ||
+ | omp_set_num_threads(4); | ||
+ | #pragma omp parallel | ||
+ | { | ||
+ | int ID = omp_get_thread_num(); | ||
+ | int nthrds = omp_get_num_threads(); | ||
+ | pooh(ID, A); | ||
+ | } | ||
+ | } | ||
+ | </pre> | ||
+ | |||
+ | * Each thread calls pooh(ID,A) for ID = 0 to nthrds-1 | ||
==Parallel loops== | ==Parallel loops== |
Revision as of 08:26, 24 April 2018
Contents
Introduction
openMP allows threaded programming across a shared memory system, so on our HPC this means utilizing more than one processing core across one computing node.
A shared memory computer consists of a number of processing cores together with some memory. Shared memory systems is a single address space across the whole memory system.
- every processing core can read and write all memory locations in the system
- one logical memory space
- all cores refer to a memory location using the same address
Programming model
Within the idea of the shared memory model we use the idea of threads which can share memory by all the other threads. These also have the following characteristics:
- Private data can only be accessed by the thread owning it
- Each thread can run simultaneous to other threads but also asynchronously, so we need to be careful of race conditions.
- Usually we have one thread per processing core, although there maybe hardware support for more (e.g. hyper-threading)
Thread Synchronization
As previously mention threads execute asynchronous;y, which means each thread proceeds through program instruction independently of other threads.
Although this makes for a very flexible system we must be very careful about the actions on shared variables occur in the correct order.
- e.g. If we access a variable to read on thread 1 before thread 2 has written to it we will cause a program crash, likewise is updates to shared variables are accessed by different threads at the same time, one of the updates may get overwritten.
To prevent this happen we must either use variables that are independent of the different threads (ie different parts of an array) or perform some sort of synchronization within the code so different threads get to the same point at the same time.
First threaded program
Creating the most basic C program would be like the following:
#include<stdio.h> int main() { printf(“ hello world\n”); return 0; }
To thread this we must tell the compiler which parts of the program to make into threads
#include<omp.h> #include<stdio.h> int main() { #pragma omp parallel { printf(“hello "); printf("world\n”); } return 0; }
Lets look at the extra components to make this a parallel threaded program
- We have an openMP include file (#include <omp.h>)
- We use the #pragma omp parallel which tells the compiler the following region within the { } is the going to be executed as threads
To compile this we use the command:
$ gcc -fopenmp myprogram.c -o myprogram ( for the gcc compiler), or
$ icc -fopenmp myprogram.c -o myprogram (for the Intel compiler)
And when we run this we would get something like the following
$ ./myprogram hello hello world world hello hello world world
- Not very coherent but remember the threads all executed at different times, that is asynchronously and this is why we must be very careful about communicating between different threads of the same program.
Second threaded program
Although the previous program is threaded it does not represent a real world example
#include<omp.h> #include<stdio.h> int main() { double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_get_thread_num(); pooh(ID, A); } }
Here each thread executes the same code independently, the only thing different is the omp thread ID is also passed to each call of pooh(ID,A).
All threads again wait of the end of the closing parallel region brace to finish before proceeding (i.e. a synchronization barrier).
In this program we always expect 4 threads to be given to us by the underlying operating system. Unfortunately this may not happen and we will allocated what the scheduler is prepared to offer. This could cause us serious program difficulties if we rely on a fixed number of threads every time.
We must call the openMP library (at runtime) how many threads we actually got, this is done with the following code:
#include<omp.h> #include<stdio.h> int main() { double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_get_thread_num(); int nthrds = omp_get_num_threads(); pooh(ID, A); } }
- Each thread calls pooh(ID,A) for ID = 0 to nthrds-1
Parallel loops
Loops are the main source of parallelism in many applications. If the iterations of a loop are independent (can be done in any order) then we can share out the iterations between different threads.
- e.g. if we have two threads and the loop
for (i=0; i<100; i++) { a[i] += b[i]; }
we could do iteration 0-49 on one thread and iterations 50-99 on the other. We can think of an iteration, or a set of iterations, as a task.