Training/openMP

Introduction

openMP allows threaded programming across a shared memory system, so on our HPC this means utilizing more than one processing core across one computing node.

A shared memory computer consists of a number of processing cores together with some memory. Shared memory systems is a single address space across the whole memory system.

every processing core can read and write all memory locations in the system
one logical memory space
all cores refer to a memory location using the same address

Programming model

Within the idea of the shared memory model we use the idea of threads which can share memory by all the other threads. These also have the following characteristics:

Private data can only be accessed by the thread owning it
Each thread can run simultaneous to other threads but also asynchronously, so we need to be careful of race conditions.
Usually we have one thread per processing core, although there maybe hardware support for more (e.g. hyper-threading)

Thread Synchronization

As previously mention threads execute asynchronous;y, which means each thread proceeds through program instruction independently of other threads.

Although this makes for a very flexible system we must be very careful about the actions on shared variables occur in the correct order.

e.g. If we access a variable to read on thread 1 before thread 2 has written to it we will cause a program crash, likewise is updates to shared variables are accessed by different threads at the same time, one of the updates may get overwritten.

To prevent this happen we must either use variables that are independent of the different threads (ie different parts of an array) or perform some sort of synchronization within the code so different threads get to the same point at the same time.

First threaded program

Creating the most basic C program would be like the following:

#include<stdio.h>
int main()
{
    printf(“ hello world\n”);
    return 0;   
}

To thread this we must tell the compiler which parts of the program to make into threads

#include<omp.h>
#include<stdio.h>
int main()
{
    #pragma omp parallel
    {
        printf(“hello ");
        printf("world\n”);
    }
    return 0;   
}

Lets look at the extra components to make this a parallel threaded program

We have an openMP include file (#include <omp.h>)
We use the #pragma omp parallel which tells the compiler the following region within the { } is the going to be executed as threads

To compile this we use the command:

$ gcc -fopenmp myprogram.c -o myprogram ( for the gcc compiler), or

$ icc -fopenmp myprogram.c -o myprogram (for the Intel compiler)

And when we run this we would get something like the following

$ ./myprogram
hello hello world
world
hello hello world
world

Not very coherent but remember the threads all executed at different times, that is asynchronously and this is why we must be very careful about communicating between different threads of the same program.

Parallel loops

Loops are the main source of parallelism in many applications. If the iterations of a loop are independent (can be done in any order) then we can share out the iterations between different threads.

e.g. if we have two threads and the loop

for (i=0; i<100; i++)
{
    a[i] += b[i];
}

we could do iteration 0-49 on one thread and iterations 50-99 on the other. We can think of an iteration, or a set of iterations, as a task.

HPC

Training/openMP

Contents

Introduction

Programming model

Thread Synchronization

First threaded program

Parallel loops

Further Information

Navigation

Support Areas

Research

Tools