Difference between revisions of "Programming/OpenACC"
(Created page with "==Introduction to openACC== ===Example openACC C/C++ code=== <pre> #pragma acc kernels { for (int i=0; i<N; i++) { x[i] = 1.0; y[i] = 2.0; } for (int i=0;...") |
m |
||
Line 1: | Line 1: | ||
==Introduction to openACC== | ==Introduction to openACC== | ||
+ | |||
+ | The first thing to realise when trying to port a code to a GPU is that they do not share the same memory as the CPU. In other words, a GPU does not have direct access to the host memory. The host memory is generally larger, but slower than the GPU memory. To use a GPU, data must therefore be transferred from the main program to the GPU through the PCI bus, which has a much lower bandwidth than either memories. This means that managing data transfer between the host and the GPU will be of paramount importance. Transferring the data and the code onto the device is called offloading. | ||
+ | |||
+ | OpenACC directives are much like OpenMP directives. They take the form of pragma in C/C++, and comments in Fortran. There are several advantages to using directives. First, since it involves very minor modifications to the code, changes can be done incrementally, one pragma at a time. This is especially useful for debugging purpose, since making a single change at a time allows one to quickly identify which change created a bug. Second, OpenACC support can be disabled at compile time. When OpenACC support is disabled, the pragma are considered comments, and ignored by the compiler. This means that a single source code can be used to compile both an accelerated version and a normal version. Third, since all of the offloading work is done by the compiler, the same code can be compiled for various accelerator types: GPUs, or CPUs. It also means that a new generation of devices only requires one to update the compiler, not to change the code. | ||
+ | |||
+ | In the following example, we take a code comprised of two loops | ||
+ | |||
===Example openACC C/C++ code=== | ===Example openACC C/C++ code=== |
Revision as of 14:43, 24 October 2018
Introduction to openACC
The first thing to realise when trying to port a code to a GPU is that they do not share the same memory as the CPU. In other words, a GPU does not have direct access to the host memory. The host memory is generally larger, but slower than the GPU memory. To use a GPU, data must therefore be transferred from the main program to the GPU through the PCI bus, which has a much lower bandwidth than either memories. This means that managing data transfer between the host and the GPU will be of paramount importance. Transferring the data and the code onto the device is called offloading.
OpenACC directives are much like OpenMP directives. They take the form of pragma in C/C++, and comments in Fortran. There are several advantages to using directives. First, since it involves very minor modifications to the code, changes can be done incrementally, one pragma at a time. This is especially useful for debugging purpose, since making a single change at a time allows one to quickly identify which change created a bug. Second, OpenACC support can be disabled at compile time. When OpenACC support is disabled, the pragma are considered comments, and ignored by the compiler. This means that a single source code can be used to compile both an accelerated version and a normal version. Third, since all of the offloading work is done by the compiler, the same code can be compiled for various accelerator types: GPUs, or CPUs. It also means that a new generation of devices only requires one to update the compiler, not to change the code.
In the following example, we take a code comprised of two loops
Example openACC C/C++ code
#pragma acc kernels { for (int i=0; i<N; i++) { x[i] = 1.0; y[i] = 2.0; } for (int i=0; i<N; i++) { y[i] = a * x[i] + y[i]; } }
Example openACC Fortran code
!$acc kernels do i=1,N x(i) = 1.0 y(i) = 2.0 end do y(:) = a*x(:) + y(:) !$acc end kernels