Difference between revisions of "Programming/OpenCL"
m |
|||
Line 36: | Line 36: | ||
==External Links== | ==External Links== | ||
− | * | + | * [https://handsonopencl.github.io/ OpenCL tutorial] |
+ | * [https://www.nersc.gov/assets/pubs_presos/MattsonTutorialSC14.pdf OpenCL course] | ||
{| | {| |
Revision as of 10:32, 7 February 2019
Contents
Introduction to openCL
OpenCL programming
The first thing to realise when trying to port a code to a GPU is that they do not share the same memory as the CPU. In other words, a GPU does not have direct access to the host memory. The host memory is generally larger, but slower than the GPU memory. To use a GPU, data must therefore be transferred from the main program to the GPU through the PCI bus, which has a much lower bandwidth than either memories. This means that managing data transfer between the host and the GPU will be of paramount importance. Transferring the data and the code onto the device is called offloading.
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages (based on C99 and C++11) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
OpenCL vs CUDA
From one point of view these two paradigms look quite similar, however there are some differences
- CUDA is mature and efficient, it has many tools and libraries.
However, it is only usable by NVIDIA GPU architectures.
- OpenCL designed for various different processors including AMD/NVIDIA CPU/GPUs, DSPs and FPGAs (many platforms heterogeneous).
However, it not as mature and as widely used as CUDA C and programming appears to be verbose compared to CUDA.
In the following example, we take a code comprised of two loops
Example openACC C/C++ code
</pre>