Difference between revisions of "Programming/Cuda"
From HPC
m |
m (Pysdlb moved page Cuda to Programming/Cuda without leaving a redirect) |
(No difference)
| |
Revision as of 15:55, 30 January 2017
Contents
Programming Details
CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows you to program a CUDA-enabled graphics processing unit (GPU) for general purpose processing.
The CUDA platform is designed to work with programming languages such as C, C++, and Fortran
Programming example
#include <stdio.h>
__global__
void saxpy(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n)
y[i] = a*x[i] + y[i];
}
int main(void)
{
int N = 1<<31;
float *x, *y, *d_x, *d_y;
x = (float*)malloc(N*sizeof(float));
y = (float*)malloc(N*sizeof(float));
cudaMalloc(&d_x, N*sizeof(float));
cudaMalloc(&d_y, N*sizeof(float));
for (int i = 0; i < N; i++)
{
x[i] = 1.0f;
y[i] = 2.0f;
}
cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
// Perform SAXPY on 1M elements
saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = max(maxError, abs(y[i]-4.0f));
printf("Max error: %fn", maxError);
}
Compilation
The program would be compiled in the following way, optional Intel compiler available too:
module load cuda/7.5.18 nvcc -o testGPU testGPU.cu
Usage Examples
Batch example
#!/bin/bash #SBATCH -J gpu-cuda #SBATCH -N 1 #SBATCH --ntasks-per-node 1 #SBATCH -D /home/user/CUDA #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p gpu #SBATCH --gres=gpu:tesla #SBATCH --exclusive module load cuda/7.5.18 /home/user/CUDA/testGPU
[username@login01 ~]$ sbatch demoCUDA.job Submitted batch job 290552