Difference between revisions of "Programming/Cuda"
From HPC
MSummerbell (talk | contribs) (→Programming example) |
MSummerbell (talk | contribs) (→Batch example) |
||
Line 74: | Line 74: | ||
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;"> | ||
− | |||
#!/bin/bash | #!/bin/bash | ||
Line 89: | Line 88: | ||
/home/user/CUDA/testGPU | /home/user/CUDA/testGPU | ||
− | |||
</pre> | </pre> | ||
Revision as of 09:27, 8 February 2017
Contents
Programming Details
CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows you to program a CUDA-enabled graphics processing unit (GPU) for general purpose processing.
The CUDA platform is designed to work with programming languages such as C, C++, and Fortran
Programming example
#include <stdio.h> __global__ void saxpy(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int main(void) { int N = 1<<31; float *x, *y, *d_x, *d_y; x = (float*)malloc(N*sizeof(float)); y = (float*)malloc(N*sizeof(float)); cudaMalloc(&d_x, N*sizeof(float)); cudaMalloc(&d_y, N*sizeof(float)); for (int i = 0; i < N; i++) { x[i] = 1.0f; y[i] = 2.0f; } cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice); // Perform SAXPY on 1M elements saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y); cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost); float maxError = 0.0f; for (int i = 0; i < N; i++) maxError = max(maxError, abs(y[i]-4.0f)); printf("Max error: %fn", maxError); }
Modules Available
The following modules are available:
- module load cuda/6.5.14 (or)
- module load cuda/7.5.18
Compilation
The program would be compiled using NVIDIA's own compiler:
[username@login01 ~]$ module add cuda/7.5.18 [username@login01 ~]$ nvcc -o testGPU testGPU.cu
Usage Examples
Batch example
#!/bin/bash #SBATCH -J gpu-cuda #SBATCH -N 1 #SBATCH --ntasks-per-node 1 #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p gpu #SBATCH --gres=gpu:tesla #SBATCH --exclusive module add cuda/7.5.18 /home/user/CUDA/testGPU
[username@login01 ~]$ sbatch demoCUDA.job Submitted batch job 290552