Difference between revisions of "Programming/Cuda"

From HPC
Jump to: navigation , search
(Batch example)
m
Line 3: Line 3:
 
CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows you to program a CUDA-enabled graphics processing unit (GPU) for general purpose processing.
 
CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows you to program a CUDA-enabled graphics processing unit (GPU) for general purpose processing.
  
The CUDA platform is designed to work with programming languages such as C, C++, and Fortran
+
The CUDA platform is designed to work with programming languages such as [[programming/C|C]], C++, and [[programming/Fortran|Fortran]].
  
  
Line 9: Line 9:
  
  
<pre style="background-color: #C8C8C8; color: black; border: 2px solid #C8C8C8; font-family: monospace, sans-serif;">
+
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
 
#include <stdio.h>
 
#include <stdio.h>
  
Line 56: Line 56:
 
The following modules are available:
 
The following modules are available:
  
* module load cuda/6.5.14 (or)
+
* module add cuda/6.5.14 (or)
* module load cuda/7.5.18
+
* module add cuda/7.5.18
  
  
Line 73: Line 73:
 
=== Batch example ===
 
=== Batch example ===
  
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
+
 
 +
<pre style="background-color: #C8C8C8; color: black; border: 2px solid #C8C8C8; font-family: monospace, sans-serif;">
 +
 
 
#!/bin/bash
 
#!/bin/bash
  

Revision as of 11:45, 8 February 2017

Programming Details

CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows you to program a CUDA-enabled graphics processing unit (GPU) for general purpose processing.

The CUDA platform is designed to work with programming languages such as C, C++, and Fortran.


Programming example

#include <stdio.h>

__global__
void saxpy(int n, float a, float *x, float *y)
{
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n)
        y[i] = a*x[i] + y[i];
}

int main(void)
{
  int N = 1<<31;

  float *x, *y, *d_x, *d_y;
  x = (float*)malloc(N*sizeof(float));
  y = (float*)malloc(N*sizeof(float));

  cudaMalloc(&d_x, N*sizeof(float));
  cudaMalloc(&d_y, N*sizeof(float));

        for (int i = 0; i < N; i++)
        {
                x[i] = 1.0f;
                y[i] = 2.0f;
        }

  cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

  // Perform SAXPY on 1M elements
  saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);

  cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);

  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = max(maxError, abs(y[i]-4.0f));
  printf("Max error: %fn", maxError);
}

Modules Available

The following modules are available:

  • module add cuda/6.5.14 (or)
  • module add cuda/7.5.18


Compilation

The program would be compiled using NVIDIA's own compiler:

[username@login01 ~]$ module add cuda/7.5.18
[username@login01 ~]$ nvcc -o testGPU testGPU.cu

Usage Examples

Batch example


#!/bin/bash

#SBATCH -J gpu-cuda
#SBATCH -N 1
#SBATCH --ntasks-per-node 1
#SBATCH -o %N.%j.%a.out
#SBATCH -e %N.%j.%a.err
#SBATCH -p gpu
#SBATCH --gres=gpu:tesla
#SBATCH --exclusive

module add cuda/7.5.18

/home/user/CUDA/testGPU


[username@login01 ~]$ sbatch demoCUDA.job
Submitted batch job 290552

Further Information