Difference between revisions of "TensorflowforGPU"

From HPC
Jump to: navigation , search
m (Batch Job)
m (Building a Virtual Environment)
Line 20: Line 20:
  
 
[pysdlb@login01 ~]$ conda activate tensorflow01
 
[pysdlb@login01 ~]$ conda activate tensorflow01
(tensorflow01) [pysdlb@login01 ~]$
+
(tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu
  
 
## Package Plan ##
 
## Package Plan ##

Revision as of 08:49, 21 April 2023

Introduction

This page is specifically for people intending to use the TensorFlow package on a GPU-based node, this will also touch on the package Pytorch as well.

Building a Virtual Environment

To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU.

[pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9
[pysdlb@login01 ~]$ conda create -n tensorflow01
Collecting package metadata (current_repodata.json): done
Solving environment: done
:
:# To activate this environment, use
#
#     $ conda activate tensorflow01

[pysdlb@login01 ~]$ conda activate tensorflow01
(tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu

## Package Plan ##

  environment location: /home/pysdlb/.conda/envs/tensorflow01

  added / updated specs:
    - tensorflow-gpu


The following packages will be downloaded:
:
:
  tensorboard        conda-forge/noarch::tensorboard-2.6.0-pyhd8ed1ab_1
  tensorboard-data-~ conda-forge/linux-64::tensorboard-data-server-0.6.1-py39hd97740a_4
  tensorboard-plugi~ conda-forge/noarch::tensorboard-plugin-wit-1.8.1-pyhd8ed1ab_0
  tensorflow         conda-forge/linux-64::tensorflow-2.6.2-cuda112py39h9333c2f_1
  tensorflow-base    conda-forge/linux-64::tensorflow-base-2.6.2-cuda112py39he9472f8_1
  tensorflow-estima~ conda-forge/linux-64::tensorflow-estimator-2.6.2-cuda112py39h9333c2f_1
  tensorflow-gpu     conda-forge/linux-64::tensorflow-gpu-2.6.2-cuda112py39h0bbbad9_1
  termcolor          conda-forge/noarch::termcolor-1.1.0-pyhd8ed1ab_3
  tk                 conda-forge/linux-64::tk-8.6.12-h27826a3_0
  typing-extensions  conda-forge/noarch::typing-extensions-3.7.4.3-0
  typing_extensions  conda-forge/noarch::typing_extensions-3.7.4.3-py_0
  tzdata             conda-forge/noarch::tzdata-2023c-h71feb2d_0
  urllib3            conda-forge/noarch::urllib3-1.26.15-pyhd8ed1ab_0
  werkzeug           conda-forge/noarch::werkzeug-2.2.3-pyhd8ed1ab_0
  wheel              conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0
  wrapt              conda-forge/linux-64::wrapt-1.12.1-py39h3811e60_3
  xz                 conda-forge/linux-64::xz-5.2.6-h166bdaf_0
  yarl               conda-forge/linux-64::yarl-1.8.2-py39hb9d737c_0
  zipp               conda-forge/noarch::zipp-3.15.0-pyhd8ed1ab_0
  zlib               conda-forge/linux-64::zlib-1.2.13-h166bdaf_4


Proceed ([y]/n)?
:
:
Preparing transaction: done
Verifying transaction: done
Executing transaction: \
:
:

Interactive Session

(tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu
Job ID 3678519 connecting to gpu03, please wait...
Last login: Thu Apr 13 08:11:40 2023 from login01.cluster

[pysdlb@gpu03 ~]$ module load cuda/10.1.168
[pysdlb@gpu03 ~]$ conda activate tensorflow01

(tensorflow01) [pysdlb@gpu03 2_BasicModels]$ python logistic_regression.py

2023-04-21 09:25:02.678493: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2023-04-21 09:25:02.846792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA A40 major: 8 minor: 6 memoryClockRate(GHz): 1.74
pciBusID: 0000:02:00.0
totalMemory: 44.37GiB freeMemory: 521.56MiB
2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6)

etc....

Batch Job

This is the associated batch file as well:



#!/bin/bash
#SBATCH -J dlb-nodes
#SBATCH -N 1
#SBATCH --ntasks-per-node 1
#SBATCH -D /home/<user>
#SBATCH -o debug-rnn.out
#SBATCH -e debug-rnn.err
#SBATCH -p gpu
#SBATCH --gres=gpu


module load cuda/10.1.168
module load gcc/10.2.0

source activate /home/<user>/.conda/envs/tensorflow01
export PATH=/home/<user>/.conda/envs/tensorflow01/bin:${PATH}

python logistic_regression.py

Running on a GPU

Because TensorFlow and Pytorch can run on a CPU as well as a GPU it is important to make sure the model is running on the GPU. Adding the following code snippets to your Python program is an important part of coding.

Tensorflow code

import tensorflow as tf 

if tf.test.gpu_device_name(): 

    print('Default GPU Device:

    {}'.format(tf.test.gpu_device_name()))

else:

   print("Please install GPU version of TF")

Pytorch code


import torch
# setting device on GPU if available, else CPU

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

Further Information