Difference between revisions of "TensorflowforGPU"
From HPC
m (→Batch Job) |
m (→Building a Virtual Environment) |
||
Line 20: | Line 20: | ||
[pysdlb@login01 ~]$ conda activate tensorflow01 | [pysdlb@login01 ~]$ conda activate tensorflow01 | ||
− | (tensorflow01) [pysdlb@login01 ~]$ | + | (tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu |
## Package Plan ## | ## Package Plan ## |
Revision as of 08:49, 21 April 2023
Contents
Introduction
This page is specifically for people intending to use the TensorFlow package on a GPU-based node, this will also touch on the package Pytorch as well.
Building a Virtual Environment
To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU.
[pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9 [pysdlb@login01 ~]$ conda create -n tensorflow01 Collecting package metadata (current_repodata.json): done Solving environment: done : :# To activate this environment, use # # $ conda activate tensorflow01 [pysdlb@login01 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu ## Package Plan ## environment location: /home/pysdlb/.conda/envs/tensorflow01 added / updated specs: - tensorflow-gpu The following packages will be downloaded: : : tensorboard conda-forge/noarch::tensorboard-2.6.0-pyhd8ed1ab_1 tensorboard-data-~ conda-forge/linux-64::tensorboard-data-server-0.6.1-py39hd97740a_4 tensorboard-plugi~ conda-forge/noarch::tensorboard-plugin-wit-1.8.1-pyhd8ed1ab_0 tensorflow conda-forge/linux-64::tensorflow-2.6.2-cuda112py39h9333c2f_1 tensorflow-base conda-forge/linux-64::tensorflow-base-2.6.2-cuda112py39he9472f8_1 tensorflow-estima~ conda-forge/linux-64::tensorflow-estimator-2.6.2-cuda112py39h9333c2f_1 tensorflow-gpu conda-forge/linux-64::tensorflow-gpu-2.6.2-cuda112py39h0bbbad9_1 termcolor conda-forge/noarch::termcolor-1.1.0-pyhd8ed1ab_3 tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 typing-extensions conda-forge/noarch::typing-extensions-3.7.4.3-0 typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0 tzdata conda-forge/noarch::tzdata-2023c-h71feb2d_0 urllib3 conda-forge/noarch::urllib3-1.26.15-pyhd8ed1ab_0 werkzeug conda-forge/noarch::werkzeug-2.2.3-pyhd8ed1ab_0 wheel conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0 wrapt conda-forge/linux-64::wrapt-1.12.1-py39h3811e60_3 xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 yarl conda-forge/linux-64::yarl-1.8.2-py39hb9d737c_0 zipp conda-forge/noarch::zipp-3.15.0-pyhd8ed1ab_0 zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 Proceed ([y]/n)? : : Preparing transaction: done Verifying transaction: done Executing transaction: \ : :
Interactive Session
(tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu Job ID 3678519 connecting to gpu03, please wait... Last login: Thu Apr 13 08:11:40 2023 from login01.cluster [pysdlb@gpu03 ~]$ module load cuda/10.1.168 [pysdlb@gpu03 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@gpu03 2_BasicModels]$ python logistic_regression.py 2023-04-21 09:25:02.678493: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2023-04-21 09:25:02.846792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: name: NVIDIA A40 major: 8 minor: 6 memoryClockRate(GHz): 1.74 pciBusID: 0000:02:00.0 totalMemory: 44.37GiB freeMemory: 521.56MiB 2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6) etc....
Batch Job
This is the associated batch file as well:
#!/bin/bash #SBATCH -J dlb-nodes #SBATCH -N 1 #SBATCH --ntasks-per-node 1 #SBATCH -D /home/<user> #SBATCH -o debug-rnn.out #SBATCH -e debug-rnn.err #SBATCH -p gpu #SBATCH --gres=gpu module load cuda/10.1.168 module load gcc/10.2.0 source activate /home/<user>/.conda/envs/tensorflow01 export PATH=/home/<user>/.conda/envs/tensorflow01/bin:${PATH} python logistic_regression.py
Running on a GPU
Because TensorFlow and Pytorch can run on a CPU as well as a GPU it is important to make sure the model is running on the GPU. Adding the following code snippets to your Python program is an important part of coding.
Tensorflow code
import tensorflow as tf if tf.test.gpu_device_name(): print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) else: print("Please install GPU version of TF")
Pytorch code
import torch # setting device on GPU if available, else CPU device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print('Using device:', device) print()