Difference between revisions of "TensorflowforGPU"
From HPC
m (→Building a Virtual Environment) |
m (→Building a Virtual Environment) |
||
(7 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU. | To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU. | ||
− | <pre> | + | (''The : symbol refers to superfluous output not needed here''). |
+ | |||
+ | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
[pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9 | [pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9 | ||
[pysdlb@login01 ~]$ conda create -n tensorflow01 | [pysdlb@login01 ~]$ conda create -n tensorflow01 | ||
Line 20: | Line 22: | ||
[pysdlb@login01 ~]$ conda activate tensorflow01 | [pysdlb@login01 ~]$ conda activate tensorflow01 | ||
− | (tensorflow01) [pysdlb@login01 ~]$ | + | (tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu |
## Package Plan ## | ## Package Plan ## | ||
Line 67: | Line 69: | ||
===Interactive Session=== | ===Interactive Session=== | ||
− | <pre> | + | Using an interactive session is a good way of testing your neural net program and will show any issues. |
+ | |||
+ | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
(tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu | (tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu | ||
Job ID 3678519 connecting to gpu03, please wait... | Job ID 3678519 connecting to gpu03, please wait... | ||
Line 84: | Line 88: | ||
2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6) | 2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6) | ||
− | etc | + | etc... |
</pre> | </pre> | ||
===Batch Job=== | ===Batch Job=== | ||
− | <pre> | + | |
+ | This is the associated batch file which should be used for well testing programs: | ||
+ | |||
+ | |||
+ | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
+ | |||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -J dlb-nodes | #SBATCH -J dlb-nodes | ||
#SBATCH -N 1 | #SBATCH -N 1 | ||
#SBATCH --ntasks-per-node 1 | #SBATCH --ntasks-per-node 1 | ||
− | #SBATCH -D /home/ | + | #SBATCH -D /home/<user> |
#SBATCH -o debug-rnn.out | #SBATCH -o debug-rnn.out | ||
#SBATCH -e debug-rnn.err | #SBATCH -e debug-rnn.err | ||
Line 100: | Line 109: | ||
#SBATCH --gres=gpu | #SBATCH --gres=gpu | ||
− | + | module load python/anaconda/20220712/3.9 | |
module load cuda/10.1.168 | module load cuda/10.1.168 | ||
module load gcc/10.2.0 | module load gcc/10.2.0 | ||
+ | |||
+ | source activate /home/<user>/.conda/envs/tensorflow01 | ||
+ | export PATH=/home/<user>/.conda/envs/tensorflow01/bin:${PATH} | ||
python logistic_regression.py | python logistic_regression.py | ||
Line 114: | Line 126: | ||
===Tensorflow code=== | ===Tensorflow code=== | ||
− | <pre> | + | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> |
import tensorflow as tf | import tensorflow as tf | ||
Line 131: | Line 143: | ||
===Pytorch code=== | ===Pytorch code=== | ||
− | <pre> | + | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> |
import torch | import torch |
Latest revision as of 08:54, 21 April 2023
Contents
Introduction
This page is specifically for people intending to use the TensorFlow package on a GPU-based node, this will also touch on the package Pytorch as well.
Building a Virtual Environment
To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU.
(The : symbol refers to superfluous output not needed here).
[pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9 [pysdlb@login01 ~]$ conda create -n tensorflow01 Collecting package metadata (current_repodata.json): done Solving environment: done : :# To activate this environment, use # # $ conda activate tensorflow01 [pysdlb@login01 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@login01 ~]$ conda install tensorflow-gpu ## Package Plan ## environment location: /home/pysdlb/.conda/envs/tensorflow01 added / updated specs: - tensorflow-gpu The following packages will be downloaded: : : tensorboard conda-forge/noarch::tensorboard-2.6.0-pyhd8ed1ab_1 tensorboard-data-~ conda-forge/linux-64::tensorboard-data-server-0.6.1-py39hd97740a_4 tensorboard-plugi~ conda-forge/noarch::tensorboard-plugin-wit-1.8.1-pyhd8ed1ab_0 tensorflow conda-forge/linux-64::tensorflow-2.6.2-cuda112py39h9333c2f_1 tensorflow-base conda-forge/linux-64::tensorflow-base-2.6.2-cuda112py39he9472f8_1 tensorflow-estima~ conda-forge/linux-64::tensorflow-estimator-2.6.2-cuda112py39h9333c2f_1 tensorflow-gpu conda-forge/linux-64::tensorflow-gpu-2.6.2-cuda112py39h0bbbad9_1 termcolor conda-forge/noarch::termcolor-1.1.0-pyhd8ed1ab_3 tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 typing-extensions conda-forge/noarch::typing-extensions-3.7.4.3-0 typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0 tzdata conda-forge/noarch::tzdata-2023c-h71feb2d_0 urllib3 conda-forge/noarch::urllib3-1.26.15-pyhd8ed1ab_0 werkzeug conda-forge/noarch::werkzeug-2.2.3-pyhd8ed1ab_0 wheel conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0 wrapt conda-forge/linux-64::wrapt-1.12.1-py39h3811e60_3 xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 yarl conda-forge/linux-64::yarl-1.8.2-py39hb9d737c_0 zipp conda-forge/noarch::zipp-3.15.0-pyhd8ed1ab_0 zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 Proceed ([y]/n)? : : Preparing transaction: done Verifying transaction: done Executing transaction: \ : :
Interactive Session
Using an interactive session is a good way of testing your neural net program and will show any issues.
(tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu Job ID 3678519 connecting to gpu03, please wait... Last login: Thu Apr 13 08:11:40 2023 from login01.cluster [pysdlb@gpu03 ~]$ module load cuda/10.1.168 [pysdlb@gpu03 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@gpu03 2_BasicModels]$ python logistic_regression.py 2023-04-21 09:25:02.678493: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2023-04-21 09:25:02.846792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: name: NVIDIA A40 major: 8 minor: 6 memoryClockRate(GHz): 1.74 pciBusID: 0000:02:00.0 totalMemory: 44.37GiB freeMemory: 521.56MiB 2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6) etc...
Batch Job
This is the associated batch file which should be used for well testing programs:
#!/bin/bash #SBATCH -J dlb-nodes #SBATCH -N 1 #SBATCH --ntasks-per-node 1 #SBATCH -D /home/<user> #SBATCH -o debug-rnn.out #SBATCH -e debug-rnn.err #SBATCH -p gpu #SBATCH --gres=gpu module load python/anaconda/20220712/3.9 module load cuda/10.1.168 module load gcc/10.2.0 source activate /home/<user>/.conda/envs/tensorflow01 export PATH=/home/<user>/.conda/envs/tensorflow01/bin:${PATH} python logistic_regression.py
Running on a GPU
Because TensorFlow and Pytorch can run on a CPU as well as a GPU it is important to make sure the model is running on the GPU. Adding the following code snippets to your Python program is an important part of coding.
Tensorflow code
import tensorflow as tf if tf.test.gpu_device_name(): print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) else: print("Please install GPU version of TF")
Pytorch code
import torch # setting device on GPU if available, else CPU device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print('Using device:', device) print()