TensorflowforGPU
From HPC
Introduction
This page is specifically for people intending to use the TensorFlow package on a GPU-based node, this will also touch on the package Pytorch as well.
Building a Virtual Environment
To build a virtual environment for the GPU nodes you must specify the packages that will run with a GPU.
[pysdlb@login01 ~]$ module load python/anaconda/20220712/3.9 [pysdlb@login01 ~]$ conda create -n tensorflow01 Collecting package metadata (current_repodata.json): done Solving environment: done : :# To activate this environment, use # # $ conda activate tensorflow01 [pysdlb@login01 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@login01 ~]$ ## Package Plan ## environment location: /home/pysdlb/.conda/envs/tensorflow01 added / updated specs: - tensorflow-gpu The following packages will be downloaded: : : tensorboard conda-forge/noarch::tensorboard-2.6.0-pyhd8ed1ab_1 tensorboard-data-~ conda-forge/linux-64::tensorboard-data-server-0.6.1-py39hd97740a_4 tensorboard-plugi~ conda-forge/noarch::tensorboard-plugin-wit-1.8.1-pyhd8ed1ab_0 tensorflow conda-forge/linux-64::tensorflow-2.6.2-cuda112py39h9333c2f_1 tensorflow-base conda-forge/linux-64::tensorflow-base-2.6.2-cuda112py39he9472f8_1 tensorflow-estima~ conda-forge/linux-64::tensorflow-estimator-2.6.2-cuda112py39h9333c2f_1 tensorflow-gpu conda-forge/linux-64::tensorflow-gpu-2.6.2-cuda112py39h0bbbad9_1 termcolor conda-forge/noarch::termcolor-1.1.0-pyhd8ed1ab_3 tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 typing-extensions conda-forge/noarch::typing-extensions-3.7.4.3-0 typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0 tzdata conda-forge/noarch::tzdata-2023c-h71feb2d_0 urllib3 conda-forge/noarch::urllib3-1.26.15-pyhd8ed1ab_0 werkzeug conda-forge/noarch::werkzeug-2.2.3-pyhd8ed1ab_0 wheel conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0 wrapt conda-forge/linux-64::wrapt-1.12.1-py39h3811e60_3 xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 yarl conda-forge/linux-64::yarl-1.8.2-py39hb9d737c_0 zipp conda-forge/noarch::zipp-3.15.0-pyhd8ed1ab_0 zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 Proceed ([y]/n)? : : Preparing transaction: done Verifying transaction: done Executing transaction: \ : : (tensorflow01) [pysdlb@login01 ~]$ interactive -pgpu Job ID 3678519 connecting to gpu03, please wait... Last login: Thu Apr 13 08:11:40 2023 from login01.cluster [pysdlb@gpu03 ~]$ module load cuda/10.1.168 [pysdlb@gpu03 ~]$ conda activate tensorflow01 (tensorflow01) [pysdlb@gpu03 2_BasicModels]$ python logistic_regression.py 2023-04-21 09:25:02.678493: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2023-04-21 09:25:02.846792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: name: NVIDIA A40 major: 8 minor: 6 memoryClockRate(GHz): 1.74 pciBusID: 0000:02:00.0 totalMemory: 44.37GiB freeMemory: 521.56MiB 2023-04-21 09:25:02.846830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA A40, pci bus id: 0000:02:00.0, compute capability: 8.6) etc....