Quickstart/Virtual Environments

From HPC
Revision as of 10:09, 3 November 2022 by Pysdlb (talk | contribs) (How to use a Virtual Environment)

Jump to: navigation , search

What is a Virtual Environment?

A virtual environment is a named, isolated, working copy of Python that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. These can be created in standard Python and also Conda Python.

Why should you use a Virtual Environment?

Python has various modules and packages for different applications. During our project, it may require a third-party library, which we install. Another project also uses the same directory for retrieval and storage but doesn't require any other third-party packages.

So, the virtual environment can come into play and make a separate isolated environment for both projects, and each project can store and retrieve packages from their specific environment.

Also, let us consider another case where we are creating a Deep learning project using Tensorflow. Suppose you are working on two projects project-01 and project-02.

If project-01 uses Tensorflow-2.0 and project2 uses Tensorflow-2.6, they would be stored in the same directory with the same name, and the error may occur. Then, in such cases, virtual environments can be really helpful for you to maintain the dependencies of both the projects.

How to use a Virtual Environment

Creating a virtual environment is simple and will save you a considerable amount of time and remove a lot of pitfalls as you use the Python libraries.

We will use the following steps

  1. Start the python module
  2. Create the environment
  3. Adding python packages
  4. Using the environment in an interactive session or as a submitted batch job.


Environment

Let's load up a python module and build our Virtual Environment (VE) from there.


Creation of Virtual Environment Using Anaconda on Viper

To create a virtual environment using anaconda 4.6 with python version 3.7 on Viper, you would use the conda create command as follows:

  • IMPORTANT NOTE: By default, the virtual environment does not use the python system packages. However, because of how viper is configured it will see python system packages because of the PYTHONPATH environment variable. So it is advised if you would like a clean environment (meaning no system packages being included) set this environment variable to empty as follows: export PYTHONPATH=
[user@c001 ~ ]$ module load python/anaconda/202111/3.9
[user@c001 ~ ]$ conda create –n tensorflow1

The above command creates a new virtual environment called tensorflow1.

To activate this virtual environment you would issue the following command:

[user@c001 ~ ]$ conda activate tensorflow1

On successful activation of this virtual environment you should the name of your environment in front of your login prompt like so:

(tensorflow1)  [user@c001 ~ ]$

To exit the virtual environment use the key combination ctrl + d or 'conda deactivate'.

Adding packages

Once you have installed Miniconda and set up your environment to access it, you can then add whatever packages you wish to the installation using the conda install ... command. For example:

(tensorflow1) user@c001:~> conda install numpy
Fetching package metadata ...............
Solving package specifications:

Package plan for installation in environment /home/t01/t01/user/miniconda3:

The following NEW packages will be INSTALLED:

    blas:        1.1-openblas                  conda-forge
    libgfortran: 3.0.0-1                                  
    numpy:       1.14.0-py36_blas_openblas_200 conda-forge [blas_openblas]
    openblas:    0.2.20-7                      conda-forge

The following packages will be UPDATED:

    conda:       4.3.31-py36_0                             --> 4.3.33-py36_0 conda-forge

The following packages will be SUPERSEDED by a higher-priority channel:

    conda-env:   2.6.0-h36134e3_1                          --> 2.6.0-0       conda-forge

Proceed ([y]/n)? y
  • Please note, for some package installations it may also be necessary to specify a channel such as conda-forge. For example, the following command installs the pygobject module.


(tensorflow1) [user@c001]$ conda install -c conda-forge pygobject 
  • To create an environment with a specific version of a package:
[user@c001]$ conda create -n myenv scipy=0.15.0
  • or even defining the python version at 3.4
[user@c001]$ conda create -n myenv python=3.4 scipy=0.15.0 astroid babel

Clone an environment

[user@c001]$ conda create -n OriginalENV --clone NewENV

Removing an environment

To delete a conda environment, enter the following, where yourenvname is the name of the environment you wish to delete.

[user@c001]$ conda remove --name EnvironmentNAME --all


Using your environment in an interactive session

To test and debug your program it is recommended to use an interactive session, these are also useful for programs that have short runtimes too.

  • If you want a CPU-based node to use the commands:
[user@login01]$ interactive
[user@c001]$ conda activate pytorch01
{pytorch01} [user@c001]$ python mypytorchprogram.py
  • If you want a GPU-based node instead use the commands:
[user@login01]$ interactive -pgpu
[user@gpu02]$ conda activate pytorch01
{pytorch01} [user@gpu02]$ python mypytorchprogram.py

If you see an error like this:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently, supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

You will need to type:

$ conda init bash 

then exit the terminal you're in and restart the session to read the new conda settings.


Using a BATCH script with a virtual environment

Here are two examples of BATCH scripts which are using a Python virtual environment

  • substitute /home/<user> for your own path

Compute Node Example

#!/bin/bash
#SBATCH -J BUILDCPU
#SBATCH -N 1
#SBATCH --ntasks-per-node 12
#SBATCH -D /home/<user>/
#SBATCH -o debug.out
#SBATCH -e debug.err
#SBATCH -p compute
#SBATCH -t 00:10:00
#SBATCH --mail-user= your email address here

echo $SLURM_JOB_NODELIST

module purge
module load python/anaconda/20220712/3.9

source activate /home/<user>/.conda/envs/bioinformatics1
export PATH=/home/<user>/.conda/envs/bioinformatics/bin:${PATH}

python /home/user/TATT-CPU.py


GPU Node Example

#!/bin/bash
#SBATCH -J BIDGPU
#SBATCH -N 1
#SBATCH --ntasks-per-node 1
#SBATCH -D /home/<user>/
#SBATCH -o debug.out
#SBATCH -e debug.err
#SBATCH --gres=gpu:tesla
#SBATCH -p gpu
#SBATCH -t 00:10:00
#SBATCH --mail-user= your email address here

echo $SLURM_JOB_NODELIST

module purge
module load gcc/5.2.0
module load  python/anaconda/20220712/3.9
module load cuda/11.5.0

source activate /home/<user>/.conda/envs/bioinformatics1
export PATH=/home/<user>/.conda/envs/bioinformatics/bin:${PATH}

python /home/user/TATT-GPU.py

Creation of a Virtual Environment in Anaconda Using a YAML File

Sometimes (particularly from GitHub) you'll get a YAML file which is a description of which packages to install.

  • To create a virtual environment from a YAML file you would issue the following command:
[user@c001 ~ ]$ conda env create -f myenv.yml

The above command is creating a virtual environment from the YAML called myenv.yml. Below is a copy of the markup in the file called "myenv.yml".

name: ytenv
channels:
- defaults
dependencies:
- ca-certificates=2017.08.26=h1d4fec5_0
- certifi=2018.1.18=py27_0
- intel-openmp=2018.0.0=hc7b2577_8
- libedit=3.1=heed3624_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=7.2.0=h7cc24e2_2
- libgfortran-ng=7.2.0=h9f7466a_2
- libstdcxx-ng=7.2.0=h7a57d05_2
- mkl=2018.0.1=h19d6760_4
- ncurses=6.0=h9df7e31_2
- numpy=1.14.0=py27h3dfced4_1
- openssl=1.0.2n=hb7f436b_0
- pip=9.0.1=py27ha730c48_4
- python=2.7.14=h1571d57_29
- readline=7.0=ha6073c6_4
- setuptools=38.4.0=py27_0
- sqlite=3.22.0=h1bed415_0
- tk=8.6.7=hc745277_3
- wheel=0.30.0=py27h2bc6bb2_1
- zlib=1.2.11=ha838bed_2
- pip:
  - backports.functools-lru-cache==1.5
  - backports.shutil-get-terminal-size==1.0.0
  - cycler==0.10.0
  - decorator==4.2.1
  - enum34==1.1.6
  - h5py==2.7.1
  - ipython==5.5.0
  - ipython-genutils==0.2.0
  - matplotlib==2.1.2
  - mpmath==1.0.0
  - pathlib2==2.3.0
  - pexpect==4.4.0
  - pickleshare==0.7.4
  - prompt-toolkit==1.0.15
  - ptyprocess==0.5.2
  - pygments==2.2.0
  - pyparsing==2.2.0
  - python-dateutil==2.6.1
  - pytz==2018.3
  - scandir==1.7
  - simplegeneric==0.8.1
  - six==1.11.0
  - subprocess32==3.2.7
  - sympy==1.1.1
  - traitlets==4.3.2
  - wcwidth==0.1.7
  - yt==3.4.1

Exporting a Virtual Environment in Anaconda to a YAML File

Export a virtual environment to a YAML file so that you or another researcher can replicate your environment using Anaconda can be done using the following steps:

  • Activate the Virtual environment you wish to export:
[user@c001 ~ ]$ source activate tensorflow1
  • Export your active virtual environment using the following command:
{tensorflow1} [user@c001 ~ ]$ conda env export > tensorflow1.yml

Virtual Environment Tips

  • Avoid using pip by itself. Using python -m pip will always guarantee you are using the pip associated with that specific python being called, instead of potentially calling a pip associated with a different python.
  • I recommend using a separate virtual environment for each project.
  • You should never copy or move around virtual environments. Always create new ones, or use YAML exports.
  • Ignore the virtual environment directories from repositories (eg GitHub, GitLab). For example, .gitignore them.

Next Steps

Back