Quickstart/Batch Jobs

From HPC
Revision as of 16:39, 9 November 2022 by Pysdlb (talk | contribs) (Example Batch Jobs)

Jump to: navigation , search

What is a batch job?

Having been introduced to the Slurm scheduler in Slurm and then one of the ways of using Viper with Interactive sessions in Interactive the next way of making use of an HPC system like Viper is by submitting jobs to run automatically without interaction when the resource becomes available. We call these batch jobs.

In order to run a batch job, we need to provide Slurm with information about what we want to do. We do this via a job submission script, which is sort of like a recipe for the job.

What is a Slurm script?

The submission script is a text file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and sets up the environment so the task can run. A minimal submission script has three main components:

  • A set of directives that provides Slurm with some high-level information such as what resource is required, a job name, how long the task should run for and where to log any output that would normally be shown on the screen.
  • Information about how the job environment should be set up, for example, what application using modules should be loaded.
  • The actual command(s) that need to be run.

Slurm Scripts

For ease of use please give your .job files a descriptive name. While you can specify the directory, it is easier to create and submit the script from the working directory.

#!/bin/bash

All Slurm scripts must start with #!/bin/bash (called a shebang).

#!/bin/bash

#SBATCH

The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in further topics.

Please note these are case sensitive.

#SBATCH -J jobname

#SBATCH -J jobname tells Slurm the name of your job - this is what you will see when you run squeue.

#!/bin/bash
#SBATCH -J helloWorld

#SBATCH -n

#SBATCH -n Number tells Slurm how many cores you would like to use.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4

#SBATCH -o

#SBATCH -o %N.%j.out tells Slurm where you would like standard output to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .out.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out

#SBATCH -e

#SBATCH -e %N.%j.err tells Slurm where you would like standard error to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .err.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err

#SBATCH -p

#SBATCH -p Partition tells Slurm what partition you would like to use: compute/highmem/gpu. Please note highmem and gpu jobs have additional #SBATCH requirements.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute

#SBATCH -t

#SBATCH -t=HH:MM:SS tells Slurm how long to run the job for the job will end automatically when finished or it will be terminated if it takes longer than the specified time.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

Environment Setup

The next part of the script sets up the environment in which the job will run in.

module purge

module purge unloads any previously loaded modules, it is good practice to start with this section

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge

module add modulename

Just like you would in an interactive session add any modules you need.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge
module add python/anaconda/202111/3.9

Commands

This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. There are some examples at the bottom of this page of basic batch jobs with what commands you may run or you can check the modules pages on Modules Available for the specifics of what commands to use if you are not sure.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge
module add python/anaconda/202111/3.9

python helloWorld.py

The above batch script is now complete; this will use a compute node with 4 cores for a maximum of 2 hours 30 minutes and will load the anaconda module before running the python file helloWorld.py.

How to submit a batch job to slurm

Submitting a job is easy! Just run sbatch jobscript.job.

[username@login01 ~]$ sbatch PythonTest.job
Submitted batch job 289522

You will now be able to see the status of your batch job by running squeueme.

Example Batch Jobs

R

R should be used on an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive

#!/bin/bash
#SBATCH -J My_R_job         # Job name, you can change it to whatever you want
#SBATCH -N 1                # Number of nodes 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH -t=01:00:00         # Max time job runs for
#SBATCH --exclusive         # Run on one node without any other users

 
module purge
module add R/4.0.2
 
R CMD BATCH Random.R output.data     #The output from the R interpreter will appear in the file output.data

Matlab

Matlab requires an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive

#!/bin/bash
#SBATCH -J MATLAB           # Job name, you can change it to whatever you want
#SBATCH -N 1                # Number of nodes (for Matlab should be always one)
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued
#SBATCH -t=01:00:00         # Max time job runs for
#SBATCH --exclusive         # Run on one node without any other users
 
module purge
module add matlab/2016a
 
matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file

Python Virtual Environment

#!/bin/bash
#SBATCH -J matrixMulti      # Job name, you can change it to whatever you want
#SBATCH -n 4                # Number of cores
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH -t=01:00:00         # Max time job runs for
#SBATCH --exclusive         # Run on one node without any other users

module purge
module add python/anaconda/202111/3.9

source activate /home/<user>/.conda/envs/numpyenv
python matrixMultiplication.py

Openfoam

#!/bin/bash
#SBATCH -J openfoamExample  # Job name, you can change it to whatever you want
#SBATCH -N 1                # Number of nodes 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued
#SBATCH -t=01:00:00         # Max time job runs for
#SBATCH --exclusive         # Run on one node without any other users

module add openfoam/4.0

export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:tmi
export I_MPI_FALLBACK=no

mpirun -n 28 -cpu_bind=cores interFoam -parallel

More detailed scripts

For more information on batch scripts visit: Advanced Batch Jobs. This includes information on using high memory and GPU nodes, resource management, how to use multiple nodes, and node reservations



Back / Next (Data Management)