Quickstart/Batch Jobs

From HPC
Revision as of 14:16, 9 November 2022 by Pysdlb (talk | contribs) (Batch Jobs!!!!)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation , search

What is a batch job?

Interactive sessions are only used for development (to view program out and debug information), and applications that require some interactivity like RStudio. An interactive session will only give 12 hours of allocation whereas a batch session is up to 5 days.

A Slurm script is used to provide the job scheduler (Slurm) about the task you would like to run.

What is a Slurm script?

A Slurm script is a file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and then sets up the environment so the task can run.

There are three main components:

  1. A set of directives starting with #SBATCH, these tell Slurm what resources are required, the job name and where to output log and error files.
  2. Information on the environment that the job should run in, for example; what modules should be loaded.
  3. The commands that you would like to run.

Slurm Scripts

For ease of use please give your .job files a descriptive name. While you can specify the directory, it is easier to create and submit the script from the working directory.

#!/bin/bash

All Slurm scripts must start with #!/bin/bash (called a shebang).

#!/bin/bash

#SBATCH

The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in further topics.

Please note these are case sensitive.

#SBATCH -J jobname

#SBATCH -J jobname tells Slurm the name of your job - this is what you will see when you run squeue.

#!/bin/bash
#SBATCH -J jobname

#SBATCH -n

#SBATCH -n Number tells Slurm how many cores you would like to use.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4

#SBATCH -o

#SBATCH -o %N.%j.out tells Slurm where you would like standard output to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .out.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out

#SBATCH -e

#SBATCH -e %N.%j.err tells Slurm where you would like standard error to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .err.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err

#SBATCH -p

#SBATCH -p Partition tells Slurm what partition you would like to use: compute/highmem/gpu. Please note highmem andgpu jobs have additional #SBATCH requirements.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute

#SBATCH --t

#SBATCH --t=HH:MM:SS tells Slurm how long to run the job for- the job will end automatically when finished or it will be terminated if it takes longer than the specified time.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH --t=02:30:00

Environment Setup

The next part of the script sets up the environment that the job will run in.

module purge

module purge unloads any previously loaded modules, it is good practice to start with this section

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH --t=02:30:00

module purge

module add modulename

Just like you would in an interactive session add any modules you need.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH --t=02:30:00

module purge
module add anaconda

Commands

This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. Please check the modules pages on Modules Available for the specifics of what commands to use if you are not sure.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH --t=02:30:00

module purge
module add anaconda

python helloWorld.py

The above batch script is now complete; this will use a compute node with 4 cores for a maximum of 2 hours 30 minutes, and will load the anaconda module before running the python file helloWorld.py.

How to submit a batch job to slurm

Submitting a job is easy! Just run sbatch jobscript.job.

[username@login01 ~]$ sbatch PythonTest.job
Submitted batch job 289522

You will now be able to see the status of your batch job by running squeueme.

More detailed scripts

For more information on batch scripts visit: Advanced Batch Jobs. This includes information on using high memory and gpu nodes, resource management, how to use multiple nodes, and node reservations

Back / Next (Data Management)