Difference between revisions of "Quickstart/Batch Jobs"

From HPC
Jump to: navigation , search
m (More detailed scripts)
Line 1: Line 1:
 
==What is a batch job?==
 
==What is a batch job?==
[[Quickstart/Interactive|Interactive]] sessions are only used for development (to view program out and debug information), and applications that require some interactivity like RStudio. An interactive session will only give 12 hours of allocation whereas a batch session is up to 5 days.
+
Having been introduced to the Slurm scheduler in [[Quickstart/Slurm]] and then one of the ways of using Viper with Interactive sessions in [[Quickstart/Interactive]] the next way of making use of an HPC system like Viper is by submitting jobs to run automatically without interaction when the resource becomes available. We call these '''batch jobs'''.
  
A Slurm script is used to provide the job scheduler ([[Quickstart/Slurm|Slurm]]) about the task you would like to run.
+
In order to run a batch job, we need to provide Slurm with information about what we want to do. We do this via a job submission script, which is sort of like a recipe for the job.  
  
 
==What is a Slurm script?==
 
==What is a Slurm script?==
A Slurm script is a file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and then sets up the environment so the task can run.
+
The submission script is a text file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and sets up the environment so the task can run. A minimal submission script has three main components:
  
There are three main components:
+
* A set of directives that provides Slurm with some high-level information such as what resource is required, a job name, how long the task should run for and where to log any output that would normally be shown on the screen.
# A set of directives starting with ''#SBATCH'', these tell Slurm what resources are required, the job name and where to output log and error files.
+
* Information about how the job environment should be set up, for example, what application [[Quickstart/Modules]] should be loaded.
# Information on the environment that the job should run in, for example; what [[Quickstart/Using Modules|modules]] should be loaded.
+
* The actual command(s) that need to be run.
# The commands that you would like to run.
 
  
 
==Slurm Scripts==
 
==Slurm Scripts==
Line 63: Line 62:
 
#SBATCH -p compute
 
#SBATCH -p compute
 
</pre>
 
</pre>
====#SBATCH --t====
+
====#SBATCH -t====
 
<nowiki>#SBATCH</nowiki> --t=HH:MM:SS tells Slurm how long to run the job for- the job will end automatically when finished or it will be terminated if it takes longer than the specified time.
 
<nowiki>#SBATCH</nowiki> --t=HH:MM:SS tells Slurm how long to run the job for- the job will end automatically when finished or it will be terminated if it takes longer than the specified time.
 
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
Line 72: Line 71:
 
#SBATCH -e %N.%j.err
 
#SBATCH -e %N.%j.err
 
#SBATCH -p compute
 
#SBATCH -p compute
#SBATCH --t=02:30:00
+
#SBATCH -t=02:30:00
 
</pre>
 
</pre>
 
===Environment Setup===
 
===Environment Setup===
Line 85: Line 84:
 
#SBATCH -e %N.%j.err
 
#SBATCH -e %N.%j.err
 
#SBATCH -p compute
 
#SBATCH -p compute
#SBATCH --t=02:30:00
+
#SBATCH -t=02:30:00
  
 
module purge
 
module purge
Line 98: Line 97:
 
#SBATCH -e %N.%j.err
 
#SBATCH -e %N.%j.err
 
#SBATCH -p compute
 
#SBATCH -p compute
#SBATCH --t=02:30:00
+
#SBATCH -t=02:30:00
  
 
module purge
 
module purge
Line 112: Line 111:
 
#SBATCH -e %N.%j.err
 
#SBATCH -e %N.%j.err
 
#SBATCH -p compute
 
#SBATCH -p compute
#SBATCH --t=02:30:00
+
#SBATCH -t=02:30:00
  
 
module purge
 
module purge

Revision as of 14:26, 9 November 2022

What is a batch job?

Having been introduced to the Slurm scheduler in Quickstart/Slurm and then one of the ways of using Viper with Interactive sessions in Quickstart/Interactive the next way of making use of an HPC system like Viper is by submitting jobs to run automatically without interaction when the resource becomes available. We call these batch jobs.

In order to run a batch job, we need to provide Slurm with information about what we want to do. We do this via a job submission script, which is sort of like a recipe for the job.

What is a Slurm script?

The submission script is a text file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and sets up the environment so the task can run. A minimal submission script has three main components:

  • A set of directives that provides Slurm with some high-level information such as what resource is required, a job name, how long the task should run for and where to log any output that would normally be shown on the screen.
  • Information about how the job environment should be set up, for example, what application Quickstart/Modules should be loaded.
  • The actual command(s) that need to be run.

Slurm Scripts

For ease of use please give your .job files a descriptive name. While you can specify the directory, it is easier to create and submit the script from the working directory.

#!/bin/bash

All Slurm scripts must start with #!/bin/bash (called a shebang).

#!/bin/bash

#SBATCH

The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in further topics.

Please note these are case sensitive.

#SBATCH -J jobname

#SBATCH -J jobname tells Slurm the name of your job - this is what you will see when you run squeue.

#!/bin/bash
#SBATCH -J jobname

#SBATCH -n

#SBATCH -n Number tells Slurm how many cores you would like to use.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4

#SBATCH -o

#SBATCH -o %N.%j.out tells Slurm where you would like standard output to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .out.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out

#SBATCH -e

#SBATCH -e %N.%j.err tells Slurm where you would like standard error to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .err.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err

#SBATCH -p

#SBATCH -p Partition tells Slurm what partition you would like to use: compute/highmem/gpu. Please note highmem andgpu jobs have additional #SBATCH requirements.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute

#SBATCH -t

#SBATCH --t=HH:MM:SS tells Slurm how long to run the job for- the job will end automatically when finished or it will be terminated if it takes longer than the specified time.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

Environment Setup

The next part of the script sets up the environment that the job will run in.

module purge

module purge unloads any previously loaded modules, it is good practice to start with this section

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge

module add modulename

Just like you would in an interactive session add any modules you need.

#!/bin/bash
#SBATCH -J jobname
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge
module add anaconda

Commands

This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. Please check the modules pages on Modules Available for the specifics of what commands to use if you are not sure.

#!/bin/bash
#SBATCH -J helloWorld
#SBATCH -n 4
#SBATCH -o %N.%j.out
#SBATCH -e %N.%j.err
#SBATCH -p compute
#SBATCH -t=02:30:00

module purge
module add anaconda

python helloWorld.py

The above batch script is now complete; this will use a compute node with 4 cores for a maximum of 2 hours 30 minutes, and will load the anaconda module before running the python file helloWorld.py.

How to submit a batch job to slurm

Submitting a job is easy! Just run sbatch jobscript.job.

[username@login01 ~]$ sbatch PythonTest.job
Submitted batch job 289522

You will now be able to see the status of your batch job by running squeueme.

More detailed scripts

For more information on batch scripts visit: Advanced Batch Jobs. This includes information on using high memory and gpu nodes, resource management, how to use multiple nodes, and node reservations



Back / Next (Data Management)