Difference between revisions of "Quickstart/Batch Jobs"
m (→Example Batch Jobs) |
|||
(One intermediate revision by the same user not shown) | |||
Line 65: | Line 65: | ||
====#SBATCH -t==== | ====#SBATCH -t==== | ||
− | ''<nowiki>#SBATCH</nowiki> -t | + | ''<nowiki>#SBATCH</nowiki> -t HH:MM:SS'' tells Slurm how long to run the job for the job will end automatically when finished or it will be terminated if it takes longer than the specified time. |
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 73: | Line 73: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
</pre> | </pre> | ||
Line 87: | Line 87: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
Line 100: | Line 100: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
Line 115: | Line 115: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
Line 142: | Line 142: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
</pre> | </pre> | ||
Line 155: | Line 155: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 01:00:00 # Max time job runs for 1 hour |
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Line 174: | Line 174: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 01:00:00 # Max time job runs for |
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Line 192: | Line 192: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 01:00:00 # Max time job runs for 1 hour |
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Line 211: | Line 211: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 01:00:00 # Max time job runs for |
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Latest revision as of 10:12, 3 August 2023
Contents
What is a batch job?
Having been introduced to the Slurm scheduler in Slurm and then one of the ways of using Viper with Interactive sessions in Interactive the next way of making use of an HPC system like Viper is by submitting jobs to run automatically without interaction when the resource becomes available. We call these batch jobs.
In order to run a batch job, we need to provide Slurm with information about what we want to do. We do this via a job submission script, which is sort of like a recipe for the job.
What is a Slurm script?
The submission script is a text file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and sets up the environment so the task can run. A minimal submission script has three main components:
- A set of directives that provides Slurm with some high-level information such as what resource is required, a job name, how long the task should run for and where to log any output that would normally be shown on the screen.
- Information about how the job environment should be set up, for example, what application using modules should be loaded.
- The actual command(s) that need to be run.
Slurm Scripts
For ease of use please give your .job files a descriptive name. While you can specify the directory, it is easier to create and submit the script from the working directory.
#!/bin/bash
All Slurm scripts must start with #!/bin/bash (called a shebang).
#!/bin/bash
#SBATCH
The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in further topics.
Please note these are case sensitive.
#SBATCH -J
#SBATCH -J jobname tells Slurm the name of your job - this is what you will see when you run squeue.
#!/bin/bash #SBATCH -J helloWorld
#SBATCH -n
#SBATCH -n Number tells Slurm how many cores you would like to use.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4
#SBATCH -o
#SBATCH -o %N.%j.out tells Slurm where you would like standard output to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .out.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out
#SBATCH -e
#SBATCH -e %N.%j.err tells Slurm where you would like standard error to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .err.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err
#SBATCH -p
#SBATCH -p Partition tells Slurm what partition you would like to use: compute/highmem/gpu. Please note highmem and gpu jobs have additional #SBATCH requirements.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute
#SBATCH -t
#SBATCH -t HH:MM:SS tells Slurm how long to run the job for the job will end automatically when finished or it will be terminated if it takes longer than the specified time.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00
Environment Setup
The next part of the script sets up the environment in which the job will run in.
module purge
module purge unloads any previously loaded modules, it is good practice to start with this section
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge
module add modulename
Just like you would in an interactive session add any modules you need.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge module add python/anaconda/202111/3.9
Commands
This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. There are some examples at the bottom of this page of basic batch jobs with what commands you may run or you can check the modules pages on Modules Available for the specifics of what commands to use if you are not sure.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge module add python/anaconda/202111/3.9 python helloWorld.py
The above batch script is now complete; this will use a compute node with 4 cores for a maximum of 2 hours 30 minutes and will load the anaconda module before running the python file helloWorld.py.
How to submit a batch job to slurm
Submitting a job is easy! Just run sbatch jobscript.job.
[username@login01 ~]$ sbatch PythonTest.job Submitted batch job 289522
You will now be able to see the status of your batch job by running squeue -u your_username.
Exclusive Batch Jobs
For demanding tasks such as Matlab please use exclusive nodes. If a job or session hasn't requested an appropriate resource, this multithreading can cause contention for CPU resources and negatively impact other users. This can be done by using #SBATCH -N 1 and #SBATCH --exclusive.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -N 1 #SBATCH --exclusive #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00
Example Batch Jobs
R
R should be used on an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J My_R_job # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for 1 hour #SBATCH --exclusive # Run on one node without any other users module purge module add R/4.0.2 R CMD BATCH Random.R output.data #The output from the R interpreter will appear in the file output.data
Matlab
Matlab requires an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J MATLAB # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes (for Matlab should be always one) #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for #SBATCH --exclusive # Run on one node without any other users module purge module add matlab/2016a matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file
Python Virtual Environment
See Virtual Environment Quickstart for setting up a Virtual Environment.
#!/bin/bash #SBATCH -J matrixMulti # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores = 1 #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for 1 hour #SBATCH --exclusive # Run on one node without any other users module purge module add python/anaconda/202111/3.9 source activate /home/<user>/.conda/envs/numpyenv python matrixMultiplication.py
Openfoam
Openfoam should be used on an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J openfoamExample # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for #SBATCH --exclusive # Run on one node without any other users module add openfoam/4.0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no mpirun -n 28 -cpu_bind=cores interFoam -parallel
More detailed scripts
For more information on batch scripts visit: Advanced Batch Jobs. This includes information on using high memory and GPU nodes, resource management, how to use multiple nodes, and node reservations