Difference between revisions of "General/Batch"
(→Introduction) |
(→GPU batch job) |
||
(42 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
− | |||
− | A submission script is a file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, then sets up the environment so the task can run. A minimal submission script has three main components: | + | ===Overview=== |
+ | |||
+ | Viper uses the [[General/Slurm | Slurm]] job scheduler to provide access to compute resources. The scheduler | ||
+ | |||
+ | A submission script is a file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and then sets up the environment so the task can run. A minimal submission script has three main components: | ||
+ | |||
+ | * A set of directives, starting with '''#SBATCH''', which tell the scheduler about the job such as information about resources required, job name and job log and error files. Although in a normal BASH script anything starting with a '#' would indicate a comment. However, the SLURM interpreter will recognise this as a command to pass to the scheduler. | ||
+ | * Information about how the job environment should be set up, for example, what application [[General/Modules | modules]] should be loaded. | ||
+ | * The actual command(s) that need to be run. | ||
+ | |||
+ | '''Important''' : [[General/Interactive|Interactive sessions]] are only used for development (to view program out and debug information), and applications that require some interactivity like [[Applications/RStudio|RStudio]]. An interactive session will only give 12 hours of allocation whereas a batch session is up to 5 days. | ||
+ | |||
+ | |||
+ | ===SLURM script=== | ||
+ | |||
+ | The script starts with #!/bin/bash (also called a shebang), which makes the submission script a Linux bash script. | ||
+ | |||
+ | The script continues with a series of lines starting with #, representing bash script comments. For Slurm, the lines starting with #SBATCH are directives that request job scheduling resources. (Note: it's essential that you put all the directives at the top of a script, before any other commands; any #SBATCH directive coming after a bash script command is ignored!) | ||
+ | |||
+ | The resource request #SBATCH --nodes=n determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job. A note of caution is on threaded single-process applications (e.g. Matlab). These applications cannot run on more than a single compute node; allocating more (e.g. #SBATCH --nodes=2) will end up with the first node being busy and the rest idle. | ||
+ | |||
+ | The maximum walltime is specified by #SBATCH --time=T, where T has format h:m:s. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless of whether the job processes are still running or not. | ||
+ | |||
+ | The name of the job can be specified too with #SBATCH --job-name="name". | ||
+ | |||
+ | '''Note''': Linux and Slurm do not care about the name used for a submission script, however for ease of support, we would recommend you call your submission script a relevant name with a .job suffix, for example ''MATLABtest.job'' | ||
− | |||
− | |||
− | |||
=== Example batch scripts === | === Example batch scripts === | ||
+ | The following are a set of basic job submission scripts along with relevant additional information on how these can be adjusted to suit the task. | ||
+ | |||
+ | To submit a job, use the '''sbatch jobscript.job''' for example: | ||
+ | |||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ sbatch MATLABtest.job | ||
+ | Submitted batch job 289522 | ||
+ | </pre> | ||
+ | |||
==== Basic batch job ==== | ==== Basic batch job ==== | ||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 14: | Line 44: | ||
#SBATCH -J jobname # Job name, you can change it to whatever you want | #SBATCH -J jobname # Job name, you can change it to whatever you want | ||
#SBATCH -n 1 # Number of cores | #SBATCH -n 1 # Number of cores | ||
+ | #SBATCH -N 4 # Number of nodes (eg. 4) | ||
#SBATCH -o %N.%j.out # Standard output will be written here | #SBATCH -o %N.%j.out # Standard output will be written here | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH --time=20:00:00 # Set max wallclock time this case 20 hours | ||
+ | #SBATCH --mail-user= your email address here | ||
module purge | module purge | ||
Line 23: | Line 56: | ||
command | command | ||
</pre> | </pre> | ||
+ | |||
+ | * Setting the time with '''#SBATCH --time=DD-HH:MM:SS''' (or -t DD-HH:MM:SS) allows the scheduler to place your job more efficiently, using the default here may cause delays when the cluster is running at a high demand rate. | ||
==== Exclusive batch job ==== | ==== Exclusive batch job ==== | ||
+ | |||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 32: | Line 68: | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
#SBATCH --exclusive # Request exclusive access to a node (all 28 cores, 128GB of RAM) | #SBATCH --exclusive # Request exclusive access to a node (all 28 cores, 128GB of RAM) | ||
+ | #SBATCH -t=10:00:00 # run for 10 hours | ||
+ | #SBATCH --mail-user= your email address here # Mail when completion happens (put your email address in though) | ||
module purge | module purge | ||
Line 38: | Line 76: | ||
command | command | ||
</pre> | </pre> | ||
+ | |||
+ | Examples of where exclusive access is useful: | ||
+ | * [[Programming/OpenMP | Wiki: OpenMP ]] | ||
+ | * [[Applications/Matlab#Parallel_Computing_Toolbox | Wiki: Matlab Parallel Computing Toolbox ]] | ||
==== Parallel batch jobs ==== | ==== Parallel batch jobs ==== | ||
+ | |||
===== Intel MPI parallel batch job ===== | ===== Intel MPI parallel batch job ===== | ||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -J jobname # Job name, you can change it to whatever you want | #SBATCH -J jobname # Job name, you can change it to whatever you want | ||
− | #SBATCH -n | + | #SBATCH -n 28 # Number of cores (eg. 28) |
− | #SBATCH - | + | #SBATCH -N 4 # Number of nodes (eg. 4) |
#SBATCH -o %N.%j.out # Standard output will be written here | #SBATCH -o %N.%j.out # Standard output will be written here | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t=1-00:00:00 # run for 1 day | ||
+ | #SBATCH --mail-user= your email address here | ||
module purge | module purge | ||
− | module add | + | module add intel/2017 |
− | command | + | mpirun command |
</pre> | </pre> | ||
Line 60: | Line 105: | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -J jobname # Job name, you can change it to whatever you want | #SBATCH -J jobname # Job name, you can change it to whatever you want | ||
− | #SBATCH -n | + | #SBATCH -n 28 # Number of cores |
− | #SBATCH - | + | #SBATCH -N 4 # Number of nodes |
#SBATCH -o %N.%j.out # Standard output will be written here | #SBATCH -o %N.%j.out # Standard output will be written here | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t=00:10:00 # run for 10 minutes | ||
+ | #SBATCH --mail-user= your email address here | ||
module purge | module purge | ||
− | module add | + | module add mvapich2/2.2/gcc-6.3.0 |
− | command | + | mpirun command |
</pre> | </pre> | ||
+ | |||
+ | There are various options for the mvapich2 module, see [[Applications/mvapich2 | mvapich2 ]] | ||
===== OpenMPI parallel batch job ===== | ===== OpenMPI parallel batch job ===== | ||
Line 76: | Line 125: | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -J jobname # Job name, you can change it to whatever you want | #SBATCH -J jobname # Job name, you can change it to whatever you want | ||
− | #SBATCH -n | + | #SBATCH -n 28 # Number of cores |
− | #SBATCH - | + | #SBATCH -N 4 # Number of nodes |
#SBATCH -o %N.%j.out # Standard output will be written here | #SBATCH -o %N.%j.out # Standard output will be written here | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t=10:30:00 # run for 10 hours 30 minutes | ||
+ | #SBATCH --mail-user= your email address here | ||
module purge | module purge | ||
− | module add | + | module add openmpi/1.10.5/gcc-6.3.0 |
− | command | + | mpirun command |
</pre> | </pre> | ||
+ | |||
+ | There are various options for the OpenMPI module, see [[Programming/OpenMPI | openmpi ]] | ||
==== High memory batch job ==== | ==== High memory batch job ==== | ||
− | If | + | If your task requires more memory than the standard provision (approximately 4GB). Then you need to include a directive in your submission script to request the appropriate resource. The standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition. |
The following job submission script runs on the highmem partition and uses the ''#SBATCH --mem'' flag to request 500GB of RAM, which results in three things: | The following job submission script runs on the highmem partition and uses the ''#SBATCH --mem'' flag to request 500GB of RAM, which results in three things: | ||
Line 105: | Line 158: | ||
#SBATCH -p highmem # Slurm partition, where you want the job to be queued | #SBATCH -p highmem # Slurm partition, where you want the job to be queued | ||
#SBATCH --mem=500G | #SBATCH --mem=500G | ||
+ | #SBATCH -t=3-00:00:00 # run for 3 days | ||
+ | #SBATCH --mail-user= your email address here | ||
module purge | module purge | ||
Line 112: | Line 167: | ||
</pre> | </pre> | ||
− | If a job exceeds the requested about of memory, it will terminate with an error message similar to the following (a job | + | If a job exceeds the requested about of memory, it will terminate with an error message similar to the following (a job that ran with a memory limit of 2GB): |
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;"> | ||
Line 122: | Line 177: | ||
==== GPU batch job ==== | ==== GPU batch job ==== | ||
+ | To make use of the GPU nodes, you need to request the GPU queue in your submission script directives with '''#SBATCH -p gpu''' and also request GPU resource using the '''#SBATCH --gres=gpu''' directive | ||
+ | |||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 128: | Line 185: | ||
#SBATCH -o %N.%j.out # Standard output will be written here | #SBATCH -o %N.%j.out # Standard output will be written here | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
+ | #SBATCH --gres=gpu # use the GPU resource not the CPU | ||
#SBATCH -p gpu # Slurm partition, where you want the job to be queued | #SBATCH -p gpu # Slurm partition, where you want the job to be queued | ||
− | + | #SBATCH -t=00:40:00 # run for 40 minutes | |
+ | #SBATCH --mail-user= your email address here | ||
+ | |||
module purge | module purge | ||
module add modulename | module add modulename | ||
Line 137: | Line 197: | ||
==== Array batch job ==== | ==== Array batch job ==== | ||
− | An array batch job allows multiple jobs to be executed with identical parameters based on | + | An array batch job allows multiple jobs to be executed with identical parameters based on single job submission. By using the directive '''#SBATCH --array 1-10''' the same job will be run 10 times. The index specification identifies what array index values should be used. Multiple values may be specified using a comma-separated list and/or a range of values with a "-" separator. For example, "--array=0-15" or "--array=0,6,16-32". |
+ | |||
+ | A step function can also be specified with a suffix containing a colon and number. For example, "--array=0-15:4" is equivalent to "--array=0,4,8,12". A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4. | ||
+ | |||
+ | The variable ''$SLURM_ARRAY_TASK_ID'' can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc. | ||
− | + | When the batch script below is submitted, 10 jobs will run resulting in the ''command'' being run with the first argument corresponding to the array element of that task, for instance: ''command 1'', ''command 2'', through to ''command 10''. The output of each of these tasks will be logged to a different out and err file, with the format ''<node job ran on>.<job ID>.<array index>.out'' ''<node job ran on>.<job ID>.<array index>.err''. | |
− | + | * Note that '''#SBATCH --mail-user''' has not been specified here as it does not process array jobs | |
− | |||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 159: | Line 222: | ||
command $SLURM_ARRAY_TASK_ID | command $SLURM_ARRAY_TASK_ID | ||
</pre> | </pre> | ||
+ | |||
+ | ===Next Steps=== | ||
+ | |||
+ | * [[General/Slurm | Slurm Scheduler]] | ||
+ | |||
+ | |||
+ | ==Navigation== | ||
+ | |||
+ | * [[Main_Page|Home]] | ||
+ | * [[Applications|Application support]] | ||
+ | * [[General|General]] * | ||
+ | * [[Programming|Programming support]] |
Latest revision as of 10:49, 27 January 2023
Contents
Introduction
Overview
Viper uses the Slurm job scheduler to provide access to compute resources. The scheduler
A submission script is a file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and then sets up the environment so the task can run. A minimal submission script has three main components:
- A set of directives, starting with #SBATCH, which tell the scheduler about the job such as information about resources required, job name and job log and error files. Although in a normal BASH script anything starting with a '#' would indicate a comment. However, the SLURM interpreter will recognise this as a command to pass to the scheduler.
- Information about how the job environment should be set up, for example, what application modules should be loaded.
- The actual command(s) that need to be run.
Important : Interactive sessions are only used for development (to view program out and debug information), and applications that require some interactivity like RStudio. An interactive session will only give 12 hours of allocation whereas a batch session is up to 5 days.
SLURM script
The script starts with #!/bin/bash (also called a shebang), which makes the submission script a Linux bash script.
The script continues with a series of lines starting with #, representing bash script comments. For Slurm, the lines starting with #SBATCH are directives that request job scheduling resources. (Note: it's essential that you put all the directives at the top of a script, before any other commands; any #SBATCH directive coming after a bash script command is ignored!)
The resource request #SBATCH --nodes=n determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job. A note of caution is on threaded single-process applications (e.g. Matlab). These applications cannot run on more than a single compute node; allocating more (e.g. #SBATCH --nodes=2) will end up with the first node being busy and the rest idle.
The maximum walltime is specified by #SBATCH --time=T, where T has format h:m:s. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless of whether the job processes are still running or not.
The name of the job can be specified too with #SBATCH --job-name="name".
Note: Linux and Slurm do not care about the name used for a submission script, however for ease of support, we would recommend you call your submission script a relevant name with a .job suffix, for example MATLABtest.job
Example batch scripts
The following are a set of basic job submission scripts along with relevant additional information on how these can be adjusted to suit the task.
To submit a job, use the sbatch jobscript.job for example:
[username@login01 ~]$ sbatch MATLABtest.job Submitted batch job 289522
Basic batch job
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -N 4 # Number of nodes (eg. 4) #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --time=20:00:00 # Set max wallclock time this case 20 hours #SBATCH --mail-user= your email address here module purge module add modulename command
- Setting the time with #SBATCH --time=DD-HH:MM:SS (or -t DD-HH:MM:SS) allows the scheduler to place your job more efficiently, using the default here may cause delays when the cluster is running at a high demand rate.
Exclusive batch job
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --exclusive # Request exclusive access to a node (all 28 cores, 128GB of RAM) #SBATCH -t=10:00:00 # run for 10 hours #SBATCH --mail-user= your email address here # Mail when completion happens (put your email address in though) module purge module add modulename command
Examples of where exclusive access is useful:
Parallel batch jobs
Intel MPI parallel batch job
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 28 # Number of cores (eg. 28) #SBATCH -N 4 # Number of nodes (eg. 4) #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=1-00:00:00 # run for 1 day #SBATCH --mail-user= your email address here module purge module add intel/2017 mpirun command
MVAPICH parallel batch job
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 28 # Number of cores #SBATCH -N 4 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=00:10:00 # run for 10 minutes #SBATCH --mail-user= your email address here module purge module add mvapich2/2.2/gcc-6.3.0 mpirun command
There are various options for the mvapich2 module, see mvapich2
OpenMPI parallel batch job
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 28 # Number of cores #SBATCH -N 4 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=10:30:00 # run for 10 hours 30 minutes #SBATCH --mail-user= your email address here module purge module add openmpi/1.10.5/gcc-6.3.0 mpirun command
There are various options for the OpenMPI module, see openmpi
High memory batch job
If your task requires more memory than the standard provision (approximately 4GB). Then you need to include a directive in your submission script to request the appropriate resource. The standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.
The following job submission script runs on the highmem partition and uses the #SBATCH --mem flag to request 500GB of RAM, which results in three things:
- The job will only be allocated to a node with this much memory available
- No other jobs will be allocated to this node unless their memory requirements fit in with the remaining available memory
- If the job exceeds this requested value, the task will terminate
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p highmem # Slurm partition, where you want the job to be queued #SBATCH --mem=500G #SBATCH -t=3-00:00:00 # run for 3 days #SBATCH --mail-user= your email address here module purge module add modulename command
If a job exceeds the requested about of memory, it will terminate with an error message similar to the following (a job that ran with a memory limit of 2GB):
slurmstepd: Step 307110.0 exceeded memory limit (23933492 > 2097152), being killed srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: got SIGCONT slurmstepd: Exceeded job memory limit
GPU batch job
To make use of the GPU nodes, you need to request the GPU queue in your submission script directives with #SBATCH -p gpu and also request GPU resource using the #SBATCH --gres=gpu directive
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH --gres=gpu # use the GPU resource not the CPU #SBATCH -p gpu # Slurm partition, where you want the job to be queued #SBATCH -t=00:40:00 # run for 40 minutes #SBATCH --mail-user= your email address here module purge module add modulename command
Array batch job
An array batch job allows multiple jobs to be executed with identical parameters based on single job submission. By using the directive #SBATCH --array 1-10 the same job will be run 10 times. The index specification identifies what array index values should be used. Multiple values may be specified using a comma-separated list and/or a range of values with a "-" separator. For example, "--array=0-15" or "--array=0,6,16-32".
A step function can also be specified with a suffix containing a colon and number. For example, "--array=0-15:4" is equivalent to "--array=0,4,8,12". A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
The variable $SLURM_ARRAY_TASK_ID can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.
When the batch script below is submitted, 10 jobs will run resulting in the command being run with the first argument corresponding to the array element of that task, for instance: command 1, command 2, through to command 10. The output of each of these tasks will be logged to a different out and err file, with the format <node job ran on>.<job ID>.<array index>.out <node job ran on>.<job ID>.<array index>.err.
- Note that #SBATCH --mail-user has not been specified here as it does not process array jobs
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%A.%a.out # Standard output will be written here #SBATCH -e %N.%A.%a.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --array 1-10 module purge module add modulename command $SLURM_ARRAY_TASK_ID
Next Steps