Difference between revisions of "FurtherTopics/Advanced Batch Jobs"
m (→Array batch job) |
m (→Array batch job) |
||
Line 119: | Line 119: | ||
== Array batch job == | == Array batch job == | ||
− | An array batch job allows multiple jobs to be executed with identical parameters based on a single job submission | + | An array batch job allows multiple jobs to be executed with identical parameters based on a single job submission. |
− | + | This is implemented by using the following additional line in the job submission file: | |
− | + | {| class="wikitable" style="margin:auto" | |
+ | |+ Additional #SBATCH line | ||
+ | |- | ||
+ | ! Header text !! Header text !! Header text | ||
+ | |- | ||
+ | | {| class="wikitable" style="margin:auto" | ||
+ | |+ Caption text | ||
+ | |- | ||
+ | ! Header text !! Header text !! Header text | ||
+ | |- | ||
+ | | '''#SBATCH --array=1-10''' || (typical usage) the same job will be run 10 times, or | ||
+ | |- | ||
+ | | '''#SBATCH --array=1,6,16''' || certain elements ie 1,6 and 16 | ||
+ | |- | ||
+ | | '''#SBATCH --array=1,6,16,51-60''' || certain elements and a range | ||
+ | |- | ||
+ | | '''#SBATCH --array=1-15%4 ''' || a range, but limit simultaneous tasks to 4 concurrently. | ||
+ | |} | ||
− | + | * The variable ''$SLURM_ARRAY_TASK_ID'' can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc. | |
− | + | The following batch file will execute program_name/1.bin, program_name/2.bin to program/10.bin | |
− | |||
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 141: | Line 157: | ||
module add modulename | module add modulename | ||
− | + | program_name/$SLURM_ARRAY_TASK_ID.bin | |
+ | |||
</pre> | </pre> |
Revision as of 11:21, 10 November 2022
Contents
Other Directives
Mail When Done
Including the directive #SBATCH --mail-user=your email means you will receive an email when your job has finished.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=20:00:00 # Run for 20 hours #SBATCH --mail-user=your email # Mail to email address when finished module purge module add modulename command
Time >1 Day
Use #SBATCH -t=0-00:00:00 (D-HH:MM:SS).
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=3-20:00:00 # Run for 3 days and 20 hours module purge module add modulename command
Increase resources
Additional Memory
Use #SBATCH --mem=<amount>G to specify the amount of memory. For jobs needing more than 128GB you will need to use a high memory node.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --mem=50G # 500GB of memory #SBATCH -t=20:00:00 # Run for 20 hours module purge module add modulename command
Additional Nodes
Some parallel jobs require multiple nodes to be used, the number of nodes can be speficied by using: #SBATCH -N <number>.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -N 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t=20:00:00 # Run for 20 hours module purge module add modulename command
Other Partition Queues
highmem
Standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.
Use #SBATCH --exclusive for the full 1TB or # SBATCH --mem=<amount>G for a specific amount of RAM.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p highmem # Slurm partition, where you want the job to be queued #SBATCH --mem=500G # 500GB of memory #SBATCH -t=3-00:00:00 # Run for 3 days module purge module add modulename command
gpu
Use --gres=gpu to use the GPU resource instead of the CPU on the node.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH --gres=gpu # use the GPU resource not the CPU #SBATCH -p gpu # Slurm partition, where you want the job to be queued #SBATCH -t=00:40:00 # Run for 40 minutes module purge module add modulename command
Array batch job
An array batch job allows multiple jobs to be executed with identical parameters based on a single job submission.
This is implemented by using the following additional line in the job submission file:
Header text | Header text | Header text |
---|---|---|
class="wikitable" style="margin:auto" | ||
Header text | Header text | Header text |
#SBATCH --array=1-10 | (typical usage) the same job will be run 10 times, or | |
#SBATCH --array=1,6,16 | certain elements ie 1,6 and 16 | |
#SBATCH --array=1,6,16,51-60 | certain elements and a range | |
#SBATCH --array=1-15%4 | a range, but limit simultaneous tasks to 4 concurrently. |
- The variable $SLURM_ARRAY_TASK_ID can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.
The following batch file will execute program_name/1.bin, program_name/2.bin to program/10.bin
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%A.%a.out # Standard output will be written here #SBATCH -e %N.%A.%a.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --array 1-10 module purge module add modulename program_name/$SLURM_ARRAY_TASK_ID.bin