Difference between revisions of "FurtherTopics/Advanced Batch Jobs"
m (→Array batch job) |
(→Time >1 Day) |
||
(12 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Quickstart/Batch Jobs #More detailed scripts| Back to Batch Jobs Quickstart]] | ||
+ | |||
+ | [[FurtherTopics/FurtherTopics #Batch Jobs| Back to Further Topics]] | ||
==Other Directives== | ==Other Directives== | ||
===Mail When Done=== | ===Mail When Done=== | ||
Line 9: | Line 12: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 20:00:00 # Run for 20 hours |
#SBATCH --mail-user=your email # Mail to email address when finished | #SBATCH --mail-user=your email # Mail to email address when finished | ||
Line 20: | Line 23: | ||
===Time >1 Day=== | ===Time >1 Day=== | ||
− | Use ''#SBATCH -t | + | Use ''#SBATCH -t 0-00:00:00'' (D-HH:MM:SS). |
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 29: | Line 32: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 3-20:00:00 # Run for 3 days and 20 hours |
Line 49: | Line 52: | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
#SBATCH --mem=50G # 500GB of memory | #SBATCH --mem=50G # 500GB of memory | ||
− | #SBATCH -t | + | #SBATCH -t 20:00:00 # Run for 20 hours |
Line 68: | Line 71: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 20:00:00 # Run for 20 hours |
Line 91: | Line 94: | ||
#SBATCH -p highmem # Slurm partition, where you want the job to be queued | #SBATCH -p highmem # Slurm partition, where you want the job to be queued | ||
#SBATCH --mem=500G # 500GB of memory | #SBATCH --mem=500G # 500GB of memory | ||
− | #SBATCH -t | + | #SBATCH -t 3-00:00:00 # Run for 3 days |
Line 110: | Line 113: | ||
#SBATCH --gres=gpu # use the GPU resource not the CPU | #SBATCH --gres=gpu # use the GPU resource not the CPU | ||
#SBATCH -p gpu # Slurm partition, where you want the job to be queued | #SBATCH -p gpu # Slurm partition, where you want the job to be queued | ||
− | #SBATCH -t | + | #SBATCH -t 00:40:00 # Run for 40 minutes |
module purge | module purge | ||
Line 144: | Line 147: | ||
− | The following batch file shows how this is implemented and will execute '''calcFFTs db1.data''' to '''calcFFTs db10.data'''. This will schedule ten instances of calcFFTs with the range of dbN.data sets. | + | The following batch file shows how this is implemented and will execute '''calcFFTs db1.data''' to '''calcFFTs db10.data'''. |
+ | * This will schedule ten instances of calcFFTs with the range of dbN.data sets. (N is between 1 to 10). | ||
+ | |||
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 163: | Line 168: | ||
===Viewing the job tasks=== | ===Viewing the job tasks=== | ||
− | <pre class="mw-collapsible" style="background-color: #000000; color: | + | <pre class="mw-collapsible" style="background-color: #000000; color: white; font-family: monospace, courier;"> |
$ squeue | $ squeue | ||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
− | 1080_[ | + | 1080_[6-10] compute job mac PD 0:00 1 (Resources) |
− | 1080_1 | + | 1080_1 compute job mac R 0:17 1 c001 |
1080_2 compute job mac R 0:16 1 c002 | 1080_2 compute job mac R 0:16 1 c002 | ||
1080_3 compute job mac R 0:03 1 c003 | 1080_3 compute job mac R 0:03 1 c003 | ||
1080_4 compute job mac R 0:03 1 c004 | 1080_4 compute job mac R 0:03 1 c004 | ||
</pre> | </pre> | ||
+ | |||
+ | * Looking at our job id (1080) it now has multiple job task with 1080_1, 1080_2 etc. | ||
+ | * See [https://slurm.schedmd.com/job_array.html Job arrays on slurm.schedmd.com] | ||
+ | |||
+ | |||
+ | |||
+ | [[FurtherTopics/FurtherTopics #Batch Jobs| Back to Further Topics]] / [[Quickstart/Batch Jobs| Back to Batch Jobs Quickstart]] / [[Main Page #Quickstart| Main Page]] |
Latest revision as of 13:26, 3 August 2023
Contents
Other Directives
Mail When Done
Including the directive #SBATCH --mail-user=your email means you will receive an email when your job has finished.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 20:00:00 # Run for 20 hours #SBATCH --mail-user=your email # Mail to email address when finished module purge module add modulename command
Time >1 Day
Use #SBATCH -t 0-00:00:00 (D-HH:MM:SS).
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 3-20:00:00 # Run for 3 days and 20 hours module purge module add modulename command
Increase resources
Additional Memory
Use #SBATCH --mem=<amount>G to specify the amount of memory. For jobs needing more than 128GB you will need to use a high memory node.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --mem=50G # 500GB of memory #SBATCH -t 20:00:00 # Run for 20 hours module purge module add modulename command
Additional Nodes
Some parallel jobs require multiple nodes to be used, the number of nodes can be speficied by using: #SBATCH -N <number>.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -N 4 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 20:00:00 # Run for 20 hours module purge module add modulename command
Other Partition Queues
highmem
Standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.
Use #SBATCH --exclusive for the full 1TB or # SBATCH --mem=<amount>G for a specific amount of RAM.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p highmem # Slurm partition, where you want the job to be queued #SBATCH --mem=500G # 500GB of memory #SBATCH -t 3-00:00:00 # Run for 3 days module purge module add modulename command
GPU
Use --gres=gpu to use the GPU resource instead of the CPU on the node.
#!/bin/bash #SBATCH -J jobname # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH --gres=gpu # use the GPU resource not the CPU #SBATCH -p gpu # Slurm partition, where you want the job to be queued #SBATCH -t 00:40:00 # Run for 40 minutes module purge module add modulename command
Array batch job
Array batch jobs offer a mechanism for submitting and managing collections of similar jobs quickly and easily with one batch file. This is very useful if you have multiple data sets that need processing separately but in the same way. Without this mechanism, you would have to write a separate batch file for each.
- All tasks are run by the scheduler and indexed by the variable $SLURM_ARRAY_TASK_ID.
- All tasks must have the same initial options (e.g. size, time limit, etc.)
- All array tasks can be cancelled with a single scancel command
- It is implemented by using the following additional line #SBATCH --array=... in the job submission file, see the different examples below:
#SBATCH --array=1-10 | (typical usage) the same job will be run 10 times, or |
#SBATCH --array=1,6,16 | certain elements ie 1,6 and 16, or |
#SBATCH --array=1,6,16,51-60 | certain elements and a range, or |
#SBATCH --array=1-15%4 | a range, but limit the total number of tasks running to 4. |
- As mentioned the variable $SLURM_ARRAY_TASK_ID can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.
The following batch file shows how this is implemented and will execute calcFFTs db1.data to calcFFTs db10.data.
- This will schedule ten instances of calcFFTs with the range of dbN.data sets. (N is between 1 to 10).
#!/bin/bash #SBATCH -J job # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores #SBATCH -o %N.%A.%a.out # Standard output will be written here #SBATCH -e %N.%A.%a.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH --array=1-10 module purge module add modulename calcFFTs db$SLURM_ARRAY_TASK_ID.data
Viewing the job tasks
$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1080_[6-10] compute job mac PD 0:00 1 (Resources) 1080_1 compute job mac R 0:17 1 c001 1080_2 compute job mac R 0:16 1 c002 1080_3 compute job mac R 0:03 1 c003 1080_4 compute job mac R 0:03 1 c004
- Looking at our job id (1080) it now has multiple job task with 1080_1, 1080_2 etc.
- See Job arrays on slurm.schedmd.com
Back to Further Topics / Back to Batch Jobs Quickstart / Main Page