Difference between revisions of "FurtherTopics/Advanced Batch Jobs"

From HPC
Jump to: navigation , search
(highmem)
(Time >1 Day)
 
(47 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==highmem==
+
[[Quickstart/Batch Jobs #More detailed scripts| Back to Batch Jobs Quickstart]]
If your task requires more memory than the standard provision (approximately 4GB). Then you need to include a directive in your submission script to request the appropriate resource. The standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.
 
  
Use ''--exclusive'' for the full 1TB or ''--mem=<amount>G'' for a specific amount of RAM.
+
[[FurtherTopics/FurtherTopics #Batch Jobs| Back to Further Topics]]
 +
==Other Directives==
 +
===Mail When Done===
 +
Including the directive ''#SBATCH --mail-user=your email'' means you will receive an email when your job has finished.
 +
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 +
#!/bin/bash
 +
#SBATCH -J jobname                # Job name, you can change it to whatever you want
 +
#SBATCH -n 4                      # Number of cores
 +
#SBATCH -o %N.%j.out              # Standard output will be written here
 +
#SBATCH -e %N.%j.err              # Standard error will be written here
 +
#SBATCH -p compute                # Slurm partition, where you want the job to be queued
 +
#SBATCH -t 20:00:00              # Run for 20 hours
 +
#SBATCH --mail-user=your email    # Mail to email address when finished
 +
 
 +
 
 +
module purge
 +
module add modulename
 +
 +
command
 +
</pre>
 +
 
 +
===Time >1 Day===
 +
Use ''#SBATCH -t 0-00:00:00'' (D-HH:MM:SS).
 +
 
 +
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 +
#!/bin/bash
 +
#SBATCH -J jobname                # Job name, you can change it to whatever you want
 +
#SBATCH -n 4                      # Number of cores
 +
#SBATCH -o %N.%j.out              # Standard output will be written here
 +
#SBATCH -e %N.%j.err              # Standard error will be written here
 +
#SBATCH -p compute                # Slurm partition, where you want the job to be queued
 +
#SBATCH -t 3-20:00:00            # Run for 3 days and 20 hours
 +
 
 +
 +
module purge
 +
module add modulename
 +
 +
command
 +
</pre>
 +
 
 +
==Increase resources==
 +
===Additional Memory===
 +
Use ''#SBATCH --mem=<amount>G'' to specify the amount of memory. For jobs needing more than 128GB you will need to use a [[#highmem| high memory node]].
 +
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 +
#!/bin/bash
 +
#SBATCH -J jobname          # Job name, you can change it to whatever you want
 +
#SBATCH -n 1                # Number of cores
 +
#SBATCH -o %N.%j.out        # Standard output will be written here
 +
#SBATCH -e %N.%j.err        # Standard error will be written here
 +
#SBATCH -p compute          # Slurm partition, where you want the job to be queued
 +
#SBATCH --mem=50G          # 500GB of memory
 +
#SBATCH -t 20:00:00        # Run for 20 hours
 +
 
 +
 +
module purge
 +
module add modulename
 +
 +
command
 +
</pre>
 +
 
 +
===Additional Nodes===
 +
Some parallel jobs require multiple nodes to be used, the number of nodes can be speficied by using: ''#SBATCH -N <number>''.
 +
 
 +
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 +
#!/bin/bash
 +
#SBATCH -J jobname          # Job name, you can change it to whatever you want
 +
#SBATCH -N 4                # Number of cores
 +
#SBATCH -o %N.%j.out        # Standard output will be written here
 +
#SBATCH -e %N.%j.err        # Standard error will be written here
 +
#SBATCH -p compute          # Slurm partition, where you want the job to be queued
 +
#SBATCH -t 20:00:00        # Run for 20 hours
 +
 
 +
 
 +
module purge
 +
module add modulename
 +
 +
command
 +
</pre>
 +
 
 +
==Other Partition Queues==
 +
===highmem===
 +
Standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.
 +
 
 +
Use ''#SBATCH --exclusive'' for the full 1TB or ''# SBATCH --mem=<amount>G'' for a specific amount of RAM.
  
 
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
Line 12: Line 94:
 
#SBATCH -p highmem          # Slurm partition, where you want the job to be queued  
 
#SBATCH -p highmem          # Slurm partition, where you want the job to be queued  
 
#SBATCH --mem=500G          # 500GB of memory
 
#SBATCH --mem=500G          # 500GB of memory
#SBATCH -t=3-00:00:00      # Run for 3 days
+
#SBATCH -t 3-00:00:00      # Run for 3 days
  
 
   
 
   
Line 21: Line 103:
 
</pre>
 
</pre>
  
==gpu==
+
===GPU===
 
Use ''--gres=gpu'' to use the GPU resource instead of the CPU on the node.
 
Use ''--gres=gpu'' to use the GPU resource instead of the CPU on the node.
<pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
+
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 
#!/bin/bash
 
#!/bin/bash
 
#SBATCH -J jobname          # Job name, you can change it to whatever you want
 
#SBATCH -J jobname          # Job name, you can change it to whatever you want
Line 31: Line 113:
 
#SBATCH --gres=gpu          # use the GPU resource not the CPU
 
#SBATCH --gres=gpu          # use the GPU resource not the CPU
 
#SBATCH -p gpu              # Slurm partition, where you want the job to be queued  
 
#SBATCH -p gpu              # Slurm partition, where you want the job to be queued  
#SBATCH -t=00:40:00        # Run for 40 minutes
+
#SBATCH -t 00:40:00        # Run for 40 minutes
  
 
module purge
 
module purge
Line 39: Line 121:
 
</pre>
 
</pre>
  
==== Array batch job ====
+
== Array batch job ==
An array batch job allows multiple jobs to be executed with identical parameters based on single job submission. By using the directive '''#SBATCH --array 1-10''' the same job will be run 10 times. The index specification identifies what array index values should be used. Multiple values may be specified using a comma-separated list and/or a range of values with a "-" separator. For example, "--array=0-15" or "--array=0,6,16-32".  
+
 
 +
Array batch jobs offer a mechanism for submitting and managing collections of similar jobs quickly and easily with one batch file. This is very useful if you have multiple data sets that need processing separately but in the same way. Without this mechanism, you would have to write a separate batch file for each.
 +
 
 +
* All tasks are run by the scheduler and indexed by the variable ''$SLURM_ARRAY_TASK_ID''.
 +
* All tasks must have the same initial options (e.g. size, time limit, etc.)
 +
* All array tasks can be cancelled with a single ''scancel'' command
 +
* It is implemented by using the following additional line '''#SBATCH --array=...''' in the job submission file, see the different examples below:
 +
 
 +
 
 +
{| class="wikitable" style="margin:auto"
 +
|-
 +
'''#SBATCH --array=1-10''' || ('''typical usage''') the same job will be run 10 times, or
 +
|-
 +
| '''#SBATCH --array=1,6,16''' || certain elements ie 1,6 and 16, or
 +
|-
 +
|  '''#SBATCH --array=1,6,16,51-60''' || certain elements and a range, or
 +
|-
 +
| '''#SBATCH --array=1-15%4 ''' || a range, but limit the total number of tasks running to 4.
 +
|-
 +
|}
 +
 
  
A step function can also be specified with a suffix containing a colon and number. For example, "--array=0-15:4" is equivalent to "--array=0,4,8,12".  A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator.  For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
+
* As mentioned the variable ''$SLURM_ARRAY_TASK_ID'' can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.
  
The variable ''$SLURM_ARRAY_TASK_ID'' can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.
 
  
When the batch script below is submitted, 10 jobs will run resulting in the ''command'' being run with the first argument corresponding to the array element of that task, for instance: ''command 1'', ''command 2'', through to ''command 10''. The output of each of these tasks will be logged to a different out and err file, with the format ''<node job ran on>.<job ID>.<array index>.out'' ''<node job ran on>.<job ID>.<array index>.err''.  
+
The following batch file shows how this is implemented and will execute '''calcFFTs db1.data''' to '''calcFFTs db10.data'''.  
 +
* This will schedule ten instances of calcFFTs with the range of dbN.data sets. (N is between 1 to 10).
  
* Note that '''#SBATCH --mail-user''' has not been specified here as it does not process array jobs
 
  
<pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
+
<pre class="mw-collapsible" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;">
 
#!/bin/bash
 
#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
+
#SBATCH -J job        # Job name, you can change it to whatever you want
 
#SBATCH -n 1                # Number of cores  
 
#SBATCH -n 1                # Number of cores  
 
#SBATCH -o %N.%A.%a.out    # Standard output will be written here
 
#SBATCH -o %N.%A.%a.out    # Standard output will be written here
 
#SBATCH -e %N.%A.%a.err    # Standard error will be written here
 
#SBATCH -e %N.%A.%a.err    # Standard error will be written here
 
#SBATCH -p compute          # Slurm partition, where you want the job to be queued  
 
#SBATCH -p compute          # Slurm partition, where you want the job to be queued  
#SBATCH --array 1-10
+
#SBATCH --array=1-10
 
   
 
   
 
module purge
 
module purge
 
module add modulename
 
module add modulename
 
   
 
   
command $SLURM_ARRAY_TASK_ID
+
calcFFTs db$SLURM_ARRAY_TASK_ID.data
 +
</pre>
 +
 
 +
===Viewing the job tasks===
 +
 
 +
<pre class="mw-collapsible" style="background-color: #000000; color: white; font-family: monospace, courier;">
 +
$ squeue
 +
JOBID    PARTITION  NAME  USER  ST  TIME  NODES NODELIST(REASON)
 +
1080_[6-10]  compute  job  mac  PD  0:00      1 (Resources)
 +
1080_1        compute  job  mac  R  0:17      1 c001
 +
1080_2        compute  job  mac  R  0:16      1 c002
 +
1080_3        compute  job  mac  R  0:03      1 c003
 +
1080_4        compute  job  mac  R  0:03      1 c004
 
</pre>
 
</pre>
 +
 +
* Looking at our job id (1080) it now has multiple job task with 1080_1, 1080_2 etc.
 +
* See [https://slurm.schedmd.com/job_array.html Job arrays on slurm.schedmd.com]
 +
 +
 +
 +
[[FurtherTopics/FurtherTopics #Batch Jobs| Back to Further Topics]]  /  [[Quickstart/Batch Jobs| Back to Batch Jobs Quickstart]]  /    [[Main Page #Quickstart| Main Page]]

Latest revision as of 13:26, 3 August 2023

Back to Batch Jobs Quickstart

Back to Further Topics

Other Directives

Mail When Done

Including the directive #SBATCH --mail-user=your email means you will receive an email when your job has finished.

#!/bin/bash
#SBATCH -J jobname                # Job name, you can change it to whatever you want
#SBATCH -n 4                      # Number of cores 
#SBATCH -o %N.%j.out              # Standard output will be written here
#SBATCH -e %N.%j.err              # Standard error will be written here
#SBATCH -p compute                # Slurm partition, where you want the job to be queued 
#SBATCH -t 20:00:00               # Run for 20 hours
#SBATCH --mail-user=your email    # Mail to email address when finished


module purge
module add modulename
 
command

Time >1 Day

Use #SBATCH -t 0-00:00:00 (D-HH:MM:SS).

#!/bin/bash
#SBATCH -J jobname                # Job name, you can change it to whatever you want
#SBATCH -n 4                      # Number of cores 
#SBATCH -o %N.%j.out              # Standard output will be written here
#SBATCH -e %N.%j.err              # Standard error will be written here
#SBATCH -p compute                # Slurm partition, where you want the job to be queued 
#SBATCH -t 3-20:00:00             # Run for 3 days and 20 hours

 
module purge
module add modulename
 
command

Increase resources

Additional Memory

Use #SBATCH --mem=<amount>G to specify the amount of memory. For jobs needing more than 128GB you will need to use a high memory node.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH --mem=50G           # 500GB of memory
#SBATCH -t 20:00:00         # Run for 20 hours

 
module purge
module add modulename
 
command

Additional Nodes

Some parallel jobs require multiple nodes to be used, the number of nodes can be speficied by using: #SBATCH -N <number>.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -N 4                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH -t 20:00:00         # Run for 20 hours


module purge
module add modulename
 
command

Other Partition Queues

highmem

Standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.

Use #SBATCH --exclusive for the full 1TB or # SBATCH --mem=<amount>G for a specific amount of RAM.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p highmem          # Slurm partition, where you want the job to be queued 
#SBATCH --mem=500G          # 500GB of memory
#SBATCH -t 3-00:00:00       # Run for 3 days

 
module purge
module add modulename
 
command

GPU

Use --gres=gpu to use the GPU resource instead of the CPU on the node.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH --gres=gpu          # use the GPU resource not the CPU
#SBATCH -p gpu              # Slurm partition, where you want the job to be queued 
#SBATCH -t 00:40:00         # Run for 40 minutes

module purge
module add modulename
 
command

Array batch job

Array batch jobs offer a mechanism for submitting and managing collections of similar jobs quickly and easily with one batch file. This is very useful if you have multiple data sets that need processing separately but in the same way. Without this mechanism, you would have to write a separate batch file for each.

  • All tasks are run by the scheduler and indexed by the variable $SLURM_ARRAY_TASK_ID.
  • All tasks must have the same initial options (e.g. size, time limit, etc.)
  • All array tasks can be cancelled with a single scancel command
  • It is implemented by using the following additional line #SBATCH --array=... in the job submission file, see the different examples below:


#SBATCH --array=1-10 (typical usage) the same job will be run 10 times, or
#SBATCH --array=1,6,16 certain elements ie 1,6 and 16, or
#SBATCH --array=1,6,16,51-60 certain elements and a range, or
#SBATCH --array=1-15%4 a range, but limit the total number of tasks running to 4.


  • As mentioned the variable $SLURM_ARRAY_TASK_ID can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.


The following batch file shows how this is implemented and will execute calcFFTs db1.data to calcFFTs db10.data.

  • This will schedule ten instances of calcFFTs with the range of dbN.data sets. (N is between 1 to 10).


#!/bin/bash
#SBATCH -J job         # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%A.%a.out     # Standard output will be written here
#SBATCH -e %N.%A.%a.err     # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH --array=1-10
 
module purge
module add modulename
 
calcFFTs db$SLURM_ARRAY_TASK_ID.data

Viewing the job tasks

$ squeue
 JOBID     PARTITION  NAME  USER  ST  TIME  NODES NODELIST(REASON)
1080_[6-10]  compute   job  mac  PD  0:00      1 (Resources)
1080_1        compute   job   mac   R  0:17      1 c001
1080_2        compute   job   mac   R  0:16      1 c002
1080_3        compute   job   mac   R  0:03      1 c003
1080_4        compute   job   mac   R  0:03      1 c004


Back to Further Topics / Back to Batch Jobs Quickstart / Main Page