FurtherTopics/Advanced Batch Jobs

From HPC
Revision as of 09:56, 10 November 2022 by Pysdlb (talk | contribs) (highmem)

Jump to: navigation , search

highmem

If your task requires more memory than the standard provision (approximately 4GB). Then you need to include a directive in your submission script to request the appropriate resource. The standard compute nodes have 128GB of RAM available and there are dedicated high memory nodes that have a total of 1TB of RAM. If your job requires more than 128GB of RAM, submit it to the highmem partition.

Use --exclusive for the full 1TB or --mem=<amount>G for a specific amount of RAM.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH -p highmem          # Slurm partition, where you want the job to be queued 
#SBATCH --mem=500G          # 500GB of memory
#SBATCH -t=3-00:00:00       # Run for 3 days

 
module purge
module add modulename
 
command

gpu

Use --gres=gpu to use the GPU resource instead of the CPU on the node.

#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%j.out        # Standard output will be written here
#SBATCH -e %N.%j.err        # Standard error will be written here
#SBATCH --gres=gpu          # use the GPU resource not the CPU
#SBATCH -p gpu              # Slurm partition, where you want the job to be queued 
#SBATCH -t=00:40:00         # Run for 40 minutes

module purge
module add modulename
 
command

Array batch job

An array batch job allows multiple jobs to be executed with identical parameters based on single job submission. By using the directive #SBATCH --array 1-10 the same job will be run 10 times. The index specification identifies what array index values should be used. Multiple values may be specified using a comma-separated list and/or a range of values with a "-" separator. For example, "--array=0-15" or "--array=0,6,16-32".

A step function can also be specified with a suffix containing a colon and number. For example, "--array=0-15:4" is equivalent to "--array=0,4,8,12". A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.

The variable $SLURM_ARRAY_TASK_ID can be used within the batch script, being replaced by the index of the job, for example as part of the input or data filename, etc.

When the batch script below is submitted, 10 jobs will run resulting in the command being run with the first argument corresponding to the array element of that task, for instance: command 1, command 2, through to command 10. The output of each of these tasks will be logged to a different out and err file, with the format <node job ran on>.<job ID>.<array index>.out <node job ran on>.<job ID>.<array index>.err.

  • Note that #SBATCH --mail-user has not been specified here as it does not process array jobs
#!/bin/bash
#SBATCH -J jobname          # Job name, you can change it to whatever you want
#SBATCH -n 1                # Number of cores 
#SBATCH -o %N.%A.%a.out     # Standard output will be written here
#SBATCH -e %N.%A.%a.err     # Standard error will be written here
#SBATCH -p compute          # Slurm partition, where you want the job to be queued 
#SBATCH --array 1-10
 
module purge
module add modulename
 
command $SLURM_ARRAY_TASK_ID