- Description: SLURM is an open-source job scheduler, used by HPCs.
- Version: 15.08.8
The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).
- First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work.
- Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes.
- Thirdly, it arbitrates contention for resources by managing a queue of pending jobs.
Slurm takes your batch job submission and execute it across the computing nodes of Viper. How it is processed will depend on a number of factors including the queue it is submitted to, jobs already submitted to the queue etc.
Common Slurm Commands
|sbatch||Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.|
|squeue||Used to view job and job step information for jobs managed by SLURM.|
|scancel||Used to signal or cancel jobs, job arrays or job steps.|
|sinfo||Used to view partition and node information for a system running SLURM.|
Used to submit a job to Slurm.
[username@login01 ~]$ sbatch jobfile.job Submitted batch job 289535
|The number displayed (289535) is the Job ID|
squeue shows information about jobs in the scheduling queue
[username@login01 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 306414 compute clasDFT user R 16:36 1 c006 306413 compute mpi_benc user R 31:02 2 c[005,007] 306411 compute orca_1n user R 1:00:32 1 c004 306410 compute orca_1n user R 1:04:17 1 c003 306409 highmem cnv_obit user R 11:37:17 1 c232 306407 compute 20M4_20 user R 11:45:54 1 c012 306406 compute 20_ML_20 user R 11:55:40 1 c012
|JOBID||The unique identifier assinged to a job|
|PARTITION||The type of node the job is running on e.g. compute, highmem, GPU|
|NAME||Name of job|
|USER||User ID of job owner|
|ST||Job state code e.g. R stands for 'Running'|
|TIME||Length of time a job has been running|
|NODES||Amount of nodes a job is running on|
|NODELIST(REASON)||List of nodes a job is running on, also provides reason a job is not running e.g. a dependency on a node.|
scancel is used to cancel currently running jobs. Only jobs running under your userid may be canceled.
[username@login01 ~]$ scancel 289535
|No output is given by the command.|
sinfo shows information on the partitions and nodes in the cluster.
[username@login01 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up 2-00:00:00 9 mix c[006,012,014,016,018-020,022,170] compute* up 2-00:00:00 11 alloc c[003-004,008,015,046,086,093,098,138,167-168] compute* up 2-00:00:00 156 idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176] highmem up 4-00:00:00 1 mix c230 highmem up 4-00:00:00 2 alloc c[231-232] highmem up 4-00:00:00 1 idle c233 gpu up 5-00:00:00 4 idle gpu[01-04]
|PARTITION||A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU|
|AVAIL||Availability of a specific partition|
|TIMELIMIT||Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days.|
|NODES||Number of nodes in a specific state/partition|
|STATE||The current status of a partition/group of nodes e.g. alloc (allocated)|
|NODELIST||List of nodes in specific state/partition|
Common Submission Flags
For ease and repetition it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands.
Note: The use of the --exclusive flag to indicate whether you require the whole node for your job, if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node.
|-J / --job-name||Specify a name for the job|
|-N / --nodes||Specifies the number of nodes to be allocated to a job|
|-n / --ntasks||Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28|
|-o / --output||Specifies the name of the output file|
|-e / --error||Specifies the name of the error file|
|-p / --partition||Specifies the specific partition for the job e.g. compute, highmem, gpu|
|--exclusive||Requests exclusive access to nodes preventing other jobs from running|
Example Job Submission Script
#!/bin/bash #SBATCH -J Example_Slurm_Job #SBATCH -N 1 #SBATCH -n 28 #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p compute #SBATCH --exclusive echo $SLURM_JOB_NODELIST module purge module add gcc/4.9.3 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no /home/user/slurmexample
For more information on creating batch jobs visit the Batch Jobs guide
- Slurm Website: https://slurm.schedmd.com/
- Slurm Rosetta (Useful for converting submission scripts from other formats)
- You might find applications' specific submission scripts here Application support