Difference between revisions of "General/Slurm"
MSummerbell (talk | contribs) (→Common Slurm Commands) |
m |
||
(55 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | + | __TOC__ | |
− | + | ==Application Details== | |
+ | *Description: SLURM is an open-source job scheduler, used by HPCs. | ||
* Version: 15.08.8 | * Version: 15.08.8 | ||
− | |||
− | |||
+ | ==Introduction== | ||
+ | The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters). | ||
+ | *First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. | ||
+ | *Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. | ||
+ | *Thirdly, it arbitrates contention for resources by managing a queue of pending jobs. | ||
− | + | Slurm takes your batch job submission and executes it across the computing nodes of Viper. How it is processed will depend on a number of factors including the queue it is submitted to, jobs already submitted to the queue etc. | |
+ | |||
+ | * [https://viper.hull.ac.uk/Main/SBuilder/ SLURM builder] is available on viper.hull.ac.uk for ready-made custom scripts. | ||
+ | |||
+ | ==Common Slurm Commands== | ||
{| class="wikitable" | {| class="wikitable" | ||
| style="width:25%" | <Strong>Command</Strong> | | style="width:25%" | <Strong>Command</Strong> | ||
Line 12: | Line 20: | ||
|- | |- | ||
| sbatch | | sbatch | ||
− | | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no | + | | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input. |
|- | |- | ||
| squeue | | squeue | ||
Line 24: | Line 32: | ||
|- | |- | ||
|} | |} | ||
+ | ===sbatch=== | ||
+ | Used to submit a job to Slurm. | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ sbatch jobfile.job | ||
+ | Submitted batch job 289535 | ||
+ | </pre> | ||
+ | {| | ||
+ | |style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]] | ||
+ | |style="width:95%; border-width: 0;cellpadding=0" | The number displayed (289535) is the Job ID | ||
+ | |- | ||
+ | |} | ||
+ | ===squeue=== | ||
+ | squeue shows information about jobs in the scheduling queue | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ squeue | ||
+ | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
+ | 306414 compute clasDFT user R 16:36 1 c006 | ||
+ | 306413 compute mpi_benc user R 31:02 2 c[005,007] | ||
+ | 306411 compute orca_1n user R 1:00:32 1 c004 | ||
+ | 306410 compute orca_1n user R 1:04:17 1 c003 | ||
+ | 306409 highmem cnv_obit user R 11:37:17 1 c232 | ||
+ | 306407 compute 20M4_20 user R 11:45:54 1 c012 | ||
+ | 306406 compute 20_ML_20 user R 11:55:40 1 c012 | ||
+ | </pre> | ||
+ | {| class="wikitable" | ||
+ | | style="width:25%" | <Strong>Heading</Strong> | ||
+ | | style="width:75%" | <Strong>Description</Strong> | ||
+ | |- | ||
+ | | JOBID | ||
+ | | The unique identifier assigned to a job | ||
+ | |- | ||
+ | | PARTITION | ||
+ | | The type of node the job is running on e.g. compute, highmem, GPU | ||
+ | |- | ||
+ | | NAME | ||
+ | | Name of job | ||
+ | |- | ||
+ | | USER | ||
+ | | User ID of job owner | ||
+ | |- | ||
+ | | ST | ||
+ | | Job state code e.g. R stands for 'Running' | ||
+ | |- | ||
+ | | TIME | ||
+ | | Length of time a job has been running | ||
+ | |- | ||
+ | | NODES | ||
+ | | Amount of nodes a job is running on | ||
+ | |- | ||
+ | | NODELIST(REASON) | ||
+ | | List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node. | ||
+ | |} | ||
+ | |||
+ | ===scancel=== | ||
+ | scancel is used to cancel currently running jobs. Only jobs running under your userid may be cancelled. | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ scancel 289535</pre> | ||
+ | {| | ||
+ | |style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]] | ||
+ | |style="width:95%; border-width: 0;cellpadding=0" | No output is given by the command. | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ===sinfo=== | ||
+ | sinfo shows the information on the partitions and nodes in the cluster. | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ sinfo | ||
+ | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
+ | compute* up 2-00:00:00 9 mix c[006,012,014,016,018-020,022,170] | ||
+ | compute* up 2-00:00:00 11 alloc c[003-004,008,015,046,086,093,098,138,167-168] | ||
+ | compute* up 2-00:00:00 156 idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176] | ||
+ | highmem up 4-00:00:00 1 mix c230 | ||
+ | highmem up 4-00:00:00 2 alloc c[231-232] | ||
+ | highmem up 4-00:00:00 1 idle c233 | ||
+ | gpu up 5-00:00:00 4 idle gpu[01-04] | ||
+ | </pre> | ||
+ | |||
+ | {| class="wikitable" | ||
+ | | style="width:25%" | <Strong>Heading</Strong> | ||
+ | | style="width:75%" | <Strong>Description</Strong> | ||
+ | |- | ||
+ | | PARTITION | ||
+ | | A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU | ||
+ | |- | ||
+ | | AVAIL | ||
+ | | Availability of a specific partition | ||
+ | |- | ||
+ | | TIMELIMIT | ||
+ | | Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days. | ||
+ | |- | ||
+ | | NODES | ||
+ | | Number of nodes in a specific state/partition | ||
+ | |- | ||
+ | | STATE | ||
+ | | The current status of a partition/group of nodes e.g. alloc (allocated) | ||
+ | |- | ||
+ | | NODELIST | ||
+ | | List of nodes in specific state/partition | ||
+ | |} | ||
+ | |||
+ | ==Common Submission Flags== | ||
+ | For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands. | ||
+ | |||
+ | '''Note''': The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | | style="width:25%" | <Strong>Flag</Strong> | ||
+ | | style="width:75%" | <Strong>Description</Strong> | ||
+ | |- | ||
+ | | -J / --job-name | ||
+ | | Specify a name for the job | ||
+ | |- | ||
+ | | -N / --nodes | ||
+ | | Specifies the number of nodes to be allocated to a job | ||
+ | |- | ||
+ | | -n / --ntasks | ||
+ | | Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28 | ||
+ | |- | ||
+ | | -o / --output | ||
+ | | Specifies the name of the output file | ||
+ | |- | ||
+ | | -e / --error | ||
+ | | Specifies the name of the error file | ||
+ | |- | ||
+ | | -p / --partition | ||
+ | |Specifies the specific partition for the job e.g. compute, highmem or GPU. | ||
+ | |- | ||
+ | | --exclusive | ||
+ | |Requests exclusive access to nodes preventing other jobs from running | ||
+ | |- | ||
+ | |} | ||
+ | ===Example Job Submission Script=== | ||
+ | <pre style="background-color: #E5E4E2; color: black; font-family: monospace, sans-serif;"> | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH -J Example_Slurm_Job | ||
+ | #SBATCH -N 1 | ||
+ | #SBATCH -n 28 | ||
+ | #SBATCH -o %N.%j.%a.out | ||
+ | #SBATCH -e %N.%j.%a.err | ||
+ | #SBATCH -p compute | ||
+ | #SBATCH --exclusive | ||
+ | #SBATCH --mail-user= your email address here | ||
+ | |||
+ | echo $SLURM_JOB_NODELIST | ||
+ | |||
+ | module purge | ||
+ | module add gcc/8.2.0 | ||
+ | |||
+ | export I_MPI_DEBUG=5 | ||
+ | export I_MPI_FABRICS=shm:tmi | ||
+ | export I_MPI_FALLBACK=no | ||
+ | |||
+ | /home/user/slurmexample | ||
+ | </pre> | ||
+ | |||
+ | ===See Also=== | ||
+ | |||
+ | For more information on creating batch jobs visit the [[General/Batch|Batch Jobs]] guide | ||
+ | |||
+ | ==Next Steps== | ||
+ | * Slurm Website: [https://slurm.schedmd.com/ https://slurm.schedmd.com/] | ||
+ | * [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats) | ||
+ | * You might find applications' specific submission scripts here [http://hpc.mediawiki.hull.ac.uk/Main_Page#Application_Support Application support] | ||
+ | |||
+ | |||
+ | ==Navigation== | ||
+ | |||
+ | * [[Main_Page|Home]] | ||
+ | * [[Applications|Application support]] | ||
+ | * [[General|General]] * | ||
+ | * [[Programming|Programming support]] |
Revision as of 16:50, 8 November 2022
Contents
Application Details
- Description: SLURM is an open-source job scheduler, used by HPCs.
- Version: 15.08.8
Introduction
The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).
- First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work.
- Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes.
- Thirdly, it arbitrates contention for resources by managing a queue of pending jobs.
Slurm takes your batch job submission and executes it across the computing nodes of Viper. How it is processed will depend on a number of factors including the queue it is submitted to, jobs already submitted to the queue etc.
- SLURM builder is available on viper.hull.ac.uk for ready-made custom scripts.
Common Slurm Commands
Command | Description |
sbatch | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input. |
squeue | Used to view job and job step information for jobs managed by SLURM. |
scancel | Used to signal or cancel jobs, job arrays or job steps. |
sinfo | Used to view partition and node information for a system running SLURM. |
sbatch
Used to submit a job to Slurm.
[username@login01 ~]$ sbatch jobfile.job Submitted batch job 289535
The number displayed (289535) is the Job ID |
squeue
squeue shows information about jobs in the scheduling queue
[username@login01 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 306414 compute clasDFT user R 16:36 1 c006 306413 compute mpi_benc user R 31:02 2 c[005,007] 306411 compute orca_1n user R 1:00:32 1 c004 306410 compute orca_1n user R 1:04:17 1 c003 306409 highmem cnv_obit user R 11:37:17 1 c232 306407 compute 20M4_20 user R 11:45:54 1 c012 306406 compute 20_ML_20 user R 11:55:40 1 c012
Heading | Description |
JOBID | The unique identifier assigned to a job |
PARTITION | The type of node the job is running on e.g. compute, highmem, GPU |
NAME | Name of job |
USER | User ID of job owner |
ST | Job state code e.g. R stands for 'Running' |
TIME | Length of time a job has been running |
NODES | Amount of nodes a job is running on |
NODELIST(REASON) | List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node. |
scancel
scancel is used to cancel currently running jobs. Only jobs running under your userid may be cancelled.
[username@login01 ~]$ scancel 289535
No output is given by the command. |
sinfo
sinfo shows the information on the partitions and nodes in the cluster.
[username@login01 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up 2-00:00:00 9 mix c[006,012,014,016,018-020,022,170] compute* up 2-00:00:00 11 alloc c[003-004,008,015,046,086,093,098,138,167-168] compute* up 2-00:00:00 156 idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176] highmem up 4-00:00:00 1 mix c230 highmem up 4-00:00:00 2 alloc c[231-232] highmem up 4-00:00:00 1 idle c233 gpu up 5-00:00:00 4 idle gpu[01-04]
Heading | Description |
PARTITION | A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU |
AVAIL | Availability of a specific partition |
TIMELIMIT | Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days. |
NODES | Number of nodes in a specific state/partition |
STATE | The current status of a partition/group of nodes e.g. alloc (allocated) |
NODELIST | List of nodes in specific state/partition |
Common Submission Flags
For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands.
Note: The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node.
Flag | Description |
-J / --job-name | Specify a name for the job |
-N / --nodes | Specifies the number of nodes to be allocated to a job |
-n / --ntasks | Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28 |
-o / --output | Specifies the name of the output file |
-e / --error | Specifies the name of the error file |
-p / --partition | Specifies the specific partition for the job e.g. compute, highmem or GPU. |
--exclusive | Requests exclusive access to nodes preventing other jobs from running |
Example Job Submission Script
#!/bin/bash #SBATCH -J Example_Slurm_Job #SBATCH -N 1 #SBATCH -n 28 #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p compute #SBATCH --exclusive #SBATCH --mail-user= your email address here echo $SLURM_JOB_NODELIST module purge module add gcc/8.2.0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no /home/user/slurmexample
See Also
For more information on creating batch jobs visit the Batch Jobs guide
Next Steps
- Slurm Website: https://slurm.schedmd.com/
- Slurm Rosetta (Useful for converting submission scripts from other formats)
- You might find applications' specific submission scripts here Application support