Difference between revisions of "General/Slurm"
MSummerbell (talk | contribs) |
m (→Introduction) |
||
(40 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | __TOC__ | ||
==Application Details== | ==Application Details== | ||
+ | *Description: SLURM is an open-source job scheduler, used by HPCs. | ||
* Version: 15.08.8 | * Version: 15.08.8 | ||
− | |||
− | |||
==Introduction== | ==Introduction== | ||
− | The SLURM (Simple Linux Utility for Resource Management workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters). | + | The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters). |
+ | *First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. | ||
+ | *Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. | ||
+ | *Thirdly, it arbitrates contention for resources by managing a queue of pending jobs. | ||
+ | |||
+ | Slurm executes your batch job submission across Viper's computing nodes. How it is processed will depend on several factors including the queue it is submitted to, jobs already submitted to the queue etc. | ||
==Common Slurm Commands== | ==Common Slurm Commands== | ||
Line 13: | Line 18: | ||
|- | |- | ||
| sbatch | | sbatch | ||
− | | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no | + | | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input. |
|- | |- | ||
| squeue | | squeue | ||
Line 25: | Line 30: | ||
|- | |- | ||
|} | |} | ||
− | + | ===sbatch=== | |
− | <pre style="background-color: # | + | Used to submit a job to Slurm. |
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
[username@login01 ~]$ sbatch jobfile.job | [username@login01 ~]$ sbatch jobfile.job | ||
Submitted batch job 289535 | Submitted batch job 289535 | ||
</pre> | </pre> | ||
− | ====squeue=== | + | {| |
− | <pre style="background-color: # | + | |style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]] |
+ | |style="width:95%; border-width: 0;cellpadding=0" | The number displayed (289535) is the Job ID | ||
+ | |- | ||
+ | |} | ||
+ | ===squeue=== | ||
+ | squeue shows information about jobs in the scheduling queue | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
[username@login01 ~]$ squeue | [username@login01 ~]$ squeue | ||
− | JOBID PARTITION NAME | + | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
− | + | 306414 compute clasDFT user R 16:36 1 c006 | |
− | + | 306413 compute mpi_benc user R 31:02 2 c[005,007] | |
− | + | 306411 compute orca_1n user R 1:00:32 1 c004 | |
− | + | 306410 compute orca_1n user R 1:04:17 1 c003 | |
− | + | 306409 highmem cnv_obit user R 11:37:17 1 c232 | |
− | + | 306407 compute 20M4_20 user R 11:45:54 1 c012 | |
− | + | 306406 compute 20_ML_20 user R 11:55:40 1 c012 | |
</pre> | </pre> | ||
− | ====scancel=== | + | {| class="wikitable" |
− | <pre style="background-color: # | + | | style="width:25%" | <Strong>Heading</Strong> |
− | [username@login01 ~]$ scancel 289535 | + | | style="width:75%" | <Strong>Description</Strong> |
− | </pre> | + | |- |
− | ====sinfo=== | + | | JOBID |
− | <pre style="background-color: # | + | | The unique identifier assigned to a job |
+ | |- | ||
+ | | PARTITION | ||
+ | | The type of node the job is running on e.g. compute, highmem, GPU | ||
+ | |- | ||
+ | | NAME | ||
+ | | Name of job | ||
+ | |- | ||
+ | | USER | ||
+ | | User ID of job owner | ||
+ | |- | ||
+ | | ST | ||
+ | | Job state code e.g. R stands for 'Running' | ||
+ | |- | ||
+ | | TIME | ||
+ | | Length of time a job has been running | ||
+ | |- | ||
+ | | NODES | ||
+ | | Amount of nodes a job is running on | ||
+ | |- | ||
+ | | NODELIST(REASON) | ||
+ | | List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node. | ||
+ | |} | ||
+ | |||
+ | ===scancel=== | ||
+ | scancel is used to cancel currently running jobs. Only jobs running under your userid may be cancelled. | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
+ | [username@login01 ~]$ scancel 289535</pre> | ||
+ | {| | ||
+ | |style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]] | ||
+ | |style="width:95%; border-width: 0;cellpadding=0" | No output is given by the command. | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ===sinfo=== | ||
+ | sinfo shows the information on the partitions and nodes in the cluster. | ||
+ | <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;"> | ||
[username@login01 ~]$ sinfo | [username@login01 ~]$ sinfo | ||
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
Line 58: | Line 106: | ||
gpu up 5-00:00:00 4 idle gpu[01-04] | gpu up 5-00:00:00 4 idle gpu[01-04] | ||
</pre> | </pre> | ||
+ | |||
+ | {| class="wikitable" | ||
+ | | style="width:25%" | <Strong>Heading</Strong> | ||
+ | | style="width:75%" | <Strong>Description</Strong> | ||
+ | |- | ||
+ | | PARTITION | ||
+ | | A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU | ||
+ | |- | ||
+ | | AVAIL | ||
+ | | Availability of a specific partition | ||
+ | |- | ||
+ | | TIMELIMIT | ||
+ | | Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days. | ||
+ | |- | ||
+ | | NODES | ||
+ | | Number of nodes in a specific state/partition | ||
+ | |- | ||
+ | | STATE | ||
+ | | The current status of a partition/group of nodes e.g. alloc (allocated) | ||
+ | |- | ||
+ | | NODELIST | ||
+ | | List of nodes in specific state/partition | ||
+ | |} | ||
==Common Submission Flags== | ==Common Submission Flags== | ||
+ | For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands. | ||
+ | |||
+ | '''Note''': The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node. | ||
+ | |||
{| class="wikitable" | {| class="wikitable" | ||
| style="width:25%" | <Strong>Flag</Strong> | | style="width:25%" | <Strong>Flag</Strong> | ||
Line 80: | Line 155: | ||
|- | |- | ||
| -p / --partition | | -p / --partition | ||
− | |Specifies the specific partition for the job e.g. compute, highmem | + | |Specifies the specific partition for the job e.g. compute, highmem or GPU. |
|- | |- | ||
| --exclusive | | --exclusive | ||
Line 86: | Line 161: | ||
|- | |- | ||
|} | |} | ||
+ | ===Example Job Submission Script=== | ||
+ | <pre style="background-color: #E5E4E2; color: black; font-family: monospace, sans-serif;"> | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH -J Example_Slurm_Job | ||
+ | #SBATCH -N 1 | ||
+ | #SBATCH -n 28 | ||
+ | #SBATCH -o %N.%j.%a.out | ||
+ | #SBATCH -e %N.%j.%a.err | ||
+ | #SBATCH -p compute | ||
+ | #SBATCH --exclusive | ||
+ | #SBATCH --mail-user= your email address here | ||
+ | |||
+ | echo $SLURM_JOB_NODELIST | ||
+ | |||
+ | module purge | ||
+ | module add gcc/8.2.0 | ||
+ | |||
+ | export I_MPI_DEBUG=5 | ||
+ | export I_MPI_FABRICS=shm:tmi | ||
+ | export I_MPI_FALLBACK=no | ||
+ | |||
+ | /home/user/slurmexample | ||
+ | </pre> | ||
+ | |||
+ | ===See Also=== | ||
+ | |||
+ | For more information on creating batch jobs visit the [[General/Batch|Batch Jobs]] guide | ||
+ | |||
+ | ==Next Steps== | ||
+ | * Slurm Website: [https://slurm.schedmd.com/ https://slurm.schedmd.com/] | ||
+ | * [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats) | ||
+ | * You might find applications' specific submission scripts here [http://hpc.mediawiki.hull.ac.uk/Main_Page#Application_Support Application support] | ||
+ | |||
+ | |||
+ | ==Navigation== | ||
+ | |||
+ | * [[Main_Page|Home]] | ||
+ | * [[Applications|Application support]] | ||
+ | * [[General|General]] * | ||
+ | * [[Programming|Programming support]] |
Latest revision as of 15:47, 22 August 2024
Contents
Application Details
- Description: SLURM is an open-source job scheduler, used by HPCs.
- Version: 15.08.8
Introduction
The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).
- First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work.
- Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes.
- Thirdly, it arbitrates contention for resources by managing a queue of pending jobs.
Slurm executes your batch job submission across Viper's computing nodes. How it is processed will depend on several factors including the queue it is submitted to, jobs already submitted to the queue etc.
Common Slurm Commands
Command | Description |
sbatch | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input. |
squeue | Used to view job and job step information for jobs managed by SLURM. |
scancel | Used to signal or cancel jobs, job arrays or job steps. |
sinfo | Used to view partition and node information for a system running SLURM. |
sbatch
Used to submit a job to Slurm.
[username@login01 ~]$ sbatch jobfile.job Submitted batch job 289535
The number displayed (289535) is the Job ID |
squeue
squeue shows information about jobs in the scheduling queue
[username@login01 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 306414 compute clasDFT user R 16:36 1 c006 306413 compute mpi_benc user R 31:02 2 c[005,007] 306411 compute orca_1n user R 1:00:32 1 c004 306410 compute orca_1n user R 1:04:17 1 c003 306409 highmem cnv_obit user R 11:37:17 1 c232 306407 compute 20M4_20 user R 11:45:54 1 c012 306406 compute 20_ML_20 user R 11:55:40 1 c012
Heading | Description |
JOBID | The unique identifier assigned to a job |
PARTITION | The type of node the job is running on e.g. compute, highmem, GPU |
NAME | Name of job |
USER | User ID of job owner |
ST | Job state code e.g. R stands for 'Running' |
TIME | Length of time a job has been running |
NODES | Amount of nodes a job is running on |
NODELIST(REASON) | List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node. |
scancel
scancel is used to cancel currently running jobs. Only jobs running under your userid may be cancelled.
[username@login01 ~]$ scancel 289535
No output is given by the command. |
sinfo
sinfo shows the information on the partitions and nodes in the cluster.
[username@login01 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up 2-00:00:00 9 mix c[006,012,014,016,018-020,022,170] compute* up 2-00:00:00 11 alloc c[003-004,008,015,046,086,093,098,138,167-168] compute* up 2-00:00:00 156 idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176] highmem up 4-00:00:00 1 mix c230 highmem up 4-00:00:00 2 alloc c[231-232] highmem up 4-00:00:00 1 idle c233 gpu up 5-00:00:00 4 idle gpu[01-04]
Heading | Description |
PARTITION | A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU |
AVAIL | Availability of a specific partition |
TIMELIMIT | Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days. |
NODES | Number of nodes in a specific state/partition |
STATE | The current status of a partition/group of nodes e.g. alloc (allocated) |
NODELIST | List of nodes in specific state/partition |
Common Submission Flags
For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands.
Note: The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node.
Flag | Description |
-J / --job-name | Specify a name for the job |
-N / --nodes | Specifies the number of nodes to be allocated to a job |
-n / --ntasks | Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28 |
-o / --output | Specifies the name of the output file |
-e / --error | Specifies the name of the error file |
-p / --partition | Specifies the specific partition for the job e.g. compute, highmem or GPU. |
--exclusive | Requests exclusive access to nodes preventing other jobs from running |
Example Job Submission Script
#!/bin/bash #SBATCH -J Example_Slurm_Job #SBATCH -N 1 #SBATCH -n 28 #SBATCH -o %N.%j.%a.out #SBATCH -e %N.%j.%a.err #SBATCH -p compute #SBATCH --exclusive #SBATCH --mail-user= your email address here echo $SLURM_JOB_NODELIST module purge module add gcc/8.2.0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no /home/user/slurmexample
See Also
For more information on creating batch jobs visit the Batch Jobs guide
Next Steps
- Slurm Website: https://slurm.schedmd.com/
- Slurm Rosetta (Useful for converting submission scripts from other formats)
- You might find applications' specific submission scripts here Application support