Difference between revisions of "General/Slurm"

From HPC
Jump to: navigation , search
(Example Job Submission Script)
m (Introduction)
 
(36 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
__TOC__
 
==Application Details==
 
==Application Details==
 +
*Description: SLURM is an open-source job scheduler, used by HPCs.
 
* Version: 15.08.8
 
* Version: 15.08.8
* Further information: [https://slurm.schedmd.com/ https://slurm.schedmd.com/]
 
* [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats)
 
  
 
==Introduction==
 
==Introduction==
The SLURM (Simple Linux Utility for Resource Management workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).  
+
The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).
 +
*First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work.
 +
*Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes.
 +
*Thirdly, it arbitrates contention for resources by managing a queue of pending jobs.
 +
 
 +
Slurm executes your batch job submission across Viper's computing nodes. How it is processed will depend on several factors including the queue it is submitted to, jobs already submitted to the queue etc.
  
 
==Common Slurm Commands==
 
==Common Slurm Commands==
Line 13: Line 18:
 
|-
 
|-
 
| sbatch
 
| sbatch
| Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.
+
| Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input.
 
|-
 
|-
 
| squeue
 
| squeue
Line 25: Line 30:
 
|-
 
|-
 
|}
 
|}
====sbatch====
+
===sbatch===
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
+
Used to submit a job to Slurm.
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
[username@login01 ~]$ sbatch jobfile.job
 
[username@login01 ~]$ sbatch jobfile.job
 
Submitted batch job 289535
 
Submitted batch job 289535
 
</pre>
 
</pre>
====squeue====
+
{|
 +
|style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]]
 +
|style="width:95%; border-width: 0;cellpadding=0" | The number displayed (289535) is the Job ID
 +
|-
 +
|}
 +
===squeue===
 
squeue shows information about jobs in the scheduling queue
 
squeue shows information about jobs in the scheduling queue
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
+
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
[username@login01 ~]$ squeue
 
[username@login01 ~]$ squeue
             JOBID PARTITION    NAME     USER ST      TIME  NODES NODELIST(REASON)
+
             JOBID PARTITION    NAME   USER ST      TIME  NODES NODELIST(REASON)
            306414  compute  clasDFT  495711 R      16:36      1 c006
+
            306414  compute  clasDFT  user R      16:36      1 c006
            306413  compute mpi_benc  535286 R      31:02      2 c[005,007]
+
            306413  compute mpi_benc  user R      31:02      2 c[005,007]
            306411  compute  orca_1n  442104 R    1:00:32      1 c004
+
            306411  compute  orca_1n  user R    1:00:32      1 c004
            306410  compute  orca_1n  442104 R    1:04:17      1 c003
+
            306410  compute  orca_1n  user R    1:04:17      1 c003
            306409  highmem cnv_obit  524274 R  11:37:17      1 c232
+
            306409  highmem cnv_obit  user R  11:37:17      1 c232
            306407  compute  20M4_20  535822 R  11:45:54      1 c012
+
            306407  compute  20M4_20  user R  11:45:54      1 c012
            306406  compute 20_ML_20  535822 R  11:55:40      1 c012
+
            306406  compute 20_ML_20  user R  11:55:40      1 c012
 
</pre>
 
</pre>
 +
{| class="wikitable"
 +
| style="width:25%" | <Strong>Heading</Strong>
 +
| style="width:75%" | <Strong>Description</Strong>
 +
|-
 +
| JOBID
 +
| The unique identifier assigned to a job
 +
|-
 +
| PARTITION
 +
| The type of node the job is running on e.g. compute, highmem, GPU
 +
|-
 +
| NAME
 +
| Name of job
 +
|-
 +
| USER
 +
| User ID of job owner
 +
|-
 +
| ST
 +
| Job state code e.g. R stands for 'Running'
 +
|-
 +
| TIME
 +
| Length of time a job has been running
 +
|-
 +
| NODES
 +
| Amount of nodes a job is running on
 +
|-
 +
| NODELIST(REASON)
 +
| List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node.
 +
|}
  
====scancel====
+
===scancel===
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
+
scancel is used to cancel currently running jobs.  Only jobs running under your userid may be cancelled.
[username@login01 ~]$ scancel 289535
+
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
</pre>
+
[username@login01 ~]$ scancel 289535</pre>
====sinfo====
+
{|
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
+
|style="width:5%; border-width: 0;cellpadding=0" | [[File:icon_exclam3.png]]
 +
|style="width:95%; border-width: 0;cellpadding=0" | No output is given by the command.
 +
|-
 +
|}
 +
 
 +
===sinfo===
 +
sinfo shows the information on the partitions and nodes in the cluster.
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
[username@login01 ~]$ sinfo
 
[username@login01 ~]$ sinfo
 
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
Line 60: Line 106:
 
gpu          up 5-00:00:00      4  idle gpu[01-04]
 
gpu          up 5-00:00:00      4  idle gpu[01-04]
 
</pre>
 
</pre>
 +
 +
{| class="wikitable"
 +
| style="width:25%" | <Strong>Heading</Strong>
 +
| style="width:75%" | <Strong>Description</Strong>
 +
|-
 +
| PARTITION
 +
| A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU
 +
|-
 +
| AVAIL
 +
| Availability of a specific partition
 +
|-
 +
| TIMELIMIT
 +
| Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days.
 +
|-
 +
| NODES
 +
| Number of nodes in a specific state/partition
 +
|-
 +
| STATE
 +
| The current status of a partition/group of nodes e.g. alloc (allocated)
 +
|-
 +
| NODELIST
 +
| List of nodes in specific state/partition
 +
|}
  
 
==Common Submission Flags==
 
==Common Submission Flags==
 +
For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands.
 +
 +
'''Note''': The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node.
 +
 
{| class="wikitable"
 
{| class="wikitable"
 
| style="width:25%" | <Strong>Flag</Strong>
 
| style="width:25%" | <Strong>Flag</Strong>
Line 82: Line 155:
 
|-
 
|-
 
| -p / --partition
 
| -p / --partition
|Specifies the specific partition for the job e.g. compute, highmem, gpu
+
|Specifies the specific partition for the job e.g. compute, highmem or GPU.
 
|-
 
|-
 
| --exclusive
 
| --exclusive
Line 89: Line 162:
 
|}
 
|}
 
===Example Job Submission Script===
 
===Example Job Submission Script===
<pre style="background-color: #C8C8C8; color: black; border: 2px solid blue; font-family: monospace, sans-serif;">
+
<pre style="background-color: #E5E4E2; color: black; font-family: monospace, sans-serif;">
 
#!/bin/bash
 
#!/bin/bash
  
 
#SBATCH -J Example_Slurm_Job
 
#SBATCH -J Example_Slurm_Job
 
#SBATCH -N 1
 
#SBATCH -N 1
#SBATCH --ntasks-per-node 28
+
#SBATCH -n 28
 
#SBATCH -o %N.%j.%a.out
 
#SBATCH -o %N.%j.%a.out
 
#SBATCH -e %N.%j.%a.err
 
#SBATCH -e %N.%j.%a.err
 
#SBATCH -p compute
 
#SBATCH -p compute
 
#SBATCH --exclusive
 
#SBATCH --exclusive
 +
#SBATCH --mail-user= your email address here
  
 
echo $SLURM_JOB_NODELIST
 
echo $SLURM_JOB_NODELIST
  
 
module purge
 
module purge
module load gcc/4.9.3
+
module add gcc/8.2.0
  
 
export I_MPI_DEBUG=5
 
export I_MPI_DEBUG=5
Line 111: Line 185:
 
/home/user/slurmexample
 
/home/user/slurmexample
 
</pre>
 
</pre>
 +
 +
===See Also===
 +
 +
For more information on creating batch jobs visit the [[General/Batch|Batch Jobs]] guide
 +
 +
==Next Steps==
 +
* Slurm Website: [https://slurm.schedmd.com/ https://slurm.schedmd.com/]
 +
* [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats)
 +
* You might find applications' specific submission scripts here [http://hpc.mediawiki.hull.ac.uk/Main_Page#Application_Support Application support]
 +
 +
 +
==Navigation==
 +
 +
* [[Main_Page|Home]]
 +
* [[Applications|Application support]]
 +
* [[General|General]] *
 +
* [[Programming|Programming support]]

Latest revision as of 15:47, 22 August 2024

Application Details

  • Description: SLURM is an open-source job scheduler, used by HPCs.
  • Version: 15.08.8

Introduction

The SLURM (Simple Linux Utility for Resource Management) workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).

  • First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work.
  • Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes.
  • Thirdly, it arbitrates contention for resources by managing a queue of pending jobs.

Slurm executes your batch job submission across Viper's computing nodes. How it is processed will depend on several factors including the queue it is submitted to, jobs already submitted to the queue etc.

Common Slurm Commands

Command Description
sbatch Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no filename is specified, sbatch will read in a script from standard input.
squeue Used to view job and job step information for jobs managed by SLURM.
scancel Used to signal or cancel jobs, job arrays or job steps.
sinfo Used to view partition and node information for a system running SLURM.

sbatch

Used to submit a job to Slurm.

[username@login01 ~]$ sbatch jobfile.job
Submitted batch job 289535
Icon exclam3.png The number displayed (289535) is the Job ID

squeue

squeue shows information about jobs in the scheduling queue

[username@login01 ~]$ squeue
             JOBID  PARTITION     NAME   USER ST       TIME  NODES NODELIST(REASON)
             306414   compute  clasDFT   user  R      16:36      1 c006
             306413   compute mpi_benc   user  R      31:02      2 c[005,007]
             306411   compute  orca_1n   user  R    1:00:32      1 c004
             306410   compute  orca_1n   user  R    1:04:17      1 c003
             306409   highmem cnv_obit   user  R   11:37:17      1 c232
             306407   compute  20M4_20   user  R   11:45:54      1 c012
             306406   compute 20_ML_20   user  R   11:55:40      1 c012
Heading Description
JOBID The unique identifier assigned to a job
PARTITION The type of node the job is running on e.g. compute, highmem, GPU
NAME Name of job
USER User ID of job owner
ST Job state code e.g. R stands for 'Running'
TIME Length of time a job has been running
NODES Amount of nodes a job is running on
NODELIST(REASON) List of nodes a job is running on also provides a reason a job is not running e.g. a dependency on a node.

scancel

scancel is used to cancel currently running jobs. Only jobs running under your userid may be cancelled.

[username@login01 ~]$ scancel 289535
Icon exclam3.png No output is given by the command.

sinfo

sinfo shows the information on the partitions and nodes in the cluster.

[username@login01 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up 2-00:00:00      9    mix c[006,012,014,016,018-020,022,170]
compute*     up 2-00:00:00     11  alloc c[003-004,008,015,046,086,093,098,138,167-168]
compute*     up 2-00:00:00    156   idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176]
highmem      up 4-00:00:00      1    mix c230
highmem      up 4-00:00:00      2  alloc c[231-232]
highmem      up 4-00:00:00      1   idle c233
gpu          up 5-00:00:00      4   idle gpu[01-04]
Heading Description
PARTITION A group of nodes, on Viper partitions are organised by node type e.g compute, high memory and GPU
AVAIL Availability of a specific partition
TIMELIMIT Time limit for jobs running on a specific partition e.g. for the compute nodes the maximum time a job can run for is 2 days.
NODES Number of nodes in a specific state/partition
STATE The current status of a partition/group of nodes e.g. alloc (allocated)
NODELIST List of nodes in specific state/partition

Common Submission Flags

For ease and repetition, it is much easier to build Slurm commands into batch files, the following are an example of the most commonly used commands.

Note: The use of the --exclusive flag to indicate whether you require the whole node for your job if don't need a significant amount of processing cores then omitting this command will allow other users to use the unused resources of that node.

Flag Description
-J / --job-name Specify a name for the job
-N / --nodes Specifies the number of nodes to be allocated to a job
-n / --ntasks Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28
-o / --output Specifies the name of the output file
-e / --error Specifies the name of the error file
-p / --partition Specifies the specific partition for the job e.g. compute, highmem or GPU.
--exclusive Requests exclusive access to nodes preventing other jobs from running

Example Job Submission Script

#!/bin/bash

#SBATCH -J Example_Slurm_Job
#SBATCH -N 1
#SBATCH -n 28
#SBATCH -o %N.%j.%a.out
#SBATCH -e %N.%j.%a.err
#SBATCH -p compute
#SBATCH --exclusive
#SBATCH --mail-user= your email address here 

echo $SLURM_JOB_NODELIST

module purge
module add gcc/8.2.0

export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:tmi
export I_MPI_FALLBACK=no

/home/user/slurmexample

See Also

For more information on creating batch jobs visit the Batch Jobs guide

Next Steps


Navigation