Difference between revisions of "General/Slurm"

From HPC
Jump to: navigation , search
(scancel)
(squeue)
Line 35: Line 35:
 
[username@login01 ~]$ squeue
 
[username@login01 ~]$ squeue
 
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             306414  compute  clasDFT  495711 R      16:36      1 c006
+
             306414  compute  clasDFT  user R      16:36      1 c006
             306413  compute mpi_benc  535286 R      31:02      2 c[005,007]
+
             306413  compute mpi_benc  user R      31:02      2 c[005,007]
             306411  compute  orca_1n  442104 R    1:00:32      1 c004
+
             306411  compute  orca_1n  user R    1:00:32      1 c004
             306410  compute  orca_1n  442104 R    1:04:17      1 c003
+
             306410  compute  orca_1n  user R    1:04:17      1 c003
             306409  highmem cnv_obit  524274 R  11:37:17      1 c232
+
             306409  highmem cnv_obit  user R  11:37:17      1 c232
             306407  compute  20M4_20  535822 R  11:45:54      1 c012
+
             306407  compute  20M4_20  user R  11:45:54      1 c012
             306406  compute 20_ML_20  535822 R  11:55:40      1 c012
+
             306406  compute 20_ML_20  user R  11:55:40      1 c012
 
</pre>
 
</pre>
  

Revision as of 09:31, 3 February 2017

Application Details

Introduction

The SLURM (Simple Linux Utility for Resource Management workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).

Common Slurm Commands

Command Description
sbatch Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.
squeue Used to view job and job step information for jobs managed by SLURM.
scancel Used to signal or cancel jobs, job arrays or job steps.
sinfo Used to view partition and node information for a system running SLURM.

sbatch

[username@login01 ~]$ sbatch jobfile.job
Submitted batch job 289535

squeue

squeue shows information about jobs in the scheduling queue

[username@login01 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            306414   compute  clasDFT   user  R      16:36      1 c006
            306413   compute mpi_benc   user  R      31:02      2 c[005,007]
            306411   compute  orca_1n   user  R    1:00:32      1 c004
            306410   compute  orca_1n   user  R    1:04:17      1 c003
            306409   highmem cnv_obit   user  R   11:37:17      1 c232
            306407   compute  20M4_20   user  R   11:45:54      1 c012
            306406   compute 20_ML_20   user  R   11:55:40      1 c012

scancel

Note: No output is given by the command.

[username@login01 ~]$ scancel 289535

sinfo

[username@login01 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up 2-00:00:00      9    mix c[006,012,014,016,018-020,022,170]
compute*     up 2-00:00:00     11  alloc c[003-004,008,015,046,086,093,098,138,167-168]
compute*     up 2-00:00:00    156   idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176]
highmem      up 4-00:00:00      1    mix c230
highmem      up 4-00:00:00      2  alloc c[231-232]
highmem      up 4-00:00:00      1   idle c233
gpu          up 5-00:00:00      4   idle gpu[01-04]

Common Submission Flags

Flag Description
-J / --job-name Specify a name for the job
-N / --nodes Specifies the number of nodes to be allocated to a job
-n / --ntasks Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28
-o / --output Specifies the name of the output file
-e / --error Specifies the name of the error file
-p / --partition Specifies the specific partition for the job e.g. compute, highmem, gpu
--exclusive Requests exclusive access to nodes preventing other jobs from running

Example Job Submission Script

#!/bin/bash

#SBATCH -J Example_Slurm_Job
#SBATCH -N 1
#SBATCH -n 28
#SBATCH -o %N.%j.%a.out
#SBATCH -e %N.%j.%a.err
#SBATCH -p compute
#SBATCH --exclusive

echo $SLURM_JOB_NODELIST

module purge
module load gcc/4.9.3

export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:tmi
export I_MPI_FALLBACK=no

/home/user/slurmexample