Difference between revisions of "General/Slurm"
From HPC
MSummerbell (talk | contribs) (→Common Submission Flags) |
MSummerbell (talk | contribs) |
||
Line 3: | Line 3: | ||
* Further information: [https://slurm.schedmd.com/ https://slurm.schedmd.com/] | * Further information: [https://slurm.schedmd.com/ https://slurm.schedmd.com/] | ||
* [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats) | * [https://slurm.schedmd.com/rosetta.pdf Slurm Rosetta] (Useful for converting submission scripts from other formats) | ||
+ | |||
+ | ==Introduction== | ||
+ | The SLURM (Simple Linux Utility for Resource Management workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters). | ||
==Common Slurm Commands== | ==Common Slurm Commands== |
Revision as of 10:58, 2 February 2017
Contents
Application Details
- Version: 15.08.8
- Further information: https://slurm.schedmd.com/
- Slurm Rosetta (Useful for converting submission scripts from other formats)
Introduction
The SLURM (Simple Linux Utility for Resource Management workload manager is a free and open-source job scheduler for the Linux kernel. It is used by Viper and many of the world's supercomputers (and clusters).
Common Slurm Commands
Command | Description |
sbatch | Submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. |
squeue | Used to view job and job step information for jobs managed by SLURM. |
scancel | Used to signal or cancel jobs, job arrays or job steps. |
sinfo | Used to view partition and node information for a system running SLURM. |
sbatch
[username@login01 ~]$ sbatch jobfile.job Submitted batch job 289535
squeue
[username@login01 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 306414 compute clasDFT 495711 R 16:36 1 c006 306413 compute mpi_benc 535286 R 31:02 2 c[005,007] 306411 compute orca_1n 442104 R 1:00:32 1 c004 306410 compute orca_1n 442104 R 1:04:17 1 c003 306409 highmem cnv_obit 524274 R 11:37:17 1 c232 306407 compute 20M4_20 535822 R 11:45:54 1 c012 306406 compute 20_ML_20 535822 R 11:55:40 1 c012
scancel
[username@login01 ~]$ scancel 289535
sinfo
[username@login01 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up 2-00:00:00 9 mix c[006,012,014,016,018-020,022,170] compute* up 2-00:00:00 11 alloc c[003-004,008,015,046,086,093,098,138,167-168] compute* up 2-00:00:00 156 idle c[001-002,005,007,009-011,013,017,021,023-045,047-085,087-092,094-097,099-137,139-166,169,171-176] highmem up 4-00:00:00 1 mix c230 highmem up 4-00:00:00 2 alloc c[231-232] highmem up 4-00:00:00 1 idle c233 gpu up 5-00:00:00 4 idle gpu[01-04]
Common Submission Flags
Flag | Description |
-J / --job-name | Specify a name for the job |
-N / --nodes | Specifies the number of nodes to be allocated to a job |
-n / --ntasks | Specifies the allocation of resources e.g. for 1 Compute Node the maximum would be 28 |
-o / --output | Specifies the name of the output file |
-e / --error | Specifies the name of the error file |
-p / --partition | Specifies the specific partition for the job e.g. compute, highmem, gpu |
--exclusive | Requests exclusive access to nodes preventing other jobs from running |