Difference between revisions of "Quickstart/Batch Jobs"
(→R: added quick note on output) |
|||
(28 intermediate revisions by 2 users not shown) | |||
Line 19: | Line 19: | ||
</pre> | </pre> | ||
===#SBATCH=== | ===#SBATCH=== | ||
− | The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in [[FurtherTopics/ | + | The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in [[FurtherTopics/Advanced Batch Jobs|further topics]]. |
Please note these are case sensitive. | Please note these are case sensitive. | ||
− | ====#SBATCH -J | + | ====#SBATCH -J ==== |
''<nowiki>#SBATCH</nowiki> -J jobname'' tells Slurm the name of your job - this is what you will see when you run [[Quickstart/Slurm #squeue| squeue]]. | ''<nowiki>#SBATCH</nowiki> -J jobname'' tells Slurm the name of your job - this is what you will see when you run [[Quickstart/Slurm #squeue| squeue]]. | ||
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
Line 28: | Line 28: | ||
#SBATCH -J helloWorld | #SBATCH -J helloWorld | ||
</pre> | </pre> | ||
+ | |||
====#SBATCH -n==== | ====#SBATCH -n==== | ||
''<nowiki>#SBATCH</nowiki> -n Number'' tells Slurm how many cores you would like to use. | ''<nowiki>#SBATCH</nowiki> -n Number'' tells Slurm how many cores you would like to use. | ||
Line 53: | Line 54: | ||
</pre> | </pre> | ||
====#SBATCH -p==== | ====#SBATCH -p==== | ||
− | ''<nowiki>#SBATCH</nowiki> -p Partition'' tells Slurm what partition you would like to use: compute/highmem/gpu. Please note [[FurtherTopics/ | + | ''<nowiki>#SBATCH</nowiki> -p Partition'' tells Slurm what partition you would like to use: compute/highmem/gpu. Please note [[FurtherTopics/Advanced Batch Jobs #highmem|highmem]] and [[FurtherTopics/Advanced Batch Jobs#gpu|gpu]] jobs have additional #SBATCH requirements. |
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 64: | Line 65: | ||
====#SBATCH -t==== | ====#SBATCH -t==== | ||
− | ''<nowiki>#SBATCH</nowiki> -t | + | ''<nowiki>#SBATCH</nowiki> -t HH:MM:SS'' tells Slurm how long to run the job for the job will end automatically when finished or it will be terminated if it takes longer than the specified time. |
<pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 72: | Line 73: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
</pre> | </pre> | ||
Line 86: | Line 87: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
Line 99: | Line 100: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
− | module add anaconda | + | module add python/anaconda/202111/3.9 |
</pre> | </pre> | ||
+ | |||
===Commands=== | ===Commands=== | ||
This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. There are some [[#Example Batch Jobs| examples]] at the bottom of this page of basic batch jobs with what commands you may run or you can check the modules pages on [[Modules| Modules Available]] for the specifics of what commands to use if you are not sure. | This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. There are some [[#Example Batch Jobs| examples]] at the bottom of this page of basic batch jobs with what commands you may run or you can check the modules pages on [[Modules| Modules Available]] for the specifics of what commands to use if you are not sure. | ||
Line 113: | Line 115: | ||
#SBATCH -e %N.%j.err | #SBATCH -e %N.%j.err | ||
#SBATCH -p compute | #SBATCH -p compute | ||
− | #SBATCH -t | + | #SBATCH -t 02:30:00 |
module purge | module purge | ||
− | module add anaconda | + | module add python/anaconda/202111/3.9 |
python helloWorld.py | python helloWorld.py | ||
Line 128: | Line 130: | ||
Submitted batch job 289522 | Submitted batch job 289522 | ||
</pre> | </pre> | ||
− | You will now be able to see the status of your batch job by running [[Quickstart/Slurm # | + | You will now be able to see the status of your batch job by running [[Quickstart/Slurm #squeue -u username | squeue -u your_username]]. |
+ | |||
+ | ==Exclusive Batch Jobs== | ||
+ | For demanding tasks such as [[Applications/Matlab| Matlab]] please use exclusive nodes. If a job or session hasn't requested an appropriate resource, this multithreading can cause contention for CPU resources and negatively impact other users. This can be done by using ''#SBATCH -N 1'' and '' #SBATCH --exclusive''. | ||
+ | <pre style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -J helloWorld | ||
+ | #SBATCH -N 1 | ||
+ | #SBATCH --exclusive | ||
+ | #SBATCH -o %N.%j.out | ||
+ | #SBATCH -e %N.%j.err | ||
+ | #SBATCH -p compute | ||
+ | #SBATCH -t 02:30:00 | ||
+ | </pre> | ||
==Example Batch Jobs== | ==Example Batch Jobs== | ||
===R=== | ===R=== | ||
− | R should be used on an exclusive node: this is done by the #SBATCH directives ''#SBATCH -N 1'' and ''#SBATCH --exclusive'' | + | [[Applications/R|R]] should be used on an exclusive node: this is done by the #SBATCH directives ''#SBATCH -N 1'' and ''#SBATCH --exclusive'' |
<pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 140: | Line 155: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t 01:00:00 # Max time job runs for 1 hour | ||
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Line 150: | Line 166: | ||
===Matlab=== | ===Matlab=== | ||
− | Matlab requires an exclusive node: this is done by the #SBATCH directives ''#SBATCH -N 1'' and ''#SBATCH --exclusive'' | + | [[Applications/Matlab|Matlab]] requires an exclusive node: this is done by the #SBATCH directives ''#SBATCH -N 1'' and ''#SBATCH --exclusive'' |
<pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | <pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 158: | Line 174: | ||
#SBATCH -e %N.%j.err # Standard error will be written here | #SBATCH -e %N.%j.err # Standard error will be written here | ||
#SBATCH -p compute # Slurm partition, where you want the job to be queued | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t 01:00:00 # Max time job runs for | ||
#SBATCH --exclusive # Run on one node without any other users | #SBATCH --exclusive # Run on one node without any other users | ||
Line 164: | Line 181: | ||
matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file | matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file | ||
+ | </pre> | ||
+ | |||
+ | ===Python Virtual Environment=== | ||
+ | See [[Quickstart/Virtual Environments| Virtual Environment Quickstart]] for setting up a Virtual Environment. | ||
+ | <pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -J matrixMulti # Job name, you can change it to whatever you want | ||
+ | #SBATCH -n 1 # Number of cores = 1 | ||
+ | #SBATCH -o %N.%j.out # Standard output will be written here | ||
+ | #SBATCH -e %N.%j.err # Standard error will be written here | ||
+ | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t 01:00:00 # Max time job runs for 1 hour | ||
+ | #SBATCH --exclusive # Run on one node without any other users | ||
+ | |||
+ | module purge | ||
+ | module add python/anaconda/202111/3.9 | ||
+ | |||
+ | source activate /home/<user>/.conda/envs/numpyenv | ||
+ | python matrixMultiplication.py | ||
+ | </pre> | ||
+ | |||
+ | ===Openfoam=== | ||
+ | [[Applications/Openfoam|Openfoam]] should be used on an exclusive node: this is done by the #SBATCH directives ''#SBATCH -N 1'' and ''#SBATCH --exclusive'' | ||
+ | <pre class="mw-collapsible mw-collapsed" style="background-color: #C8C8C8; color: black; font-family: monospace, sans-serif;"> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -J openfoamExample # Job name, you can change it to whatever you want | ||
+ | #SBATCH -N 1 # Number of nodes | ||
+ | #SBATCH -o %N.%j.out # Standard output will be written here | ||
+ | #SBATCH -e %N.%j.err # Standard error will be written here | ||
+ | #SBATCH -p compute # Slurm partition, where you want the job to be queued | ||
+ | #SBATCH -t 01:00:00 # Max time job runs for | ||
+ | #SBATCH --exclusive # Run on one node without any other users | ||
+ | |||
+ | module add openfoam/4.0 | ||
+ | |||
+ | export I_MPI_DEBUG=5 | ||
+ | export I_MPI_FABRICS=shm:tmi | ||
+ | export I_MPI_FALLBACK=no | ||
+ | |||
+ | mpirun -n 28 -cpu_bind=cores interFoam -parallel | ||
+ | |||
</pre> | </pre> | ||
Line 172: | Line 230: | ||
− | [[ | + | [[Main Page #Quickstart|Back]] / [[Quickstart/Data Management| Next (Data Management)]] |
Latest revision as of 10:12, 3 August 2023
Contents
What is a batch job?
Having been introduced to the Slurm scheduler in Slurm and then one of the ways of using Viper with Interactive sessions in Interactive the next way of making use of an HPC system like Viper is by submitting jobs to run automatically without interaction when the resource becomes available. We call these batch jobs.
In order to run a batch job, we need to provide Slurm with information about what we want to do. We do this via a job submission script, which is sort of like a recipe for the job.
What is a Slurm script?
The submission script is a text file that provides information to Slurm about the task you are running so that it can be allocated to the appropriate resource, and sets up the environment so the task can run. A minimal submission script has three main components:
- A set of directives that provides Slurm with some high-level information such as what resource is required, a job name, how long the task should run for and where to log any output that would normally be shown on the screen.
- Information about how the job environment should be set up, for example, what application using modules should be loaded.
- The actual command(s) that need to be run.
Slurm Scripts
For ease of use please give your .job files a descriptive name. While you can specify the directory, it is easier to create and submit the script from the working directory.
#!/bin/bash
All Slurm scripts must start with #!/bin/bash (called a shebang).
#!/bin/bash
#SBATCH
The following #SBATCH directives are essential to a Slurm script, there are other directives you can include which can be found in further topics.
Please note these are case sensitive.
#SBATCH -J
#SBATCH -J jobname tells Slurm the name of your job - this is what you will see when you run squeue.
#!/bin/bash #SBATCH -J helloWorld
#SBATCH -n
#SBATCH -n Number tells Slurm how many cores you would like to use.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4
#SBATCH -o
#SBATCH -o %N.%j.out tells Slurm where you would like standard output to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .out.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out
#SBATCH -e
#SBATCH -e %N.%j.err tells Slurm where you would like standard error to be written. %N is the node number and %j is the assigned job number. Though this can be named anything it must end in .err.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err
#SBATCH -p
#SBATCH -p Partition tells Slurm what partition you would like to use: compute/highmem/gpu. Please note highmem and gpu jobs have additional #SBATCH requirements.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute
#SBATCH -t
#SBATCH -t HH:MM:SS tells Slurm how long to run the job for the job will end automatically when finished or it will be terminated if it takes longer than the specified time.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00
Environment Setup
The next part of the script sets up the environment in which the job will run in.
module purge
module purge unloads any previously loaded modules, it is good practice to start with this section
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge
module add modulename
Just like you would in an interactive session add any modules you need.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge module add python/anaconda/202111/3.9
Commands
This section is where you tell Slurm what you would like it to run - mostly the same as you would in an interactive session. There are some examples at the bottom of this page of basic batch jobs with what commands you may run or you can check the modules pages on Modules Available for the specifics of what commands to use if you are not sure.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -n 4 #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00 module purge module add python/anaconda/202111/3.9 python helloWorld.py
The above batch script is now complete; this will use a compute node with 4 cores for a maximum of 2 hours 30 minutes and will load the anaconda module before running the python file helloWorld.py.
How to submit a batch job to slurm
Submitting a job is easy! Just run sbatch jobscript.job.
[username@login01 ~]$ sbatch PythonTest.job Submitted batch job 289522
You will now be able to see the status of your batch job by running squeue -u your_username.
Exclusive Batch Jobs
For demanding tasks such as Matlab please use exclusive nodes. If a job or session hasn't requested an appropriate resource, this multithreading can cause contention for CPU resources and negatively impact other users. This can be done by using #SBATCH -N 1 and #SBATCH --exclusive.
#!/bin/bash #SBATCH -J helloWorld #SBATCH -N 1 #SBATCH --exclusive #SBATCH -o %N.%j.out #SBATCH -e %N.%j.err #SBATCH -p compute #SBATCH -t 02:30:00
Example Batch Jobs
R
R should be used on an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J My_R_job # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for 1 hour #SBATCH --exclusive # Run on one node without any other users module purge module add R/4.0.2 R CMD BATCH Random.R output.data #The output from the R interpreter will appear in the file output.data
Matlab
Matlab requires an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J MATLAB # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes (for Matlab should be always one) #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for #SBATCH --exclusive # Run on one node without any other users module purge module add matlab/2016a matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file
Python Virtual Environment
See Virtual Environment Quickstart for setting up a Virtual Environment.
#!/bin/bash #SBATCH -J matrixMulti # Job name, you can change it to whatever you want #SBATCH -n 1 # Number of cores = 1 #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for 1 hour #SBATCH --exclusive # Run on one node without any other users module purge module add python/anaconda/202111/3.9 source activate /home/<user>/.conda/envs/numpyenv python matrixMultiplication.py
Openfoam
Openfoam should be used on an exclusive node: this is done by the #SBATCH directives #SBATCH -N 1 and #SBATCH --exclusive
#!/bin/bash #SBATCH -J openfoamExample # Job name, you can change it to whatever you want #SBATCH -N 1 # Number of nodes #SBATCH -o %N.%j.out # Standard output will be written here #SBATCH -e %N.%j.err # Standard error will be written here #SBATCH -p compute # Slurm partition, where you want the job to be queued #SBATCH -t 01:00:00 # Max time job runs for #SBATCH --exclusive # Run on one node without any other users module add openfoam/4.0 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:tmi export I_MPI_FALLBACK=no mpirun -n 28 -cpu_bind=cores interFoam -parallel
More detailed scripts
For more information on batch scripts visit: Advanced Batch Jobs. This includes information on using high memory and GPU nodes, resource management, how to use multiple nodes, and node reservations