Difference between revisions of "General/Interactive"

From HPC
Jump to: navigation , search
m
m
(11 intermediate revisions by the same user not shown)
Line 65: Line 65:
 
</pre>
 
</pre>
  
===== Interactive session using a different partition =====
+
===== Interactive session using high memory partition =====
 +
 
 +
'''High memory node'''
 +
 
 
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
[username@login01 ~]$ interactive -phighmem
 
[username@login01 ~]$ interactive -phighmem
salloc: Granted job allocation 306853
+
salloc: Granted job allocation 306153
Job ID 306853 connecting to c233, please wait...
+
Job ID 306153 connecting to c233, please wait...
 
</pre>
 
</pre>
 +
 +
===== Interactive session using GPU partition =====
 +
 +
'''GPU node with a single GPU and non exclusive'''
  
 
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
Line 77: Line 84:
 
Job ID 306855 connecting to gpu02, please wait...
 
Job ID 306855 connecting to gpu02, please wait...
 
</pre>
 
</pre>
 +
 +
'''GPU node with a single GPU and exclusive'''
 +
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 +
[username@login01 ~]$ interactive -pgpu --exclusive
 +
salloc: Granted job allocation 306856
 +
Job ID 306856 connecting to gpu03, please wait...
 +
</pre>
 +
 +
'''GPU node with all 4 GPUs and exclusive'''
 +
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 +
[username@login01 ~]$ interactive -pgpu --gres=gpu:tesla:4 --exclusive
 +
salloc: Granted job allocation 1043984
 +
Job ID 1043984 connecting to gpu02, please wait...
 +
Last login: Fri May 18 11:03:13 2018 from login01
 +
[username@gpu02 ~]$
 +
</pre>
 +
 +
===== Interactive session with a node reservation =====
 +
 +
This example is for a reservation of 327889 and the partition (queue) '''GPU''', missing the partition name will default to the '''compute''' queue.
 +
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 +
[username@login01 ~]$ interactive -pgpu --reservation=327889
 +
salloc: Granted job allocation 306353
 +
Job ID 306353 connecting to gpu04, please wait...
 +
</pre>
 +
 +
==More Information==
 +
 +
More information can be found by typing the following (''based on slurm 15.08.8''):
 +
 +
<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 +
[username@login01 ~]$ interactive --help
 +
 +
Parallel run options:
 +
  -A, --account=name          charge job to specified account
 +
      --begin=time            defer job until HH:MM MM/DD/YY
 +
      --bell                  ring the terminal bell when the job is allocated
 +
      --bb=<spec>            burst buffer specifications
 +
      --bbf=<file_name>      burst buffer specification file
 +
  -c, --cpus-per-task=ncpus  number of cpus required per task
 +
      --comment=name          arbitrary comment
 +
      --cpu-freq=min[-max[:gov]] requested cpu frequency (and governor)
 +
  -d, --dependency=type:jobid defer job until condition on jobid is satisfied
 +
  -D, --chdir=path            change working directory
 +
      --get-user-env          used by Moab.  See srun man page.
 +
      --gid=group_id          group ID to run job as (user root only)
 +
      --gres=list            required generic resources
 +
  -H, --hold                  submit job in held state
 +
  -I, --immediate[=secs]      exit if resources not available in "secs"
 +
      --jobid=id              specify jobid to use
 +
  -J, --job-name=jobname      name of job
 +
  -k, --no-kill              do not kill job on node failure
 +
  -K, --kill-command[=signal] signal to send terminating job
 +
  -L, --licenses=names        required license, comma separated
 +
  -m, --distribution=type    distribution method for processes to nodes
 +
                              (type = block|cyclic|arbitrary)
 +
      --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
 +
      --mail-user=user        who to send email notification for job state
 +
                              changes
 +
  -n, --tasks=N              number of processors required
 +
      --nice[=value]          decrease scheduling priority by value
 +
      --no-bell              do NOT ring the terminal bell
 +
      --ntasks-per-node=n    number of tasks to invoke on each node
 +
  -N, --nodes=N              number of nodes on which to run (N = min[-max])
 +
  -O, --overcommit            overcommit resources
 +
      --power=flags          power management options
 +
      --priority=value        set the priority of the job to value
 +
      --profile=value        enable acct_gather_profile for detailed data
 +
                              value is all or none or any combination of
 +
                              energy, lustre, network or task
 +
  -p, --partition=partition  partition requested
 +
      --qos=qos              quality of service
 +
  -Q, --quiet                quiet mode (suppress informational messages)
 +
      --reboot                reboot compute nodes before starting job
 +
  -s, --share                share nodes with other jobs
 +
      --sicp                  If specified, signifies job is to receive
 +
                              job id from the incluster reserve range.
 +
      --signal=[B:]num[@time] send signal when time limit within time seconds
 +
      --switches=max-switches{@max-time-to-wait}
 +
                              Optimum switches and max time to wait for optimum
 +
  -S, --core-spec=cores      count of reserved cores
 +
      --thread-spec=threads  count of reserved threads
 +
  -t, --time=minutes          time limit
 +
      --time-min=minutes      minimum time limit (if distinct)
 +
      --uid=user_id          user ID to run job as (user root only)
 +
  -v, --verbose              verbose mode (multiple -v's increase verbosity)
 +
      --wckey=wckey          wckey to run job under
 +
 +
Constraint options:
 +
      --contiguous            demand a contiguous range of nodes
 +
  -C, --constraint=list      specify a list of constraints
 +
  -F, --nodefile=filename    request a specific list of hosts
 +
      --mem=MB                minimum amount of real memory
 +
      --mincpus=n            minimum number of logical processors (threads)
 +
                              per node
 +
      --reservation=name      allocate resources from named reservation
 +
      --tmp=MB                minimum amount of temporary disk
 +
  -w, --nodelist=hosts...    request a specific list of hosts
 +
  -x, --exclude=hosts...      exclude a specific list of hosts
 +
 +
Consumable resources related options:
 +
      --exclusive[=user]      allocate nodes in exclusive mode when
 +
                              cpu consumable resource is enabled
 +
      --mem-per-cpu=MB        maximum amount of real memory per allocated
 +
                              cpu required by the job.
 +
                              --mem >= --mem-per-cpu if --mem is specified.
 +
 +
Affinity/Multi-core options: (when the task/affinity plugin is enabled)
 +
  -B  --extra-node-info=S[:C[:T]]            Expands to:
 +
      --sockets-per-node=S  number of sockets per node to allocate
 +
      --cores-per-socket=C  number of cores per socket to allocate
 +
      --threads-per-core=T  number of threads per core to allocate
 +
                              each field can be 'min' or wildcard '*'
 +
                              total cpus requested = (N x S x C x T)
 +
 +
      --ntasks-per-core=n    number of tasks to invoke on each core
 +
      --ntasks-per-socket=n  number of tasks to invoke on each socket
 +
 +
 +
Help options:
 +
  -h, --help                  show this help message
 +
  -u, --usage                display brief usage message
 +
 +
Other options:
 +
  -V, --version              output version information and exit
 +
 +
</pre>
 +
 +
==Further Information==
 +
 +
* [https://slurm.schedmd.com/ https://slurm.schedmd.com/]
 +
 +
 +
{|
 +
|style="width:5%; border-width: 0" | [[File:icon_home.png]]
 +
|style="width:95%; border-width: 0" |
 +
* [[Main_Page|Home]]
 +
* [[Applications|Application support]]
 +
* [[General|General]]
 +
* [[Training|Training]]
 +
* [[Programming|Programming support]]
 +
|-
 +
|}

Revision as of 08:10, 20 June 2018

Introduction

An interactive session can be started on Viper that can be used for any task that requires interaction and should be used for any computationally demanding task rather than using the login node.

Examples for interactive usage include:

  • Code compilation
  • Data analysis
  • Basic visualisation
  • Console based interactive applications such as Python, R, Matlab, SAS or Stata
  • Graphical user interfaces such as Matlab, SAS or Stata


Interactive Sessions

Starting an interactive session

An interactive session can be started by using the interactive command:

[username@login01 ~]$ interactive
salloc: Granted job allocation 306844
Job ID 306844 connecting to c068, please wait...
[username@c068 ~]$

To exit from an interactive session just type exit:

[username@c068 ~]$ exit
logout
salloc: Relinquishing job allocation 306844
[username@login01 ~]$ 

By default the interactive command will give you an allocation to a single compute core on a node for 12 hours and a standard 4GB of RAM. This can be adjusted in the following ways:

Exclusive interactive session
[username@login01 ~]$ interactive --exclusive
salloc: Granted job allocation 306848
Job ID 306848 connecting to c174, please wait...

Note: This will give you the whole node for your job exclusively, if this is not specified there may be other jobs that are running on the allocated node at the same time. For example if you job requires a significant about of processing cores this should be specified, single processing core tasks would not require this and would leave the other process cores idle. (See Interactive session with additional CPU cores below).

Interactive session with additional CPU cores
[username@login01 ~]$ interactive -n24
salloc: Granted job allocation 306849
Job ID 306849 connecting to c174, please wait...
Interactive session with additional RAM
[username@login01 ~]$ interactive --mem=24G
salloc: Granted job allocation 306852
Job ID 306852 connecting to c068, please wait...

Note: if a job exceeds the requested about of memory, it will terminate with an error message similar to the following (a job which ran with a memory limit of 2GB):

slurmstepd: Step 307110.0 exceeded memory limit (23933492 > 2097152), being killed
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: got SIGCONT
slurmstepd: Exceeded job memory limit
Interactive session using high memory partition

High memory node

[username@login01 ~]$ interactive -phighmem
salloc: Granted job allocation 306153
Job ID 306153 connecting to c233, please wait...
Interactive session using GPU partition

GPU node with a single GPU and non exclusive

[username@login01 ~]$ interactive -pgpu
salloc: Granted job allocation 306855
Job ID 306855 connecting to gpu02, please wait...

GPU node with a single GPU and exclusive

[username@login01 ~]$ interactive -pgpu --exclusive
salloc: Granted job allocation 306856
Job ID 306856 connecting to gpu03, please wait...

GPU node with all 4 GPUs and exclusive

[username@login01 ~]$ interactive -pgpu --gres=gpu:tesla:4 --exclusive
salloc: Granted job allocation 1043984
Job ID 1043984 connecting to gpu02, please wait...
Last login: Fri May 18 11:03:13 2018 from login01
[username@gpu02 ~]$
Interactive session with a node reservation

This example is for a reservation of 327889 and the partition (queue) GPU, missing the partition name will default to the compute queue.

[username@login01 ~]$ interactive -pgpu --reservation=327889
salloc: Granted job allocation 306353
Job ID 306353 connecting to gpu04, please wait...

More Information

More information can be found by typing the following (based on slurm 15.08.8):

[username@login01 ~]$ interactive --help

Parallel run options:
  -A, --account=name          charge job to specified account
      --begin=time            defer job until HH:MM MM/DD/YY
      --bell                  ring the terminal bell when the job is allocated
      --bb=<spec>             burst buffer specifications
      --bbf=<file_name>       burst buffer specification file
  -c, --cpus-per-task=ncpus   number of cpus required per task
      --comment=name          arbitrary comment
      --cpu-freq=min[-max[:gov]] requested cpu frequency (and governor)
  -d, --dependency=type:jobid defer job until condition on jobid is satisfied
  -D, --chdir=path            change working directory
      --get-user-env          used by Moab.  See srun man page.
      --gid=group_id          group ID to run job as (user root only)
      --gres=list             required generic resources
  -H, --hold                  submit job in held state
  -I, --immediate[=secs]      exit if resources not available in "secs"
      --jobid=id              specify jobid to use
  -J, --job-name=jobname      name of job
  -k, --no-kill               do not kill job on node failure
  -K, --kill-command[=signal] signal to send terminating job
  -L, --licenses=names        required license, comma separated
  -m, --distribution=type     distribution method for processes to nodes
                              (type = block|cyclic|arbitrary)
      --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
      --mail-user=user        who to send email notification for job state
                              changes
  -n, --tasks=N               number of processors required
      --nice[=value]          decrease scheduling priority by value
      --no-bell               do NOT ring the terminal bell
      --ntasks-per-node=n     number of tasks to invoke on each node
  -N, --nodes=N               number of nodes on which to run (N = min[-max])
  -O, --overcommit            overcommit resources
      --power=flags           power management options
      --priority=value        set the priority of the job to value
      --profile=value         enable acct_gather_profile for detailed data
                              value is all or none or any combination of
                              energy, lustre, network or task
  -p, --partition=partition   partition requested
      --qos=qos               quality of service
  -Q, --quiet                 quiet mode (suppress informational messages)
      --reboot                reboot compute nodes before starting job
  -s, --share                 share nodes with other jobs
      --sicp                  If specified, signifies job is to receive
                              job id from the incluster reserve range.
      --signal=[B:]num[@time] send signal when time limit within time seconds
      --switches=max-switches{@max-time-to-wait}
                              Optimum switches and max time to wait for optimum
  -S, --core-spec=cores       count of reserved cores
      --thread-spec=threads   count of reserved threads
  -t, --time=minutes          time limit
      --time-min=minutes      minimum time limit (if distinct)
      --uid=user_id           user ID to run job as (user root only)
  -v, --verbose               verbose mode (multiple -v's increase verbosity)
      --wckey=wckey           wckey to run job under

Constraint options:
      --contiguous            demand a contiguous range of nodes
  -C, --constraint=list       specify a list of constraints
  -F, --nodefile=filename     request a specific list of hosts
      --mem=MB                minimum amount of real memory
      --mincpus=n             minimum number of logical processors (threads)
                              per node
      --reservation=name      allocate resources from named reservation
      --tmp=MB                minimum amount of temporary disk
  -w, --nodelist=hosts...     request a specific list of hosts
  -x, --exclude=hosts...      exclude a specific list of hosts

Consumable resources related options:
      --exclusive[=user]      allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
      --mem-per-cpu=MB        maximum amount of real memory per allocated
                              cpu required by the job.
                              --mem >= --mem-per-cpu if --mem is specified.

Affinity/Multi-core options: (when the task/affinity plugin is enabled)
  -B  --extra-node-info=S[:C[:T]]            Expands to:
       --sockets-per-node=S   number of sockets per node to allocate
       --cores-per-socket=C   number of cores per socket to allocate
       --threads-per-core=T   number of threads per core to allocate
                              each field can be 'min' or wildcard '*'
                              total cpus requested = (N x S x C x T)

      --ntasks-per-core=n     number of tasks to invoke on each core
      --ntasks-per-socket=n   number of tasks to invoke on each socket


Help options:
  -h, --help                  show this help message
  -u, --usage                 display brief usage message

Other options:
  -V, --version               output version information and exit

Further Information


Icon home.png