Difference between revisions of "General/Interactive"

Revision as of 08:07, 20 June 2018

Introduction

An interactive session can be started on Viper that can be used for any task that requires interaction and should be used for any computationally demanding task rather than using the login node.

Examples for interactive usage include:

Code compilation
Data analysis
Basic visualisation
Console based interactive applications such as Python, R, Matlab, SAS or Stata
Graphical user interfaces such as Matlab, SAS or Stata

Interactive Sessions

Starting an interactive session

An interactive session can be started by using the interactive command:

[username@login01 ~]$ interactive
salloc: Granted job allocation 306844
Job ID 306844 connecting to c068, please wait...
[username@c068 ~]$

To exit from an interactive session just type exit:

[username@c068 ~]$ exit
logout
salloc: Relinquishing job allocation 306844
[username@login01 ~]$

By default the interactive command will give you an allocation to a single compute core on a node for 12 hours and a standard 4GB of RAM. This can be adjusted in the following ways:

Exclusive interactive session

[username@login01 ~]$ interactive --exclusive
salloc: Granted job allocation 306848
Job ID 306848 connecting to c174, please wait...

Note: This will give you the whole node for your job exclusively, if this is not specified there may be other jobs that are running on the allocated node at the same time. For example if you job requires a significant about of processing cores this should be specified, single processing core tasks would not require this and would leave the other process cores idle. (See Interactive session with additional CPU cores below).

Interactive session with additional CPU cores

[username@login01 ~]$ interactive -n24
salloc: Granted job allocation 306849
Job ID 306849 connecting to c174, please wait...

Interactive session with additional RAM

[username@login01 ~]$ interactive --mem=24G
salloc: Granted job allocation 306852
Job ID 306852 connecting to c068, please wait...

Note: if a job exceeds the requested about of memory, it will terminate with an error message similar to the following (a job which ran with a memory limit of 2GB):

slurmstepd: Step 307110.0 exceeded memory limit (23933492 > 2097152), being killed
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: got SIGCONT
slurmstepd: Exceeded job memory limit

Interactive session using a different partition

High memory node

[username@login01 ~]$ interactive -phighmem
salloc: Granted job allocation 306153
Job ID 306153 connecting to c233, please wait...

GPU node with a single GPU

[username@login01 ~]$ interactive -pgpu
salloc: Granted job allocation 306855
Job ID 306855 connecting to gpu02, please wait...

GPU node with all 4 GPUs and exclusive

[username@login01 ~]$ interactive -pgpu --gres=gpu:tesla:4 --exclusive
salloc: Granted job allocation 1043984
Job ID 1043984 connecting to gpu02, please wait...
Last login: Fri May 18 11:03:13 2018 from login01
[username@gpu02 ~]$

Interactive session with a node reservation

This example is for a reservation of 327889 and the partition (queue) GPU, missing the partition name will default to the compute queue.

[username@login01 ~]$ interactive -pgpu --reservation=327889
salloc: Granted job allocation 306353
Job ID 306353 connecting to gpu04, please wait...

More Information

More information can be found by typing the following (based on slurm 15.08.8):

[username@login01 ~]$ interactive --help

Parallel run options:
  -A, --account=name          charge job to specified account
      --begin=time            defer job until HH:MM MM/DD/YY
      --bell                  ring the terminal bell when the job is allocated
      --bb=<spec>             burst buffer specifications
      --bbf=<file_name>       burst buffer specification file
  -c, --cpus-per-task=ncpus   number of cpus required per task
      --comment=name          arbitrary comment
      --cpu-freq=min[-max[:gov]] requested cpu frequency (and governor)
  -d, --dependency=type:jobid defer job until condition on jobid is satisfied
  -D, --chdir=path            change working directory
      --get-user-env          used by Moab.  See srun man page.
      --gid=group_id          group ID to run job as (user root only)
      --gres=list             required generic resources
  -H, --hold                  submit job in held state
  -I, --immediate[=secs]      exit if resources not available in "secs"
      --jobid=id              specify jobid to use
  -J, --job-name=jobname      name of job
  -k, --no-kill               do not kill job on node failure
  -K, --kill-command[=signal] signal to send terminating job
  -L, --licenses=names        required license, comma separated
  -m, --distribution=type     distribution method for processes to nodes
                              (type = block|cyclic|arbitrary)
      --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
      --mail-user=user        who to send email notification for job state
                              changes
  -n, --tasks=N               number of processors required
      --nice[=value]          decrease scheduling priority by value
      --no-bell               do NOT ring the terminal bell
      --ntasks-per-node=n     number of tasks to invoke on each node
  -N, --nodes=N               number of nodes on which to run (N = min[-max])
  -O, --overcommit            overcommit resources
      --power=flags           power management options
      --priority=value        set the priority of the job to value
      --profile=value         enable acct_gather_profile for detailed data
                              value is all or none or any combination of
                              energy, lustre, network or task
  -p, --partition=partition   partition requested
      --qos=qos               quality of service
  -Q, --quiet                 quiet mode (suppress informational messages)
      --reboot                reboot compute nodes before starting job
  -s, --share                 share nodes with other jobs
      --sicp                  If specified, signifies job is to receive
                              job id from the incluster reserve range.
      --signal=[B:]num[@time] send signal when time limit within time seconds
      --switches=max-switches{@max-time-to-wait}
                              Optimum switches and max time to wait for optimum
  -S, --core-spec=cores       count of reserved cores
      --thread-spec=threads   count of reserved threads
  -t, --time=minutes          time limit
      --time-min=minutes      minimum time limit (if distinct)
      --uid=user_id           user ID to run job as (user root only)
  -v, --verbose               verbose mode (multiple -v's increase verbosity)
      --wckey=wckey           wckey to run job under

Constraint options:
      --contiguous            demand a contiguous range of nodes
  -C, --constraint=list       specify a list of constraints
  -F, --nodefile=filename     request a specific list of hosts
      --mem=MB                minimum amount of real memory
      --mincpus=n             minimum number of logical processors (threads)
                              per node
      --reservation=name      allocate resources from named reservation
      --tmp=MB                minimum amount of temporary disk
  -w, --nodelist=hosts...     request a specific list of hosts
  -x, --exclude=hosts...      exclude a specific list of hosts

Consumable resources related options:
      --exclusive[=user]      allocate nodes in exclusive mode when
                              cpu consumable resource is enabled
      --mem-per-cpu=MB        maximum amount of real memory per allocated
                              cpu required by the job.
                              --mem >= --mem-per-cpu if --mem is specified.

Affinity/Multi-core options: (when the task/affinity plugin is enabled)
  -B  --extra-node-info=S[:C[:T]]            Expands to:
       --sockets-per-node=S   number of sockets per node to allocate
       --cores-per-socket=C   number of cores per socket to allocate
       --threads-per-core=T   number of threads per core to allocate
                              each field can be 'min' or wildcard '*'
                              total cpus requested = (N x S x C x T)

      --ntasks-per-core=n     number of tasks to invoke on each core
      --ntasks-per-socket=n   number of tasks to invoke on each socket


Help options:
  -h, --help                  show this help message
  -u, --usage                 display brief usage message

Other options:
  -V, --version               output version information and exit

Further Information

https://slurm.schedmd.com/

@@ Line 66: / Line 66: @@
 ===== Interactive session using a different partition =====
+'''High memory node'''
 <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 [username@login01 ~]$ interactive -phighmem
@@ Line 71: / Line 74: @@
 Job ID 306153 connecting to c233, please wait...
 </pre>
+'''GPU node with a single GPU'''
 <pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
@@ Line 76: / Line 81: @@
 salloc: Granted job allocation 306855
 Job ID 306855 connecting to gpu02, please wait...
+</pre>
+'''GPU node with all 4 GPUs and exclusive'''
+<pre style="background-color: #000000; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
+[username@login01 ~]$ interactive -pgpu --gres=gpu:tesla:4 --exclusive
+salloc: Granted job allocation 1043984
+Job ID 1043984 connecting to gpu02, please wait...
+Last login: Fri May 18 11:03:13 2018 from login01
+[username@gpu02 ~]$
 </pre>

HPC