Difference between revisions of "Applications/SAS"

From HPC
Jump to: navigation , search
(Created page with "__TOC__ === Application Details === * Description: SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, business intelligen...")
 
Line 43: Line 43:
  
 
=== Batch Submission ===
 
=== Batch Submission ===
 +
A better approach is to submit tasks to run automatically without any interaction. To do this, you need a job submission script, an example of which is:
 +
 
<pre style="background-color: #C8C8C8; color: black; border: 2px solid blue; font-family: monospace, sans-serif;">
 
<pre style="background-color: #C8C8C8; color: black; border: 2px solid blue; font-family: monospace, sans-serif;">
 
#!/bin/bash
 
#!/bin/bash
#SBATCH -J MATLAB
+
#SBATCH -J SASjob
 
#SBATCH -N 1
 
#SBATCH -N 1
#SBATCH -o %N.%j.out
+
#SBATCH -n 1
#SBATCH -e %N.%j.err
+
#SBATCH -o %N-%j.log
 +
#SBATCH -e %N-%j.err
 
#SBATCH -p compute
 
#SBATCH -p compute
#SBATCH --exclusive
+
#SBATCH --mem=10G
 
   
 
   
module add matlab/2016a
+
module add test-modules SAS/9.4
 
   
 
   
matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file
+
sas_en -memsize 10G -cpucount 1 -nothreads -noterminal ms_sur_LCT.sas
</pre>
+
</pre>  
 +
 +
Details:
 +
Line 1 - just a standard line that needs to be at the top of the file
 +
Line 2 - the -J sets the name of the job, in this case to SASjob. This doesn’t impact on the job and doesn’t have to be unique. It helps distinguish tasks when looking in squeue (LINK)
 +
Line 3 - this requests you are allocated 1 compute node. SAS should only be run on 1 node
 +
Line 4 - this requests one slot on the node. As you have some tasks that make use of functions which I believe may benefit from parallel computation, we may change this (and other settings) at a later date. For now leave as 1
 +
Line 5 and 6 - these set the output and error files. The log file will contain SAS console output, the error file will contain information that may be useful if things don’t work as expected (at a cluster level)
 +
Line 7 - this requests the job runs on one of the compute nodes on the compute queue.
 +
Line 8 - this requests 10GB of RAM be made available for the task. This can be changed as necessary for the task
 +
Line 9 - this provides access to the SAS software module
 +
Line 10 - this is the run command. Note that the –memsize size should match that requested in line 8, while –cpucount should match line 4. Change the last .sas filename as appropriate
 +
 
 
This submission script can be found at /path/to/sample/script
 
This submission script can be found at /path/to/sample/script
Matlab
 
  
 
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #C8C8C8; color: black; border: 2px solid black; font-family: monospace, sans-serif;">
[username@login01 ~]$ sbatch MATLABtest.job
+
[username@login01 ~]$ sbatch SAStest.job
Submitted batch job 289522
+
Submitted batch job 289552
</pre>
 
 
 
== Parallel Matlab ==
 
By default Matlab on Viper will run on a single core as mutli-threading conflicts with the schedulers resource management, however there are three ways in which Matlab can utilise multiple processors:
 
 
 
=== Multi-threading ===
 
Matlab can use multiple threads for certain built in functions for example linear algebra and numerical functions such as fft, \ (mldivide), eig, svd, and sort. These functions automatically execute on multiple computational threads in a single MATLAB session, allowing them to execute faster on multicore-enabled machines. Additionally, many functions in Image Processing Toolbox are multithreaded
 
 
 
=== Parallel Computing Toolbox ===
 
The Parallel Computing Toolbox allows you to open a number of workers (separate Matlab engines) on the local node, up to the number of cores on that node (28 for standard nodes in an exclusive session).
 
 
 
<pre style="background-color: #C8C8C8; color: black; border: 2px solid green; font-family: monospace, sans-serif;">
 
>> parpool('local',28)
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 28 workers.
 
 
 
ans =
 
 
 
Pool with properties:
 
 
 
            Connected: true
 
          NumWorkers: 28
 
              Cluster: local
 
        AttachedFiles: {}
 
          IdleTimeout: 30 minute(s) (30 minutes remaining)
 
          SpmdEnabled: true
 
 
</pre>
 
</pre>
  
=== Distributed Computing Toolbox ===
 
The distributed computing toolbox is not currently supported on Viper
 
  
 
== Change Log ==
 
== Change Log ==
[2017-01-30] Changed default Matlab launch command in module to matlab -singleCompThread to disable multi-threading
 

Revision as of 09:19, 31 January 2017

Application Details

  • Description: SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, business intelligence, data management, and predictive analytics.
  • Versions: 9.4
  • Module names: SAS/9.4
  • License: Departmental research only multi-platform license, restricted to Accounting and Finance department
  • Forum support: https://www.hpc.hull.ac.uk/forum/viewforum.php?f=22
  • Further information: http://www.sas.com/en_gb/home.html


Usage Examples

Interactive

[username@login01 ~]$ interactive
salloc: Granted job allocation 289669
Job ID 289669 connecting to c170, please wait...
Last login: Thu Jan 26 12:59:11 2017 from 10.254.5.246
[username@c170 ~]$ module add test-modules SAS/9.4

To use the full SAS GUI, make sure you have your environment setup as detailed in LINK and run sas_en:

[username@c170 ~]$ sas_en

To use the command line interactive line mode and not the GUI, run sas_en –nodms:

[username@c170 ~]$ sas_en –nodms

Some functions invoke a SAS window even in interactive line mode, for full line mode use –noterminal flag, i.e.:

[username@c170 ~]$ sas_en -nodms –noterminal

You can then run your task using a SAS script with the following command:

[username@c170 ~]$ sas_en -noteminal sas_file.sas


Batch Submission

A better approach is to submit tasks to run automatically without any interaction. To do this, you need a job submission script, an example of which is:

#!/bin/bash
#SBATCH -J SASjob
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o %N-%j.log
#SBATCH -e %N-%j.err
#SBATCH -p compute
#SBATCH --mem=10G
 
module add test-modules SAS/9.4
 
sas_en -memsize 10G -cpucount 1 -nothreads -noterminal ms_sur_LCT.sas

Details: Line 1 - just a standard line that needs to be at the top of the file Line 2 - the -J sets the name of the job, in this case to SASjob. This doesn’t impact on the job and doesn’t have to be unique. It helps distinguish tasks when looking in squeue (LINK) Line 3 - this requests you are allocated 1 compute node. SAS should only be run on 1 node Line 4 - this requests one slot on the node. As you have some tasks that make use of functions which I believe may benefit from parallel computation, we may change this (and other settings) at a later date. For now leave as 1 Line 5 and 6 - these set the output and error files. The log file will contain SAS console output, the error file will contain information that may be useful if things don’t work as expected (at a cluster level) Line 7 - this requests the job runs on one of the compute nodes on the compute queue. Line 8 - this requests 10GB of RAM be made available for the task. This can be changed as necessary for the task Line 9 - this provides access to the SAS software module Line 10 - this is the run command. Note that the –memsize size should match that requested in line 8, while –cpucount should match line 4. Change the last .sas filename as appropriate

This submission script can be found at /path/to/sample/script

[username@login01 ~]$ sbatch SAStest.job
Submitted batch job 289552


Change Log