Difference between revisions of "FurtherTopics/Additional Information On Viper"

From HPC
Jump to: navigation , search
m (Pysdlb moved page FurtherInformation/What is Viper to FurtherTopics/What is Viper without leaving a redirect: wrong name)
m (Infrastructure)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
==What is Viper==
+
=== Physical Hardware ===
 +
Viper is based on the Linux operating system and is composed of approximately 5,500 processing cores with the following specialised areas:
 +
* 180 compute nodes, each with 2x 14-core Intel Broadwell E5-2680v4 processors (2.4 –3.3 GHz), 128 GB DDR4 RAM
 +
* 4 High memory nodes, each with 4x 10-core Intel Haswell E5-4620v3 processors (2.0 GHz), 1TB DDR4 RAM
 +
* 4 GPU nodes, each identical to compute nodes with the addition of an Nvidia Ampere A40 GPU per node
 +
* 2 Visualisations nodes with 2x Nvidia GTX 980TI
 +
* Intel [http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html Omni-Path] interconnect (100 Gb/s node-switch and switch-switch)
 +
* 500 TB parallel file system ([http://www.beegfs.com/ BeeGFS])
  
Viper is the University of Hulls High-Performance Computer (HPC).
+
The compute nodes ('''compute''' and highmem ) are '''stateless''', with the GPU and visualisation nodes being stateful.  
  
Our supercomputer or more accurately named High-Performance Computer (shortened to HPC) is made up of about 200 separate computers which are linked together by a very high-speed network. The purpose of the network is to allow them to exchange data between them and give the appearance they are acting as one large computer. The operating system is Linux which runs on the whole of Viper. This is the system used by the majority of research systems, including all of the top 500 supercomputers in the world.
+
* Note: the compute nodes that have no persistent operating system storage between boots (e.g. originates from hard drives being the persistent storage mechanism). Stateless nodes do not necessarily imply diskless, as ours have 128Gb SSD temporary space too.
  
==Viper Infrastructure==
+
== Infrastructure ==
  
{| class="wikitable mw-collapsible "
+
* 4 racks with dedicated cooling and hot-aisle containment (see diagram below)
| style="width:25%; background: #ecc;" | Infrastructure
+
* Additional rack for storage and management components
| style="width:75%;background: #ecc; " |
+
* Dedicated, high-efficiency chiller on AS3 roof for cooling
|-
+
* UPS and generator power failover
|[[File: Computenodes.jpg]]
+
 
| '''Compute Nodes'''<br>These make up most of the computing nodes and perform most of the standard computing processes within the HPC. Each node has 128GByte of RAM.
+
[[File:Rack-diagram.jpg]]
 +
 
 +
{| class="wikitable"
 +
| style="width:20%" | <Strong>Image</Strong>
 +
| style="width:5%" | <Strong>Quantity</Strong>
 +
| style="width:75%" | <Strong>Description</Strong>
 
|-
 
|-
|[[File: Highmemorynodes.jpg]]
+
| [[File:Node-compute.png]]
| '''High Memory Nodes'''<br>These are very similar to the compute nodes but they have much more memory, ours have a 1 Tera Byte of RAM each which makes them ideal for research involving large memory models like engineering and DNA analysis in biology.
+
| '''180'''
 +
| These are the main processing cluster nodes, each has Intel Broadwell E5-2680v4 28x cores and 128GB of RAM
 
|-
 
|-
|[[File:Acceleratornodes.jpg]]
+
| [[File:Node-himem.jpg]]
|'''Accelerator Nodes'''<br>Also known as GPU nodes. We have 4 GPU nodes, each identical to compute nodes with the addition of an Nvidia Ampere A40 GPU per node. These are almost similar to high-end graphics cards found in gaming rigs. The usefulness of these cards is that they have thousands of very small processing cores in them and this makes them very useful for executing small amounts of code but in a massively parallel way. This is why these cards are used for the new areas of machine learning and deep learning also.
+
| '''4'''
 +
| These are specialised high memory nodes, each has Intel Haswell E5-4620v3 40x cores and 1 TB of RAM
 
|-
 
|-
|[[File: Visualisationnodes.jpg]]
+
| [[File:Node-visualisation.jpg]]
|'''Visualisation Nodes'''<br>These are used for connecting from remote computers such as desktops and allow the rendered outputs from data to be viewed on a local computer. There are two visualisations nodes with 2x Nvidia GTX 980TI.
+
| '''4'''
 +
| These are specialised accelerator nodes with an Nvidia A40 per node and 128 GB of RAM
 
|-
 
|-
|[[File: Highspeednetwork.jpg]]
+
| [[File:Node-visualisation.jpg]]
|'''High Speed Network'''<br>All these compute nodes are connected by a very fast Intel Omni-path network to allow the compute nodes to act together. This runs at 100Gbit/s.
+
| '''1'''
 +
| This is a specialised accelerator node with 2 Nvidia P100 per node and 128 GB of RAM
 
|-
 
|-
|[[File: Storagenodes.jpg]]
+
| [[File:Node-visualisation.jpg]]
|'''Storage Nodes'''<br>These are servers in their own right which allow access to the actual storage arrays (hard disks), Viper accesses it’s disk storage via these nodes.
+
| '''2'''
|-
+
| These are visualisation nodes that allow remote graphical viewing of data using Nvidia GeForce GTX 980 Ti.
|[[File: Storagearray.jpg]]
 
|'''Storage Array'''<br>These are the actual disks held in a chassis and make up the whole file storage for Viper. Viper has a total file space of 0.5 Petabyte or 500 Terabyte.
 
|-
 
|[[File:Controllernodes.jpg]]
 
|'''Controller Node'''<br>The controller nodes are responsible for the management of all the compute nodes. Managing the loading of jobs, their termination and completion via the job scheduler, for Viper this is SLURM.
 
|-
 
|[[ File:Loginnodes.jpg ]]
 
|'''Login Nodes'''These are the nodes which allow the user to log in to the cluster, this area is then used by the user to prepare jobs for the cluster. Although this is a server in its own right it is not used for computing work.
 
 
|-
 
|-
 
|}
 
|}
  
==How can viper be used?==
 
*In '''Parallel (Single Node)''': where a job can run on a single node with 28 cores.
 
 
*In '''Parallel(Multiple Nodes)''': where a job needs more than 28 cores.
 
  
*Using ac'''High Memory''' node: where a job needs a large amount of memory (currently these nodes have 1TB).
 
  
*Using '''GPU''' node(s): where a job can use fast GPU-accelerated calculations (for example high end graphics manipulation)
 
  
*The '''Visualisation'''  nodes can be used for interactive visualisation and viewing 3D models.
+
[[Main Page | Main Page]]  /    [[FurtherTopics/FurtherTopics #Further Reading| Further Topics]]

Latest revision as of 09:41, 16 November 2022

Physical Hardware

Viper is based on the Linux operating system and is composed of approximately 5,500 processing cores with the following specialised areas:

  • 180 compute nodes, each with 2x 14-core Intel Broadwell E5-2680v4 processors (2.4 –3.3 GHz), 128 GB DDR4 RAM
  • 4 High memory nodes, each with 4x 10-core Intel Haswell E5-4620v3 processors (2.0 GHz), 1TB DDR4 RAM
  • 4 GPU nodes, each identical to compute nodes with the addition of an Nvidia Ampere A40 GPU per node
  • 2 Visualisations nodes with 2x Nvidia GTX 980TI
  • Intel Omni-Path interconnect (100 Gb/s node-switch and switch-switch)
  • 500 TB parallel file system (BeeGFS)

The compute nodes (compute and highmem ) are stateless, with the GPU and visualisation nodes being stateful.

  • Note: the compute nodes that have no persistent operating system storage between boots (e.g. originates from hard drives being the persistent storage mechanism). Stateless nodes do not necessarily imply diskless, as ours have 128Gb SSD temporary space too.

Infrastructure

  • 4 racks with dedicated cooling and hot-aisle containment (see diagram below)
  • Additional rack for storage and management components
  • Dedicated, high-efficiency chiller on AS3 roof for cooling
  • UPS and generator power failover

Rack-diagram.jpg

Image Quantity Description
Node-compute.png 180 These are the main processing cluster nodes, each has Intel Broadwell E5-2680v4 28x cores and 128GB of RAM
Node-himem.jpg 4 These are specialised high memory nodes, each has Intel Haswell E5-4620v3 40x cores and 1 TB of RAM
Node-visualisation.jpg 4 These are specialised accelerator nodes with an Nvidia A40 per node and 128 GB of RAM
Node-visualisation.jpg 1 This is a specialised accelerator node with 2 Nvidia P100 per node and 128 GB of RAM
Node-visualisation.jpg 2 These are visualisation nodes that allow remote graphical viewing of data using Nvidia GeForce GTX 980 Ti.



Main Page / Further Topics