CS-Storm Hardware Guide H-2003 (Rev E)
Contents
Contents About the CS-Storm Hardware Guide.......................................................................................................................4 CS-Storm System Description...................................................................................................................................5 CS-Storm Power Distribution.....................................................................................................................................9 CS-Storm System Cooling.......................................................................................................................................12 CS-Storm Chilled Door Cooling System........................................................................................................12 CS-Storm Rack Conversion Kits..............................................................................................................................14 CS-Storm Envirnonmental Requirements................................................................................................................19 CS-Storm Chassis Components..............................................................................................................................20 Font and Rear Panel Controls and Indicators...............................................................................................26 Hard Drive Support........................................................................................................................................28 1630W Power Supplies.................................................................................................................................29 Power Backplane...........................................................................................................................................31 GPU Power Connections...............................................................................................................................32 Add-in Card LED Indicators...........................................................................................................................34 PCI Riser Interface Boards............................................................................................................................35 Flex-Foil PCIe Interface Cables.....................................................................................................................36 CS-Storm GPU Sleds..............................................................................................................................................38 GPU trays......................................................................................................................................................39 Right and Left PCI Riser Boards...................................................................................................................41 NVIDIA Tesla GPUs.................................................................................................................................................45 NVIDIA Tesla PH400.....................................................................................................................................48 NVIDIA GPU Boost and Autoboost ...............................................................................................................48 CS-Storm Fan Control Utility....................................................................................................................................52 S2600WP Motherboard Description........................................................................................................................63 Component Locations....................................................................................................................................65 Architecture...................................................................................................................................................66 E2600 v2 Processor Features.......................................................................................................................67 Integrated Memory Controller (IMC)..............................................................................................................68 RAS Modes...................................................................................................................................................71 Integrated I/O Module....................................................................................................................................72 Riser Card Slots............................................................................................................................................74 Integrated BMC.............................................................................................................................................75 S2600TP Motherboard Description.........................................................................................................................79 Component Locations....................................................................................................................................81 Architecture...................................................................................................................................................84
H-2003 (Rev E)
2
Contents
E2600 v3 Processor Features.......................................................................................................................85 Intel E5-2600 v4 Processor Features............................................................................................................86 S2600x Processor Support......................................................................................................................................89 Motherboard System Software......................................................................................................................91 Memory Population Rules.............................................................................................................................92 Motherboard Accessory Options...................................................................................................................93 BIOS Security Features.................................................................................................................................96 Quickpath Interconnect..................................................................................................................................98 InfiniBand Controllers....................................................................................................................................98 Motherboard BIOS Upgrade..................................................................................................................................101
H-2003 (Rev E)
3
About the CS-Storm Hardware Guide
About the CS-Storm Hardware Guide The CS-Storm Hardware Guide describes the components in the 2626X and 2826X server chassis and systemlevel information about the CS-Storm platform.
Hardware Releases H-2003 (Rev E) December 2016. Changed publication number format to match a new convention created for both hardware and software manuals (H-xxxx and S-yyyy). Added information about NVIDIA® M40 and Pascal PH400 GPUs. Reverted back to original GPU support statements: up to 8 GPUs in the compute server and up to 4 GPUs in the login server. HR90-2003-D April 2016. Added clarification that the 2826X compute chassis supports 4 or 8 GPUs and the IO/login chassis supports 4 GPUs. Added a section describing Intel® Xeon E5-2600 v4 processor features. HR90-2003-C March 2016. Added notes that the Intel® Xeon E5-2600 v4 processor family and DDR4 2400 MT/s memory are available. Included a reference/link to the new Cray Chilled Door Operator Guide in the System Cooling section. HR90-2003-B August 2015. Added shipping, operating, and storage environment requirements. Information on the following topics was also added: K80 GPUs, Intel S2600TP motherboard, and additional motherboard I/O capabilities through a 24-lane PCI slot. HR90-2003-A Added CS-Storm rack conversion kit information. HR90-2003 The initial release of the CS-Storm Hardware Guide including the 2626X server chassis and supported Intel S2600WP motherboards with NVIDIA® K40 GPUs.
Scope and Audience This publication does not include information about peripheral I/O switches or network fabric components. Refer to the manufacturers documentation for that equipment. This document assumes the user has attended Cray hardware training courses and is experienced in maintaining HPC equipment.
Feedback Visit the Cray Publications Portal at https://pubs.cray.com and use the "Contact us" link in the upper-right corner to make comments online. Comments can also be emailed using the [email protected] address. Your comments and suggestions are important to us. We will respond to them within 24 hours.
H-2003 (Rev E)
4
CS-Storm System Description
CS-Storm System Description The Cray® CS-Storm cluster supercomputer is an air-cooled, rackmounted, high-density system based on 2U, 24in wide, servers mounted in a 48U rack. Features: ●
Each CS-Storm rack can hold up to 22 rackmounted servers, models 2626X and 2826X.
●
Delivers up to 329 double-precision GPU teraflops of compute performance in one 48U rack.
●
Completely air-cooled platform.
●
Single 100A, 480V, 3-phase power feed to custom PDU in each cabinet.
●
Optional custom liquid-cooled 62kW rear door heat exchanger.
●
Multiple interconnect topology options, including 3D Torus/fat tree, single/ dual rail, and QDR/FDR InfiniBand.
●
2626X and 2826X compute and I/O servers are 2U, Standard EIA, 24” wide: ○
Compute nodes host up to 8 NVIDIA® GPU accelerators per node: K40, K80, M40 and PH400 (PCIe cards).
○
Based on Intel® S2600WP and S2600TP motherboards, 16 DIMM slots.
○
Support for up to 6 x 2.5-in solid-state hard drives.
○
Support for up to three 1630W power supplies and N+ 1 redundancy.
○
Optional QDR/FDR/EDR InfiniBand host channel adapters.
Figure 1. CS-Storm Front View
A 48U rack includes a single power distribution unit (PDU) that provides the electrical connections for equipment in the rack. A single facility power connection supplies 480V, 3-phase, 100A power (up to 62kW per rack). The capability of the power supplies to accept 277V input power enables the rack to support 480V facility power without an optional rackmounted power transformer. The system supports a comprehensive HPC software stack including tools that are customizable to work with most open-source and commercial compilers, schedulers and libraries. The Cray HPC cluster software stack includes Cray’s Advanced Cluster Engine (ACE™) management software, which provides network, server, cluster and storage management capabilities with easy system administration and maintenance. The system also supports the optional Cray Programming Environment on Cluster Systems (Cray PE on CS), which includes the Cray Compiling Environment, Cray Scientific and Math Libraries and Performance Measurement and Analysis Tools.
H-2003 (Rev E)
5
CS-Storm System Description
Figure 2. Cray CS-Storm Rear View
Rear
Front
42U or 48U, 19” or 24” EIA standard rack (24”, 48U shown)
Power distribution unit (PDU)
Optional input power stepdown transformer for 208V equipment and water-cooled door
Facility 480V/100 A power connection
Table 1. Cray CS-Storm General Specifications Feature Architecture
Description Air cooled, up to 22 servers per 48U rack Supported rack options and corresponding maximum number of server nodes:
Power
Cooling
H-2003 (Rev E)
●
24” rack, 42U and 48U options – 18 and 22 nodes, respectively
●
19” rack, 42U and 48U options – 10 and 15 nodes, respectively
●
Up to 63 kW in a 48U standard cabinet, depending on configuration
●
480 V power supplied to rack with a choice of 208VAC or 277VAC 3-phase power supplies. Optional rack-mounted transformer required for 208V equipment
●
Air cooled
●
Airflow: 3,000 cfm; Intake: front; Exhaust: back
●
Optional passive or active chilled cooling rear-door heat exchanger
6
CS-Storm System Description
Feature Cabinet Weight
Description ●
2,529 lbs.; 248 lbs./sq. ft. per cabinet (48U standard air-cooled door)
●
2,930 lbs.; 287 lbs./sq. ft. per cabinet (48U with optional rear-door heat exchanger)
Cabinet Dimensions
88.5” x 30” x 49” (88.5” x 30” x 65” with optional rear-door heat exchanger)
Processors (per node)
One or two 64-bit, Intel Xeon E5-2600 processors: (v2 on S2600WP, v3 [Haswell] and v4 [Broadwell] on S2600TP)
Memory (per node)
●
Sixteen DIMM slots across eight memory channels
●
S2600WP:
●
● Chipset
○
512 GB, registered DDR3 (RDIMM), load reduced DDR3 (LRDIMM), unregistered DDR3 (UDIMM)
○
DDR3 transfer rates of 800/1066/1333/1600/1867 MT/s
S2600TP: ○
1,024 GB RDDR4, LDDR4
○
DDR4 transfer rates of 1600/1866/2133/2400 MT/s
NVIDIA Tesla GPU accelerators (memory listed below)
S2600WP: Intel C600-A Platform Controller Hub (PCH) S2600TP: Intel C610
Accelerators (per node)
Interconnect
External I/O Connections
Internal I/O connectors/ headers
H-2003 (Rev E)
●
Support for up to 8 NVIDIA® Tesla® K40, K80, M40 or PH400 GPU accelerators
●
K40: One GK110 GPU and 12 GB of GDDR5 on-board memory
●
K80: Two GK210 GPUs and 24 GB of GDDR5 on-board memory (12 GB per GPU)
●
M40: One GM200 GPU and 24 GB of GDDR5 on-board memory
●
PH400: One GP100 GPU and 16 GB of HBM2 stacked memory
●
Optional InfiniBand with Mellanox ConnectX®-3/Connect-IB, or Intel True Scale host channel adapters
●
Options for single or dual-rail fat tree or 3D Torus
●
DB-15 Video connectors
●
Two RJ-45 Network Interfaces for 10/100/1000 LAN
●
One stacked two-port USB 2.0 (Port 0/1) connector
●
Optional InfiniBand QSFP port
●
Bridge Slot to extend board I/O ○
Four SATA/SAS ports for backplane
○
Front control panel signals
7
CS-Storm System Description
Feature
Description ○
One SATA 6Gb/s port for Disk on Module (DOM)
●
One USB 2.0 connector
●
One 2x7-pin header for system FAN module
●
One DH-10 serial Port A connector
●
One SATA 6Gb/s (Port 1)
●
One 2x4-pin header for Intel RMM4 Lite
●
One 1x4-pin header for Storage Upgrade Key
●
One 1x8 pin connector for backup power control connector (S2600TP)
Power Connections
Two sets of 2x3-pin connector
System Fan Support
Three sets of dual rotor fans software controlled using hydrad daemon
Riser Support
Four PCIe 3.0 riser slots
Video
Hard Drive Support
●
Riser slot 1 - x16 PCIe 3.0
●
Riser slot 2 - S2600TP: x24 PCIe 3.0 , x16 with InfiniBand.
●
Riser slot 2 - S2600WP: one x16 PCIe 3.0 and one x8 PCIe3 in one physical slot or one x8 PCIe 3.0 with InfiniBand
●
Riser slot 3 - S2600WP: x16, S2600TP: x24
●
Riser slot 4 - x16 PCIe 3.0
●
One Bridge Slot for board I/O expansion
●
Integrated 2D Video Graphics controller
●
DDR3 Memory (S2600WP - 128MB , S2600TP - 16MB)
S2600WP: One SATA port at 6Gb/s on the motherboard. Four SATA/SAS ports (from SCU0; SAS support needs storage upgrade key) and one SATA 6Gb/s port (for DOM) are supported through motherboard bridge board (SATA backplane). Six solid-state hard drives are supported in each chassis. S2600TP: Ten SATA 6Gb/s ports, two of them are SATA DOM compatible.
RAID Support
Server Management
H-2003 (Rev E)
●
Intel RSTe RAID 0/1/10/5 for SATA mode
●
Intel ESRT2 RAID 0/1/10/5 for SAS/SATA mode
●
Cray Advanced Cluster Engine (ACE™): complete remote management capability
●
On-board ServerEngines® LLC Pilot III® Controller
●
Support for Intel Remote Management Module 4 Lite solutions
●
Intel Light-Guided Diagnostics on field replaceable units
●
Support for Intel System Management Software
●
Support for Intel Intelligent Power Node Manager (PMBus®)
8
CS-Storm Power Distribution
CS-Storm Power Distribution CS-Storm systems that are fully deployed at data centers with a 480V power source typically use the 72 outlet rack PDU shown below. The 1630W power supplies in each CS-Storm server connect to the 277VAC PDU outlets using 1.5 m power cords.
Rack PDU Figure 3. CS-Storm 48U Rack PDU The CS-Storm rack PDU receives 480VAC 3-phase facility power through a single AC input connector as shown. Each phase supports 100A maximum. The output voltage to each AC output connector on the Neutral PDU is 277VAC. A 60A, 36-outlet version of the rack Line PDU is also available for less populated configurations. 2
2
DI
DI
G P
3
4
5
6
H DD
H
DD
G
L3
H DD
H
DD
L2
H
DD
H
DD
L1
L3
Rack PDU features:
L1
L3
●
Input Current: 100A maximum per phase
L3
L2
Ground
L2 L1
L2 L1
L3 L2
●
Output voltage: 277VAC or 208 VAC
●
Output power: 1.8kW per port (6.5A, maximum 10A designed)
L1
L3 L2 L1
L3 L2 L1
L3 L2
●
Frequency: 50-60Hz
●
1 AC input connector: 480VAC – Hubbell HBL5100P7W
Rack PDU 72 - 277VAC 10A outlets
L1
L3 L2 L1
L3 L2 L1
L3
●
72 output connectors: 277VAC or 208VAC – RongFeng RF-203P-HP
L2 L1
L3 L2 L1
L3
●
104.3A per phase @ 277VAC output
●
A circuit breaker at the bottom of this PDU protects and applies power to all outlets
L2 L1
L3 L2 L1
PDU Circuit Breaker
Power cable 5 Core, 2 AWG Facility AC Power Hubbell HBL5100P7W 3-phase 277V/480VAC 4P5W
H-2003 (Rev E)
9
CS-Storm Power Distribution
Cray TRS277 Step-Down Transformer The TRS277 step-down, 2U rackmount transformer can power other 208V switches and equipment that cannot accept 277VAC input power.
Figure 4. Cray TRS277 Step-Down Transformer
Specifications: ●
Output: 1.2kW (1.5kVA)
●
Frequency: 50-60Hz
●
Input: 277VAC
●
Output: 220VAC (10 outlets, C13)
PDU Options Cray offers other PDU options from the rack PDU and transformer described above. PDU choices may be based on data center facilities/requirements, customer preferences, and system/rack equipment configurations.
CS-Storm Chassis Power Distribution Three (N+1) power supplies in 2626X/2826X chassis receive power from the rack PDU. The power supplies are installed in the rear of the chassis and distribute power to all the components in the chassis through the power backplane. The power backplane is located at the bottom of the chassis, below the motherboard. Each PCIe riser receives power from the power backplane which supports the PCI add-on cards and 4 GPUs in the GPU sled. 12V auxiliary power from the right and left PCI risers connects to each GPU tray through a blind connector when the tray is installed in the GPU sled.
H-2003 (Rev E)
10
CS-Storm Power Distribution
Figure 5. CS-Storm Power Subsystem Major Components Left PCI riser Right PCI riser
Fan power and tachometer connector to GPU fan
PCI risers provide PCIe bus power and control to GPUs and add-on cards
12V auxiliary power connectors to GPUs
12V power to motherboard Hard drive power
HD
D 1/
2
HD
D 1/
G
G P
2
12V Auxiliary power to GPUs
ID ID
Power button
1630W power supplies receive 277VAC or 208VAC from rack PDU Power backplane
H-2003 (Rev E)
11
CS-Storm System Cooling
CS-Storm System Cooling The CS-Storm 2626X/2826X server chassis is air-cooled by two central chassis fans, power supply fans, and a fan at each end of each GPU sled. These fans pull air in from the front of the chassis and push air out the back as shown below. The central chassis fans 1A and 1B pull cool air in from the front across the hard disk drives, and direct it over the motherboard, power backplane, and 1630W power supplies. These central chassis fans send tachometer signals and receive power from the power backplane. The central chassis fan speed is controlled by the baseboard management controller (BMC) integrated on the motherboard. GPU fans 1-4 receive power and send tachometer signals through the PCI riser and power backplane. GPU fan speeds are controlled by the hydrad fan speed control utility. Figure 6. CS-Storm 2626X/2826X Chassis Cooling Subsystem GPU fan 2
Rear
Power supply cooling fans
Chassis fan 1A U
P
G 2 0 1
Chassis fan 1B Fan power and tachometer cable to PCI riser
GPU fan 1
GPU fan 3
Airflow
Front Airflow
GPU fan 4
CS-Storm Chilled Door Cooling System An optional Motivair® ChilledDoor® rack cooling system is available for attaching to the back of CS-Storm racks. The 48U rack supports an optional 62kW chilled-water heat exchanger. The 42U rack supports a 57kW heat exchanger. The chilled door is hinged and replaces the rear door of the CS-Storm rack. This chilled door uses 65oF facility-supplied water or a coolant distribution unit (CDU) provided with the cooling door system. The chilled door removes heat from the air exhausted out the back of the rack. Fans inside the chilled door draw the heated air through a heat exchanger where heat load is removed and transferred to the cooled water system. A menu-driven programmable logic controller (PLC) with a built in screen and alarm system is included in the chilled door. The PLC gives access to all controls, alarms and event history. The LCD display indicates normal operating conditions with an override display for alarms. Parameters monitored and controlled include fan speed, water flow, inlet air, outlet air, inlet water, and outlet water temperatures.
H-2003 (Rev E)
12
CS-Storm System Cooling
Refer to the Cray Chilled Door Operator Guide for detailed information about the ChilledDoor and CDU control and monitoring systems, programming displays, and set points and alarms. This guide also includes maintenance and operation procedures, and a list of spare parts is included at the end of the document. Figure 7. CS-Storm Rack Rear Door Heat Exchanger
H-2003 (Rev E)
Figure 8. CS-Storm Rear Door Heat Exchanger Cooling System
13
CS-Storm Rack Conversion Kits
CS-Storm Rack Conversion Kits There are two CS-Storm rack conversion kits: ●
24-to-19: Mounts 24-in CS-Storm servers in a 19-in rack
●
19-to-24: Mounts 19-in servers/switches in a CS-Storm 24-in rack
CS-Storm 24-to-19in Vertical-Mount Conversion Kit A custom 14U rack conversion kit is available for mounting CS-Storm servers vertically in a 19-in rack rather than their normal horizontal position in a 24-in rack. This assembly is shown in CS-Storm 24- to 19-inch Conversion Assembly on page 16and is also referred to as the 14U vertical-mounting kit. The assembly has five 2U wide slots for mounting up to five CS-Storm 2626X or 2826X servers. The front-right handle and bracket of the 2626X/2826X server must be removed and replaced with a server bracket. The server is slid into the 14U rack assembly on its side and is secured to the lower tray with a single screw. Another bracket is attached to the rear of the 2626X/2826X server. This rear bracket acts as a safety stop block to prevent the 2626X/2826X server from unintentionally being removed from the rack. To remove the server, press the release pin on the front of the assembly to disengage the locking clip built into the roof of the conversion rack. DANGER: ●
Heavy Object. Mechanical lift or two person lift are required depending on your equipment. Serious injury, death or equipment damage can occur with failure to follow these instructions:
●
Each CS-Storm server can weigh up to 93lbs (42kg).
●
When installing these servers below 28U, a server lift is required to remove and/or install them from a rack. If a lift is not available, two or more people must use safe lifting techniques to remove/install these servers.
●
When installing these servers at or above 28U, a server lift is required to remove/install a 2626X or 2826X server.
●
Personnel handing this equipment must be trained to follow these instructions. They are responsible for determining if additional requirements are necessary under applicable workplace safety laws and regulations.
A CS-Storm Server Lift video is available that shows how to assemble and use the lift and use it to remove or install a CS-Storm server.
Configuration Rules for CS-Storm Servers in 19-in Racks ●
Top of rack (TOR) switches need to be contained in the 2U air shroud
●
Switches need to be from the Cray approved list (front-to-back/port-side airflow only)
H-2003 (Rev E)
14
CS-Storm Rack Conversion Kits
●
●
42U CS-Storm Rack: ○
Maximum of two vertical-mounting kits per rack (up to 10 CS-Storm servers)
○
Other 19-in wide servers with can be installed in the empty 12U space
48U CS-Storm Rack: ○
Maximum of three vertical-mounting kits per rack (up to 15 CS-Storm servers)
○
Other 19-inch wide servers with can be installed in the empty 4U space
Installation Recommendations ●
Install conversion kit parts from the front/rear of the cabinet. There is no need to remove any cabinet side panels.
●
Use two people, one front and one rear to install the upper and lower trays from the front of the rack.
●
Install the lower tray, then the four braces, then the upper tray, working bottom to top.
●
Position the screws in the rear mounting brackets to set the tray length to fit the desired post-to-post spread, 30-in recommended (see figure). Don't fully tighten these screws until the trays and upper-to-lower braces are installed.
●
Tilt the trays at an angle, side to side, when installing them from the front of the cabinet. The front cutout on each side provides clearance around the front posts so the trays can then be leveled and attached to the posts.
H-2003 (Rev E)
15
CS-Storm Rack Conversion Kits
Figure 9. CS-Storm 24- to 19-inch Conversion Assembly
Server removal safety bracket/ear (beveled edge faces rear)
Bracket/ear functions as a safety stop so the server can’t be removed from the rack unintentionally until the server release pin is pressed. Pressing the server pin disengages the locking clip in the roof of the upper tray.
M4 X 6.0 flush head screws (3)
Parts for the 2626X/2826X server (front lock bracket and handle, safety bracket/ear and screws) are provided in a hardware conversion bag, separate from the rack conversion kit.
CS-Storm 2628X/2828X server
Hardware for the 24- to 19-inch conversion kit is provided with the kit.
st
po 0”
to
st
po
3
14U (top to bottom)
Upper 14U tray assembly
Server release pin (spring loaded, press to release server locking clip to remove server) Server mounting screw
DI DI PG
Upper-to-lower rear braces (2)
G
Lower 14U tray assembly Rear vertical mounting rail
Server bracket mounting screw
2626X/2826X front bracket with handle (replaces rack handle/bracket)
Screw positions for mounting depth of 30” (post-to-post)
Server slots (5 - 2U)
Upper-to-lower front braces (2) (mount flush to back of post) Tray cutout (provides clearance around post when installing tray) Front vertical mounting rail
Rear tray mounting brackets (2R, 2L) [mounts tray assembly to rear vertical mounting rail] M4 x 6.0 screws (16 for attaching upper-to-lower braces to upper/lower trays)
CS-Storm 19- to 24-inch Conversion Kit A custom 10U conversion kit is available for mounting standard 19-inch servers/switches in a CS-Storm 24-inch wide rack. This assembly is shown in CS-Storm 19- to 24-inch Conversion Assembly on page 17 and is also referred to as the 10U conversion kit. This kit has a load limit of 600lbs (270kg).
H-2003 (Rev E)
16
CS-Storm Rack Conversion Kits
Figure 10. CS-Storm 19- to 24-inch Conversion Assembly
10U rear frame (2)
Distance from outside edges of front and back 10U frames should be set at 29.5 inches (75 cm) for 19-inch equipment
Frame alignment tabs (4)
10U (top to bottom)
Horizontal bars (4)
Standard 19-inch 1U server/switch 10U front frame (2) Load limit: 270 kg/600 lbs
Rear vertical-mounting rail
Front verticalmounting rail (24-inch wide, 42U or 48U rack)
CS-Storm Server Support Brackets Each CS-Storm 2626X and 2826X rackmount server sits on a pair of support brackets mounted to the 24-inch rack. These support brackets are included in rack all preconfigured CS-Storm cabinets. A set of support bracket/ angle assemblies should be ordered when additional or separate CS-Storm servers are ordered. Support bracket/angle part numbers for mounting 2626X and 2826X servers in 24-inch racks: ●
101072200: left support angle assembly
●
101072201: right support angle assembly
H-2003 (Rev E)
17
CS-Storm Rack Conversion Kits
Figure 11. CS-Storm Server Support Brackets
Rear angle M6 X 16.0 Pan screws (4)
Shelf to support server Rear angle Front angle
(101072201)
Front angle
(101072200)
H-2003 (Rev E)
18
CCS Environmental Requirements
CCS Environmental Requirements The following table lists shipping, operating and storage environment requirements for Cray Cluster Systems. Table 2. CCS Environmental Requirements Environmental Factor
Requirement Operating
Operating temperature
Operating humidity
41° to 95° F (5° to 35° C) [up to 5,000 ft (1,500 m)] ●
Derate maximum temperature (95° F [35° C]) by 1.8° F (1° C)
●
1° C per 1,000 ft [305 m] of altitude above 5,000 ft [1525 m])
●
Temperature rate of change must not exceed 18° F (10° C) per hour
8% to 80% non-condensing Humidity rate of change must not exceed 10% relative humidity per hour
Operating altitude
Up to 10,000 ft. (up to 3,050 m) Shipping
Shipping temperature
-40° to 140° F (-40° to 60° C) Temperature rate of change must not exceed 36° F (20° C) per hour
Shipping humidity
10% to 95% non-condensing
Shipping altitude
Up to 40,000 ft (up to 12,200 m) Storage
Storage temperature
41° to 113° F (5° to 45° C) Temperature rate of change must not exceed 36° F (20° C) per hour
Storage humidity
8% to 80% non-condensing
Storage altitude:
Up to 40,000 ft (12,200 m)
H-2003 (Rev E)
19
2626X/2826X Chassis Components
2626X/2826X Chassis Components Cray CS-Storm 2626X/2826X rackmount servers use an EIA standard 24-inch wide chassis. They are configured as compute nodes or I/O (login/service) nodes. The nomenclature 2626X/2826X is used to describe features common to both node types. Major components of both node types are shown below after the features table. Table 3. CS-Storm Node Types Product Name
Description
2626X / 2826X
Compute node and I/O or login node
2626X8 / 2826X8
Compute node
2626X8N / 2826X8N
Compute node with up to 8 NVIDIA GPUs
2626X2 / 2826X2
I/O or login node
2626X2N / 2826X2N
I/O or login node with 2 NVIDIA GPUs
Figure 12. 2626X/2826X Chassis
Input power from rack PDU
Standard 24-inch wide 2U rackmount chassis
RD AR WA RE TO E TH RD
AR
WA RE TO E TH
ID
Six hard drive bays Front panel controls and indicators
Table 4. CS-Storm 2626X/2826X Features Feature
Description
Architecture
●
2U rackmounted servers in 24-in wide chassis
●
One Intel S2600WP or SW2600TP motherboard per rackmount server
H-2003 (Rev E)
20
2626X/2826X Chassis Components
Feature
Description
Power
●
Dual 1630W redundant power supplies (optional N+1)
●
Power input of 277 VAC at 10A from rack PDU
●
Compute node with eight GPUs measured at 2745W under heavy load
●
Six 80mm x 80mm x 38mm fans
●
Passive heat sinks on motherboard and GPUs
●
Operating temperature: 10°C to 30°C
●
Storage temperature: -40°C to 70°C
●
Up to 93 lbs (42 kg)
●
Mechanical lift required for safe installation and removal
●
1 per chassis
Cooling
Weight
Motherboard
Memory Capacity
Disk Subsystem
Expansion Slots
○
2628X - Washington Pass (S2600WP)
○
2828X - Taylor Pass (S2600TP)
●
2626X (S2600WP) - up to 512 GB DDR3
●
2828X (S2600TP) - up to 1,024 GB DDR4
●
On-board SATA 6 Gb/s, optional HW RAID w/BBU (N/A with on-board InfiniBand)
●
Up to six 2.5 inch removable SATA/SAS solid-state drives
●
Compute node ○
●
System management
H-2003 (Rev E)
1 riser card slot: x8 PCIe 3.0
I/O, login, or service node ○
1 riser card slot: x8 PCIe 3.0 (external)
○
1 riser card slot: x16 PCIe 3.0 (external)
○
1 riser card slot: x16 PCIe 3.0 (internal)
●
Cray Advanced Cluster Engine (ACE™): complete remote management capability
●
Integrated BMC with IPMI 2.0 support
●
Remote server control (power on/off, cycle) and remote server initialization (reset, reboot, shut down)
●
On-board ServerEngines® LLC Pilot III® Controller
●
Support for Intel Remote Management Module 4 Lite solutions
●
Intel Light-Guided Diagnostics on field replaceable units
●
Support for Intel System Management Software
●
Support for Intel Intelligent Power Node Manager (CS-Storm 1630 Watt Power supply is a PMBus®-compliant power supply)
21
2626X/2826X Chassis Components
Figure 13. 2626X/2826X Compute Node Components Slot 2 PCI riser interface board Slot 3 PCI riser interface board
Left PCI riser G
Left GPU sled
1630W power supplies (N+1)
P U 2 0 1
SATA backplane G P U 3
GPU fan
0 1
Motherboard
Flex-foil PCI cables Right GPU sled HDD 2
G
Chassis fans
Solid-state hard drives
H-2003 (Rev E)
Hard drive backplanes
22
2626X/2826X Chassis Components
Figure 14. 2626X2 I/O or Login Node Components Slot 2 PCI riser interface board Slot 3 PCI riser interface board
InfiniBand, FC, or 10GbE add-on card
1630W power supplies
G
Left GPU sled (4 GPUs)
x8 Gen3 PCIe slot
P U 2 0 1
Left PCI riser SATA backplane
G P U
GPU fan
3 0
Motherboard (S2600WP)
1
Flex-foil PCI cables to right PCI riser InfiniBand, FC, or 10GbE add-on card HD
D 1/2
RAID add-on card
HD
D 1/2
HDD
2
G
HDD
HDD
G P
HD
D 1/2
ID
3
ID
4
HDD 1
HDD
HDD
6
Hard drive
H-2003 (Rev E)
Fans
5
Hard drive backplanes
23
2626X/2826X Chassis Components
Figure 15. 2826X2 I/O or Login Node Components
Slot 2 PCI riser interface board
Slot 4 PCI riser interface board G P
Left GPU sled (4 GPUs)
U 2 0 1
Left PCI riser SATA backplane Motherboard (S2600TP)
G P U
GPU fan
1630W power supplies
3 0 1
Add-on card x8 Gen3 PCIe slot
InfiniBand, FC, or 10GbE add-on card
D
HD 1/2 HD D 1/2
HDD
2
3
Fans
1/2
HDD
ID
D
HDD
G P
HD
G
ID
4
HDD 1
HDD
HDD
RAID add-on card
Flex-foil PCI cables to right PCI riser
5
6
Hard drive
Hard drive backplanes
Slot 3 cable assembly (wraps under motherboard)
Gen3x16 (to right PCI riser) Slot 3 PCI cable assembly (reverse angle)
Gen3x8 Gen3x16 Gen3x8
Taylor Pass Motherboard Slot 3
Slot 1
Slot 4 Slot 2
H-2003 (Rev E)
24
2626X/2826X Chassis Components
Figure 16. 2626X8 CPU Node Block Diagram
Left GPU sled
Accelerator/GPU
12V
PCIe PCIx16 PCIe PCIx16
(Top)
HDD 2
HDD 3 (Top)
HDD 4
HDD 5 (Top)
HDD 6
(Bottom)
12V/Tach Pwr Mgt
PCI Slot 2 Slot 2 riser IFB
SATA 0 SAS SAS SAS SAS
B 12V
Power Management Bus
T B 12V/Tach/Pwr Mgt
(Bottom)
PCI SMBUS_S2
PCI Slot 3 Slot 3 riser IFB
12V/Tach/Pwr Mgt
T
(Bottom)
PLX
SMBUS_S3
Hard drive backplanes
HDD 1
PCI
Left PCI riser slots 2 and 3
PLX
12V/Tach Pwr Mgt
Accelerator/GPU
12V
12V
PCI Slot 4
0 1 2 3
SATA backplane
PCI Slot 3
PSU
Motherboard (S2600WP)
Power Management Bus
12V
12V/Tach/Pwr Mgt 12V/Tach/Pwr Mgt
L2
Flex-foil PCI cable
Flex-foil PCI cable
PCI Slot 4
PCI Slot 1
SMBUS_S4 PCI
Right PCI riser slots 1 and 4
PLX PCI
Right GPU sled
H-2003 (Rev E)
Accelerator/GPU
12V
L1 L3
PSU
PCI Slot 1 12V/Tach/Pwr Mgt
L3 L2
PSU
SATA port 1
12V
B
277 or 208VAC from PDU
PCI Slot 2
12V
Power Backplane
T
Bridge board
Slot Slot22PCIeX8 PCIX8
L1
12V/Tach Pwr Mgt
SMBUS_S1 PCI
PLX PCI
Accelerator/GPU
12V
25
2626X/2826X Chassis Components
Figure 17. 2826X8 CPU Node Block Diagram
Left GPU sled
Accelerator/GPU
12V
PCIe PCIx16 PCIe PCIx16
12V/Tach/Pwr Mgt
(Top)
HDD 2
HDD 3 (Top)
HDD 4
HDD 5 (Top)
HDD 6
(Bottom)
SAS SAS SAS SAS
B 12V
Power Management Bus
T B 12V/Tach/Pwr Mgt
(Bottom)
PCI SMBUS_S2
12V/Tach Pwr Mgt
GPU 2 Group Slot 2 riser IFB
SATA 0
T
(Bottom)
PLX
SMBUS_S3 GPU 3 Group Slot 4 riser IFB
Hard drive backplanes
HDD 1
PCI
Left PCI riser slots 2 and 4
PLX
12V/Tach Pwr Mgt
Accelerator/GPU
12V
12V
0 1 2 3
SATA backplane
PCI Slot 3
PCI Slot 4
Bridge board
PSU
Motherboard (S2600TP)
Slot 3 PCIX8
12V
Power Management Bus
Flex-foil PCI cable
12V/Tach/Pwr Mgt
PCI
Accelerator/GPU
L3
L1
Flex-foil PCI cable
GPU 1 Group
12V/Tach Pwr Mgt
SMBUS_S1
Right PCI riser slots 1 and 3
PLX
Right GPU sled
Flex-foil PCI cable
SMBUS_S4 PCI
L1
L2
GPU 4 Group
12V/Tach/Pwr Mgt
PSU
PSU
PCI Slot 1 12V/Tach/Pwr Mgt
L3 L2
SATA port 1
12V
B
277 or 208VAC from PDU
PCI Slot 2
12V
Power Backplane
T
Slot 2 PCIeX8
12V
PCI
PLX PCI
Accelerator/GPU
12V
2626X/2826X Front and Rear Panel Controls and Indicators The front panel controls and indicators are identical for the 2626X/2826X CPU and I/O nodes. Figure 18. 2626X/2826X Front Panel Controls and Indicators GPU status – Green/Red GPU power status – Green/Red ID LED – Off/White System status LED – Off/Red Power status LED – Off/Red ID Button/LED – Off/White Reset Button/LED – Off/White Power Button/LED – Off/Blue
H-2003 (Rev E)
26
2626X/2826X Chassis Components
Power Button
The power button is used to apply power to chassis components. Pressing the power button initiates a request to the BMC integrated into the S2600 motherboard, which forwards the request to the ACPI power state machines in the S2600 chip set. It is monitored by the BMC and does not directly control power on the power supply. ●
Off-to-On Sequence: The Integrated BMC monitors the power button and any wake-up event signals from the chip set. A transition from either source results in the Integrated BMC starting the power-up sequence. Since the processors are not executing, the BIOS does not participate in this sequence. The hardware receives the power good and reset signals from the Integrated BMC and then transitions to an ON state.
●
On-to-Off (operating system down) Sequence: The System Control Interrupt (SCI) is masked. The BIOS sets up the power button event to generate an SMI and checks the power button status bit in the ACPI hardware registers when an SMI occurs. If the status bit is set, the BIOS sets the ACPI power state of the machine in the chip set to the OFF state. The Integrated BMC monitors power state signals from the chip set and de-asserts PS_PWR_ON to the power supply backplane. As a safety mechanism, if the BIOS fails to service the request, the Integrated BMC automatically powers off the system in four to five seconds.
●
On-to-Off (operating system up) The power button switch generates a request through SCI to the operating system to shut down the system if an ACPI operating system is running. The operating system retains control of the system and the operating system policy determines the sleep state into which the system transitions, if any. Otherwise, the BIOS turns off the system.
Reset button
The Reset button initiates a reset request forwarded by the Integrated BMC to the chip set. The BIOS does not affect the behavior of the reset button.
ID button
The ID button toggles the state of the chassis ID LED. If the LED is off, pushing the ID button lights the ID LED. It remains lit until the button is pushed again or until a chassis identify command is received to change the state of the LED.
GPU status The GPU status LED indicates fatal errors have occurred on a PLX chip or GPU. Green/ON : All GPUs and PLX chips are working normally. Red/ON : A fatal error has occurred on a GPU or PLX chip. GPU power The GPU power status LED indicates the power status for the PLX chips on the right and left PCI status risers. Green/ON: GPU power normal. Red/ON: One or more GPU power failures. ID LED
The ID LED is used to visually identify a specific server installed in the rack or among several racks of servers. The ID LED can be illuminated by pushing the ID button or by using a chassis identification utility. White ON: Identifies the sever.
System status
The system status LED indicates a fatal or non-fatal error in the system. OFF: Normal operation. Amber (Solid ON): Fatal error. Amber (Blinking): Non-Fatal error.
System power
The system power status LED indicates S2600 motherboard power status. OFF: Power OFF. Blue ON: Power ON.
Rear Panel Controls and Indicators Rear panel controls and indicators are shown in the following figure. A Red/Green LED on the power supply indicates a failed condition. The power supplies are designated as shown in the figure.
H-2003 (Rev E)
27
2626X/2826X Chassis Components
Figure 19. 2626X/2826X Rear Panel Controls and Indicators
LAN1
PSU 1
LAN2
PSU 2
USB
PSU 3
1630W Power Supplies (2+1)
ID LED
NIC 1 (RJ45)
NIC 2 (RJ45)
Video (DB15)
Status LED
POST Code LEDs (8)
InfiniBand Port (QSFP)*
InfiniBand Link LED*
Stacked 2-port USB 2.0
InfiniBand Activity LED*
* Only on S2600WPQ/WPF
ID LED
NIC 1 (RJ45)
NIC 2 (RJ45)
* Only on S2600TPF
Video (DB15)
InfiniBand Port (QSFP+)*
Status LED
Stacked 2-port USB 2.0
Dedicated Management Port (RJ45)
InfiniBand Link LED*
POST Code LEDs (8)
InfiniBand Activity LED*
2626X and 2826X Hard Drive Support The S2600WP/S2600TP motherboards support one SATA port at 6 Gbps for DOM, four SATA/SAS (3 Gb/s on S2600WP, 6 Gb/s on S2600TP) ports to the backplane are supported through the bridge board slot on the motherboard. The SATA backplane PCB is installed in the bridge board slot and supports 5 of the 6 solid-state drives mounted in the front of the chassis (HDD1-HDD6). HDD2 is cabled to SATA port 1 on the motherboard. On the S2600WP, SAS support needs a storage upgrade key.
H-2003 (Rev E)
28
2626X/2826X Chassis Components
Figure 20. Disk Drive Latch Disk Drive Latch
Figure 21. 2626X/2826X Disk Subsystem SATA backplane
SAS 0 to HDD 1 SAS 1 to HDD 2 SAS 2 to HDD 3 SAS 3 to HDD 4 SATA 0 to HDD 5
Bridge board I/O expansion slot One SATA III port (6 Gb/s) or Four 6 Gb/s SAS ports SATA port 1
SATA backplane control and status to/from power backplane Hard drive power from power backplane
HD
D 1/
2
HD
HDD 1
D 3/
4
HDD 2
G
G P
HD
D 5/
6
ID
HDD 3
SATA Port 1 to HDD 6
ID
HDD 4
HDD 1
HDD
HDD
1
HDD 2
5
HDD 3 HDD 4
HDD 6
HDD 5 HDD 6
1630W Power Supplies Cray CS-Storm 2626X and 2826X servers use 1,630 W, high-efficiency, power supplies. The server can operate normally from 2 power supplies. A third power supply is optional to provide a redundant N+1 configuration. Each power supply receives power from the rack PDU, and plugs into the power backplane assembly in the server chassis. The power supplies support Power Management Bus (PMBus™) technology and are managed over this bus. The power supplies can receive 277 VAC or 208 VAC (200-277 VAC input). An optional rackmounted transformer steps down 480 VAC facility power to 208 VAC for use by other switches/equipment in the rack.
H-2003 (Rev E)
29
2626X/2826X Chassis Components
1630W power supply features: ●
1630W (including 5V stand-by, 5VSB) continuous power: 12V/133A, 5VSB/6A
●
N+1 redundant operation and hot swap capability
●
High line input voltage operation with Active Power Factor Correction
●
Power Management Bus (PMBus™)
●
Dimension: 40.0mm(H) x 76.0 mm(W) x 336mm(L)
●
High Efficiency: CSCI-2010 80 PLUS Gold Compliant
Figure 22. 2626X/2826X 1630W Power Supplies 277 VAC from rack PDU or 208* VAC 2 or 3 (N+1) 1630W power supplies
Power and PMBus connections to power backplane * An optional rackmount step-down transformer or other 208 V source is needed.
1630W Power Supply LED Status Green Power LED
A green/amber LED indicates the power supply status. A (slow) blinking green POWER LED (PWR) indicates that AC is applied to the PSU and that Standby Voltage is available. This same LED shall illuminate a steady green to indicate that all the Power Outputs are available.
Amber Failure The amber LED blinks slowly or may illuminate solid ON to indicate that the power supply has LED failed or reached a warning status and must be replaced. Table 5. 1630W Power Supply LED Status Condition
Green PWR LED Status
Amber FAIL LED Status
No AC power to all power supplies
Off
Off
Power Supply Failure (includes over voltage, over current, over temperature and fan failure)
Off
On
Power Supply Warning events where the power supply continues to operate (high temperature, high power and slow fan)
Off
1Hz Blinking
AC Present / 5VSB on (PSU OFF)
On
Off
Power Supply ON and OK
On
On
H-2003 (Rev E)
30
2626X/2826X Chassis Components
2626X/2826X Power Backplane Each power supply in a 2626X/2826X chassis plugs into the power backplane, located at the bottom of the chassis below the motherboard. The power backplane provides all the power and control connections for the motherboard, front panel controls and indicators, fans, PCIe bus power, and 12V auxiliary power for the GPUs. PCI bus power, 12V auxiliary power, fan power, tachometer signals, and system management bus (SMB) control signals are routed through the left and right PCI risers and distributed to the GPUs, fans, and PCI add-on cards. Fan power (12V) and tachometer signals are provided to each fan through two 4-pin connectors on the power backplane. A 14-pin header provides power good, and fan tachometer signals to the motherboard SMB. Three disk drive power connectors supply 5V and 12V to the hard drive backplanes at the rear of each disk drive. Motherboard SMB control and status signals, LED indicators, and power and reset buttons on the front panel also connect to a 14-pin header on the power backplane assembly. Figure 23. 2626X/2826X Power Backplane Assembly Left PCI riser power and control for motherboard PCI slots 2 and 3 12V fan power and tachometer
1630W power supply connectors
To front panel assmbly, SMB control and monitoring, fault, and power status Power good, SMB, fan tachometer to motherboard fan header
12V@55A power to motherboard 12V fan power and tachometer Front panel control and status to SATA backplane
H-2003 (Rev E)
5V and 12V to hard drive backplanes
Right PCI riser power and control for PCI slots 1 and 4
31
2626X/2826X Chassis Components
Figure 24. 2626X Power Backplane Block Diagram
GPU Power Connections GPUs receive power from their respective right or left PCI riser. The power backplane provides 12V power to each PCI riser. A blind mate connector on the GPU interface board (IFB), attached to the GPU tray, plugs into the PCI riser. The GPU IFB uses either PCIe 6-pin and 8-pin connectors (K40) or an 8-pin connector (K80/M40/ PH400) to provide power to each GPU. The fan control daemon (hydrad) provides an active fan control utility that enables adjustment of fan speeds and powering of GPUs on or off.
H-2003 (Rev E)
32
2626X/2826X Chassis Components
Figure 25. 2626X/2826X 12V Auxiliary Power Connectors GPU tray K40 GPU PCIe 6-pin and 8-pin 12V auxiliary power connectors GPU PCI connector Blind mate power connector (to PCI riser)
GPU IFB (different IFBs for K40 and K80/M40/PH400)
Original (first generation) GPU tray [K40:101154601 | K80: 101192400] K80 GPU 8-pin power connector
K40/K80/M40/PH400 GPU 8-pin power connector
New (second generation) GPU tray [K40:101154602 | K80/M40/PH400: 101192401]
H-2003 (Rev E)
33
2626X/2826X Chassis Components
Figure 26. 2626X/2826X PCI Riser Block Diagram 12V auxiliary power
PCI Riser
12V auxiliary power
12V power male connector
Power backpane
12V@58A 3.3V
DC-DC 12V, 3.3V
Voltage monitor PCIx16 male connector
EEPROM
PLX Chip
I2C
Gen3x16 PCI slot
Slot 2 TX/RX
DC-DC 12V, 1.8V
Gen3x16 PCI slot
Slot 1 TX/RX
DC-DC 12V, 0.9V
JTAG
TX/RX
PCI slot Reset
Motherboard
CLK
Reset Buffer Slot 1 data
SMB MUX I2C
Slot 2 data
2626X and 2826X Plug-in Card LED Indicators There are four plug-in cards that may be installed in 2626X/2826X servers. The meaning of the LEDs on each card are described below. Table 6. 2626X Plug-in Cards Card
Description
Fibre Channel
Dual-port 8Gb fibre channel to PCIe host bus adapter
RAID Controller
Eight-port PCIe RAID controller
InfiniBand
Connect-IB (InfiniBand FDR) single-port QSFP+ host channel adapter card
10GbE
ConnectX-3 10GbE Ethernet dual SFP+ PCIe adapter card
Fibre Channel Adapter:
H-2003 (Rev E)
34
2626X/2826X Chassis Components
Table 7. Dual-port Fibre Channel HBA LED (Light Pipe) Scheme Yellow LED
Green LED
Amber LED
Activity
Off
Off
Off
Power off
On
On
On
Power on (before firmware initialization)
Flashing
Flashing
Flashing
Power on (after firmware initialization)
Yellow, green, and amber LEDs flashing alternately
Firmware error
Off
Off
On and flashing
Online, 2Gbps link/I/O activity
Off
On and flashing
Off
Online, 4Gbps link/I/O activity
On and flashing
Off
Off
Online, 8Gbps link/I/O activity
Flashing
Off
Flashing
Beacon
RAID Controller: The LEDs on the RAID card (one per port) are not visible from outside the server chassis. When lit, each LED indicates the corresponding drive has failed or is in an unconfigured-bad state. InfiniBand: There are two LEDs on the I/O panel. When data is transferring, normal behavior is solid green and flashing yellow. ●
●
Yellow - Physical link ○
Constant yellow indicates a good physical link
○
Blinking indicates a problem with the physical link
○
If neither color is on, the physical link has not been established
○
When logical link is established, yellow LED turns off
Green - Logical (data activity) link ○
Constant green indicates a valid logical (data activity) link without data transfer
○
A blinking green indicates a valid logical link with data transfer
○
If the LED only lights yellow, no green, the logical link has not been established
ConnectX-3 10GbE Ethernet: There are two I/O LEDs per port in dual-port designs (four LEDs between the two ports). ●
●
Green - physical link ○
Constant on indicates a good physical link
○
If neither LED is lit, the physical link has not been established
Yellow - logical (data activity link) ○
Blinking yellow indicates data is being transferred
○
Stays off when there is no activity
2626X/2826X PCI Riser Interface Boards CS-Storm riser slot 2 and riser slot 3 (slot 4 on S2600TP) support PCIe add-on cards when the left GPU cage is removed. Add-on cards in riser slot 2 are secured to the rear panel of the blade. I/O or login servers provide
H-2003 (Rev E)
35
2626X/2826X Chassis Components
openings on the rear panel for add-on cards. Add-on cards in slot 3 (slot 4) are supported by a bracket and must connect to the rear panel with a cable assembly. Slot 2 PCI riser interface board is a x24 PCIe Gen3 bus. It supports a Gen3 x16 bus and also a Gen3 x8 PCIe slot for an add-on card mounted to the rear panel. Figure 27. Slot 2 and Slot 3 PCI Riser Interface Boards Slot 2 PCI riser interface board
PCI add-on card slot PCI X 8
PCI riser interface board (S2600WP: slot 3, S2600TP: slot 4)
Motherboard
2626X/2826X Flex-Foil PCIe Interface Cables PCIe slot 1 and slot 4 on the S2600WP (slot 3 on the S2600TP) motherboards connect to the right PCI riser through flex-foil cables. For IO/login nodes (or when the right GPU sled is removed in compute nodes), add-on cards are supported through slot 1 and slot 4/slot 3. These cards plug into the same right PCI riser board used for GPUs. An optional RAID card connects to the hard drive backplanes on the front panel. Cards in slot 4 are mounted to a bracket inside the chassis. Add-on cards through Slot 1 are secured to the rear panel.
H-2003 (Rev E)
36
2626X/2826X Chassis Components
Figure 28. 2626X/2826X Slot 1 and Slot 4 Flex-Foil Cables PCI flex-foil cable under motherboard to PCI slot 4 on S2600WP, (slot 3 on S2600TP*)
PCI flex-foil cable to motherboard PCI slot 1
Add-on card secured to rear panel
Optional RAID add-on card to hard drive backplanes
* On the S2600TP, slot 3 (X24), the PCI cable assembly includes a flex cable (x16) to the right PCI riser card, just like the S2600WP. The S2600TP PCI cable assembly includes an additional x8 flex cable for an optional low profile add-on card that is mounted to the chassis rear panel (not shown).
H-2003 (Rev E)
37
GPU Sleds
GPU Sleds The right and left GPU sleds each support 4 NVIDIA® Tesla® K40, K80, M40 or PH400 GPUs (8 GPUs per 2626X8N/2826X8N node). The GPU sleds are secured to the chassis with 2 thumb screws, and lift straight out of the chassis. A fan at each end of the GPU sled draws air in from the front of the chassis, and pushes air out the rear. The GPU fans receive power and tachometer signals through 4-pin connectors on the PCI riser. The right GPU riser is connected to the motherboard using a flex-foil cable. The left GPU sled is connected to slots 2 and 3 on the motherboard using PCIe interface PCBs. Figure 29. 2626X8/2826X8 Right and Left GPU Sled Components PCI interface board to motherboard PCIe slot 2 PCI interface board to motherboard PCIe slot 3
GPU slot group 2 1 0 PLX PCIe switch devices
GPU slot group 3 1 0
PCI flex-foil cable to motherboard PCI slot 1
PCI flex-foil cable to motherboard PCI slot 4 -S2600WP*
Airflow
Fan power and tachometer
GPU slot group 1 1 0
Push-pull fan configuration
GPU tray (group 4, tray 0) Airflow
GPU slot group 4 1 0
* On the S2600TP, a different PCI cable assembly connects to slot 3. It includes a flex cable (x16) to the right PCI riser card, just like the S2600WP. The S2600TP PCI cable assembly includes an additional x8 flex cable for an optional low profile add-on card that is mounted to the chassis rear panel (not shown). The GPU group numbering is the same for both S2600WP and S2600TP motherboards.
H-2003 (Rev E)
38
GPU Sleds
Figure 30. Remove Right GPU Sled from 2626X Chassis Flex-foil cable connector from motherboard PCIe slot 1
Flex-foil cable connector from motherboard PCIe slot 4
Handle
GPU Trays GPU trays are easily removed from the GPU sled. The trays are attached to the sled by a screw at each end of the tray. Two handles are provided to remove each tray. Power (12V) to the GPU or accelerator card is provided from the GPU riser card through a blind mate connector. The blind mate connector routes 12V power through a GPU interface board (IFB) attached to the GPU tray. Power from the IFB connects to the GPU through different power connectors as shown in the following figure.
H-2003 (Rev E)
39
GPU Sleds
Figure 31. 2626X and 2826X GPU Tray Power Connectors GPU tray K40 GPU PCIe 6-pin and 8-pin 12V auxiliary power connectors GPU PCI connector Blind mate power connector (to PCI riser)
GPU IFB (different IFBs for K40 and K80/M40/PH400)
Original (first generation) GPU tray [K40:101154601 | K80: 101192400] K80 GPU 8-pin power connector
K40/K80/M40/PH400 GPU 8-pin power connector
New (second generation) GPU tray [K40:101154602 | K80/M40/PH400: 101192401]
Custom Accelerator Cards A GPU tray supports different sized GPUs or custom-designed accelerator cards. The largest accelerator card dimension that the GPU tray can support is 39.06mm x 132.08mm x 313.04mm. Each compute node can support up to 8 accelerator cards (up to 300W per card) per chassis.
H-2003 (Rev E)
40
GPU Sleds
Figure 32. Accelerator Card Tray Dimensions
Right and Left PCI Riser Boards Two PCI riser boards (left and right) connect the motherboard PCI slots to the GPU, accelerator card, or IO addon card. The right PCI riser used in compute servers supports 4 GPUs or accelerator cards. The right PCI riser used in I/O or login servers supports 2 add-on PCI cards. Each PCI riser receives power from power backplane and provides 12V power and control to the Gen3x16 PCI slots on the riser. PCI riser edge connectors plug into the power backplane assembly to receive 12V power. Voltage regulators provide 0.9V, 1.8V, 3.3V, to the Gen3x16 PCI slots. 12V power and control signals from the power backplane are connected to the Gen3x16 PCI slots and to two blind power connectors that supply 12V auxiliary power to the GPUs. Two Gen3x16 PCI slots on the PCI riser support 4 GPU or accelerator cards. The PCI riser includes two ExpressLane™ PEX 8747 devices (PLX chips). The PLX chip is a 48-lane, 5-port, PCIe Gen3 switch device and supports multi-host PCIe switching capability. Each PLX chip multiplexes the single Gen3x16 PCI slot from the motherboard into two Gen3x16 PCI buses to support 2 GPUs or accelerator cards. The PLX chip supports peer-to-peer traffic and multicast for maximum performance. ●
Flex-foil cables connect motherboard slots 1 and 4 to the right PCI riser
●
Interface PCBs connect motherboard slots 2 and 3 to the left PCI riser
●
Login or I/O nodes use a different right PCI riser and mounting hardware to support 2 PCI add-on cards. One card (FC or GbE) is secured to the rear panel, and an internal RAID controller add-on card is mounted internally and connects to the hard drive backplanes.
H-2003 (Rev E)
41
GPU Sleds
Figure 33. PCI Riser Block Diagram - Single PLX Chip 12V auxiliary power
PCI Riser
12V auxiliary power
12V power male connector
Power backpane
12V@58A 3.3V
DC-DC 12V, 3.3V
PCIx16 male connector
EEPROM
PLX Chip
Voltage monitor
I2C
Gen3x16 PCI slot
Slot 2 TX/RX
DC-DC 12V, 1.8V
Gen3x16 PCI slot
Slot 1 TX/RX
DC-DC 12V, 0.9V
JTAG
TX/RX
PCI slot Reset
Motherboard
CLK
Reset Buffer Slot 1 data
SMB MUX I2C
Slot 2 data
Figure 34. 2626X/2826X Left GPU PCI Riser Components Gen3x16 PCIe connector to motherboard PCI slot 2
Gen3x16 PCIe connector to motherboard PCI slot 3
PLX chip
PCI connectors to GPU trays/accelerator cards (4x)
Fan power and tachometer Blind power connector
H-2003 (Rev E)
PLX chip
Edge connector to power backplane
42
GPU Sleds
The right PCI riser used for compute nodes supports 4 GPUs or accelerator cards. The right PCI riser for I/O or login nodes supports 2 add-on PCI cards. Figure 35. 2626X/2826X Right GPU PCI Riser Components Gen3x16 PCIe connector to motherboard slot 1 via flex-foil cable
Gen3x16 PCIe connector to motherboard via flex-foil cable (slot 4-S2600WP, slot 3-S2600TP)
12V auxiliary power connectors PLX chip Gen3x16 PCI slots to accelerator card
PLX chip
12V auxiliary power connectors
Edge connector to power backplane
Gen3x16 PCI slots to accelerator card
Figure 36. Add-on Card Right Riser Card
I/O and Login node Right PCI Riser
FC, IB, GbE add-on card
RAID add-on card
H-2003 (Rev E)
43
GPU Sleds
GPU Fault Conditions The front panel indicators include a GPU power status indicator (green=good, red=fault) and GPU fault status indicator. Each PCI riser supports two high-performance, low-latency PCIe switch devices (PLX chip, PEX8747) that support multi-host PCIe switching capabilities. Each PLX chip provides end-to-end cyclic redundancy checking (ECRC) and poison bit support to ensure data path integrity. The front panel GPU status indicates fatal errors have occurred on a PLX chip or GPU: Green On
All GPUs and PLX chips are working normally
Red On
A fatal error has occurred a GPU or PLX chip
The front panel GPU power status indicates the power status from the PLX chips on the right and left PCI risers: Green On
GPU power normal
Red On
One or more GPU power failures
Figure 37. PLX Chip Error Indicators
H-2003 (Rev E)
44
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
NVIDIA Tesla GPUs: K40, K80, M40 and PH400 The NVIDIA® Tesla® K40, K80 and M40 graphics processing units (GPUs) are dual-slot computing modules that use the Tesla (267 mm length) form factor. These GPUs support PCI Express Gen3 and use passive heat sinks for cooling. Tesla K40/K80 modules ship with ECC enabled by default to protect the register files, cache and DRAM. Tesla M40 boards are shipped with EDC and ECC enabled by default to protect the GPU’s memory interface and the on-board DRAM memories.With ECC enabled, some of the memory is used for the ECC bits, so the available memory is reduced by ~6.25%. Processors and memory for these GPU modules are: ●
●
●
K40 ○
One GK110B GPU
○
12 GB of GDDR5 on-board memory
○
~11.25 GB available memory with ECC on
K80 ○
Two GK210B GPUs
○
24 GB of GDDR5 on-board memory (12 GB per GPU)
○
~22.5 GB available memory with ECC on
M40 ○
One GM200 GPU
○
24 GB of GDDR5 on-board memory
○
~11.25 GB available memory with ECC on
Figure 38. NVIDIA K40, K80 and M40 GPUs
K40
H-2003 (Rev E)
K80
M40
45
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
Table 8. NVIDIA K40, K80 and M40 Features Feature
K40
K80
M40
GPU
●
Processor cores: 2880
●
●
Processor cores: 3072
●
Core clocks:
Processor cores: 4992 (2496 per GPU)
●
Core clocks:
●
Core clocks:
○ ○
Board
Memory
BIOS
Power Connectors
Base clock: 745 MHz Boost clocks: 810 MHz and 875 MHz
●
○
Base clock: 560 MHz
○
Boost clocks: 562 MHz to 875 MHz
Package size: 45 mm × 45mm 2397 pin ball grid array (SFCBGA)
●
○
Base clock: 948 MHz
○
Boost clocks: 1114 MHz
●
Package size: 45 mm × 45mm 2397 pin ball grid array (SFCBGA)
Package size: 45 mm × 45mm 2397 pin ball grid array (SFCBGA)
●
PCI Express Gen3 ×16 system interface
●
Physical dimensions: 111.15 mm (height) × 267 mm (length), dual-slot
●
Memory clock: 3.0GHz
●
Memory clock: 2.5 GHz
●
Memory clock: 3.0 GHz
●
Memory bandwidth 288 GB/sec
●
Memory bandwidth 480 GB/sec (cumulative)
●
Memory bandwidth 288 GB/sec (cumulative)
●
Interface: 384-bit
●
Interface: 384-bit
●
Interface: 384-bit
○
Total board memory: 12 GB
○
Total board memory: 24 GB
○
Total board memory: 24 GB
○
24 pieces of 256M × 16 GDDR5, SDRAM
○
48 pieces of 256M × 16 GDDR5, SDRAM
○
24 pieces of 512M × 16 GDDR5, SDRAM
●
2Mbit serial ROM
●
2Mbit serial ROM
●
4 Mbit EEPROM
●
BAR1 size: 16 GB
●
BAR1 size: 16 GB per GPU
●
BAR0: 16 MB, nonprefetchable, 32-bit
●
BAR1: 32 GB, prefetchable, 64-bit
●
BAR2: 32 MB, prefetchable, 64-bit
●
I/O: Disabled
One 6-pin CPU power connector
One 8-pin CPU power connector
One 8-pin CPU power connector
One 8-pin CPU power connector
H-2003 (Rev E)
46
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
Table 9. NVIDIA Tesla K40, K80 and M40 Board Configuration Specification
K40
K80
M40
Graphics processor
One GK110B
Two GK210B
One GM200
Core clocks
Base clock: 745 MHz
Base clock: 560 MHz
Base clock: 948 MHz
Boost clocks: 810 MHz and 875 MHz
Boost clocks: 562 – 875 MHz Boost clock: 1,114 MHz
Memory clock
3.0 GHz
2.5 GHz
3.0 GHz
Memory Size
up to12 GB
24 GB (per board)
up to 24 GB
12 GB (per GPU) Memory I/O
384-bit GDDR5
Memory bandwidth 288 GB/s (per board)
384-bit GDDR5
384-bit GDDR5
480 GB/s (per board)
288 GB/s
240 GB/s (per GPU) Memory configurations
24 pieces of 256M x 16 GDDR5 SDRAM
48 pieces of 256M × 16 GDDR5 SDRAM
24 pieces of 512M × 16 GDDR5, SDRAM
Display connectors None
None
None
Power connectors
EPS-12V 8-pin
12V 8-pin
PCIe 6-pin PCIe 8-pin
Board power/TDP
235 W
300 W
250 W
Power cap level
235 W
150 W per GPU
250 W
300 W per board BAR1 size
16 GB
16 GB (per GPU)
32 GB
Extender support
Straight extender is the default and the long offset extender is available as an option
Straight extender or long offset extender
Straight extender or long offset extender
Cooling
Passive heat sink
Passive heat sink
Passive heat sink
ASPM
Off
Off
K40 and K80 Connectors and Block Diagrams The K40 receives power through PCIe 6-pin and 8-pin connectors. A Y-cable from these connectors plugs into a K40 interface board (IFB) attached to the GPU tray. The K80 uses a single EPS-12V 8-pin connector/cable that plugs into a K80 IFB. The IFB boards uses a blind power connector that plugs into the GPU riser card.
H-2003 (Rev E)
47
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
Figure 39. K40 and K80 Block Diagrams NVIDIA K80 GPU
384b
12GB GDDR5 24 pieces 256Mx16
BIOS 2Mbit ROM
GPU FB
GK110B
BIOS 2 Mbit ROM
GK210B 384b
Power Supply
PEX
12V PCIe 6 pin
PCI Edge Connector
GPU Riser Card
384b
GPU FB PEX GPIO
GPIO Gen3 x16 PCI bus
GK210B
PLX PCIe Switch
12V
Gen3 x16 PCI bus
PCIe 8 pin
12V aux. power
PCI Edge Connector
K40 IFB
GPU Riser Card
Blind power connector to GPU Riser Card
12GB GDDR5 24 pieces 256Mx16
12GB GDDR5 24 pieces 256Mx16
NVIDIA K40 GPU
Power Supply 12V EPS-12V 8 pin
12V aux. power
K80 IFB
Blind power connector to GPU Riser Card
NVIDIA Tesla PH400 The NVIDIA® Tesla® PH400 GPU is a dual-slot computing modules that supports PCI Express Gen3 and uses a passive heat sink for cooling. The PH400 is shipped with ECC enabled to protect the GPU memory interface and the onboard memories. EDC protects the memory interface by detecting and single, double, and all odd-bit errors. The PH400 with HBM2 memory has native support for ECC and has no ECC overhead, both in memory capacity and bandwidth. Features and specifications: ●
One PascalTM GP100 GPU
●
16 GB Chip on Wafer on Substrate (CoWoS) HBM2 stacked memory
●
Processor cores: 3584
●
TDP: 250 W
●
Base Clock: 1328 MHz
●
Boost clock: 1480 MHz
●
Memory clock: 1.4 Gbps HBM2
●
Memory bandwidth 720 GB/sec
●
GPU Boost: 5.3 teraflops double-precision performance, 10.6 teraflops single-precision performance and 21.2 teraflops half-precision performance
NVIDIA GPU Boost and Autoboost NVIDIA GPU Boost™ is a feature available on NVIDIA Tesla products that makes use of any power and thermal headroom to boost application performance by increasing GPU core and memory clock rates. GPU Boost is customized for compute intensive workloads running on clusters. Application workloads that have headroom can
H-2003 (Rev E)
48
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
run at higher GPU clocks to boost application performance. If power or thermal limits are reached, the GPU clock scales down to the next available clock setting so that the GPU board remains below the power and thermal limit. The GPU clocks available under NVIDIA GPU Boost are: ●
Base Clock: A clock defined to run the thermal design power (TDP) application under TDP test conditions (worst-case board under worst-case test conditions). For Tesla products, the TDP application is typically specified to be a variation of DGEMM.
●
Boost Clock(s): The clocks above the base clock and they are available to the GPU when there is power headroom. The number of boost clocks supported, vary between the different GPUs.
NVIDIA GPU Boost gives full control to end-users to select the core clock frequency that fits their workload the best. The workload may have one or more of the following characteristics: ●
Problem set is spread across multiple GPUs and requires periodic synchronization.
●
Problem set spread across multiple GPUs and runs independent of each other.
●
Workload has “compute spikes.” For example, some portions of the workload are extremely compute intensive pushing the power higher and some portions are moderate.
●
Workload is compute intensive through-out without any spikes.
●
Workload requires fixed clocks and is sensitive to clocks fluctuating during the execution.
●
Workload runs in a cluster where all GPUs need to start, finish, and run at the same clocks.
●
Workload or end user requires predictable performance and repeatable results.
●
Data center is used to run different types of workload at different hours in a day to better manage the power consumption.
GPU Boost on K40 The K40 ships with the GPU clock set to the base clock. To enable the GPU Boost, the end user can use the NVML or nvidia-smi to select one of the available GPU clocks or boost levels. A user or system administrator can select higher clock speeds or disable autoboost and manually set the right clocks for an application, by either running the nvidia-smi command line tool or using the NVIDIA Management Library (NVML). You can use nvidia-smi to control application clocks without any changes to the application.
GPU Boost on K80 The K80 ships with Autoboost enabled by default. Autoboost mode means that when the Tesla K80 is used for the first time, the GPUs will start at base clock and raise the core clock to higher levels automatically as long as the boards stays within the 300 W power limit. Tesla K80 autoboost can automatically match the performance of explicitly controlled application clocks. If you do not want the K80 clocks to boost automatically, the end-user can disable this feature and lock the module to a clock supported by the GPU. The Autoboost feature can be disabled and the module locked to a supported clock speed so the K80 will not automatically boost clocks. The K80 autoboost feature enables GPUs to work independently and not need to run in lock step with all the GPUs in the cluster. The following table summarizes the GPU Boost behavior and features for K40 and K80.
H-2003 (Rev E)
49
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
Table 10. K40 and K80 Boost Features Feature
K40
K80
GPU clocks
745 MHz
562 MHz to 875 MHz at 13 MHz increments
810 MHz 875 MHz Base clock
735 MHz
560 MHz
Autoboost: NVIDIA GPU Boost enabled by default
No. End user has to explicitly select using nvidia-smi/NVML
Yes. Enabled by default to boost the clock based on power headroom
Ability to select clocks via nvidiasmi/NVML
Yes
Yes
Ability to disable NVIDIA GPU Boost
Yes. Using nvidia-smi/NVML
Yes. Using nvidia-smi/NVML
API for GPU Boost NVML is a C-based API for monitoring and managing the various states of Tesla products. It provides a direct access to submit queries and commands via nvidia-smi. NVML documentation is available from: https:// developer.nvidia.com/nvidia-management-library-nvml. The following table is a summary of nvidia-smi commands for using GPU Boost. Table 11. nvidia-smi Command Summary Purpose
Command
View the supported clocks
nvidia-smi–q –d SUPPORTED_CLOCKS
Set one of the supported clocks
nvidia-smi -ac
Make the clock settings persistent across driver unload
nvidia-smi -pm 1
Make the clock settings revert to base clocks after driver nvidia-smi -pm 0 unloads (or turn off the persistent mode) To view the clock in use
nvidia-smi -q –d CLOCK
To reset clocks back to the base clock (as specified in the board specification)
nvidia-smi –rac
To allow “non-root” access to change graphics clock
nvidia-smi -acp 0
Enable auto boosting the GPU clocks
nvidia-smi --auto-boost-default=ENABLED -i 1
Disable auto boosting the GPU clocks
nvidia-smi --auto-boost-default=ENABLED -i 0
H-2003 (Rev E)
50
NVIDIA Tesla GPUs: K40, K80, M40 and PH400
Purpose
Command
To allow “non-root” access to set autoboost
nvidia-smi --auto-boostpermission=UNRESTRICTED -i 0
When using non-default applications clocks, driver persistence mode should be enabled. Persistence mode ensures that the driver stays loaded even when no NVIDIA® CUDA® or X applications are running on the GPU. This maintains current state, including requested applications clocks. If persistence mode is not enabled, and no applications are using the GPU, the driver will unload and any current user settings will revert back to default for the next application. To enable persistence mode run: # sudo nvidia-smi-pm 1 The driver will attempt to maintain requested applications clocks whenever a CUDA context is running on the GPU. However, if no contexts are running the GPU will revert back to idle clocks to save power and will stay there until the next context is created. Thus, if the GPU is not busy, you may see idle current clocks even though requested applications clocks are much higher. NOTE: By default changing the application clocks requires root access If the user does not have root access, the user can request his or her cluster manager to allow non-root control over application clocks. Once changed, this setting will persist for the life of the driver before reverting back to root-only defaults. Persistence mode should always be enabled whenever changing application clocks, or enabling non-root permissions to do so.
Using GPU Boost ●
The K40 runs at a base clock of 745 MHz. Run an workload at the base clock and check the power draw using the NVML or nvidia-smi query. If the power draw is less than 235 W, select a higher boost clock and run the application again. A few iterations and experimentation may be needed to see what boost clock works the best for a specific workload.
●
The K80 ships with Autoboost enabled. The GPUs will start boosting the clock depending on the power headroom.
●
If K40 and K80 GPUs are used with several others in a cluster, root access may be needed to try and set different clocks. The nvidia-smi -acp 0 command grants permission to set different boost clocks.
●
Experimentation may be needed to find a clock speed that works best for a workload running on multiple GPUs running at the same clock speed.
●
Selecting the highest boost clock on a K40 or K80 is likely the best option when running a workload where each GPU works independently on a problem set and there's little interaction or collaboration between GPUs.
H-2003 (Rev E)
51
Hydra Fan Control Utility
Hydra Fan Control Utility The hydra fan control utility monitors and controls GPUs and fans in CS-Storm servers. This utility controls Cray designed PCIe expansion and fan control logic through the motherboard BMC. The utility runs as a Linux service daemon (hydrad) and is distributed as an RPM package. Fan control utility (hydrad) features: ●
Supports 8 GPUs or customer accelerators
●
Supports Intel motherboards
●
Active/manual fan control for 8x GPUs with fan localization (left or right)
●
Supports Red Hat Enterprise Linux (RHEL) 6
●
GPU power on/off for energy saving
●
User-programmable fan control parameters
●
Power data monitoring with energy counter for PSU, motherboard and GPU
Figure 40. 2626X/2826X Fan Control Block Diagram hydra command
hydra CLI
IPMI
GPU Fans Motherboard I2C BMC
ADT7462 Fan Controller
Fan1 Fan2 Fan3
SMBPBI IPMB
Data file
Status update hydrad daemon
start/stop hydrad
PCI1
Fan4
I2C MUX
IPMI PCI2
Config file PCI3 (PCI4 S2600TP) PCI4 (PCI3 S2600TP)
I2C MUX
I2C MUX
I2C MUX
GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8
The CS-Storm fan control utility RPM package includes the following: /usr/sbin/hydrad The hydrad daemon is the main part of the hydra utility and runs as a service daemon on Linux OS. It starts and stops by the init script at runlevel 3, 4, and 5. When the service
H-2003 (Rev E)
52
Hydra Fan Control Utility
starts, hydrad parses the /etc/hydra.conf file for runtime environment information, then identifies/discovers the motherboard BMC, GPU, and fan control hardware logic on the system. The service then monitors the GPU status and fan speed every second. The fan speed varies according to GPU temperature, or what is defined in hydra.conf.The hydrad service updates the data file/var/tmp/hydra_self whenever the GPU or fan status has changed. /usr/sbin/hydrad.sh This script is called by /etc/rc.d/init.d/hydra and invokes the hydrad service. It generates a /tmp/hydrad.log file. /usr/sbin/hydra The hydra fan utility provides following command line interface (CLI) to users. ●
Show GPU status
●
Control GPU power on/off
●
Show fan status
●
Set active/manual fan control mode
●
Set fan speed under manual mode
/etc/hydra.conf This file contains the running environment for the hydrad service. The running parameters for fan speed and GPU temperature can be adjusted on the system. Restart the hydrad service to apply changes made to the hydra.conf file.
RPM Package After installing the hydra RPM package, the hydra utility automatically registers and starts up the service daemon. If you want to change any parameters, modify your /etc/hydra.conf file, then stop and start the hydra service. Install: # rpm -ihv ./hydra-0.4-0.x86_64.rpm The hydrad service will startup automatically during install. hydrad keeps running as a service daemon unless the package is removed. Remove: # rpm -e hydra The /etc/hydra.conf file is moved to /etc/hydra.conf.rpmsave for the next installation.
Data File Once hydrad starts up, a data file is created at /var/tmp/hydra_self. It contains all GPU and fan information that hydrad collects. Both hydrad and hydra use this data file to monitor and control the hydra system. This file can be used as a snapshot image of the latest system status.
H-2003 (Rev E)
53
Hydra Fan Control Utility
Configuration Parameters The hydrad runtime environment is modified using the /etc/hydra.conf configuration file. The /etc/ hydra.conf file contains following parameters. Use the hydra config command to display/verify the current GPU environment settings. Modify the /etc/ hydrad.conf then restart the hydrad service. ●
activefan (on, off, default is on). Selects active or manual fan control mode
●
debug (on, off, default is off). When this option is set to on, hydrad outputs debug messages to /tmp/ hydrad.log.
●
discover (on, off, default is on). hydrad responds if there is a broadcast packet issued from hscan.py on the network UDP 38067 port.
●
fanhigh (fannormal − 100%, default is 85%). The PWM duty value of high speed. If the GPU maximum temperature is higher than hightemp, the fan speed is set by this high duty value. The default setting is full speed.
●
fanlow (5% - fannormal, default is 10%). The pulse-width modulation (PWM) duty value for low speed. If the GPUs maximum temperature is lower than normaltemp, the fan speed is set according to this low duty value. The default value 10%, set for the idled state of the GPU, which reduces fan power consumption.
●
fannormal (fanlow − fanhigh, default is 65%). The PWM duty value of normal fan speed. If the GPU maximum temperature is higher than normaltemp, and lower than hightemp, the fan speed is set run by this normal duty value.
●
fanspeed (5 - 100%, default is 85%). The default fan speed after you set manual fan control mode.
●
CAUTION: ○
GPU Overheating
○
Manually setting the default fan speed to low can overheat the GPU. Monitor GPU temperature after manually setting the fan speed to avoid damage to the GPU or accelerator.
●
gpuhealth (on, off, default is on). Set gpuhealth to off to disable the GPU monitoring function if GPUs are not installed in the system
●
gpumax (0°C - 127°C, default is 90°C). The maximum GPU temperature allowed. If a GPU exceeds the gpumax value, hydrad issues an event in the event log. Set the proper gpumax temperature for the type of GPU installed in the system.
●
gpu_type (auto, K10, K20, K40, K80, MIC, default is auto). You can define the type of your GPU/MIC. If you set auto, hydrad will automatically detect the type of GPU (requires additional time).
●
hightemp (normaltemp - 127°C, default is 75°C). The minimum temperature where the fan runs at high speed. If a GPU exceeds this high temperature value, the fan runs at high speed.
●
login_node (on, off, default is off). When this option is set to on, hydrad operates for a login or I/O node. The I/O or login nodes do not support the Group A components :
●
○
PCI1, PCI4
○
FAN1, FAN4
loglevel (info, warning, critical, default is info). Controls what events hydrad logs to the /tmp/ hydrad.log file
H-2003 (Rev E)
54
Hydra Fan Control Utility
●
nodepower (on, off, default is off). hydrad monitors motherboard power consumption.
●
normaltemp (0°C - hightemp, default is 60°C). The minimum temperature where the fan runs at normal speed. If a GPU temperature exceeds the normal temperature value, the fan runs at normal speed.
●
polling (1 - 100 seconds, default is 2 seconds). Controls how often hydrad service accesses the GPU and fan controller
●
psu_health (on, off, default is off). hydrad monitors GPU power consumption.
●
psupower (on, off, default is on). hydrad checks and monitors power status and consumption of the three PSUs.
●
sysloglevel (info, warning, critical, default is warning). The hydrad service also supports the syslog facility using this log level. hydrad event logs are written to /var/log/messages.
●
CAUTION: ○
GPU Overheating
○
Manually setting the default fan speed to low can overheat the GPU. Monitor GPU temperature after manually setting the fan speed to avoid damage to the GPU or accelerator.
hydra Commands To start the fan control service: # service hydra start To stop the fan control service: # service hydra stop Fan control utility settings are controlled from the /etc/hydra.conf configuration file when hydrad is started. To disable or enable active fan control: # hydra fan [on|off] on: Active Fan Control by GPU temperature off: Manual Fan Control To set manual fan control to a specific PWM duty value (% = 10 to 100): # hydra fan off # hydra fan [%] Command line options (examples shown below): # hydra Usage: hydra [options] Options: - D :display debug message - f :use specific hydrad data file. default: /var/tmp/hydra_self - v :display hydra version Commands: config :display running hydrad settings gpu [on|off] :display or control GPU power node :display node status sensor :display GPU temperatures fan [%|on|off] :display fan status, set duty cycle, active control, manual control
H-2003 (Rev E)
55
Hydra Fan Control Utility
power [node|gpu|clear] :display PSU, motherboard and GPU power status or reset the energy counter hydra config: Display Configuration Paramaters The hydra config command displays parameter values that the hydra service is currently using. Values can be changed in the /etc/hydrad.conf file and implemented by stopping and starting the hydra service. [root@hydra3]# hydra config uid=0 cid=0 id=0 gpu_map=00000000 gpu_type=auto normaltemp=60 hightemp=75 gpumax=90 fanspeed=100 low=50 normal=80 high=100 polling=2 loglevel=info sysloglevel=warning activefan=on gpu_health=on psu_health=on nodepower=off gpupower=off login_node=off debug=off ok [root@hydra3]# hydra gpu: GPU Power Control The CS-Storm has power control logic for the all GPUs that can be controlled using a hydrad CLI command. GPU power can be disabled to reduce power consumption. The default initial power state for GPUs is power on. If the GPU power is off, the GPU is not powered on when powered on, unless GPU power is enabled using the CLI command. The following limitations exist for GPU power control: ●
The OS may crash if GPU power is set to off while the operating system is active due to the disabled PCI link.
●
Reboot the operating system after enabling power to a GPU so that the GPU is recognized.
Show the GPU status or on/off the GPU power. The power operation is performed for all installed GPUs. Individual GPU control is not allowed. Status information includes Bus number, PCI slot, Mux, power status, GPU Type, Product ID, firmware version, GPU slave address, temperature, and status. Use the following commands to enable or disable power to the GPUs. Args: : MIC, GPU on : off :
Display GPU status: Bus(PCI#,Mux), Power, Type, Product ID, FWVer for slave address, Temperature and Status. Turn on the all GPU power. Turn off the all GPU power.
H-2003 (Rev E)
56
Hydra Fan Control Utility
[root@hydra3]# hydra gpu # Slot Mux Power Type PID FWVer 0 1 1 on K40 1023 1 1 2 on K40 1023 2 2 1 on K40 1023 3 2 2 on K40 1023 4 3 1 on K40 1023 5 3 2 on K40 1023 6 4 1 on auto 7 4 2 on auto ok [root@hydra3]# hydra gpu off ok [root@hydra3]# hydra gpu on ok [root@hydra3]#
Addr Temp Status 9eH 31 ok 9eH 33 ok 9eH 32 ok 9eH 33 ok 9eH 31 ok 9eH 32 ok
hydra node: Motherboard BMC Status The hydra node command displays motherboard BMC status, Product ID, BMC firmware version and IP settings. [root@hydra3]# hydra node Prod-ID: 004e BMC Ver: 1.20 BMC CH1: 00:1e:67:76:4e:91 ipaddr: 192.168.1.57 netmask: 255.255.255.0 gateway: 192.168.1.254 BMC CH2: 00:1e:67:76:4e:92 ipaddr: 0.0.0.0 netmask: 0.0.0.0 gateway: 0.0.0.0 Sensors: 4 p1_margin: ok ( -49.0 p2_margin: ok ( -55.0 inlet: ok ( 31.0 outlet: ok ( 45.0 ok [root@hydra3]#
'C) 'C) 'C) 'C)
hydra fan: Display Fan Status and Set Control Mode The hydrad fan command displays fan status and changes fan control mode and speed. When active fan control is disabled, the fan speed is automatically set to the default manual fan speed. Use the hydrad fan command to display controller chip revision, slave address, control mode and fan status. Args: : on : off : % :
Display FAN status: Chip Rev, slave addr, control mode and FAN status. Set Active Fan control mode. Set Manual Fan control mode. Set FAN speed duty. 5-100(%)
[root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : on Fan Stat RPM Duty
H-2003 (Rev E)
57
Hydra Fan Control Utility
FAN1 FAN2 FAN3 FAN4 Ok
ok ok ok ok
9591 9574 9574 9574
50 50 50 50
Set fan control mode to manual: [root@hydra3]# hydra fan off ok [root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : off Fan Stat RPM Duty FAN1 ok 13300 100 FAN2 ok 12980 100 FAN3 ok 13106 100 FAN4 ok 13466 100 Ok Set fan duty cycle to 70%: [root@hydra3]# hydra fan 70 ok [root@hydra3]# hydra fan ADT7462 Rev : 04h ADT7462 Addr: b0h Active Fan : off Fan Stat RPM Duty FAN1 ok 12356 70 FAN2 ok 12300 70 FAN3 ok 12300 70 FAN4 ok 12244 70 Ok Set fan control mode to active. [root@hydra3 ~]# hydra fan on Ok hydra sensor: Display GPU Temperatures The hydra sensor command displays GPU temperatures [root@hydra3 ~]# hydra sensor PCI1-A PCI1-B PCI2-A PCI2-B 31 33 32 33 ok [root@hydra3 ~]#
PCI3-A 31
PCI3-B 32
PCI4-A
PCI4-B
hydra power: Display Power Values The hydra power command displays PSU, motherboard and GPU power status and can be used to reset the peak/average and energy counters. Args:
H-2003 (Rev E)
58
Hydra Fan Control Utility
: node : gpu : clear :
Display PSU power status Display Motherboard power status Display GPU power status Reset all Peak/Average and Energy Counters
[root@hydra]# hydra power No Pwr Stat Temp Fan1 Fan2 +12V Curr ACIn Watt Model 00 on ok 27 7776 6656 11.9 42 207 572 PSSH16220 H 01 on ok 27 6144 5248 11.9 43 207 572 PSSH16220 H 02 Power : 84.0 A 1122 W (Peak 1226 W, Average 1129 W) Energy: 3457.5 Wh in last 11013secs(3h 3m 33s) ok [root@hydra]# hydra PMDev : ADM1276-3 0 Power : 12.2 V 10.5 Energy: 576.1 Wh in ok
power node (ok) p: 368.0 a: 187.6 A 193 W (Peak 228 W, Average 188 W) last 11011secs(3h 3m 31s)
[root@hydra]# hydra power gpu No Slot Stat +12V Curr Watt Peak Avrg Model 1 PCI1 ok 12.2 20.0 366.5 495.0 367.8 ADM1276-3 2 PCI2 ok 12.2 20.8 386.9 485.2 387.7 ADM1276-3 3 PCI3 ok 12.1 20.0 365.5 480.6 364.3 ADM1276-3 4 PCI4 ok 12.2 18.4 339.3 483.2 340.6 ADM1276-3 Power : 78.9 A 1450 W (Peak 1534 W, Average 1407 W) Energy: 4310.9 Wh in last 11019secs(3h 3m 39s) ok
0 0 0 0
[root@hydra]# hydra power clear ok [root@hydra]# hydra power No Pwr Stat Temp Fan1 Fan2 +12V Curr ACIn Watt Model 00 on ok 27 7776 6656 11.9 42 207 560 PSSH16220 H 01 on ok 27 6144 5248 11.9 42 207 558 PSSH16220 H 02 Power : 84.0 A 1118 W (Peak 1118 W, Average 1129 W) Energy: 1.9 Wh in last 1secs(0h 0m 1s) ok [root@hydra]#
Fan Speeds by GPU Temperature As described above, fan speeds increase and decrease based on GPU termperatures. If one of GPU gets hot and exceeds the next temperature region, hydrad immediately changes the fan speed to reach target speed. As the GPU gets back to a low temperature below the region, hydrad will decrease the fan speed step by step. Duty % fanhigh |---------------------------| / ^ | / | | L | fannormal |---------+=======>+--------| / ^ | / | | / | | / |
H-2003 (Rev E)
59
Hydra Fan Control Utility
| L | fanlow |========>+-----------------| +---------------------------- Temperature 'C normaltemp hightemp
GPU and Fan Localization Each group of fans is controlled independently. The GPU temperature for a group does not affect the other group's fan speed. The fan speeds are determined by the GPUs within the same group. The I/O or login nodes do not support the Group A components. The two central chassis fans (fan 1A and 1B) are not control by hydrad. Fans 1A and 1B are controlled by the motherboard BMC. GPUs and fans are separated into two groups: ●
●
Group A components (Right GPU sled): ○
PCI1 - GPU1, GPU2
○
PCI4 - GPU7, GPU8
○
FAN1, FAN4
Group B components (Left GPU sled): ○
PCI2 - GPU3, GPU4
○
PCI3 - GPU5, GPU6
○
FAN2, FAN3
Note: The 2626X2 and 2826X2 login nodes don't have Group A components. Group A and B fans run independently from each other. The temperature of GPUs in one group do not effect the fan speeds in the other group. Fan speeds are determined by the GPU temperatures in that group. Motherboard dedicated fans 1A and 1B are not controlled by hydrad. These fans are controlled by the motherboard BMC.
No Power or Unknown GPU States If there is no power or the GPU state is unknown, hydrad sets the fans speeds to either: ●
Idle Speed (10%), if all of GPUs are powered off
●
Full Speed (100%), if one GPU is unidentified or in an abnormal state (no thermal status reported for example)
Fan Control Watchdog Timeout Condition The CS-storm system includes hardware watchdog timeout logic to protect the GPUs from overheating in the event hydrad malfunctions. The fan speed is set to full speed after 5-10 seconds if any of the following conditions occur: ●
System crash
●
BMC crash
●
hydrad crash
●
hydrad service is stopped
H-2003 (Rev E)
60
Hydra Fan Control Utility
●
hydra fan utility package is removed
Discover Utility A discovery utility (hscan.py) identifies all CS-Storm systems/nodes that are running hydrad. The hscan.py utility provides the following information from hydrad. (hydrad contains the internal identification/discovery service and provides information through UDP port 38067.) You can turn off the discover capability using the discover=off option on the hydra.conf on each CS-Strom system. ●
system: IP address, MAC of eth0, hostname, node type, hydrad version
●
gpu: GPU temperature and type
●
fan: fan status, pwm (L/R) and running speed (RPM)
●
power: PSU, node, GPU power status
If hydra is not running on the system, hscan.py will not display any information even though the system is running and online. NOTE: The fan, power and temperature information can not be displayed together. So the -T, -S, -P, -N, G and -F options can not be combined. Usage: ./hscan.py [options] Options: -h : display this message -w