Preview only show first 10 pages with watermark. For full document please download

Lrz Supermuc Phase 2 – The Story Continues With Lenovo (torsten Bloth, Lenovo)

   EMBED


Share

Transcript

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] That‘s me Torsten Bloth HPC Systems Architect leading architect for the LRZ SuperMUC Phase 2 supercomputer. started with IBM back in 2005 and was part of the IBM System x transition into Lenovo 2014. lives in beautiful Potsdam and it took him a while to travel to Lugano ;) 2 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] There is No Agenda Lenovo and HPC LRZ SuperMUC Phase 1 Review LRZ SuperMUC Phase 2 – Overview and current status – Technology used – Advantages of Water Cooling Technology 4 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Lenovo Who we are 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Who is Lenovo? A $39 billion, Fortune 500 technology company – Publicly listed/traded on the Hong Kong Stock Exchange – 60,000+ employees serving clients in 160+ countries A global company – Two headquarters in Raleigh, N.C and Beijing, China – Major research centers in the U.S, Japan, China – Manufacturing in U.S., China, India, Brazil, Mexico Invests in innovation – Ranked as one of Top 25 most innovative companies – #1 in worldwide PC market share – #2 in worldwide PC & tablet market share 6 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Lenovo Enterprise System Portfolio High-end rack systems 4/8 socket enterprise-class x86 performance, resiliency, security Dense systems Optimize space-constrained data centers with extreme performance and energy efficiency Converged/Blade systems Integration across IBM assets in systems and SW for maximum client optimization and value 1P & 2P Rack & Tower systems Broad rack and tower portfolio to meet a wide range of client needs from infrastructure to technical computing System x Cloud 7 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. SOLUTIONS Analytics Torsten Bloth – [email protected] Technical Computing HPC Storage Portfolio Home Grown + Legacy Lenovo + IBM OEM Offerings Direct Attach GSS 21/22 Network Attach GSS 24/26 8 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] IBM Storwize V3700/V7000 Unified HPC Center Opening a permanent HPC benchmarking and R&D center in Stuttgart, Germany Technology partners will gain access to a state-of-the-art benchmarking facility The center will provide HPC support to Lenovo partners worldwide Industry partners – Intel, IBM, Mellanox, NVIDIA Client partners - Leibniz-Rechenzentrum (LRZ), Science & Technology Facilities Council: Hartree Centre, Barcelona Supercomputing Centre (BSC), Cineca, Rechenzentrum Garching (RZG), Forschungszentrum Jülich (FZJ), Distributed Research utilizing Advanced Computing (DiRAC) ISV partners – ScaleMP, Allinea, PathScale 9 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] SuperMUC Phase 1 Review 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] LRZ SuperMUC – Introduction 2 HPC AC presentations from Klaus Gottschalk – 2011: SuperMUC HPC System – 2013: LRZ SuperMUC – One Year of Operation 11 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] SuperMUC Phase 1 Review 12 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] SuperMUC Phase 1 Review 3 PetaFLOP/s Peak Performance 9288 water cooled nodes with 2 Intel SB-EP 207 nodes with 4 Intel WSM-EX CPUs 324 TB Memory Mellanox InfiniBand FDR10 Interconnect Large File Space for multiple purpose 10 PiB + 200GiB/s GPFS @ DDN SFA12k 2 PiB NAS Storage with 10GiByte/s Innovative Technology for Energy Effective Computing Warm Water Cooling Energy Aware Scheduling with xCAT and LL Huge Monitoring instances PUE ~1.15 through all season free-cooling capability No GPGPUs or other Accelerator Technology 13 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] How do 10k IB cables look like? 14 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 9288 IBM System x iDataPlex dx360 M4 – 43997256 Components – 18576 Intel SandyBridge-EP – 148608 CPU Cores –8.08m² CMOS – 74304 4GB DIMMs 11868 Optical Cables 192640m 23736 QSFP Connectors 15 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. 5040 3TB disks, RAID6 Tubes and Hoses – 690m Stainless Steel – 4039m EPDM – 34153m Copper – 7.9m³ Water Mass: 194100 kg Hardware and 5year Maintenance, Energy, and Support: € 83000000 Torsten Bloth – [email protected] SuperMUC Phase 2 Solution Overview 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Heck! Moores Law? Adding ~3PFLOP/s, Direct Water Cooled Island Design almost equal to SuperMUC Phase I 6 Compute + 1 I/O islands $WORK parallel file system on GSS Technology • Adding 6PB net capacity • 100GB/s 3096 compute nodes nx360 M5 WCT InfiniBand FDR14 IO island with GPFS backend and management servers 18 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. $HOME File System • Adding 2PB Torsten Bloth – [email protected] From Chip to System System 6 Domains 3096 Server Domain 8 Racks 516 Server 508 TF/s 33.024TB RAM Chassis 12 Server ~12 TF/s 768GB RAM Server 2 Chips 986 GF/s 64GB RAM Chip ~493 GF/s 19 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Compute Server 3096 Compute Nodes Lenovo NeXtScale nx360M5 WCT 2 x Intel E5-2697 v3 2.6GHz 14c 64GB Memory Mellanox Connect-IB Single Port Diskless Direct Water Cooled 20 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] InfiniBand Concept #1 ....... #42 3 4:1 1:1 516 (compute nodes) 21 516 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. 516 516 Torsten Bloth – [email protected] 516 516 IO Components InfiniBand Concept – Phase 1 + Phase 2 Fabric #1 36p FDR10 #1 Fabric #2 36p FDR10 #18 36p FDR10 #126 18p leaf #7 Thin compute Edge Switch #2 18p leaf #8 18p leaf #36 18p leaf #1 18p leaf #7 Thin compute Edge Switch #19 18p leaf #8 18p leaf #36 18p leaf #1 18p leaf #7 I/O #1 Edge Switch #20 18p leaf #8 1 n01 18p leaf #18 18p leaf #1 18p leaf #7 Thin compute Edge Switch #1 18p leaf #8 18p leaf #36 18p leaf #1 18p leaf #7 Thin compute Edge Switch #6 18p leaf #8 18p leaf #36 18p leaf #1 n516 SuperMUC Phase I 20 islands (1+18+1) n01 n516 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 18p leaf #7 I/O #2 Edge Switch #7 18p leaf #8 1 GPFS available everyhwhere via Multi-Homed GPFS 22 36p FDR #42 3 1 18p leaf #1 36p FDR #6 36p FDR #1 SuperMUC Phase II 6+1 islands 18p leaf #21 SuperMUC Phase 2 The Technology 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Technology Overview - NeXtScale Chassis Standard Rack Compute Storage Acceleration Water Cooled Node 24 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] The Chassis IBM NeXtScale n1200 WCT Enclosure System infrastructure Simple architecture Water Cool Chassis 6U Chassis, 6 bays Each bay houses a full wide, 2node tray (12 nodes per 6U chassis) Up to 6x 900W or 1300W power supplies N+N or N+1 configurations No fans except PSUs Fan and Power Controller Drip sensor, error LEDs, and web link for detecting water leaks No built in networking Front View shown with 12 compute nodes installed (6 trays) 3x power supplies Rear fillers/EMC Fan and Power shields Controller Rear View 25 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 3x power supplies Rear fillers/EMC shields The Node System infrastructure Simple architecture Water Cool Compute Node 2 compute nodes per full wide 1U tray Water circulated through cooling tubes for component level cooling Dual socket Intel E5-2600 v3 processors (up to 165W) 16x DIMM slots (DDR4, 2133MHz) InfiniBand support: FDR: ConnectX-3 (ML2) FDR: Connect-IB (PCIe) QDR (PCIe) Onboard GbE NICs nx360 M5 WCT Compute Tray (2 nodes) Dual-port ML2 (IB/Ethernet) CPU with liquid 16x DIMM slots cooled heatsink x16 ML2 slot 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. 1 GbE ports Power, LEDs PCI slot for Infiniband Cooling tubes Water Inlet PCI slot for Connect IB x16 ML2 slot PCI slot for Connect IB 26 Labeling tag Torsten Bloth – [email protected] Water Outlet How it works 27 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] SuperMUC Phase 2 Why Water Cool Technology 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Temperature Impact on Processor Performance E5-2697 v3 145W Junction Temp vs Performance on NeXtScale WCT • 1,015 18 °C Inlet 45 °C Inlet 995 n101 Linpack Score (Gflops) n102 • n104 975 n105 Example: Linpack scores across a range of temperatures: • 12 sample processors running on NeXtScale System WCT Linpack scores remain mostly flat for junction temperatures in the range that water cooling operates. n106 Air Cooled n107 • The Linpack scores drop significantly when junction temperature is in range that air cooling operates. • Conclusion: Water Cooling enables the highest performance possible for each processor SKU at any water inlet temperature under 45°C n108 955 n109 n110 n111 935 n112 Tj,max 915 20 30 40 50 60 70 80 90 100 Junction Temperature, Tj (C) Vinod Kamath 29 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Chip Junction vs Water Inlet Temperature – nx360 M5 WCT • Linpack scores remain high and relatively stable for chip junction temperatures <60°C Achieved using <45°C inlet water • Performance drops off significantly at higher junction temps Typical for Air Cool n101 n102 Linpack Linpack T_water_inlet Tj Score Tj Score (°C) (°C) (Gflops) (°C) (Gflops) 8 28 934.5 27 948.4 18 35 935.2 33 948.3 24 41 934.7 42 948.1 35 54 931.9 52 946.3 45 60 926.7 60 944.7 55 68 921.3 73 938.5 55 1 75 918.9 79 936.1 Note: Typical flow is 0.5 lpm, 551 is 0.25 lpm/node 30 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Vinod Kamath High Power Processor Support in WCT E5-2698A v3 Junction Temp vs Performance on NeXtScale WCT 1,150 18 °C Inlet NeXtScale WCT can cool the Xeon E52698A processor due to greater thermal capabilities 35 °C Inlet 1,100 Air Cooled n125 Linpack Score (Gflops) 1,050 n126 n127 1,000 • 165W, 2.8GHz, 16 cores • Highest Linpack performance per node of any Xeon processor at 1.083 Tflops • Highest Gflops/Watt of any Xeon processor at 2.45 GFlops/Watt • Cannot be cooled by air • Processor throttles at 65°C junction temperature (below air cool range) n128 n129 950 n130 • NeXtScale WCT cools 165W processors with inlet temperatures up to 35°C • No chillers required Tj,max 900 2x E5-2698a v3 2x E5-2697 v3 2x E5-2690 v3 HPL (GF) 1083 907 813 Power (W) 441 402 385 Perf/Watt (MF/W) 2457 2256 2111 850 20 30 40 50 60 70 80 90 100 Junction Temperature, Tj (C) 31 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Vinod Kamath Lower Power Consumption with WCT E5-2695 v2 CPU 115W air water = 45°C Socket Power Is relatively flat for junction temperatures in the range that water cooling operates (18-45°C) Increases significantly when junction temperature is in range that air cooling operates. water = 18°C Result: Save 5% power when cooling with water up to 45°C Power saving is about 5% per node Vinod Kamath 32 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] Hot Water Cooling How Direct Water Cooling Makes Your Data Center More Efficient and Lowers Costs Chillers not required for most geographies – Due to inlet water temperature of 18°C to 45°C – Reduce CAPEX for new data centers 40% energy savings in datacenter due to no fans or chillers. Compute node power consumption reduced ~ 10% due to lower component temperatures (~5%) and no fans (~5%) Power Usage Effectiveness PTotal / PIT: PUE ~ 1.1 possible with NeXtScale WCT – 1.1 PUE achieved at LRZ installation – 1.5 PUE is typical of a very efficient air cooled datacenter. 85-90% Heat recovery is enabled by the compute node design – Heat energy absorbed may be reutilized for heating buildings in the winter – Energy Reuse Effectiveness (PTotal – PReuse) / PIT: ERE ~ 0.3 Vinod Kamath 33 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 34 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 35 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected] 36 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]