Preview only show first 10 pages with watermark. For full document please download

Nec - T-systems

   EMBED


Share

Transcript

Low Power and High Performance HPCN-Workshop 2016 - Göttingen Fredrik Unger 10 May 2016 20 years of server innovation NEC is the local Japanese market share leader on x86 Servers for 19 consecutive years 3 ©NEC Corporation 2016 ENTERPRISE BLADE SCALE-UP NODE PERFORMANCE NEC Server Lineup MODULAR SCALABLE MODULAR STANDARD SCALE-OUT PERFORMANCE DENSITY 4 ©NEC Corporation 2016 SCALABLE MODULAR SERVER DX2000 Extreme high density compute platform, based on Intel® XeonD® SoC. Ideal for lightweight scale out workloads. 5 ©NEC Corporation 2016 Massive scalability Ideal scalable architecture for distributed parallel computing For intensive scale out compute requirements Easy deployment (mount, connect, start) Mount chassis, go ! Easy scale out performance From 14 to thousands of nodes Low latency and high speeds interconnect Low latency and high speed node to node interconnect 6 ©NEC Corporation 2016 Easy Maintenance Module base component design Easy parts maintenance via hot plug dynamic component replacement Integrated NEC EXPRESSScope Engine For NEC’s BMC fucntionality for overall system management/monitoring Integrated Switch Cabling reduction 7 ©NEC Corporation 2016 NEC EXPRESSScope Engine Simple and smart remote server management. Remote Monitoring Remote Control Remote Operation Specially designed baseboard management controller (BMC) chipset, to provide extensive remote management capabilities ― from monitoring the health of remote server components including CPUs, memory, and cooling fans, to remotely controlling and powering on/off the servers ― regardless of the status of the server's power or operating system. 8 ©NEC Corporation 2016 NEC DX2000 Hardware System Details 9 ©NEC Corporation 2016 DX2000 at a glance ▌3U standard rack enclosure with up to 44* server nodes ▌Supports shared components including power, cooling, management and networking ▌Support for the latest Intel® XeonD® Processor Family ▌Up to 64GB of DDR4 memory per server node ▌Up to 512GB SSD per server node ▌Two or four 10GbE links per server node ▌1502GB/s of total memory bandwidth per enclosure ▌Full manageability with integrated NEC EXPRESSSCOPE Engine 3 *Chassis layouts depend on individual module configuration 10 ©NEC Corporation 2016 DX2000 Product Detail – front view ▌Supports shared components (hot plug) including power, cooling, management and networking 3U standard chassis form factor 11 ©NEC Corporation 2016 Up to 44 hot plug modules DX2000 Product Detail – rear view 3x Hot-Plug PSU (2+1) Chassis Sensor Card (CSC) 12 ©NEC Corporation 2016 2x Network Switch Module: 8x 40Gbps (ext, x2) 1x Management LAN (ext, x2) 44x 10Gbps (int) DX2000 Product Detail – top view 8x Hot-Plug Chassis Fans 44 Module slots 13 ©NEC Corporation 2016 2x Network Switch Module 3x Hot-Plug PSU DX2000 Product Detail – compute module PCIe 3.0 x8 NEC EXPRESSScope Engine™ (BMC) 2x DDR4 2133MHz slot 14 ©NEC Corporation 2016 Intel® XeonD® Processor (4 and 8 cores currently) 2x 10GbE DX2000 Product Detail – compute module 2x 10GbE 2x DDR4 2133MHz slot 15 ©NEC Corporation 2016 PCIe 3.0 x8 M.2 SATA SSD DX2000 Product Detail – 10GbE expansion card 2x 10GbE PCIe 3.0 x8 FORTVILLE 16 ©NEC Corporation 2016 DX2000 Product Detail – example layouts ▌Empty Node Chassis 17 ©NEC Corporation 2016 DX2000 Product Detail – example layouts ▌Chassis with 14 Nodes Minimum configuration, available with 4 and 8 cores modules 18 ©NEC Corporation 2016 DX2000 Product Detail – example layouts ▌Chassis with 34 Nodes Available with 4 and 8 cores modules (and =max qty for 8 cores module) 19 ©NEC Corporation 2016 DX2000 Product Detail – example layouts ▌Full 44 Nodes Only available with 4 cores based module 20 ©NEC Corporation 2016 DX2000 Product Detail – example layouts ▌22x Nodes & 22x LAN modules CPU Module Expansion Card 21 ©NEC Corporation 2016 These work as a pair (see appendix) DX2000 ▌Standard Rack Enclosure (fits standard 19” 1meter depth rack) ▌48KG weight (same level as 3x 1U rack server) ▌High speed networking 10GbE x44 x2 down 40GbE x8 x2 up ▌Up to 512GB SSD and 64GB Memory per Module ▌Dynamic Module replacement ▌Certain configurations up to 40°C operating temperature support 22 ©NEC Corporation 2016 DX2000 HPC Performance 23 ©NEC Corporation 2016 HPL Performance Linpack Nodes Linpack Power Perf/W Peak Perf. Efficiency 1 235.50 70 3.36 268.8 87.6% 4 759.00 280 2.71 1075.2 70.6% 2953.04 1120 2.64 4300.8 68.6% 16 DP FLOPS/Clock = AVX Data BW/64 * MulAdd * AVX Pipeline 16 = 256/64 * 2 * 2 Peak Performance = Clock * DPFLOPS/ Clock * Cores * Nodes Peak Performance = 2.1 * 16 * 8 * Nodes Intel XEON-D 1541 24 ©NEC Corporation 2016 $581* * Source: Intel ARK Listprice CPU HPL Performance Linpack – E5 2600v4 CPU Linpack Power Perf/W Peak Perf. Price 2xCPU 2630v4 633.60 330 1.92 704.0 $1334* 2650v4 760.32 370 2.05 844.8 $2332* 2680v4 967.68 400 2.41 1075.2 $3490* Approximately 90% Efficiency Peak Performance = Clock * DP FLOPS/ Clock * CPUs * Cores * Nodes Peak Performance = Clock * 16 * 2 * Cores * Nodes * Source: Intel ARK Listprice CPU 25 ©NEC Corporation 2016 CFD Proxy code – 96 Ranks Test DX2000 2680v3 Diff % comm_free 0,208362 0,173053 0,035309 20,4% exchange_dbl_mpi_bulk_sync 0,285602 0,202987 0,082615 40,7% exchange_dbl_mpi_early_recv 0,285919 0,203073 0,082846 40,8% exchange_dbl_mpi_async 0,250358 0,188662 0,061696 32,7% exchange_dbl_gaspi_bulk_sync 0,266797 0,195907 0,070890 36,2% exchange_dbl_gaspi_async 0,235841 0,188466 0,047375 25,1% exchange_dbl_mpi_fence_bulk_sync 0,289356 0,212169 0,077187 36,4% exchange_dbl_mpi_fence_async 0,288645 0,212144 0,076501 36,1% exchange_dbl_mpi_pscw_bulk_sync 0,294345 0,211813 0,082532 39,0% exchange_dbl_mpi_pscw_async 0,270004 0,214176 0,055828 26,1% Average time difference 33 % 4 Nodes 2680v4 : 1600 W $13.960* 12 Nodes Xeon-D : 26 ©NEC Corporation 2016 840 W $6.972* * Source: Intel ARK Listprice CPU CPU performance comparison Performance of 1CPU (8 Cores) 1,5 Same as E5-2630Lv3 1 0,5 8C 8C 8C 8C E5-2630v3 E5-2640v3 8C 0 D-1541 C2750 E5-2630Lv3 Performance of 1CPU (4-6 Cores) 1,5 Same as E3-1240Lv3 1 4C 6C 4C 4C D-1527 E5-2603v3 E3-1240Lv3 E3-1220v3 4C 0,5 0 27 ©NEC Corporation 2016 E3-1275Lv3