Transcript
Low Power and High Performance HPCN-Workshop 2016 - Göttingen Fredrik Unger 10 May 2016
20 years of server innovation
NEC is the local Japanese market share leader on x86 Servers for 19 consecutive years 3
©NEC Corporation 2016
ENTERPRISE
BLADE
SCALE-UP
NODE PERFORMANCE
NEC Server Lineup
MODULAR
SCALABLE MODULAR
STANDARD SCALE-OUT
PERFORMANCE DENSITY
4
©NEC Corporation 2016
SCALABLE MODULAR SERVER DX2000 Extreme high density compute platform, based on Intel® XeonD® SoC. Ideal for lightweight scale out workloads.
5
©NEC Corporation 2016
Massive scalability
Ideal scalable architecture for distributed parallel computing For intensive scale out compute requirements
Easy deployment (mount, connect, start) Mount chassis, go !
Easy scale out performance From 14 to thousands of nodes
Low latency and high speeds interconnect Low latency and high speed node to node interconnect
6
©NEC Corporation 2016
Easy Maintenance
Module base component design Easy parts maintenance via hot plug dynamic component replacement
Integrated NEC EXPRESSScope Engine For NEC’s BMC fucntionality for overall system management/monitoring
Integrated Switch Cabling reduction
7
©NEC Corporation 2016
NEC EXPRESSScope Engine
Simple and smart remote server management.
Remote Monitoring
Remote Control
Remote Operation
Specially designed baseboard management controller (BMC) chipset, to provide extensive remote management capabilities ― from monitoring the health of remote server components including CPUs, memory, and cooling fans, to remotely controlling and powering on/off the servers ― regardless of the status of the server's power or operating system.
8
©NEC Corporation 2016
NEC DX2000 Hardware System Details 9
©NEC Corporation 2016
DX2000 at a glance ▌3U standard rack enclosure with up to 44* server nodes ▌Supports shared components including power, cooling, management and networking ▌Support for the latest Intel® XeonD® Processor Family ▌Up to 64GB of DDR4 memory per server node ▌Up to 512GB SSD per server node ▌Two or four 10GbE links per server node ▌1502GB/s of total memory bandwidth per enclosure ▌Full manageability with integrated NEC EXPRESSSCOPE Engine 3
*Chassis layouts depend on individual module configuration
10
©NEC Corporation 2016
DX2000 Product Detail – front view ▌Supports shared components (hot plug) including power, cooling, management and networking
3U standard chassis form factor
11
©NEC Corporation 2016
Up to 44 hot plug modules
DX2000 Product Detail – rear view
3x Hot-Plug PSU (2+1)
Chassis Sensor Card (CSC)
12
©NEC Corporation 2016
2x Network Switch Module: 8x 40Gbps (ext, x2) 1x Management LAN (ext, x2) 44x 10Gbps (int)
DX2000 Product Detail – top view 8x Hot-Plug Chassis Fans
44 Module slots
13
©NEC Corporation 2016
2x Network Switch Module
3x Hot-Plug PSU
DX2000 Product Detail – compute module
PCIe 3.0 x8
NEC EXPRESSScope Engine™ (BMC)
2x DDR4 2133MHz slot
14
©NEC Corporation 2016
Intel® XeonD® Processor (4 and 8 cores currently)
2x 10GbE
DX2000 Product Detail – compute module
2x 10GbE
2x DDR4 2133MHz slot
15
©NEC Corporation 2016
PCIe 3.0 x8
M.2 SATA SSD
DX2000 Product Detail – 10GbE expansion card
2x 10GbE
PCIe 3.0 x8
FORTVILLE
16
©NEC Corporation 2016
DX2000 Product Detail – example layouts ▌Empty Node Chassis
17
©NEC Corporation 2016
DX2000 Product Detail – example layouts ▌Chassis with 14 Nodes Minimum configuration, available with 4 and 8 cores modules
18
©NEC Corporation 2016
DX2000 Product Detail – example layouts ▌Chassis with 34 Nodes Available with 4 and 8 cores modules (and =max qty for 8 cores module)
19
©NEC Corporation 2016
DX2000 Product Detail – example layouts ▌Full 44 Nodes Only available with 4 cores based module
20
©NEC Corporation 2016
DX2000 Product Detail – example layouts ▌22x Nodes & 22x LAN modules
CPU Module Expansion Card
21
©NEC Corporation 2016
These work as a pair (see appendix)
DX2000 ▌Standard Rack Enclosure (fits standard 19” 1meter depth rack) ▌48KG weight (same level as 3x 1U rack server) ▌High speed networking 10GbE x44 x2 down 40GbE x8 x2 up
▌Up to 512GB SSD and 64GB Memory per Module ▌Dynamic Module replacement ▌Certain configurations up to 40°C operating temperature support
22
©NEC Corporation 2016
DX2000 HPC Performance
23
©NEC Corporation 2016
HPL Performance Linpack Nodes
Linpack
Power Perf/W
Peak Perf.
Efficiency
1
235.50
70
3.36
268.8
87.6%
4
759.00
280
2.71
1075.2
70.6%
2953.04
1120
2.64
4300.8
68.6%
16
DP FLOPS/Clock = AVX Data BW/64 * MulAdd * AVX Pipeline 16 = 256/64 * 2 * 2
Peak Performance = Clock * DPFLOPS/ Clock * Cores * Nodes Peak Performance = 2.1 * 16 * 8 * Nodes
Intel XEON-D 1541
24
©NEC Corporation 2016
$581* * Source: Intel ARK Listprice CPU
HPL Performance Linpack – E5 2600v4 CPU
Linpack
Power Perf/W
Peak Perf.
Price 2xCPU
2630v4
633.60
330
1.92
704.0
$1334*
2650v4
760.32
370
2.05
844.8
$2332*
2680v4
967.68
400
2.41
1075.2
$3490*
Approximately 90% Efficiency
Peak Performance = Clock * DP FLOPS/ Clock * CPUs * Cores * Nodes Peak Performance = Clock * 16 * 2 * Cores * Nodes
* Source: Intel ARK Listprice CPU 25
©NEC Corporation 2016
CFD Proxy code – 96 Ranks Test
DX2000
2680v3
Diff
%
comm_free
0,208362
0,173053
0,035309 20,4%
exchange_dbl_mpi_bulk_sync
0,285602
0,202987
0,082615 40,7%
exchange_dbl_mpi_early_recv
0,285919
0,203073
0,082846 40,8%
exchange_dbl_mpi_async
0,250358
0,188662
0,061696 32,7%
exchange_dbl_gaspi_bulk_sync
0,266797
0,195907
0,070890 36,2%
exchange_dbl_gaspi_async
0,235841
0,188466
0,047375 25,1%
exchange_dbl_mpi_fence_bulk_sync
0,289356
0,212169
0,077187 36,4%
exchange_dbl_mpi_fence_async
0,288645
0,212144
0,076501 36,1%
exchange_dbl_mpi_pscw_bulk_sync
0,294345
0,211813
0,082532 39,0%
exchange_dbl_mpi_pscw_async
0,270004
0,214176
0,055828 26,1%
Average time difference 33 % 4 Nodes 2680v4 : 1600 W $13.960* 12 Nodes Xeon-D : 26
©NEC Corporation 2016
840 W $6.972* * Source: Intel ARK Listprice CPU
CPU performance comparison Performance of 1CPU (8 Cores)
1,5 Same as E5-2630Lv3
1 0,5
8C
8C
8C
8C
E5-2630v3
E5-2640v3
8C
0 D-1541
C2750
E5-2630Lv3
Performance of 1CPU (4-6 Cores)
1,5 Same as E3-1240Lv3
1 4C
6C
4C
4C
D-1527
E5-2603v3
E3-1240Lv3
E3-1220v3
4C
0,5 0
27
©NEC Corporation 2016
E3-1275Lv3