Preview only show first 10 pages with watermark. For full document please download

Best Practices

   EMBED


Share

Transcript

KINGSTON.COM Best Practices Server: Performance Benchmark Memory channels, frequency and performance Although most people don’t realize it, the world runs on many different types of databases, all of which have one thing in common, the need for high performance memory to deliver data fast and reliably. From the time we wake up to a phone call processed by our cellular service providers’ customer record database, to our weekly electronic shopping payment being processed by the financial institutions transaction database and our late night movie matinee streaming experience serving us a database of movie recommendations based on our viewing habits; databases serve many of our daily queries and need to perform consistently fast and scale dynamically to meet customer demand. [1] Serving data with consistent performance and transaction integrity is no easy task and often requires in-memory databases to serve viewing recommendations and relational data near instantaneously to multiple users. In-memory databases (IMDB) rely primarily on the use of high capacity and most importantly high performance DRAM (Dynamic Random Access Memory). They can service a high volume of requests up to x times faster than traditional disk-bound databases and serve as the backbone in any scenario that requires fast response times when querying useful data and can be used to complement big data applications. DDR3 SDRAM (Double Data Rate Synchronous Dynamic Random Access Memory) technology memory DIMMs (Dual In-line Memory Module) are available in different capacities and speeds. The speed of a memory module is often referred to as memory frequency and is denoted using MegaHertz (MHz). Memory frequency does have a direct relationship with memory performance, thus as the memory frequency increases, so does the memory performance. DRAM is, however, only one piece of the pie for achieving optimal memory subsystem performance. A memory controller is needed to manage the memory subsystem and different population rules governing the memory controller will affect the frequency/ speed and latency a memory module can addressed at. Newer generation memory controllers are embedded into the processors for best performance but require attention as some memory controllers can only run the memory subsystem at a maximum memory bandwidth of 800MHz. Best Practices Server: Performance Benchmark Using the Intel® Romley platforms’ available 24 DIMM (Dual Inline Memory Module) sockets connected to the Intel® Xeon® E5 family memory subsystem, we can gauge the sustained memory bandwidth in different memory configurations using SiSoft Sandra 2012 integrated STREAM memory benchmark with different memory channel population and memory clock speeds. [2] The Intel® Xeon® E5 family features numerous performance improvements over the previous generation of Xeon® 5500 and Xeon® 5600 Server processors, including two important performance related upgrades discussed in this paper, quad channel memory addressing and support for 1600 MHz (MegaHertz) DDR3 (Double Data Rate) memory speeds with faster 8 GT/s (GigaTransfers per second) QuickPath Interconnect (QPI) microarchitecture that benefits the connectivity bandwidth available for the reduced latency to the memory array. [3] Channel population performance Figure 1 Channel population performance measured using SiSoft Sandra 2012 Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with two Xeon E5-2665 2.40GHz processors and 64GB of memory (2 x KVR16R11D4K4/32 @1600 MHz) installed. CPU Hyperthreading and power saving features disabled. As seen in Figure 1, the performance of the memory subsystem increases near-linearly from the slowest configuration, a single memory channel populated on either Xeon processor memory controller by a single 8 Gigabytes (GB) DDR3 1600 MHz memory module; to the fastest, using a quad channel (also known as 1 DIMM per channel (DPC)) populated memory subsystem with four 8 GB 1600 MHz memory modules populating each memory socket in the first available memory bank of either processor. Best Practices Server: Performance Benchmark Even with the increased electrical load of a quad channel configuration (1 DPC), a near fourfold increase in memory subsystem performance to ~70 GB/s compared to a single channel configuration is observed, an ideal solution for applications requiring high performance for resource intensive applications such as IMDB. Memory frequency performance Figure 2 Relative memory frequency performance measured using SiSoft Sandra 2012 Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with two Xeon E5-2665 2.40GHz processors and 192GB of memory (2 x KVR16R11D4K4/32) installed. CPU Hyper-threading and power saving features disabled. In Figure 2 we utilize the same eight 8 GB DDR3 memory modules running at four different memory speeds (MHz) symmetrically across both Intel® Xeon® E5 family memory subsystems to achieve a balanced configuration and showing the best case performance at all memory speeds. Running the memory modules at 800 MHz we see the slowest performance with ~40 GB/s sustained transfer speeds measured using SiSoft Sandra 2012 integrated STREAM memory benchmark. As we scale the frequency higher we can see memory performance increase near-linearly up to the maximum of ~70 GB/s when running at 1600 MHz, ideal for scenarios where resources written to memory require the highest achievable performance to remain efficient. Best Practices Server: Performance Benchmark Memory capacities versus frequency performance Figure 3 Memory capacities versus frequency performance measured using SiSoft Sandra 2012 Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel® Romley platform S2600GZ with two Xeon E5-2665 2.40GHz processors and 192GB of memory (KVR16R11D4K4/32) installed. CPU Hyper-threading and power saving features disabled. To conclude our research into memory performance, in Figure 3 we look at the performance of a memory subsystem populated with 192GB of memory running at 1066 MHz versus a configuration using 128 GB and 64 GB, both running at 1600 MHz. Increased memory capacities running at the same 1600 MHz memory speeds using either 128 GB (16x 8GB) or 64 GB (8x 8GB) spread symmetrically across both memory subsystems shows approximately the same ~70 GB/s sustained performance. A larger, 192 GB memory capacity (24x 8GB), albeit running at a slower 1066 MHz, shows a negligible ~17 GB/s drop in sustained performance as a trade-off for an increased memory capacity. Conclusion Obeying the channel population rules specific to the server processor and memory controller allows us to easily strike the right balance in optimizing our memory for best performance using simple steps like populating all four memory channels, thus increasing memory performance up four times, increasing the ROI (Return on investment) while simultaneously reducing the TCO (Total cost of ownership) over the life-cycle of the server. Best Practices Server: Performance Benchmark References: [1] Predicting User Preference for Movies using NetFlix database, Department of Electrical and Computer Engineering Carnegie Mellon University http://users.ece.cmu.edu/~dbatra/publications/assets/goel_batra_netflix.pdf [2] SiSoft Sandra Q & A - Memory Benchmark, SiSoftware http://www.sisoftware.co.uk/?d=qa&f=ben_mem&l=en&a= [3] Intel® Xeon® Processor E5-2600 Product Family News Fact Sheet, Intel® http://download.intel.com/newsroom/kits/xeon/e5/pdfs/Intel_Xeon_E5_Factsheet.pdf ©2013 Kingston Technology Corporation, 17600 Newhope Street, Fountain Valley, CA 92708 USA. All rights reserved. All trademarks and registered trademarks are the property of their respective owners.  MKF-549