Preview only show first 10 pages with watermark. For full document please download

Techman Xc100 Nvme Ssd

   EMBED


Share

Transcript

TECHMAN Electronics TECHMAN XC100 NVMe SSD Technical White Paper v1.0 April, 2016 Techman reserves the right to change products, information and specifications without notice. Information in this document is provided in connection with Techman products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Techman's terms and conditions of sale for such products, Techman assumes no liability whatsoever and Techman disclaims any express or implied warranty, relating to sale and/or use of Techman products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Unless otherwise agreed in writing by Techman, the Techman products are not designed nor intended for any application in which the failure of the Techman product could create a situation where personal injury or death may occur. All brand names, trademarks and registered trademarks belong to their respective owners. Revision History Version 1.0 Date Apr, 2016 Author Ilong.Hsiao, Ted.Hsieh Approver Ilong.Hsiao Amendment Robert.Hsiao 1 Confidential TECHMAN Electronics Contents Overview Part 1: High Performance Hardware 1-1: Multi-core Computing 1-2: Multi-channel Flash Controller 1-3: Multi-queue Engines 1-4: Embedded XOR & Randomizer 1-5: Strong BCH ECC Part 2: Advanced NAND Flash Management 2-1: Bad Block Management 2-2: Read Disturb Policy 2-3: Data Retention Policy 2-4: Smart Read Retry Policy Part 3: Data Integrity Guarantee 3-1: End-to-end Data Protection 3-2: Adaptive RAID Data Protection 3-3: Thermal Throttling Protection 3-4: Power Loss Protection 3-5: Firmware & Metadata Protection Part 4: Intelligent Firmware Management 4-1: High Performance FTL 4-2: Global Wear Leveling 4-3: Efficient Garbage Collection 4-4: Fast Power-on Rebuild 4-5: TRIM support 4-6: Intelligent Write Flow Control 4-7: Intelligent Read Sequence Control Part 5: Dual Port for High Availability 2 Confidential TECHMAN Electronics OVERVIEW The digital universe is exploding. The data in the whole world is expected to reach 17 zettabytes in 2017 and 44 zettabytes in 2020, due to the emerging of IoT. Currently 90% of data on Earth is generated within the last 2 years. According to IDC, every 60 seconds there will be: 72 hours of video uploading to Youtube, 350 GBs of data generated on Facebook, 571 new websites created, 277,000 tweets on Twitter, 100 million emails sent, and over 2 million Google search queries happening. Either a Youtube broadcaster showcasing a live game streaming or a seismic professor analyzing the earthquakes’ data requires fast and stable process, e.g. consistent and low latency on IO requests. To fulfill such requirement, the server systems in charge of these processes must enhance the capability of both computing cores and storage devices. The traditional HDD is becoming a bottleneck of performance due to its extreme high latency. A PCIe SSD with the lowest latency second to DDR is the best option to avoid such IO performance bottleneck. To be ready for the era of High Speed Computing, such as Cloud service, Big Data Analysis, Online Transaction Process, High Frequency Financial Trading, the storage devices must be evolving simultaneously with the computing processors, to avoid becoming an obstacle of the whole performance. As a result, Techman SSD has decided to focus on Enterprise grade storage design and development. Based on PCI Express Gen3X4, Techman XC100 further extends on NVM Express (NVMe) protocol supporting higher volume of command sequences. With our optimized designs, Techman XC100 will guarantee high speed processing with very stable response time within a long operation period. Below chapters describe what technologies Techman XC100 has designed and delivered to achieve the highest speed performance with great consistency. Please enjoy it. 3 Confidential TECHMAN Electronics PART 1: HIGH PERFORMANCE HARDWARE  Multi-core Computing The evolution of SSD controller is very much similar to that of the whole computer industry. The more processors a SSD supports, the higher performance it delivers. To keep up with the system’s increasing performance, SSD must numerically increase the processors/cores accordingly. The controller adopted by XC100 supports 16-cores architecture. With these 16 cores, the commands/threads from host system can be processed in parallel with high speed. Among all these 16 cores, certain cores will have own dedicated managers, e.g., Boot Processor Core with ROM manager function, to handle functions specifically. Also all the threads will be processed via Inter-Process Communication (IPC) which transfers with high speed and allows information sharing. Furthermore, the SRAM and DRAM inside XC100 are also shared with all these 16 cores & threads. This will release more resources of the cache memory and CPU. Together with the 16-cores architecture, IPC, dedicated manager functions, and the shared cache design, multiple requests and commands from host system will be handled fast and efficiently to achieve high performance requirement.  Multi-channel Flash Controller For a SSD, the more flash channels it controls, the higher performance it delivers. XC100 supports up to 16 channels of NAND Flash control. XC100 controller will utilize all these 16 channels simultaneously during operations. All the Read/Program/Erase commands and data from host system will 4 Confidential TECHMAN Electronics be coordinated and distributed evenly, through XC100’s Flash Interface, to all these 16 channels. With Multi-channel Flash coordination & distribution, the Quality of Service (QoS) will be guaranteed.  Multi-queue Engines Today, deploying multiple high performance processors is a basic requirement not only to a server system but also to a personal computer. Also, thanks to the development of NVM Express (NVMe), the interface protocol now also can afford much, much more command queues sending from host to storage device than before. A storage device seemed to become a bottleneck of the overall performance. To fulfill the rapid & tons of IOs from the host, a storage device must be capable of handling these IOs with maximum speed. XC100 has already adopted a 16-cores controller and also supported NVMe protocol. In between controller and protocol, XC100 further designed the Multi-queue Engines to process these high speed & frequent IOs. The Multi-Queue Engines of XC100 include 1 Admin Queue, 128 Submission Queues, and 128 Completion Queues. And each Queue supports up to 1024 queue entries (1024 queue depth). IOs from multiple cores of host system will be first latched in Submission Queue Engine and then distributed to XC100’s multi-core controller for corresponding process. Once completed, the controller will submit a completion queue entry to Completion Queue Engine and a notification interrupt (doorbell) to host. Finally, the host will process the completion queue entry in Completion Queue Engine and release its resource. Simply put, these frequent IOs issued by multiple cores of host, 5 Confidential TECHMAN Electronics going through NVMe protocol, will be handled rapidly and efficiently with the Multi-queue engine of XC100.  Embedded XOR & Randomizer One of the interesting characteristics of NAND is that continuously storing some identical data pattern into the NAND flash will impact the data integrity and accuracy. To avoid such symptom, the Randomizer schemes with encryption can help. Well-randomized data pattern by randomizer stored in NAND Flash will reduce the data error during reading back. In addition, XC100 supports XOR Calculator, and XOR Engine to have flash-aware RAID functionality to provide extra protection to increase the data integrity. The XOR Calculator will calculate the parity information for Flash RAID stripe, and the XOR Engine will deliver high performance Flash RAID rebuild operations. 6 Confidential TECHMAN Electronics  Strong BCH ECC Regarding the error correction capability on BIT/BYTE level, XC100 has adopted the Bose-Chaudhuri-Hocquenghem (BCH) ECC scheme. This function supports the error correction capability up to 100 bits within 4320 Bytes of data. With such capability, XC100 can easily fulfill: (1) the 40bits/1000Bytes ECC requirement of TOSHIBA 15nm MLC adopted in XC100; (2) the UBER ≤ 10−16 requirement in JEDEC Enterprise SSD specification. 7 Confidential TECHMAN Electronics PART 2: ADVANCED NAND FLASH MANAGEMENT  Bad Block Management There are always some unhealthy cells in NAND flash memory, in nature or in nurture. These unhealthy cells usually are called “Bad Blocks”. Bad Blocks are no longer suitable to store any data. To avoid this, SSD must always monitor and record the healthiness of all Blocks, from the beginning until its life-end. There are 2 types of Bad Blocks: Original Bad Blocks (OBB) and Growth Bad Blocks (GBB). OBB are those who existed after SSD manufacturing process while GBB usually refer to those who are generated during SSD runtime operations. Process and functions are built in XC100 to well manage Bad Blocks. During SSD manufacturing, the BURN-IN process of XC100 will locate the Bad Blocks by scanning all cells in NAND. Along with those generated during Wafer & Package process of NAND vendor, all these OBB, before SSD shipping out, will be marked to avoid further usage by customers. Once XC100 started runtime operating on customers’ side, it will activate the real-time monitor function to mark and record those who encounter: (1) Block Erase Failures; (2) Page Program Failures. These 2 types of blocks will be categorized as GBB. Via such Bad Block Management, XC100 will mark and record all the possible Bad Blocks to assure the healthiness of the whole SSD during its life span.  Read Disturb Policy One of the most interesting characteristics of NAND flash memory is the Read Disturb Phenomenon. The cell’s electrons adjacent to the cell BEING READ will be 8 Confidential TECHMAN Electronics influenced, resulting into data loss in the adjacent cell. This is the so called “Read Disturb”. For example, when Reading cell B, NAND circuits will also apply 5V to its adjacent cell A & cell C. After reading cell B over 10,000 times or more, the data in cell A or cell C might not be read out any more. To avoid such phenomenon, XC100 will: (1) Monitor and record Read counts information of each block; (2) Detect error bits of the Read block; (3) Refresh block with Garbage Collection function (GC) based on information of (1) & (2). With these operations, XC100 will no longer encounter Read Disturb phenomenon.  Data Retention Policy Retention phenomenon is another interesting characteristic of NAND flash. Under some conditions and circumstances, e.g., high temperature environment and power-off, the data inside the NAND might disappear after a period of time. The root cause is that there will be charge leakage from floating gate every time after page program. And the Program/Erase cycles (P/E cycles) also will influence the retention time. When the P/E cycles of certain cells reached its limitation, e.g., life end, usually specified in NAND spec., the data retention time of SSD must fulfill the requirement defined in JEDEC: For client grade MLC, 30℃ for 1 year; for enterprise grade MLC, 40℃ for 3 months. 9 Confidential TECHMAN Electronics So how to make sure SSD data retention time fulfill JEDEC requirements? In XC100, we will: (1) Monitor retention period for each block; (2) Detect bit error rate of the Read block; (3) Refresh block with GC function based on information of (1) & (2). Via these operations, XC100 assures that the data retention will meet JEDEC’s specification.  Smart Read Retry Policy The NAND flash cell quality will become worse due to all operations, e.g. P/E cycling, Read/Write disturbance, Retention, and Temperature. Also the cell voltage distribution will shift, which means the Read threshold voltage (Vth) might require adjustment to determine if it’s 0 or 1. Under such circumstances, the Read process might require some retries to complete. However, as long as Read Retry occurs, it will impact SSD’s performance. So, it’s an important task for SSD designers to minimize the retry impact. XC100 supports Smart Read Retry scheme to protect data integrity. Our scheme includes: (a) Apply fast and adjustable Vth setting to read back data even with error bits. (b) Apply previous Vth as an Optimal value to reduce the retry time and latency. (c) Refresh data block once error bits exceeding preset limits. (d) Flash-level RAID function is the last resort of data recovery. Via this scheme, the Read Retry will not influence the overall performance of XC100. 10 Confidential TECHMAN Electronics PART 3: DATA INTEGRITY GUARANTEE  End-to-End Data Protection Data integrity is extremely important for both service providers and service users. To protect the data in storage device, XC100 supports End-to-End Data Protection to maintain data accuracy and integrity. The End-to-End Data Protection function adopted by XC100 includes: (a) Support protections similar to T10-DIF/DIX specifications (b) Support XTS-AES-256 data encryption (c) XOR data protection on DRAM. The DRAM bus contains 64 bits of data and 8 bits of data ECC (d) Support BCH ECC with 4176 bytes of data and 200 bytes of parity data (e) Support Flash-based RAID protection With such End-to-End Data Protection, the data on every path within SSD will be integrity-guaranteed.  Adaptive RAID Protection End-to-End Data Protection support BIT/BYTE level data protection. For Page/Block protection, XC100 adopted the Adaptive RAID Protection. This protection is similar to RAID-5 protection on device level, except that it is operating over all flash channels by XC100. The concept is to store parity information in 1 randomly-selected page among all n+1 pages. Since the parity page is distributed, the protection of RAID-5 scheme is similarly activated. Furthermore, the RAID stripe size is not fixed. XC100 will dynamically adjust the stripe size once Bad Block symptom occurred. However if the quantities of Bad Block 11 Confidential TECHMAN Electronics are over 8, XC100 will mark the corresponding stripe as a bad one and activate the refresh function accordingly.  Thermal Throttle Protection All electronic devices generate heat, so does a SSD. With high performance request, the temperature of a PCIe/NVMe SSD operating at full speed performance will certainly ramp up very rapidly. Thanks to the air-flow design, such phenomenon seldom occurs in a well-ventilated system. However a good designer must hope for the best and prepare for the worst. Therefore, XC100 adopted the Thermal Throttle Protection to prevent any possible thermal damages to the SSD device. There are 3 preset temperature thresholds in XC100 design. When the embedded thermal sensor exceeds each of these thresholds, XC100 will throttle the data transfer rate at its corresponding level to decrease the heat generation. Once the internal mercury goes below the threshold, the limitation will be dismissed. 12 Confidential TECHMAN Electronics  Power Loss Protection The data integrity in an Enterprise system is critically important even when encountering the unexpected power shutdown. A system or a storage device must guarantee the data integrity by all means. XC100 has designed a function, Power Loss Protection (PLP), to avoid data loss when ungraceful power shutdown occurs. With PLP, XC100 can operate normally with a limited period of time without original power source. The concepts are depicted in figure below. (1) In normal mode (green path), XC100 operates on normal power source while the PLP capacitors will be fully charged as a backup power source. (2) In abnormal mode (red dotted path), XC100 will open the switch (SW), activating the previously fully charged PLP capacitors as the backup power source, to keep XC100 operating normally for a short while. The backup power must supply long enough for a SSD to flush all important data into NAND flash. Thus, the PLP function must be well-designed and optimized with all other SSD functions, such as FTL, WL, & GC, etc., to prevent data loss.  Metadata & Firmware Protection Metadata mainly includes (I) FTL table info (II) Wear Leveling info (III) Write/Read/Erase counts info of every block (IV) Bad and Free blocks info (V) Firmware info. This means, other than user data, metadata contains lots of extremely important information. To protect metadata and firmware, XC100 adopts 2 schemes: (1) Pseudo SLC mode (2) Multi-copy backup. (1) Pseudo SLC (pSLC) is a transformation from MLC to SLC. As we all know, the SLC NAND has much better endurance (P/E cycle ≈ 60,000) and faster processing time 13 Confidential TECHMAN Electronics than MLC NAND (P/E cycle ≈ 3,000). By configuring some MLC blocks into pSLC mode, XC100 will extend the P/E cycles of these pSLC blocks to 30,000. Thus, the Metadata and Firmware in pSLC blocks will be much more intact, and also much faster to process. (2) Multi-copy concept is depicted as below figure. By distributing Metadata into different LUNs & different Blocks, XC100 can still operate normally even if errors occurred in some copies of Metadata. As for Firmware Protection, since the NVMe 1.1 protocol has defined the SLOTs concept for storing firmware image, XC100 has accordingly adopted this multi-slots scheme to store the firmware images. There can be 3 versions of firmware images stored in XC100 and each version will have another 3 backup copies (below figure). Furthermore, all these firmware images are distributed into LUNs which is similar to the concept of Metadata Protection. Via such protection, the firmware is intact even encountering ungraceful power-off. 14 Confidential TECHMAN Electronics PART 4: INTELLIGENT FIRMWARE MANAGEMENT  High Performance FTL Under some operations, such as Garbage Collection, SSD will move the valid user data from the going-to-be-erased block to other locations without notifying users. This means that the Physical Block Address (PBA) of these valid data have been changed while its Logical Block Address (LBA) remained the same. Such operations require the Flash Translation Layer (FTL) function which monitors and records the mapping relations between LBA & PBA. Moreover, due to the frequent IO commands from the host to SSD, the FTL will be updated rapidly & frequently. This means that the performance of the FTL will heavily influence the overall SSD performance. XC100 has designed and support an optimized and high speed Flash Translation Layer (FTL) scheme. The scheme includes features: (1) High speed direct mapping between LBA & PBA; (2) (3) (4) (5) 4K data based mapping, which is the most utilized file size of various OS; Put FTL in S/DRAM for fast & frequent update operations; Optimized with WL & GC for better endurance & lower latency; Periodical-saving Snapshot algorithm for balancing system performance and faster rebuild time; Via these intelligent designs and detailed verifications, XC100’s data mapping function, FTL, will be operating not only with high speed but also with consistency.  Global Wear-Leveling Regarding the unique behaviors of NAND flash: Programming is Page-based; Erasing is Block-based; A block consists of many Pages; Erase must be prior to Program; P/E cycle has limitations; Read will induce Disturbance; Data has Retention limits, etc. Almost every process applied to NAND cell will impact its life span. To avoid unevenly usage of NAND cells, the “Wear Leveling (WL)” must be adopted and carefully-designed. Wear Leveling is the function that will calculate the P/E counts of all blocks and move user data from block to block to assure that all blocks are evenly used, e.g. 15 Confidential TECHMAN Electronics with similar P/E cycles. Needless to say, such actions will involve FTL & GC mentioned previously. Again, WL, GC, & FTL functions must all be well-designed and together-optimized to avoid impacting SSD performance. Data from host can be roughly separated into 2 categories: Hot Data & Cold Data. Hot Data are very frequently updated while Cold Data might not be updated for a very long time. Based on these 2 categories, there are as well 2 corresponding types of WL in XC100: (1) Dynamic WL: Mainly applied on Hot Data. XC100 will dynamically prioritize those blocks with minimum P/E counts to store Hot Data. Via Global FTL, the original PBA of the Hot Data will be marked as invalid, waiting for Garbage Collection (GC) function to collect, erase, and release it. (2) Static WL: Mainly applied on Cold Data. As previously mentioned, XC100 will monitor the P/E counts of all blocks. Once the P/E counts of Cold Data block are the minimum, XC100 will activate Static WL, forcing this Cold Data to move to other area and release it. XC100’s outstanding endurance specifications (3 and 7 DWPD) simply indicate the well-design of WL scheme and its optimization via GC and FTL.  Efficient Garbage Collection There are many versions of Data stored in NAND flash. However there will be ONLY ONE up-to-date version while others are out-of-dated. These out-of-date data are usually referred to invalid data. Garbage Collection (GC) is to retrieve these invalid data and release free space for further usage. However, too frequent GC operations will increase the overhead and P/E cycles, impacting the overall performance and endurance. Also, GC is always operating in the background. If GC is not well-designed, along with FTL & WL, the host commands & device responses will be severely influenced. GC operations of XC100 include: (1) Select GC target block; (2) Acquire all information of valid/invalid pages of GC blocks; (3) Select free blocks for GC destination; (4) Copy valid data to destination block, leaving only invalid data on GC target block; (5) Erase GC target block to release it free. With unique and smart selection algorithm of GC blocks, GC scheme of XC100 has been optimized efficiently with FTL & WL to reach the highest performance. 16 Confidential TECHMAN Electronics  Fast Power-On Rebuild Comparing to HDD, SSD is much faster not only on operating but also on rebooting. It’s because the native process speed of NAND is much higher than HDD. However, there are still some topics a good SSD design must cover, e.g. the FTL rebuild speed during power-on. When power-on (rebooting), SSD must first acquire and rebuild all information of FTL then upload it onto DRAM for the upcoming process. When shutting down power gracefully, the system will wait for SSD until it flushes the complete FTL on DRAM back to NAND. With complete FTL in NAND, power-on process will be very fast. But if encountering abnormal power-off, the FTL on DRAM usually will be lost immediately. Thus, SSD need to scan every page in every block to rebuild the complete FTL during power-on. Such “scan-everything” process will take much longer time than normal case. In XC100, the Snapshot scheme with some special designs was adopted to avoid long power-on duration. When normal operations: (1) Snapshot function will periodically save and update the FTL data back to NAND; (2) Optimize update frequency of Snapshot to avoid performance impact; (3) Store Snapshot data in pSLC area for better endurance & faster access; When power-on: (4) Retrieve information from the latest Snapshot data first; (5) Scan the non-updated data, if any, to retrieve the rest information; (6) Rebuild the whole FTL with (4) & (5). With such Snapshot scheme, XC100 assures that the power-on rebuild is fast for both normal and abnormal power-off.  TRIM Command Support Unlike HDD which allows overwritten process, SSD must first Erase the flash cell before Programming data into it. In this case, SSD must activate WL to determine which block contains the most invalid data and use it as the data target block. Furthermore, the host system usually will not reveal to SSD which data (LBA) is no 17 Confidential TECHMAN Electronics longer valid. As a result, the more invalid data, the less free NAND space. To release free NAND space, GC will be activated. Once activating GC, the performance will start decreasing gradually. The TRIM command is to get rid of such inconveniences. The host can issue TRIM to SSD, indicating which data (LBA) is no longer valid. Then SSD can activate GC, operating in the background, to collect and erase these invalid data to release more space. Thus the SSD performance will be sustaining at a certain level instead of continuous decreasing.  Intelligent Write Data Flow Control XC100 has designed an intelligent scheme each for Read & Write data flow management. For Write flow management, by treating GC data and host data both as input, XC100 will adaptively control and adjust the balance of these 2 data to keep XC100 performance consistent while maintaining the sufficient free blocks.  Intelligent Read Sequence Control For Read flow management, XC100 adopted the Re-scheduler function to rearrange the command sequences in order to simultaneously utilize as many channels as possible. The advantages of such function are: (1) Read commands will not be jammed on flash channels; (2) Read latency will be much lower with the Pending Queue mechanism of Re-scheduler function. 18 Confidential TECHMAN Electronics PART 5: DUAL PORT SUPPORT The PCI express bus can support 2 lanes, 4 lanes, 8 lanes, and 16 lanes. The more lanes it support, the more data it allows. Why would a PCIe storage device with 4 lanes decide to downgrade itself to 2 lanes? The answer is for High Availability. High Availability is to ensure a certain degree of operational continuity during a given measurement period. HA will avoid Single Point of Failure (SPOF). A system with HA will provide: (1) A certain amount of uptime; (2) Access of critical functions of the system; (3) Redundancy. For example, there is only one server system with one XC100 providing service to customers. If failure occurs with this server, its service must be shut down for further repair. Although the data itself is intact, customers still have to wait until finished repairing. This is known as SPOF. If connecting 2 server systems to one single XC100, when one of these two paths has failure, the other one will take over the failed one’s jobs with the same data storage, e.g. XC100, to continue working for a certain period of time. As a result, service users will not encounter any downtime phenomenon while the system maintainer could repair the failed one in the meantime. Currently Techman design and validation teams are working together with our server partners on such Dual Port feature evaluations. Estimated in early June, Techman SSD will introduce our Dual-port-featured series, XC200, to the markets. 19 Confidential