Preview only show first 10 pages with watermark. For full document please download

Rapidio For Low Latency Servers And Wireless Base Station

   EMBED


Share

Transcript

RapidIO for Low Latency Servers and Wireless Base Station Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council Oct 2013 ©2013 Integrated Device Technology, Inc. IDT’s Presentation Agenda ● Server Topologies with RapidIO ● Wireless Network Evolution ● RapidIO Gen2 building blocks in Production Connecting Wireless Today Converging with Server tomorrow www.IDT.com PAGE 2 RapidIO Overview RapidIO Switch RapidIO Switch • Today: 6.25Gbps/lane - 20 Gbps/port embedded RapidIO i/f on processors, DSPs, FPGA and ASICs. • 10Gbps/lane in development (10xN) • 25Gbps/lane (25xN) next generation • No NIC needed with embedded fabric interface • Hardware termination at PHY layer: 3 layer protocol • Lowest Latency Interconnect ~ 100 ns • Inherently scales to large system with 1000’s of nodes FPGA CPU DSP •Over 6 million RapidIO switches shipped •Over 30 million 10-20 Gbps ports shipped > Ethernet (10GbE) •100% 4G interconnect market share •60% 3G, 100% China 3G market share www.IDT.com PAGE 3 Wireless topologies: moving to Data Center and HPC Computing Topologies: 20 – 100 Gbps • Micro Server • Blade server • Super computers • Storage RapidIO Switch RapidIO Switch FPGA CPU RapidIO Switched Backplane DSP ••• •••• Wireless , Video, Military, Imaging Embedded Topologies Up to 20 Gbps • Wireless Base Station • Video Conferencing • Imaging • Mil / Aero www.IDT.com RapidIO Switch PCIe – S-RIO PCIe – S-RIO RapidIO Switch ARM CPU x86 CPU ARM Storage CPU SSD DISK Compute Node PAGE 4 IDT SSD Controller CPU nVidia Compute Node RapidIO Switch Storage Node 2x Energy Efficiency of top Green 500 HPC Features Compute = 6.4 Gflops/W Switching = 2.2 Tbps/Chassis ● “Green 500” #1: 3.2 GFlops/Watt (June 2013) Successful Industry Collaboration ● Open www.IDT.com Standard: Interoperable PAGE 5 Uniform Interconnect: RapidIO Connected Data Centers ● Today: 80% North-South traffic in bulk server ● Compute and Analytics = East West “any node to any node traffic” ● Desire for high bandwidth fat pipe interconnect fabric inside the box (already done in wireless) ● Reduce energy, power and latency associated with NIC’s and adaptors 100 ns 1 hop switching www.IDT.com PAGE 6 What Is/Was a Server Memory Yesterday Processor Workstation (Processor) + LAN Connection (Ethernet) Scale with cabling and Ethernet / IB Switching Box Ethernet NIC Today Today Multiple Processors + NIC + Ethernet Scale across backplane (Blade) Large scale out with Ethernet / IB top of rack and cabling Inter-rack latency: 100+ us Scale: 10s of racks Ethernet Switch •20 years, no change in architecture •We just took LAN architecture and crunched them down •More processing in smaller form factors, but not great www.IDT.com PAGE 7 Dell: Pushing Ethernet Outside of Rack Ethernet In and Outside rack Ethernet is Outside only Unified Interconnect No NICs Disparate Fabrics Embedded Interconnect inside Rack For Processor to Processor www.IDT.com PAGE 8 RapidIO vs. Ethernet for Processor Acccess Ethernet LAN Interconnect Processor X86 35x35 PCI Express Gen2 South Bridge 27x27 PCI Express Gen2 @ 16 Gbps 10 Gb Ethernet Ethernet NIC 25x25 Form Factor 10x RapidIO Direct Processor Interconnect Processor ARM Or PPC Integrated RapidIO Long Latency Path Latency (ns) 20 Gbps RapidIO Form factor 1x Board Density factor Incremental Discrete Devices Interconnect Power (W) Ethernet >1000 10x 2 ~10 RapidIO <100 1x 0 >0.25 Short Latency Path This is why Dell is pushing Ethernet out of the Box Too many adaptors, too much latency, too much power www.IDT.com PAGE 9 3 Layers Terminated in Hardware vs. TCP Offload ● RapidIO 3 layers are terminated in hardware, offloading the processor of protocol termination tasks ● The end result is: ● Reduction in latency ● Better Throughput ● Lower Power ● Frees up processor cycles ● Higher performances systems ● TCP offload can use multiple cores in an ARM or Xeon class processors when terminating multiple TCP sessions. www.IDT.com RapidIO has zero Protocol Offload penalty Ethernet TCP Offload can use multiple cores PAGE 10 Why RapidIO for Low Latency Bandwidth and Latency Summary System Requirement Switch per-port performance raw data rate 20 Gbps – 40 Gbps Switch latency 100 ns End to end packet termination Sub 1 us Hardware Failure Recovery 2 us NIC Latency (Tsi721 PCIe2 to S-RIO) 300 ns Messaging performance www.IDT.com RapidIO Excellent PAGE 11 Scaling with RapidIO Switching ● Proven architecture from embedded systems ● Scalable to 64k nodes ● Scalable to 100s of racks ● Inter-rack latency < 5us ● 20 Gbps in production bandwidth per link today ● No NIC, low latency path with embedded fabric interface Server Blade with native RapidIO Processors ARM ARM ARM ARM CPU CPU CPU CPU ARM ARM ARM ARM CPU CPU CPU CPU ARM ARM ARM ARM ARM ARM CPU CPU CPU CPU CPU ARM ARM CPUARM ARM ARM ARM CPU CPU CPU CPU CPU ARM ARM CPUARM ARM ARM ARM CPU ARM CPU CPU CPU CPU ARM CPU ARM PPC CPUARM PPC PPC PPC ARM CPU CPU CPU CPU CPU CPU ARM CPU ARM CPU RapidIO Switch CPU ARM CPU ARM ARM ARM CPU ARM ARM CPU CPU CPU RapidIO Switch ARM ARM ARM CPU CPU CPU ARM ARM CPU CPU RapidIO Switch CPU ARM ARM CPU ARM CPU ARM CPU ARM CPU RapidIO Switch ARM ARM ARM CPU CPU CPU ARM CPU ARM ARM CPU RapidIO Switch ARM CPU ARM CPU CPU ARM CPU ARM CPU ARM CPU ARM RapidIO CPU ARM Switch ARM ARM ARM CPU CPU CPU CPUARM CPU CPU ARM ARM ARM CPU ARM ARM ARM CPU ARM CPU CPU CPU CPU CPU ARM ARM CPU ARM ARM ARM CPU CPU CPU CPU ARM CPU ARM ARM ARM ARM ARM CPU CPU CPU CPU CPU CPU ARM ARM ARM ARM CPU PPC CPU latency www.IDT.com scalability PAGE 12 CPU PPC CPU CPU PPC CPU CPU PPC RapidIO Backplane Switch 20-80 Gbps Per port CPU Low power Scaling PCIe with RapidIO ● X86 processors lead market for performance in terms of Gflops, but lack easy scale out for clusters ● Performance is key in: High Performance Computing, Server, Imaging, Wireless, Aerospace and other embedded applications ● The same applications need the performance of RapidIO Interconnect for: ● ● ● Peer to peer networks with scalability Lowest system power with protocol terminated in Hardware Lowest end to end packet latency 6U Compute Node with 2x Intel I7 in production PCIe to RapidIO NIC Attributes 13x13mm, 2W, hard terminated, 20 Gbaud per port, $49 www.IDT.com PAGE 13 Scaling PCIe with RapidIO: Atom based x86 Server • Easily scale PCIe by using S-RIO switching and PCIe to S-RIO NIC devices • Small form factor, low power NIC. 13x13 mm with 2 Watts • Total power for 8 nodes = 23W (typ.) • NIC latency 600 ns, switch latency 100 ns, superior to Enet and Infiniband NIC based scaling X86 PCIe CPU X86 PCIe CPU X86 PCIe CPU X86 CPU PCIe PCIe to S-RIO PCIe to S-RIO PCIe to S-RIO PCIe to S-RIO PCIe to S-RIO RapidIO Switch PCIe to S-RIO PCIe to S-RIO PCIe to S-RIO PCIe X86 CPU PCIe X86 CPU PCIe X86 CPU PCIe X86 CPU Server Blade with x86 Processor and RapidIO Low latency, High Density, x86 Computer nodes www.IDT.com PAGE 14 48 card/96 processing nodes (Two x86 per card): Power, Cost, Latency RapidIO (x86) (compute switch + NIC+ central) 10G Ethernet PCIe 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 (central +NIC) (compute node switch + HBA) Switch Only Power 0.33W / 10G 1.25W / 10G 1.25W / 10G Aggregate Power 96 Nodes 0.384KW 0.424KW 0.834 KW Interconnect Bandwidth per card 160 Gbps 20 Gbps 84 Gbps Interconnect Gbps per Watt 20 Gbps/W 2.26 Gbps/W 4.84 Gbps/W (NIC/HBA + % switching) @ 10k volume (public pricing) $39 $257 $122 $ per Gbps $0.65/Gbps $25.62/Gbps $2.88/Gbps Latency Sub 1 us >10-30 us >5-10 us (NICs, HBAs, Local Switch, Central) Cost per node (End to End, Any to Any) Power <25% Interconnect 4-8x Cost <75% 15 www.IDT.com PAGE 15 Applications: Wireless and Server Convergence ©2013 Integrated Device Technology, Inc. Evolution of Bandwidth More processing load per base station GSM x1 2G www.IDT.com Smart Phone Tablet x400 x40 4G 3G PAGE 17 Why RapidIO 10xN Convergence of Wireless and Cloud More Bandwidth, More Users, Moore’s Law Does not Scale with Network We need faster interconnect: RapidIO Embedded 40G CPRI Fiber Connections Mobile Carrier Core Infrastructure Centralized BBU Pool S-RIO based Servers 10Gb S-RIO Remote Radio Heads 40Gbps 2G www.IDT.com 40Gbps 3G PAGE 18 4G Base Station with Caching Micro Server Gen2 Base Station Base Station 20 Gbps Per port S-RIO RapidIO switch 20 Gbps Per port S-RIO 20 Gbps Per port S-RIO 20 Gbps S-RIO Per port to front panel www.IDT.com Micro Server CPRI FPGA enabled with S-RIO Micro Server Freescale Freescale MSC 8156 Freescale MSC 8156 DSP MSC DSP8156 with DSP DSP S-RIO RapidIO switch Processor with PCIe2 – S-RIO2 NIC x86 PCIe2 – S-RIO2 NIC x86 S-RIO Ethernet to backhaul PAGE 19 20 Gbps S-RIO Per port to front panel CPU CPU Wireless Micro Server Attributes ● Seamless Integration with Base Station via RapidIO front Panel ● Processors can snoop packet up to layer 7 in look aside mode and cache repeated content Base Station Micro Server locally ● Eliminates redundant traffic from backhaul ● Reduces overally system power consumption ● Allows for ease of data intensive traffic to WiFi co located with 4G base stations. This reduces the overall load on licensed spectrum ● Scalable Solutions for and base station. Simply stack more pizza boxes with S-RIO interconnect. Option for adding S-RIO local switch box ● ● Support Green Initiatives, with overall reduced energy footprint in datacenter with S-RIO Superior end to end packet latency, throughput and fault tolerance www.IDT.com PAGE 20 Micro Server RapidIO switch PCIe2 – S-RIO2 NIC x86 PCIe2 – S-RIO2 NIC x86 20 Gbps S-RIO Per port to front panel CPU CPU RapidIO with x86 or ARM Server Architectures ● Supports both x86 and ARM based Architectures ● ● Higher performance Servers x86 CPU with RapidIO Data movement - ARM CPU with power efficient low cost direct RapidIO Interconnect Base Station Micro Server ● Best-in-class end-to-end latency ● Switch latency around 100 ns ● Supports Micro Server Secured Virtualization ● Supports any kind of topology ● Start, Mesh, Dual-star, Hypercube, Torus etc.. PCIe2 – S-RIO2 NIC RapidIO switch CPU CPU www.IDT.com PAGE 21 CPU ARM ARM Superior to PCIe to Infiniband and PCIe to Ethernet NIC options which are Not useable in wireless x86 20 Gbps S-RIO Per port to front panel S-RIO 10xN : 4G Multi Mode Base • • • • • • • • RapidIO connects WCDMA and 4G LTE base stations • Every call, app download, email, & web page 20G RapidIO is in production now LTE and data usage causing drive for more interconnect bandwidth Moore’s Law cannot keep up. Need more processors in systems. Need more intra processor interconnect speed at low latency Backplane/Inter chassis scaling: RapidIO @ 40 Gbps Baseband Subsystem: @ 20-40 Gbps ASICs for WCDMA and CDMA when doing multi mode www.IDT.com CPRI 40 Gbps S-RIO Per port to Backplane/ front panel 40 Gbps Per port S-RIO RapidIO 10xN switch 5-10Gbps Per port S-RIO 40 Gbps Per port S-RIO PAGE 22 Freescale Freescale MSC 8156 MSC 8156 DSPDSP DSP CPRI ASIC for legacy Protocols Processor S-RIO Ethernet to backhaul Building Blocks with RapidIO Gen2 ©2013 Integrated Device Technology, Inc. IDT RapidIO Gen2 Portfolio Highlights • RapidIO Gen2 supports 1.25, 2.5, 3.125, 5, 6.25 Gbaud • 100 cm 2 connector reach for backplane support • Backward compatible with 1.3 switches and endpoints • 240 Gbps highest performance backplane switches in embedded industry • PCIe2 to S-RIO2 protocol conversion RapidIO 2 Switches and Bridges in Production in: Wireless , Military, Industrial and Video bridge in development to expand RapidIO ecosystem options • RapidIO 2 Endpoint IP available www.IDT.com PAGE 24 CPS Gen 2 Overview CPS-1848 12x20 Gbps 18x 10 Gbps 18x 5 Gbps 29x29 FCBGA • Designed to S-RIO v1.3 and 2.1 • Up to 48 lanes –12x4, 18x2, 18x1 • Up to Full duplex 240 Gbps non blocking bandwidth • Supports all RapidIO speeds: 1.25, 2.5, 3.125, 5, 6.25 Gbaud • Cut through latency 100 ns • 40% per 10 Gbps power reduction vs S-RIO 1.3 • Several switch fabric related patents filed CPS-1432 8x20 Gbps 14x 10 Gbps 14x 5 Gbps 25x25 FCBGA CPS-1616 4x20 Gbps 8x 10 Gbps 16x 5 Gbps 21x21 FCBGA www.IDT.com Detailed Features ● High-Performance SerDes ● Long reach 100 cm 2 connector with DFE support ● Transmit Pre Emphasis and receive Equalization ● As low as 300 mW per 10 Gbps of data ● Dynamic ingress and egress buffers management: ● 40 multicast groups per port ● Supports cut-through and store & forward ● Error management extension support ● Error Log: sequence of events in time ● packet mirror, trace, filter support ● Receiver and transmitter based flow control ● Per port reset mode robust support for hot swap ● Multicast event control symbol generation input pin PAGE 25 Tsi721: PCIe to Serial RapidIO Bridge PCIe to S-RIO Tsi721 x4, x2, x1 PCIe Gen 2 PCIe Gen 2 Endpoint www.IDT.com PAGE 26 S-RIO to PCIe Messaging Engine PCIe  S-RIO Bridge Mapping Engine S-RIO Gen 2 Endpoint x4, x2, x1 S-RIO Gen 2 FEATURES ● Gen 1 and Gen 2 support ● PCIe v2.1 ● S-RIO v2.1 ● PCIe to S-RIO bridging ● Non-transparent for transaction mapping ● 8 DMA and messaging engines ● Single port, x1/x2 or x4 @ 1.25, 2.5, 3.125, 5 Gbaud ● S-RIO 1.3 and 2.1 compliant ● PCI-express 1.1 and 2.1 compliant (End-Point) ● Can buffer up to 32 S-RIO max size packets ● Full line rate throughput for 64 byte and > packets ● Low Power ~ 1.5-2 W typical ● Power down unused lanes, when used in x1 or x2 ● Lane swap and polarity inversion support ● Reach support: 60 cm over 2 connectors ● S-RIO and PCIe endpoint compatible clocking options 100 MHz, 125 MHz 156.25 MHz ● Forward Bridge (must have microprocessor on PCIe side of the network) ● JTAG 1149.1 and 1149.6 ● 13x13mm FCBGA package ● Commercial and industrial variants Block DMA Engine JTAG I2C GPIO Clock RESET 13x13 FCBGA Connect PCIe Processors To S-RIO Networks For Superior Performance Over 10 GbE and Infiniband IDT RapidIO Roadmap 10xN CPS-1848 18x1, 18x2, 12x4 CPS-1432 Switch Products 14x1, 14x2, 8x4 S-RIO 10xN Switches CPS-1616 16x1, 8x2, 4x4 SPS-1616 16x1, 8x2, 4x4 Bridge Products Tsi721 PCIe2 to S-RIO2 Bridge PCIe3 to S-RIO 10xN Legend: Production Ecosystem Eval Boards Partner AMCs Specifications S-RIO Gen2 IP SRDP2 1848/1616 RapidIO Tsi721 PCIe to S-RIO Eval Board S-RIO 10xN IP 40 G per Port Linux SW RapidIO Gen 2 Modeling www.IDT.com Commagility Xilinx V6 AMC CPS-1848 TI Gen 2 DSP EVM AMC CPS-1848 S-RIO 10xN 10 / 20 / 40 / 80 / 160G Specification S-RIO 10xN Linux S-RIO 10xN Windows S-RIO 10xN VxWorks PAGE 27 Development Concept RapidIO 10xN in Wireless and C-RAN 2G • • • • • • • 3G 4G Today 100% of the 4G OEMs use RapidIO for baseband interconnect with over 6 million switches shipped 4G technologies are driving need for inter processor communication Wireless and Server worlds are converging Today RapidIO leads the market with 20 Gbps embedded fabric interconnect Tier 1 customers worldwide pushing IDT for 40Gbps IDT is developing S-RIO 10xN Switches, bridges and endpoint IP @ 40 Gbps per port Peer to Peer Scalable Interconnect for Wireless, Cloud, Imaging, Military, Industrial and Medical • 10.3125 Gbaud per lane, 40 Gbps per port embedded interconnect • 100 ns Latency with 5x effective bandwidth of 10 GigE for embedded systems • <300mW per 10 Gbps of data www.IDT.com PAGE 28 Backup Slides Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council Oct 2013 ©2013 Integrated Device Technology, Inc. Memory Memory Memory Memory Processor Processor Processor Processor PCI Express (I/O Interconnect) Ethernet NIC RapidIO Switch .. . Ethernet NIC Ethernet LAN Interconnect With NIC .. . Ethernet NIC .. . Ethernet NIC Ethernet Switch ETHERNET Long Latency NIC Required High Footprint Higher Power Software Intervention www.IDT.com RapidIO Direct Processor Interconnect RAPIDIO Short Latency NO NIC embedded Fabric Low Power Native Hardware Termination PAGE 30 Short Latency Path Long Latency Path RapidIO vs Ethernet Latency Path Scale PCIe with RapidIO: Two x86 Compute Computer Node with PCIe to RapidIO NIC and switching • 13x13 NIC, 25x25 32 Lane Switch • 2W per NIC, 4W per Switch • Total interconnect power 8W (typ.) • $49/NIC, $80/Switch • Total Interconnect per Compute Node $65 X86 PCIe CPU X86 CPU PCIe PCIe to S-RIO PCIe to S-RIO S-RIO S-RIO Switch Switch Low Energy, High Density 20 Gbps Infinite Scaling www.IDT.com PAGE 31 Compare to PCIe and Ethernet 2 Node Compute Micro Server Building Block with Ethernet only Micro Server Building Block with PCIe only X86 PCIe X86 CPU PCIe PCIe NTB Switch HBA CPU X86 PCIe CPU Ethernet Switch X86 CPU PCIe PCIe 10GbE NIC Ethernet Switch Scale out limitation with PCIe, Bandwidth and latency Limits with Ethernet www.IDT.com PAGE 32