Transcript
RapidIO for Low Latency Servers and Wireless Base Station
Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council
Oct 2013
©2013 Integrated Device Technology, Inc.
IDT’s Presentation Agenda ● Server
Topologies with
RapidIO ● Wireless
Network Evolution
● RapidIO
Gen2 building blocks
in Production
Connecting Wireless Today Converging with Server tomorrow
www.IDT.com
PAGE 2
RapidIO Overview RapidIO Switch
RapidIO Switch
• Today: 6.25Gbps/lane - 20 Gbps/port embedded RapidIO i/f on processors, DSPs, FPGA and ASICs. • 10Gbps/lane in development (10xN) • 25Gbps/lane (25xN) next generation • No NIC needed with embedded fabric interface • Hardware termination at PHY layer: 3 layer protocol • Lowest Latency Interconnect ~ 100 ns • Inherently scales to large system with 1000’s of nodes
FPGA
CPU
DSP
•Over 6 million RapidIO switches shipped •Over 30 million 10-20 Gbps ports shipped > Ethernet (10GbE) •100% 4G interconnect market share •60% 3G, 100% China 3G market share www.IDT.com
PAGE 3
Wireless topologies: moving to Data Center and HPC Computing Topologies: 20 – 100 Gbps • Micro Server • Blade server • Super computers • Storage
RapidIO Switch
RapidIO Switch
FPGA
CPU
RapidIO Switched Backplane
DSP
•••
•••• Wireless , Video, Military, Imaging
Embedded Topologies Up to 20 Gbps • Wireless Base Station • Video Conferencing • Imaging • Mil / Aero www.IDT.com
RapidIO Switch
PCIe – S-RIO
PCIe – S-RIO
RapidIO Switch
ARM CPU
x86 CPU
ARM
Storage CPU
SSD DISK
Compute Node PAGE 4
IDT SSD Controller
CPU
nVidia
Compute Node
RapidIO Switch
Storage Node
2x Energy Efficiency of top Green 500 HPC Features Compute = 6.4 Gflops/W Switching = 2.2 Tbps/Chassis ●
“Green 500” #1: 3.2 GFlops/Watt (June 2013)
Successful Industry Collaboration ● Open
www.IDT.com
Standard: Interoperable
PAGE 5
Uniform Interconnect: RapidIO Connected Data Centers ● Today:
80% North-South traffic in bulk server
● Compute
and Analytics = East West “any node to any node traffic”
● Desire
for high bandwidth fat pipe interconnect fabric inside the box (already done in wireless)
● Reduce
energy, power and latency associated with NIC’s and adaptors
100 ns 1 hop switching www.IDT.com
PAGE 6
What Is/Was a Server Memory
Yesterday
Processor
Workstation (Processor) + LAN Connection (Ethernet) Scale with cabling and Ethernet / IB Switching Box
Ethernet NIC
Today Today Multiple Processors + NIC + Ethernet Scale across backplane (Blade) Large scale out with Ethernet / IB top of rack and cabling Inter-rack latency: 100+ us Scale: 10s of racks
Ethernet Switch
•20 years, no change in architecture •We just took LAN architecture and crunched them down •More processing in smaller form factors, but not great www.IDT.com
PAGE 7
Dell: Pushing Ethernet Outside of Rack
Ethernet In and Outside rack
Ethernet is Outside only
Unified Interconnect No NICs
Disparate Fabrics
Embedded Interconnect inside Rack For Processor to Processor www.IDT.com
PAGE 8
RapidIO vs. Ethernet for Processor Acccess
Ethernet LAN Interconnect
Processor X86 35x35
PCI Express Gen2
South Bridge 27x27
PCI Express Gen2 @ 16 Gbps
10 Gb Ethernet
Ethernet NIC 25x25
Form Factor 10x
RapidIO Direct Processor Interconnect
Processor ARM Or PPC
Integrated RapidIO
Long Latency Path Latency (ns)
20 Gbps RapidIO
Form factor 1x
Board Density factor
Incremental Discrete Devices
Interconnect Power (W)
Ethernet
>1000
10x
2
~10
RapidIO
<100
1x
0
>0.25
Short Latency Path This is why Dell is pushing Ethernet out of the Box Too many adaptors, too much latency, too much power www.IDT.com
PAGE 9
3 Layers Terminated in Hardware vs. TCP Offload ● RapidIO
3 layers are terminated in hardware, offloading the processor of protocol termination tasks
● The
end result is:
● Reduction
in latency
● Better
Throughput
● Lower
Power
● Frees
up processor cycles
● Higher
performances systems
● TCP
offload can use multiple cores in an ARM or Xeon class processors when terminating multiple TCP sessions.
www.IDT.com
RapidIO has zero Protocol Offload penalty Ethernet TCP Offload can use multiple cores PAGE 10
Why RapidIO for Low Latency Bandwidth and Latency Summary
System Requirement
Switch per-port performance raw data rate
20 Gbps – 40 Gbps
Switch latency
100 ns
End to end packet termination
Sub 1 us
Hardware Failure Recovery
2 us
NIC Latency (Tsi721 PCIe2 to S-RIO)
300 ns
Messaging performance
www.IDT.com
RapidIO
Excellent
PAGE 11
Scaling with RapidIO Switching ● Proven
architecture from embedded systems
● Scalable
to 64k nodes
● Scalable
to 100s of racks
● Inter-rack
latency < 5us
● 20
Gbps in production bandwidth per link today
● No
NIC, low latency path with embedded fabric interface
Server Blade with native RapidIO Processors ARM ARM ARM ARM CPU CPU CPU CPU ARM ARM ARM ARM CPU CPU CPU CPU ARM ARM ARM ARM ARM ARM CPU CPU CPU CPU CPU ARM ARM CPUARM ARM ARM ARM CPU CPU CPU CPU CPU ARM ARM CPUARM ARM ARM ARM CPU ARM CPU CPU CPU CPU ARM CPU ARM PPC CPUARM PPC PPC PPC ARM CPU CPU CPU CPU CPU CPU ARM CPU ARM CPU RapidIO Switch CPU ARM CPU ARM ARM ARM CPU ARM ARM CPU CPU CPU RapidIO Switch ARM ARM ARM CPU CPU CPU ARM ARM CPU CPU RapidIO Switch CPU ARM ARM CPU ARM CPU ARM CPU ARM CPU RapidIO Switch ARM ARM ARM CPU CPU CPU ARM CPU ARM ARM CPU RapidIO Switch ARM CPU ARM CPU CPU ARM CPU ARM CPU ARM CPU ARM RapidIO CPU ARM Switch ARM ARM ARM CPU CPU CPU CPUARM CPU CPU ARM ARM ARM CPU ARM ARM ARM CPU ARM CPU CPU CPU CPU CPU ARM ARM CPU ARM ARM ARM CPU CPU CPU CPU ARM CPU ARM ARM ARM ARM ARM CPU CPU CPU CPU CPU CPU ARM ARM ARM ARM CPU
PPC CPU
latency www.IDT.com
scalability PAGE 12
CPU
PPC CPU
CPU
PPC CPU
CPU
PPC
RapidIO Backplane Switch 20-80 Gbps Per port
CPU
Low power
Scaling PCIe with RapidIO ●
X86 processors lead market for performance in terms of Gflops, but lack easy scale out for clusters
●
Performance is key in: High Performance Computing, Server, Imaging, Wireless, Aerospace and other embedded applications
●
The same applications need the performance of RapidIO Interconnect for: ● ● ●
Peer to peer networks with scalability Lowest system power with protocol terminated in Hardware Lowest end to end packet latency
6U Compute Node with 2x Intel I7 in production
PCIe to RapidIO NIC Attributes 13x13mm, 2W, hard terminated, 20 Gbaud per port, $49 www.IDT.com
PAGE 13
Scaling PCIe with RapidIO: Atom based x86 Server •
Easily scale PCIe by using S-RIO switching and PCIe to S-RIO NIC devices
•
Small form factor, low power NIC. 13x13 mm with 2 Watts
•
Total power for 8 nodes = 23W (typ.)
•
NIC latency 600 ns, switch latency 100 ns, superior to Enet and Infiniband NIC based scaling X86
PCIe
CPU
X86
PCIe
CPU
X86
PCIe
CPU
X86 CPU
PCIe
PCIe to S-RIO
PCIe to S-RIO
PCIe to S-RIO
PCIe to S-RIO
PCIe to S-RIO
RapidIO Switch
PCIe to S-RIO
PCIe to S-RIO PCIe to S-RIO
PCIe
X86 CPU
PCIe
X86 CPU
PCIe
X86 CPU
PCIe
X86 CPU
Server Blade with x86 Processor and RapidIO
Low latency, High Density, x86 Computer nodes www.IDT.com
PAGE 14
48 card/96 processing nodes (Two x86 per card): Power, Cost, Latency RapidIO (x86)
(compute switch + NIC+ central)
10G Ethernet
PCIe
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(central +NIC)
(compute node switch + HBA)
Switch Only Power
0.33W / 10G
1.25W / 10G
1.25W / 10G
Aggregate Power 96 Nodes
0.384KW
0.424KW
0.834 KW
Interconnect Bandwidth per card
160 Gbps
20 Gbps
84 Gbps
Interconnect Gbps per Watt
20 Gbps/W
2.26 Gbps/W
4.84 Gbps/W
(NIC/HBA + % switching) @ 10k volume (public pricing)
$39
$257
$122
$ per Gbps
$0.65/Gbps
$25.62/Gbps
$2.88/Gbps
Latency
Sub 1 us
>10-30 us
>5-10 us
(NICs, HBAs, Local Switch, Central)
Cost per node
(End to End, Any to Any)
Power <25%
Interconnect 4-8x
Cost <75% 15
www.IDT.com
PAGE 15
Applications: Wireless and Server Convergence
©2013 Integrated Device Technology, Inc.
Evolution of Bandwidth More processing load per base station
GSM
x1
2G www.IDT.com
Smart Phone
Tablet
x400
x40
4G
3G PAGE 17
Why RapidIO 10xN Convergence of Wireless and Cloud More Bandwidth, More Users, Moore’s Law Does not Scale with Network We need faster interconnect: RapidIO Embedded 40G CPRI Fiber Connections
Mobile Carrier Core Infrastructure
Centralized BBU Pool
S-RIO
based Servers
10Gb
S-RIO
Remote Radio Heads
40Gbps
2G www.IDT.com
40Gbps
3G PAGE 18
4G
Base Station with Caching Micro Server Gen2
Base Station
Base Station
20 Gbps Per port S-RIO
RapidIO switch
20 Gbps Per port S-RIO
20 Gbps Per port S-RIO
20 Gbps S-RIO Per port to front panel
www.IDT.com
Micro Server
CPRI
FPGA enabled with S-RIO
Micro Server
Freescale Freescale MSC 8156 Freescale MSC 8156 DSP MSC DSP8156 with DSP DSP S-RIO
RapidIO switch
Processor with
PCIe2 – S-RIO2 NIC
x86
PCIe2 – S-RIO2 NIC
x86
S-RIO
Ethernet to backhaul
PAGE 19
20 Gbps S-RIO Per port to front panel
CPU
CPU
Wireless Micro Server Attributes ●
Seamless Integration with Base Station via RapidIO front Panel
●
Processors can snoop packet up to layer 7 in look aside mode and cache repeated content Base Station Micro Server locally
●
Eliminates redundant traffic from backhaul
●
Reduces overally system power consumption
●
Allows for ease of data intensive traffic to WiFi co located with 4G base stations. This reduces the overall load on licensed spectrum
●
Scalable Solutions for and base station. Simply stack more pizza boxes with S-RIO interconnect. Option for adding S-RIO local switch box
●
●
Support Green Initiatives, with overall reduced energy footprint in datacenter with S-RIO Superior end to end packet latency, throughput and fault tolerance
www.IDT.com
PAGE 20
Micro Server RapidIO switch
PCIe2 – S-RIO2 NIC
x86
PCIe2 – S-RIO2 NIC
x86
20 Gbps S-RIO Per port to front panel
CPU
CPU
RapidIO with x86 or ARM Server Architectures ● Supports
both x86 and ARM based Architectures ● ●
Higher performance Servers x86 CPU with RapidIO Data movement - ARM CPU with power efficient low cost direct RapidIO Interconnect
Base Station Micro Server
● Best-in-class end-to-end latency ● Switch latency around 100 ns ● Supports
Micro Server
Secured Virtualization
● Supports any kind of topology ● Start, Mesh, Dual-star, Hypercube, Torus etc..
PCIe2 – S-RIO2 NIC
RapidIO switch
CPU
CPU
www.IDT.com
PAGE 21
CPU
ARM ARM
Superior to PCIe to Infiniband and PCIe to Ethernet NIC options which are Not useable in wireless
x86
20 Gbps S-RIO Per port to front panel
S-RIO 10xN : 4G Multi Mode Base •
• • • • • • •
RapidIO connects WCDMA and 4G LTE base stations • Every call, app download, email, & web page 20G RapidIO is in production now LTE and data usage causing drive for more interconnect bandwidth Moore’s Law cannot keep up. Need more processors in systems. Need more intra processor interconnect speed at low latency Backplane/Inter chassis scaling: RapidIO @ 40 Gbps Baseband Subsystem: @ 20-40 Gbps ASICs for WCDMA and CDMA when doing multi mode
www.IDT.com
CPRI
40 Gbps S-RIO Per port to Backplane/ front panel
40 Gbps Per port S-RIO
RapidIO 10xN switch
5-10Gbps Per port S-RIO
40 Gbps Per port S-RIO
PAGE 22
Freescale Freescale MSC 8156 MSC 8156 DSPDSP DSP CPRI
ASIC for legacy Protocols
Processor S-RIO
Ethernet to backhaul
Building Blocks with RapidIO Gen2
©2013 Integrated Device Technology, Inc.
IDT RapidIO Gen2 Portfolio Highlights • RapidIO Gen2 supports 1.25, 2.5, 3.125, 5, 6.25 Gbaud • 100 cm 2 connector reach for backplane support • Backward compatible with 1.3 switches and endpoints • 240 Gbps highest performance backplane switches in embedded industry • PCIe2 to S-RIO2 protocol conversion RapidIO 2 Switches and Bridges in Production in: Wireless , Military, Industrial and Video
bridge in development to expand RapidIO ecosystem options • RapidIO 2 Endpoint IP available www.IDT.com
PAGE 24
CPS Gen 2 Overview
CPS-1848 12x20 Gbps 18x 10 Gbps 18x 5 Gbps 29x29 FCBGA
• Designed to S-RIO v1.3 and 2.1 • Up to 48 lanes –12x4, 18x2, 18x1 • Up to Full duplex 240 Gbps non blocking bandwidth • Supports all RapidIO speeds: 1.25, 2.5, 3.125, 5, 6.25 Gbaud • Cut through latency 100 ns
• 40% per 10 Gbps power reduction vs S-RIO 1.3 • Several switch fabric related patents filed
CPS-1432 8x20 Gbps 14x 10 Gbps 14x 5 Gbps 25x25 FCBGA
CPS-1616 4x20 Gbps 8x 10 Gbps 16x 5 Gbps 21x21 FCBGA
www.IDT.com
Detailed Features ● High-Performance SerDes ● Long reach 100 cm 2 connector with DFE support ● Transmit Pre Emphasis and receive Equalization ● As low as 300 mW per 10 Gbps of data ● Dynamic ingress and egress buffers management: ● 40 multicast groups per port ● Supports cut-through and store & forward ● Error management extension support ● Error Log: sequence of events in time ● packet mirror, trace, filter support ● Receiver and transmitter based flow control ● Per port reset mode robust support for hot swap ● Multicast event control symbol generation input pin PAGE 25
Tsi721: PCIe to Serial RapidIO Bridge
PCIe to S-RIO
Tsi721
x4, x2, x1 PCIe Gen 2
PCIe Gen 2 Endpoint
www.IDT.com
PAGE 26
S-RIO to PCIe
Messaging Engine
PCIe S-RIO Bridge Mapping Engine
S-RIO Gen 2 Endpoint
x4, x2, x1 S-RIO Gen 2
FEATURES ● Gen 1 and Gen 2 support ● PCIe v2.1 ● S-RIO v2.1 ● PCIe to S-RIO bridging ● Non-transparent for transaction mapping ● 8 DMA and messaging engines ● Single port, x1/x2 or x4 @ 1.25, 2.5, 3.125, 5 Gbaud ● S-RIO 1.3 and 2.1 compliant ● PCI-express 1.1 and 2.1 compliant (End-Point) ● Can buffer up to 32 S-RIO max size packets ● Full line rate throughput for 64 byte and > packets ● Low Power ~ 1.5-2 W typical ● Power down unused lanes, when used in x1 or x2 ● Lane swap and polarity inversion support ● Reach support: 60 cm over 2 connectors ● S-RIO and PCIe endpoint compatible clocking options 100 MHz, 125 MHz 156.25 MHz ● Forward Bridge (must have microprocessor on PCIe side of the network) ● JTAG 1149.1 and 1149.6 ● 13x13mm FCBGA package ● Commercial and industrial variants
Block DMA Engine
JTAG
I2C
GPIO
Clock
RESET
13x13 FCBGA
Connect PCIe Processors To S-RIO Networks For Superior Performance Over 10 GbE and Infiniband
IDT RapidIO Roadmap
10xN
CPS-1848
18x1, 18x2, 12x4
CPS-1432
Switch Products
14x1, 14x2, 8x4
S-RIO 10xN Switches
CPS-1616
16x1, 8x2, 4x4
SPS-1616
16x1, 8x2, 4x4
Bridge Products
Tsi721 PCIe2 to S-RIO2 Bridge
PCIe3 to S-RIO 10xN
Legend: Production
Ecosystem Eval Boards Partner AMCs Specifications
S-RIO Gen2 IP
SRDP2 1848/1616
RapidIO
Tsi721 PCIe to S-RIO Eval Board
S-RIO 10xN IP 40 G per Port
Linux SW
RapidIO Gen 2 Modeling
www.IDT.com
Commagility Xilinx V6 AMC CPS-1848 TI Gen 2 DSP EVM AMC CPS-1848
S-RIO 10xN 10 / 20 / 40 / 80 / 160G Specification
S-RIO 10xN Linux S-RIO 10xN Windows S-RIO 10xN VxWorks
PAGE 27
Development Concept
RapidIO 10xN in Wireless and C-RAN
2G •
• • • • • •
3G
4G
Today 100% of the 4G OEMs use RapidIO for baseband interconnect with over 6 million switches shipped 4G technologies are driving need for inter processor communication Wireless and Server worlds are converging Today RapidIO leads the market with 20 Gbps embedded fabric interconnect Tier 1 customers worldwide pushing IDT for 40Gbps IDT is developing S-RIO 10xN Switches, bridges and endpoint IP @ 40 Gbps per port Peer to Peer Scalable Interconnect for Wireless, Cloud, Imaging, Military, Industrial and Medical • 10.3125 Gbaud per lane, 40 Gbps per port embedded interconnect • 100 ns Latency with 5x effective bandwidth of 10 GigE for embedded systems • <300mW per 10 Gbps of data
www.IDT.com
PAGE 28
Backup Slides Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council
Oct 2013
©2013 Integrated Device Technology, Inc.
Memory
Memory
Memory
Memory
Processor
Processor
Processor
Processor
PCI Express
(I/O Interconnect)
Ethernet NIC
RapidIO Switch .. . Ethernet NIC
Ethernet LAN Interconnect With NIC
.. . Ethernet NIC
.. . Ethernet NIC
Ethernet Switch
ETHERNET Long Latency NIC Required High Footprint Higher Power Software Intervention www.IDT.com
RapidIO Direct Processor Interconnect
RAPIDIO Short Latency NO NIC embedded Fabric Low Power Native Hardware Termination PAGE 30
Short Latency Path
Long Latency Path
RapidIO vs Ethernet Latency Path
Scale PCIe with RapidIO: Two x86 Compute Computer Node with PCIe to RapidIO NIC and switching •
13x13 NIC, 25x25 32 Lane Switch
•
2W per NIC, 4W per Switch
•
Total interconnect power 8W (typ.)
•
$49/NIC, $80/Switch
•
Total Interconnect per Compute Node $65
X86
PCIe
CPU
X86 CPU
PCIe
PCIe to S-RIO PCIe to S-RIO
S-RIO
S-RIO Switch
Switch
Low Energy, High Density 20 Gbps Infinite Scaling www.IDT.com
PAGE 31
Compare to PCIe and Ethernet 2 Node Compute
Micro Server Building Block with Ethernet only
Micro Server Building Block with PCIe only
X86
PCIe
X86 CPU
PCIe
PCIe NTB Switch
HBA
CPU
X86
PCIe
CPU
Ethernet Switch
X86 CPU
PCIe
PCIe 10GbE NIC
Ethernet Switch
Scale out limitation with PCIe, Bandwidth and latency Limits with Ethernet www.IDT.com
PAGE 32