Transcript
Advanced Computer Networks 263-3501-00
Network I/O Virtualization
Patrick Stuedi, Ankit Singla, Desislava Dimitrova Spring Semester 2016
© Oriana Riva, Department of Computer Science | ETH Zürich
Outline Last week: Data center routing & addressing Portland, VL2, TRILL
Today: Network I/O Virtualization Paravirtualization SR-IOV
2
Processor Clock Frequency Scaling Has Ended Three decades of exponential clock rate (and electrical power!) growth has ended Yet Moore’s Law continues in transistor count What do we do with all those transistors to keep performance increasing to meet demand? Industry response: Multicore (i.e. double the number of cores every 18 months instead of the clock frequency (and power!)
Source: “The Landscape of Computer Architecture,” John Shalf, NERSC/LBNL, presented at ISC07, Dresden, June 25, 2007
3
Virtualization and Hypervisors VM1
VM2
VM3
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
Hypervisor Hardware
4
Data Transfer Non-Virtualized Application syscall
OS
TCP/IP
Driver
privileged instructions
NIC
1) Application: syscall, e.g., socket.write() 2) OS driver: issue PCI commands ●
Set up DMA operation
3) NIC: ●
transmit data
●
raise interrupt when done 5
Virtualization and Hypervisors VM1
VM2
VM3
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
How does Hypervisor network access work? Hardware
6
Option 1: Full Device Emulation Guest OS unaware that it is being virtualized
Application
Hypervisor emulates device at the lowest level
Privileged instructions from guest driver trap into hypervisor
Advantage: no changes to the
Hypervisor
Disadvantage:
Traps Device Emulation
guest OS required
Guest OS
Inefficient Complex
Hardware
7
Option 2: Paravirtualization Guest OS aware that it is being virtualized
Application
Runs special paravirtual device drivers
Guest OS
Hypervisor cooperates with guest
Paravirtual Driver
OS through paravirtual interfaces
Interfaces
Advantage:
Device Emulation
Better performance Simple
Hypervisor
Disadvantage:
Requires changes to the guest OS
8
Hardware
Paravirtualization with VirtIO Guest OS Front-end drivers
virtio Backend drivers Device Emulation
Backend drivers Device Emulation
KVM Hypervisor
lguest Hypervisor Hardware
VirtIO: I/O virtualization framework for Linux
Framework for developing paravirtual drivers Split driver model: front-end and back-end driver APIs for front-end and back-end to communicate 9
Example: KVM Hypervisor VM1
guest mode (guest user, guest kernel)
user mode
Guest Application Guest Operation System
QEMU
Based on Intel VT-x
Single Linux Process
/dev/kvm
Additional guest execution mode
I/O at guest OS trap into KVM (VM Exit)
KVM schedules QEMU process to emulate I/O operation
User Application
VM Enter & VM Exit
KVM module
Hypervisor Hardware
Starting new guest = starting QEMU process QEMU process interacts with KVM through ioctl on /dev/kvm to
Allocated memory for guest Start guest 10 ...
user mode kernel mode
VirtIO and KVM Guest OS
QEMU
1
VirtIO Back-end
tx rx
3 KVM module
VirtIO-Net Driver (Front-end)
2
VirtIO shared memory
4
Hypervisor
tap
Real NIC 1) VirtIO-Net driver adds packet to shared VirtIO memory 2) VirtIO-Net driver causes trap into KVM 3) KVM schedules QEMU VirtIO Back-end 4) VirtIO back-end gets packet from shared VirtIO memory and emulates I/O (via system call) 5) KVM resumes guest 11
Vhost: Improved VirtIO Backend Guest OS QEMU
VirtIO-Net Driver
tx KVM module
rx
vhost net
Hypervisor
tap
Real NIC
Vhost puts VirtIO emulation code into the kernel
Instead of performing system calls from userspace (QEMU) 12
Where are we? Option 1: Full emulation No changes to guest required Complex Inefficient
Option 2: Paravirtualization Requires special guest drivers Enhanced performance
Option 3: Passthrough No hypervisor involvement: best performance Problems: 13
Where are we? Option 1: Full emulation No changes to guest required Complex Inefficient
Option 2: Paravirtualization Requires special guest drivers Enhanced performance
Not good enough! Still requires hypervisor involvement, e.g., interrupt relaying
Option 3: Passthrough No hypervisor involvement: best performance Problems: 14
Where are we? Option 1: Full emulation No changes to guest required Complex Inefficient
Option 2: Paravirtualization Requires special guest drivers Enhanced performance
Not good enough! Still requires hypervisor involvement, e.g., interrupt relaying
Option 3: Passthrough Directly assign NIC to VM No hypervisor involvement: best performance
15
Passthrough / Direct Assignment VM1
VM2
VM3
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System Native Device Driver
Native Device Driver Safe HW IF
Ethernet
Hypervisor
Safe HW IF
Hardware
Ethernet
16
Paravirtual vs Passthrough in KVM VM1
VM2
Application
Application
Guest OS
Guest OS
VirtIO-Net Driver
tx
Physical Driver
rx
KVM module
vhost net
Hypervisor
tap
NIC exclusively assigned to VM2
Real NIC
Real NIC
17
Challenges with Passthrough / Direct Assignment VM tied to specific NIC hardware
Makes VM migration more difficult
VM driver issues DMA requests using VM addresses
Incorrect: VM physical addresses are host virtual addresses (!) Security concern: addresses may belong to other VM Potential solution: let VM translate it's physical addresses to real DMA addresses - Still safety problem: exposes driver details to hypervisor, bugs in driver could result in incorrect translations
Solution: Use an IOMMU to translate/validate DMA requests from the device
Need a different NIC for each VM Solution: SR-IOV, emulate multiple NICs at hardware level 18
Challenges with Passthrough / Direct Assignment VM tied to specific NIC hardware
Makes VM migration more difficult
VM driver issues DMA requests using VM addresses
Incorrect: VM physical addresses are host virtual addresses (!) Security concern: addresses may belong to other VM Potential solution: let VM translate it's physical addresses to real DMA addresses - Still safety problem: exposes driver details to hypervisor, bugs in driver could result in incorrect translations
Solution: Use an IOMMU to translate/validate DMA requests from the device
Need a different NIC for each VM
Solution: SR-IOV, emulate multiple NICs at hardware level 19
Memory Address Terminology Virtual Address
Address in some virtual address space in a process running in the guest OS
Physical Address:
Hardware address as seen by the guest OS, i.e., physical address in the virtual machine
Machine address:
Real hardware address on the physical machine as seen by the Hypervisor
20
IOMMU
Memory controller
IOMMU
PCIe function (e.g. NIC)
21
Main memory
IOMMU VMM programs IOMMU with VM-physical to machine address translations
Memory controller
IOMMU
PCIe function (e.g. NIC)
22
Main memory
IOMMU VMM programs IOMMU with VM-physical to machine address translations
Memory controller
IOMMU
Guest OS programs NIC with VMphysical address of DMA
PCIe function (e.g. NIC)
23
Main memory
IOMMU VMM programs IOMMU with VM-physical to machine address translations
Memory controller
IOMMU
Guest OS programs NIC with VMphysical address of DMA
NIC issues a DMA request to VM physical memory
PCIe function (e.g. NIC)
24
Main memory
IOMMU VMM programs IOMMU with VM-physical to machine address translations
IOMMU checks and translates to machine (real) address for transfer
Memory controller
IOMMU
Guest OS programs NIC with VMphysical address of DMA
NIC issues a DMA request to VM physical memory
PCIe function (e.g. NIC)
25
Main memory
IOMMU VMM programs IOMMU with VM-physical to machine address translations
IOMMU checks and translates to machine (real) address for transfer
Memory controller
IOMMU
Guest OS programs NIC with VMphysical address of DMA
NIC issues a DMA request to VM physical memory
PCIe function (e.g. NIC)
26
Main memory
Memory controller accesses memory
SR-IOV Single-Root I/O Virtualization Key idea: dynamically create new “PCI devices” Physical Function (PF): original device, full functionality Virtual Function (VF): extra device, limited functionality VFs created/destroyed via PF registers
For Networking: Partitions a network card's resources With direct assignment can implement passthrough
27
SR-IOV in Action Guest Application
Guest Application
Guest Application
Guest OS
Guest OS
Guest OS
Physical Driver
Physical Driver
Physical Driver
Hypervisor
Physical Driver
IOMMU PCI Virtual Function
Virtual Virtual Function Function Virtual Ethernet Bridge/Switch
SR-IOV NIC 28
Physical Function
Example: SolarFlare
29
Inter-VM communication VM1
VM2
VM3
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
Hypervisor NIC
30
Inter-VM communication VM1
VM2
VM3
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
Hypervisor How does inter-
VM communication work?
Hardware
31
Switch in Hypervisor
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
Bridge/switch
NIC
32
HV
Switched Vhost in KVM
Advantage: low latency (1 software copy)
Guest Application
Guest Application
Guest Application
Guest OS
Guest OS
Guest OS
VirtIO-Net Driver
VirtIO-Net Driver
VirtIO-Net Driver
vhost net
vhost net
vhost net
tap
tap
tap
Disadvantage: uses host CPU cycles
KVM
bridge
Real NIC 33
Switch Externally...
Guest Application
Guest Application
Guest Application
Guest Operation System
Guest Operation System
Guest Operation System
...either in
External switch:
Simplifies configuration: all switching controlled/configured by the network
Latency = 2xDMA + 2hops
NIC
Hypervisor
NIC
34
Latency = 2xDMA
Controversial External switching in NIC or Switch
Extra latency Reduces CPU requirements Hardware vendors like it Better TCAMs on the switch Integration with network management policies
Software switching in hypervisor
Lower latency Higher CPU consumption. But software switches got more efficient over the last years CPU resources are generic and flexible Easy to upgrade Fully support OpenFlow
35
Moral Network interface cards traditional are the “end point” We now have at least two more hops Virtual switch in the NIC Virtual switch in the hypervisor
Inside of a physical machine increasingly resembles a network
36
References I/O Virtualization, Mendel Rosenblum, ACM Queue, January 2012
Kernel-based Virtual Machine Technology, Yasunori Goto, Fujitsu Technical Journal, July 2011
VirtIO: Towards a De-Facto Standard For Virtual I/O Devices, Rusty Russel, ACM SIGOPS Operating Systems Review, July 2008
37