Transcript
New networking features & tools for Red Hat Enterprise Linux 7 beta Eric Dubé, Networking Technology Product Manager, Red Hat Rashid Khan, Manager, Software Engineering, Red Hat
Agenda • Network Management • Link Aggregation • Virtualization, Container, & Overlay Networking Technologies • Network Performance • Security • Precision Time Synchronization • Diagnostics • Partner Ecosystem & Summary • Questions?
Network Management
NetworkManager • Easy to use yet comprehensive network management suite designed to provide painless network configuration. • Eliminates the need to manually edit network configuration files by hand. • Flexible, unified interface with GUI, CLI, and TUI options for managing of local, remote, or even headless systems. • Supports a broad array of common network interface types: • Ethernet, IPoIB, VLANs, Bridges, Bonds, Teams, WiFi, WiMAX, WWAN, Bluetooth, VPN, and ATM-based DSL.
Numerous improvements for RHEL 7 beta aimed at usability, interoperability and accessibility, including: • New command line user interface with command tab-completion (nmcli) • New curses-based, menu driven text user interface (nmtui) • Cooperates with existing interface configurations and non-destructively takes over an interface's existing configuration • Recognizes live reconfiguration for changes made outside of NM without requiring a restart • All interfaces now provide support for IP Address aliases
nmcli: examples List active connections # nmcli connection show NAME UUID TYPE DEVICE Local Lan 4d5c449a-a6c5-451c-8206 802-3-ethernet eth1 MyWiFi 91451385-4eb8-4080-8b82 802-11-wireless wlan0 Bond connection 1 720aab83-28dd-4598-9325 bond bond0 Adding a connection # nmcli connection add con-name “Local LAN” ifname eth1 type ethernet ip4 192.168.1.2/24 gw4 192.168.1.1 Show configuration details for a connection # nmcli connection show “Local LAN” connection.id: Local LAN connection.uuid: bdd2eb8e-bc67-468e-97b5-e6e1dc8942f8 connection.interface-name: eno16777736 connection.type: 802-3-ethernet connection.autoconnect: yes connection.timestamp: 0 connection.read-only: no … Show available wifi networks and details # nmcli dev wifi list SSID MODE CHAN RATE SIGNAL BARS SECURITY MyCafe Infra 11 54 MB/s 39 ▂▄__ WPA2 NextDoor Infra 1 54 MB/s 27 ▂___ WPA2 Modifying a connection to auto start # nmcli connection mod eth1 connection.autoconnect yes
Please see the RHEL 7 beta Networking Guide for more examples!
nmtui: screenshots
Link Aggregation
Team Driver • Mechanism for bonding multiple network devices (ports) into a single logical interface at the data link layer (L2) that provides an increase in maximum bandwidth and link redundancy. • Alternative to the existing Linux Bonding driver that provides a number of advantages over traditional bonding while still providing equal or even slightly better performance. • Implemented mostly in user space with only the necessary data fast-paths in the kernel. • Moves most of the work and logic into a user space daemon making it: • more stable • easier to debug
eth0
• much simpler to extend
• Supports IEEE 802.3ad (IEEE 802.1ax) LACP + many proprietary standards. • Team configurations based on the JSON format. • Managed from either NetworkManager or traditional initscripts infrastructure.
Server Server
team0 eth1
Network Network Switch Switch
Team Driver: example configuration # /etc/sysconfig/network-scripts/ifcfg-team0: DEVICE="team0" DEVICETYPE="Team" ONBOOT="yes" BOOTPROTO=none NETMASK=255.255.255.0 IPADDR=192.168.23.11 TEAM_CONFIG='{"runner": {"name": "roundrobin"}}' NM_CONTROLLED="no" For each port device, create an ifcfg config similar to the following one: # /etc/sysconfig/network-scripts/ifcfg-eth1: DEVICE="eth1" DEVICETYPE="TeamPort" ONBOOT="yes" TEAM_MASTER="team0" NM_CONTROLLED="no"
Team Driver: NetworkManager
Virtualization, Container, & Overlay Networking Technologies
Multiqueue support for Virtio-net • Enables packet sending/receiving processing to scale with the number of available virtual CPUs in a guest. • Each guest virtual CPU can have a it's own separate transmit or receive queue and interrupts that can be used without influencing other virtual CPUs. • Provides better application scalability and improved network performance in many cases.
vcpu0
rx0
Guest
tx0
• To enable, add the following to the
block of your Libvirt XML configuration file:
vcpu1
rx1
tx1
vhost/qemu sock0
...
sock1
tap
• Enabled from guest VM using Ethtool:
bridge
# ethtool -L eth0 combined 4 rx
tx
rx
tx
Single Root I/O Virtualization (SR-IOV) • Allows a device, such as a network adapter, to separate access to its resources among various PCIe hardware functions: Physical Function (PF) and one or more Virtual Functions (VF) • Enables network traffic to bypass the software layer of the hypervisor and flow directly between the VF and the virtual machine. • Near line-rate performance without the need to dedicate a separate NIC to each individual virtual machine. • For RHEL 7 beta, the number of available SR-IOV Virtual Functions has been increased (up to 128) for capable network adapters and driver support has also been expanded to cover more devices. • Full Support Drivers • Broadcom bnx2x • Emulex be2net • Intel igb/igbvf, ixgbe/ixgbevf, i40e/i40evf
• Tech Preview Drivers • Chelsio cxgb4/cxgb4vf • Mellanox mlx4_en/mlx4_ib • Qlogic qlcnic
Network Namespaces • Lightweight container-based virtualization allows virtual network stacks to be associated with a process group. • Creates an isolated copy of the networking data structures such as the interface list, sockets, routing table, /proc/net directory, port numbers, and so on. • Managed through the iproute2 (ip netns) interface: Shows the list of current named network namespaces # ip netns list Creates a network namespace and names it vpn # ip netns add vpn Bring up the loopback interface in the vpn network namespace # ip netns exec vpn ip link set lo up Report as network namespaces are added and deleted # ip netns monitor
Server Blue Web Service 10.10.10.1:80 veth0
Red Web Service 10.10.10.1:80 veth0
veth_blue
veth_red
Blue Bridge
Red Bridge
eth0.10 eth0
• Use Cases:
Trunk Port (VLAN 10,20)
• Isolated network space for application development. • Overlapping IP ranges for multi-tenancy hosting. • Running multiple applications on the same host with identical port number binding requirements.
eth0.20
Network Switch VLAN 10 Blue Client 10.10.10.2
VLAN 20 Red Client 10.10.10.2
Control Groups (cgroups) • Allows for resource allocation (such as CPU time, system memory, network bandwidth, disk I/O, or combinations of these resources) among user-defined groups of processes running on a system. • Cgroups provide: • Resource Limiting: Groups can be set to not exceed a set memory limit. • Prioritization: Some groups may get a larger share of CPU, network, or disk I/O throughput. • Accounting: Measure how much resources certain systems use (e.g. for billing purposes). • Control: Freezing groups or checkpointing and restarting.
• Improvements for RHEL 7 beta include: • Per-Control Group TCP Buffer Limits • Memory pressure controls for TCP designed to limit buffer sizes (which hold packet data as it passes through a socket) preventing them from getting too large.
• Network Priority Control Group • Allows an administrator to dynamically set the priority of egress network traffic on a given interface generated by various applications.
Overlay Networking Technologies • Virtual Extensible LAN (VXLAN) • New support for VXLAN encapsulation protocol for running an overlay network using an existing infrastructure to support elastic compute architectures. • TCP/IP VXLAN offload and VXLAN GRO. • Hardware checksum and segmentation offloading support. • Measured ~38Gbps using a 40GbE NIC!
• Generic Routing Encapsulation (GRE) • Support for carrying GRE frames over IPv6 in addition to IPv4. • Hardware checksum offload support using GSO/GRO.
• Layer 2 Tunneling Protocol (L2TP) • Support for carrying L2TP frames over UDP on top of IPv6 in addition to IPv4. • Encapsulation support for frames directly over IPv6 (non-UDP based).
L2
Open vSwitch • Multi-layer software switch intended to be used in place of the existing Linux software bridge designed to forward traffic between virtual machines and physical or logical networks. • Supports application and tenant traffic isolation using overlay networking technologies (GRE, VXLAN) and 802.1Q VLAN tagging. • Highlights: • Multi-threaded user space switching daemon for increased scalability. • Support for wildcard flows in kernel data path;; can significantly reduce size of the flow tables, avoid unnecessary flow misses, and optimize flow setup rate.
VM
VM
VM
Open vSwitch
• Supports GRE and VXLAN encapsulation including kernel based hardware offload. • SCTP support.
• Supported on Red Hat Enterprise Linux OpenStack Platform and Red Hat Enterprise Virtualization product offerings. • For testing and development purposes, the user-space packages for RHEL 7 beta can be obtained from Fedora's RDO OpenStack Icehouse repository.
Security: Security:VLAN VLAN isolation, isolation,encapsulation, encapsulation, traffic filtering traffic filtering
Monitoring: Monitoring:Netflow, Netflow, sFlow, sFlow,SPAN, SPAN,RSPAN RSPAN
QoS: QoS:Traffic Trafficqueuing queuing and andtraffic trafficshaping shaping
Automated AutomatedControl: Control: OpenFlow, OpenFlow,OVSDB OVSDB management managementprotocol protocol
Network Performance
Next Generation Networking Hardware Support • 40G Ethernet (IEEE 802.3ba) • Provides support for 40G Ethernet link speeds enabling faster network communication for applications and systems. • Ethtool will report interface link speeds up to 40G data rates. • 40G Capable Network Drivers • Chelsio cxgb4;; Emulex be2net;; Intel i40e;; Mellanox mlx4_en;; Solarflare sfc
• WiGig 60 GHz Band (IEEE 802.11ad) • Allows devices to wirelessly communicate at multi-gigabit speeds (up to 7 Gbps.) • Nearly 50 times faster than the 802.11n specification! • 802.11ad Capable Wireless Network Drivers • Atheros WIL6210
TCP Performance and Latency Improvements • TCP Fast Open (both client and server-side) • Experimental TCP extension designed to reduce the overhead when establishing a TCP connection by eliminating one round time trip (RTT) from certain kinds of TCP conversations. • Useful for accelerating HTTP connection handshaking resulting in speed improvements of between 4% and 41% in the page load times for busy web sites. • TCP Tail Loss Probe (TLP) Algorithm • Experimental algorithm improves the efficiency of how the TCP networking stack deals with lost packets at the end of a TCP transaction. • For short transactions, TLP can reduce transmission timeouts by as much as 15% and shorten HTTP response times by an average of 6%. • TCP Early Retransmit (ER) • Allows the transport to use fast retransmits to recover segment losses that would otherwise require a lengthy retransmission timeout. • Enables connections to recover from lost packets faster decreasing overall latency. • TCP Proportional Rate Reduction (PRR) • Experimental algorithm designed to adapt transmission rates to the rates that can be processed by the recipient and by the routers along the way (especially after throttling the rate to prevent an imminent overload.) • Designed to return to the maximum transfer rate faster than the previously used method and potentially reduce HTTP response times by 3-10%.
TCP Bufferbloat Avoidance • Term used for problems such as high network latencies and disrupted connections caused by too much buffering during data transfers between networks that are not properly matched with respect to speed of handling packets. • Several improvements were made in RHEL 7 beta to help avoid common Bufferbloat problems, including: • Dynamic Queue Limits and Byte Queue Limits • Allows the kernel to control how much data can accumulate in a send queue caused by excessive data buffering in networking hardware. • TCP Small Queues (TSQ) • Uses small buffers of no more than 128KB per network socket by default but doesn't affect data throughput. • CoDel and Fair Queue CoDel AQM Packet Schedulers • Adds support for the packet schedulers "CoDel" (Controlled-Delay Active Queue Management algorithm) and "Fair Queue CoDel AQM". • Active queue management algorithms specifically developed to overcome bufferbloat that works by setting limits on the delay network packets suffer due to passing through the buffer.
Low Latency Sockets using Busy Poll • Designed to reduce networking latency and jitter within the kernel by driving the receive from user context. • Allows an application to poll for new packets directly in the device driver enabling packets to quickly find their way into the network stack. • Requires a supported network driver: • Broadcom bnx2x;; Emulex be2net;; Intel ixgbe;; Mellanox mlx4;; Myricom myri10ge
• Only sockets with the SO_BUSY_POLL socket option set are busy polled: # Controls how long to spin waiting for packets on the device queue for socket poll and select sysctl: net.core.busy_poll = {# of µsec;; 0=OFF [DEFAULT]} # Controls how long to spin waiting for packets on the device queue for socket reads sysctl: net.core.busy_read = {# of µsec;; 0=OFF [DEFAULT]}
• Additional tuning should be done for best performance, such as: • Interrupt coalescing, disabling of GRO/LRO, binding application threads, etc.
Routing Improvements • Interface option to enable routing of 127.0.0.0/8 • Provides support for a new per interface option that allows routing of the 127.0.0.0/8 address block on any interface enabling the kernel to recognize on-box traffic flows and optimize accordingly. • Useful within single-machine configurations where processes (such as containerized applications) use TCP to communicate with each other. • Default localhost interface route must first be removed: # sysctl -w net.ipv4.conf.eth0.route_localnet=1 # ip route del 127.0.0.0/8 dev lo table local # ip addr add 127.1.0.1/16 dev eth0 # ip route flush cache
• IPv4 Routing Cache • Removes old and outdated IPv4 route cache functionality in the kernel • Results in decreased route cache lookup misses for high volume sites and reduced overhead for route lookups. • DoS attacks are also completely eliminated while providing predictable and consistent performance, no matter what the pattern of traffic serviced.
XPS: Transmit Packet Steering • Mechanism for intelligently selecting which transmit queue to use when transmitting a packet on multiqueue capable devices. • Analogous to Receive Packet Steering (RPS): • RPS selects a CPU based on receive queue. • XPS selects a queue based on the CPU.
• Benefits: • Contention on the device queue lock is significantly reduced since fewer CPUs contend for the same queue. • Contention can be completely eliminated if each CPU has its own transmit queue. • Cache miss rate on transmit completion is reduced.
• Configuration: /sys/class/net/eth[#]/queues/tx-[#]/xps_cpus {bitmask of CPUs that may use transmit queue}
PF_PACKET Performance • Packet sockets are used to send or receive raw packets at the device driver level. • Allow users to implement protocol modules in user space on top of the physical layer. • For diagnosing network-related problems, it's often useful to be able to capture packets transmitted or received by a machine (Linux implements the PF_PACKET socket family to use for this purpose.) • Several improvements, including: • Fanout Mode • Packet fanout support enables socket clustering and load-balancing of multiple processes working on packet sockets, e.g. via different policies such as round-robin, rxhash, or roll-over.
• TPACKET_V3 Flexible Buffer Implementation • New zero-copy mechanism provides higher throughput than with TPACKET_V1/2 due to fewer translation lookaside buffer (TLB) misses.
• Hardware Time Stamping • Hardware time stamping has been improved and also added to the [TX,RX]_RING.
Remote Direct Memory Access (RDMA) • RDMA over Converged Ethernet (RoCE) • Provides low latency, high bandwidth network connectivity while reducing CPU overhead using 10/40Gb RoCE hardware-enabled network adapters. • Now included with RHEL 7 – no longer requires the HPN add-on option.
• iSCSI Extensions for RDMA (iSER) & SCSI RDMA Protocol (SRP) Target Drivers • Enables access to SCSI devices attached to another computer via RDMA providing higher throughput and lower latency than what is typically possible using TCP/IP. • New 'targetcli' administration tool provides easy configuration of target devices. • rsockets RDMA socket API is now part of librdmacm package • Supports socket-level API intended to match the behavior of corresponding socket calls;; essentially, a simplified “sockets-like” interface to RDMA programming.
• New Driver Support • ocrdma: RoCE support for Emulex Oce14000 10/40Gb Ethernet Network Adapters (Tech Preview). • mlx5: InfiniBand support for Mellanox Single/Dual-Port Connect-IB 4X FDR Host Channel Adapters.
Security
Firewalld • New dynamic and protocol independent firewall service providing greater flexibility over traditional iptables. • Eliminates service disruptions during rule updates. • Supports different network trust zones for per-connection firewall settings. • Unified firewall management service for: • IPv4 (iptables), IPv6 (ip6tables), and Ethernet Bridges (ebtables)
• GUI (firewall-config) and CLI (firewall-cmd) based configuration utilities • Simple yet powerful XML-based configuration file format with nearly 50 built-in pre-defined settings for many common system services. • Configurable service options include: • Port ranges with protocol type • Netfilter helper modules • Destination address (range) for IPv4 and/or IPv6
Firewalld: example configurations dns tftp https dhcpv6-client
nftables • Next-generation, unified replacement to the separate [ip,ip6,arp,eb]_tables frameworks within the kernel providing packet filtering and classification. • Introduces the concept of a simple, universal pseudo-virtual machine (inspired by BPF) to execute bytecode for inspecting a network packet and making decisions on how that packet should be handled. • User-space utility interprets the rule-set and compiles it to pseudo-byte code then transfers it to the kernel. • Main advantages over iptables: • Reduction of code duplication by removing protocol awareness from decision engine • Improved error reporting • More efficient execution, storage, and incremental changes of filtering rules
• Kernel support is included in RHEL 7 beta, however, the user-space packages will be included in a future release once upstream development has had time to stabilize. • For testing and development purposes, nftables requires: • libmnl: Minimalistic Netlink library [included in RHEL 7 beta] • libnfnl: User-space library for low-level interaction with nftables Netlink's API • nftables: Command line utility to maintain ruleset
DDoS (Distributed Denial of Service) Protection
DDoS
• Netfilter: iptables target SYNPROXY • DDoS attacks are increasingly becoming commonplace as more and more products and services become dependent on delivering services over the Internet. • SYNPROXY module is designed to protect against common SYN-floods and ACK-floods, but can also be adjusted to protect against SYN-ACK floods. • Works by filtering out false SYN-ACK and ACK packets before the socket enters the “listen” state lock (otherwise preventing new incoming connections) • Significant step for fighting DDoS and protecting critical system services. • Example configuration (intended for a web server): sysctl: net.netfilter.nf_conntrack_tcp_loose=0 [DEFAULT=1] # iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK # iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \ -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 –ecn
Domain Name System Security Extension (DNSSEC)
DNSSEC
• Allows clients to determine origin authentication of DNS data, authenticated denial of existence and data integrity. • Prevents man-in-the-middle attacks in which active eavesdropping or intercepted communication occurs between two systems. • Two new DNSSEC packages have been introduced for RHEL 7 beta: • Unbound – DNS resolver that provides caching and DNSSEC validation. • Controlled by the unbound systemd service
• dnssec-trigger – Handles reconfiguring the local unbound DNS server (e.g., in the case of hotspot detection.) • Controlled by the dnssec-trigger systemd service
IPv6 Network Address Translation (NAT) • Process of modifying IP address information in packet headers while in transit across a traffic routing device or node for the purpose of remapping one IP address space into another. • Commonly used in IPv4 to workaround IPv4 address exhaustion
• While NAT is generally considered unnecessary with IPv6 (due to its much larger address space), it can be used to hide topology details for internal networks. • Configured from netfilter6 and ip6tables: • Clients behind a router can be hidden by using IPv6 masquerading (hide/overlap NAT): # ip6tables -t nat -A POSTROUTING -o sixxs -s fec0::/64 -j MASQUERADE
• Dedicated public IPv6 address can be forwarded to an internal IPv6 address: # ip6tables -t nat -A PREROUTING -d 2001:db8:0:1:5054:ff:fe01:2345 -i sixxs -j DNAT \ --to-destination fec0::5054:ff:fe01:2345
• Dedicated specified port can be forwarded to an internal system: # ip6tables -t nat -A PREROUTING -i sixxs -p tcp --dport 8080 -j DNAT --to-destination [fec0::1234]:80
Precision Time Synchronization
Chrony Suite
NTP
• Different implementation of the NTP protocol than ntpd that is able to synchronize the system clock faster and with better accuracy than ntpd. • Not intended to be a replacement for ntpd for all use cases, however, the algorithm used to discipline the clock gives Chrony several advantages over ntpd, including: • Much faster synchronization requiring only minutes instead of hours to minimize the time and frequency error • Larger range for frequency correction (100000 ppm vs 500 ppm) allowing it to operate even on machines with broken or unstable clocks (useful for some virtual machines) • Better response to rapid changes in the clock frequency due to changes in the temperature of the crystal oscillator • After the initial synchronization the clock is never stepped so as not to upset applications needing time to be monotonic • Better stability with temporary asymmetric delays due to network congestion • Periodic polling of servers is not required, so systems with intermittent network connections can still quickly synchronize clocks
Chrony Suite: example # chronyc tracking Reference ID : 46.249.47.127 (fw.ams.nl.alexs.co.nz) Stratum : 3 Ref time (UTC) : Fri Dec 13 09:12:14 2013 System time : 0.000245416 seconds slow of NTP time Last offset : -0.000308746 seconds RMS offset : 0.000653052 seconds Frequency : 18.964 ppm slow Residual freq : -0.004 ppm Skew : 0.039 ppm Root delay : 0.045544 seconds Root dispersion : 0.012329 seconds Update interval : 1039.7 seconds Leap status : Normal # chronyc sources 210 Number of sources = 4 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^* fw.ams.nl.alexs.co.nz 2 10 377 53 -2813us[-3122us] +/- 50ms ^+ sip.dicode.nl 2 10 377 649 -3861us[-4161us] +/- 57ms ^+ thuis.bentware.nl 3 10 377 442 -1470us[-1773us] +/- 76ms ^+ mirror.muntinternet.net 2 10 377 239 -1592us[-1898us] +/- 50ms # chronyc sourcestats 210 Number of sources = 4 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev ============================================================================== fw.ams.nl.alexs.co.nz 19 9 309m -0.046 0.120 -97us 689us sip.dicode.nl 20 7 327m -0.007 0.144 -246us 916us thuis.bentware.nl 34 17 568m 0.015 0.042 -4754ns 695us mirror.muntinternet.net 32 15 552m -0.008 0.054 +345us 835us
Precision Time Protocol version 2 (PTPv2) IEEE 1588
• Based on IEEE 1588-2008 standard, method for precisely synchronizing distributed clocks over an Ethernet network. • Capable of achieving clock accuracy in the sub-microsecond range when used in conjunction with PTP-enabled hardware devices. • Robust protocol implementation provided by 'LinuxPTP' package (using modern Linux Kernel API's.) • When used in combination with ntpd or Chrony, it can be used to accurately synchronize time from the host to Virtual Machines. • For RHEL 7 beta, new network driver support for both hardware and software time stamping capabilities: • Hardware time stamping (also requires support in the physical network adapter): • Broadcom tg3;; Intel e1000e, igb, ixgbe;; Mellanox mlx4_en;; Solarflare sfc
• Software time stamping: • Broadcom tg3, bnx2x;; Intel e1000e, igb, ixgbe
• Tech Preview: • Hardware: Intel i40e, pch_ptp • Software: Cadence macb;; Intel e1000, i40e;; Realtek r8169;; SMSC smsc9420;; dnet;; usbnet
LinuxPTP: example configuration # ethtool -T eth1 Time stamping parameters for eth1: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON) Hardware Receive Filter Modes: none (HWTSTAMP_FILTER_NONE) all (HWTSTAMP_FILTER_ALL) # ptp4l -i eth1 -m selected eth1 as PTP clock port 1: INITIALIZING to LISTENING on INITIALIZE port 0: INITIALIZING to LISTENING on INITIALIZE port 1: new foreign master 00a069.fffe.0b552d-1 selected best master clock 00a069.fffe.0b552d port 1: LISTENING to UNCALIBRATED on RS_SLAVE master offset -23947 s0 freq +0 path delay 11350 master offset -28867 s0 freq +0 path delay 11236 master offset -32801 s0 freq +0 path delay 10841 master offset -37203 s1 freq +0 path delay 10583 master offset -7275 s2 freq -30575 path delay 10583 port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED master offset -4552 s2 freq -30035 path delay 10385 # phc2sys -s eth1 -w
Diagnostics
IPTraf-ng • Curses-based, console network monitoring and statistics utility. • Capable of gathering a variety of measurements, such as: • TCP packet and byte counts, interface statistics and activity indicators, TCP/UDP traffic breakdowns, and LAN station packet and byte counts.
Netsniff-ng • High-performance, networking toolkit utilizing zero-copy mechanisms eliminating the need for the kernel to copy packets from kernel space to user space and vice versa during packet reception and transmission. • Toolkit is comprised of the following utilities: • astraceroute, an autonomous system (AS) and GeoIP trace route utility • bpfc, a Berkeley Packet Filter compiler, Linux BPF JIT disassembler • ifpps, a top-like kernel networking statistics tool • netsniff-ng, a fast zero-copy analyzer, pcap capturing and replaying tool • trafgen, a multithreaded low-level zero-copy network packet generator
• Fast and highly configurable: # netsniff-ng --in eth0 --out dump.pcap -s -b 0 tcp or udp Running! Hang up with ^C! 1826 packets incoming (3 unread on exit) 1829 packets passed filter 0 packets failed filter (out of space) 0.0000% packet droprate 26 sec, 901712 usec in total
Partner Ecosystem & Summary
Vibrant Networking Partner Eco-system • Close engineering relationships with our networking partners result in better out of box performance and overall, a higher-quality product through: • Cooperative development • Upstream collaboration • Joint testing of releases • Mutual customer support
• Significant partner code contributions account for ~10% of the lines of code in the RHEL 7 beta kernel.
Summary • Flexible network management • New link aggregation mechanism • Many virtualization, container, and overlay networking technologies updates • Major security enhancements • Highly accurate time synchronization • Numerous network performance optimizations and latency improvements • New diagnostic tools • Strong partner eco-system
This only represents a subset of all the new and exciting enhancements found in RHEL 7 beta!
Questions?
Backup Slides
Network Management
NetworkManager • New command line user interface (nmcli) • Intended for use by administrators/end-users who prefer or may require command line access to setup, manage, or script network services on a system.
• New curses-based user interface (nmtui) • Replacement for system-config-network-tui (in RHEL 6) designed to make it easier to configure many common network settings.
• Supports common network Interface types • Ethernet, IPoIB, VLANs, Bridges, Bonds, Teams, WiFi, WiMAX, WWAN, Bluetooth, VPN, and ATM-based DSL. • Status and monitoring support for GRE, MACVTAP, TUN, TAP, and VETH interfaces.
• Cooperates with existing interface configurations • Restarting won't change any addressing, routing, or Layer-2 configurations for Ethernet, bridge, bond/team, and VLAN interfaces and will non-destructively take over the interface's existing config.
• Recognizes live reconfiguration • Changes to addresses and routes made outside of NM are immediately reflected and can be made permanent by asking NM to save that new configuration to disk.
NetworkManager • IP Address Aliases support • Support for interface aliases (multiple IP addresses on a single interface).
• New Server Defaults • NetworkManager-config-server RPM provides suitable defaults for servers. • Not creating default DHCP connections, ignoring the carrier state on interfaces with static IP configurations, suppress changes to resolv.conf, etc.
• Explicit Configuration Reload • No longer watches for configuration file changes by default, and allows administrators to make it aware of external changes manually. • This behavior better aligns with expectations about configuration file changes made through editors or development tools.
• Future development • Planned support for managing IPSec, VXLAN, and DNSSEC Tunnels.
ModemManager • Service for controlling Wireless WAN devices and communicating with cellular data networks. • Provides a rich unified D-Bus API for: • Network status • Data connections • Short Message Service (SMS) communications • Location Services • Other cellular functions
• Device enablement has been significantly improved on RHEL 7 beta with support for multi-mode hardware, 4G LTE networks, and enhanced support for SMS communication and location services.
Link Aggregation
Team Driver: example configuration $ ls /usr/share/doc/teamd-*/example_configs/ activebackup_arp_ping_1.conf activebackup_multi_lw_1.conf loadbalance_2.conf activebackup_arp_ping_2.conf activebackup_nsna_ping_1.conf loadbalance_3.conf activebackup_ethtool_1.conf broadcast.conf random.conf activebackup_ethtool_2.conf lacp_1.conf roundrobin_2.conf activebackup_ethtool_3.conf loadbalance_1.conf roundrobin.conf # cat /usr/share/doc/teamd-0.1/example_configs/activebackup_ethtool_1.conf { "device": "team0", "runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}, "ports": { "eth1": { "prio": -10, "sticky": true }, "eth2": { "prio": 100 } } } # teamd -f /usr/share/doc/teamd-0.1/example_configs/activebackup_ethtool_1.conf -d # ip link 4: eth1: mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether 52:54:00:3d:c7:6d brd ff:ff:ff:ff:ff:ff 5: eth2: mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000 link/ether 52:54:00:73:15:c2 brd ff:ff:ff:ff:ff:ff 5: team0: mtu 1500 qdisc noop state DOWN mode DEFAULT link/ether ea:8e:85:d3:95:5d brd ff:ff:ff:ff:ff:ff # ip addr add 192.168.23.2/24 dev team0 # ip link set team0 up
Virtualization, Container, & Overlay Networking Technologies
TCP Connection Repair • Designed for stopping a TCP connection and restarting it on another host (intended for process checkpointing and restarting.) • Container virtualization implementations can make use this feature to relocate an entire network connection from one host to another transparently for the remote end. • Achieved by putting the socket in a "repair" mode allowing the gathering of necessary information for restoring the previous state into a new socket. • Accomplished with the setsockopt() system call using the new TCP_REPAIR option, which puts the socket in/out of the repair mode.
IP Virtual Server (IPVS) • Built on top of Netfilter, IPVS implements transport-layer load balancing inside the Linux kernel. • Runs on a host and acts as a load balancer at the front of a cluster of real servers. • Works by directing requests for TCP/UDP based services to the real servers and by making services of the real servers appear as a virtual service on a single IP address. • New for RHEL 7 beta: • Added support for Linux Containers (LXC) allowing process virtualization with network namespace transport-layer load balancing. • Support for fragmented IPv6 UDP messages with IPVS.
Control Groups (cgroups) • Per-Control Group TCP Buffer Limits • Hard limit can be set/shown from: /sys/fs/cgroup/memory/memory.kmem.tcp.limit_in_bytes
• Additional information can be found in cgroups/memory.txt within the 'kernel-doc' package. • Network Priority Control Group • Creating network priority groups: # insmod /lib/modules//kernel/net/core/netprio_cgroup.ko # mkdir /sys/fs/cgroup/net_prio # mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio
• Each net_prio cgroup contains two files that are subsystem specific: • net_prio.prioidx: Contains a unique integer value that the kernel uses as an internal representation of this cgroup (read-only and simply informative.) • net_prio.ifpriomap: Contains a map of the priorities assigned to traffic originating from processes in this group and egressing the system on various interfaces. Tuple formatted list : # echo "eth0 5" > /sys/fs/cgroups/net_prio/test/net_prio.ifpriomap
• Additional information can be found in cgroups/net_prio.txt within the 'kernel-doc' package.
Open vSwitch: example VLAN-based configuration Open vSwitch bridge between two Virtual Machines using VLAN tagging for traffic isolation: • Create an OVS bridge: # ovs-vsctl add-br br0
• Add eth0 to the bridge (by default, all OVS ports are VLAN trunks, so eth0 will pass all VLANs): # ovs-vsctl add-port br0 eth0
• Add VM1 as an “access port” on VLAN 1: # ovs-vsctl add-port br0 tap0 tag=1
• Add VM2 on VLAN 1:
VM1
VM2
tap0
tap0
# ovs-vsctl add-port br0 tap1 tag=1
OVS
br0
Security
nftables vs. iptables comparison • With iptables, you need to write two rules, one for drop and one for logging: # iptables -A FORWARD -p tcp --dport 22 -j LOG # iptables -A FORWARD -p tcp --dport 22 -j DROP
• With nftables, you can combined both targets: # nft add rule filter forward tcp dport 22 log drop
• With iptables in order to allow packets for different ports and allow different icmpv6 types, you would need to do the following: # ip6tables -A INPUT -p tcp -m multiport --dports 23,80,443 -j ACCEPT # ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT # ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT # ip6tables -A INPUT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT # ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
• With nftables, sets can be used on any element in a rule: # nft add rule ip6 filter input tcp dport {telnet, http, https} accept # nft add rule ip6 filter input icmpv6 type { nd-neighbor-solicit, echo-request, \ nd-router-advert, nd-neighbor-advert } accept
Network Performance
TCP Performance and Latency Improvements • TCP Fast Open sysctl: net.ipv4.tcp_fastopen={Bitmap Values: 0=Disabled [DEFAULT], 1=Enables Client-side, 2=Enables Server-side, 4=Send data in opening SYN regardless of cookie}
• TCP Tail Loss Probe (TLP) Algorithm and TCP Early Retransmit (ER) sysctl: net.ipv4.tcp_early_retrans={0=disables TLP and ER;; 1=enables RFC5827 ER;; 2=delayed ER;; 3=TLP and delayed ER [DEFAULT];; 4=TLP only}
• TCP SO_REUSEPORT Option • TCP and UDP sockets now support a SO_REUSEPORT option that allows multiple sockets to listen on the same port. • Enables multiple processes (such as a web server) or threads to open individual sockets to listen on a port. • Any connections that come in on this port will be evenly distributed across the sockets by the kernel. int sfd = socket(domain, socktype, 0);; int optval = 1;; setsockopt(sfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));; bind(sfd, (struct sockaddr *) &addr, addrlen);;
TCP Bufferbloat Avoidance • Dynamic Queue Limits and Byte Queue Limits • For testing and development purposes, the DQL Library is required: void netdev_sent_queue(struct net_device *dev, unsigned int pkts, unsigned int bytes);; void netdev_tx_sent_queue(struct netdev_queue *dev_queue, unsigned int pkts, unsigned int bytes);; void netdev_completed_queue(struct net_device *dev, unsigned pkts, unsigned bytes);; void netdev_tx_completed_queue(struct netdev_queue *dev_queue, unsigned pkts, unsigned bytes);;
• TCP Small Queues (TSQ) • Buffer size can manually adjusted at runtime: sysctl: net.ipv4.tcp_limit_output_bytes=[131072]
• CoDel and Fair Queue CoDel AQM Packet Schedulers • Load the kernel module of the desired scheduler then configure using the 'tc' (traffic control) command: # insmod sch_fq_codel.ko # tc qdisc add dev wlan0 root fq_codel
PF_PACKET Performance • Virtual Netlink Device for Packet Sockets • Allows a virtual netlink device to be easily used without modification (by tools like tcpdump, Wireshark, etc.) for monitoring and debugging of netlink traffic that is exchanged between user and kernel space (with PF_PACKET sockets through the nlmon device driver.) • Can be used to record pcap files for a later analysis without any code changes needed on the side of such analyzers, except for adding a simple protocol dissector, for example.
Berkeley Packet Filter (BPF) Just-In-Time Compiler • Mechanism for fast filtering network packets on their way to an application. • Used by many common packet capture tools such as libpcap and tcpdump.
• Just-In-Time (JIT) compiler incorporated into the kernel to translate BPF code directly into the host system's assembly code. • BPF machine makes the JIT translation relatively simple allowing it to carry out some of the network packet filtering tasks set by sniffer tools • Measurable savings of around 50 nanoseconds per packet!
• Standalone, minimal BPF JIT image disassembler helper available in 'netsniff-ng' package • Allows for debugging or verification of emitted BPF JIT images. • Useful for emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
• Disabled by default, but can be enabled at runtime: sysctl: net.core.bpf_jit_enable={0=Disabled [DEFAULT], 1=Enabled, 2=Debug Output}
Jump Label • The number of tracepoints in the kernel is growing, and each one adds a new to test where a value from memory must be fetched, adding to the pressure on the cache thus hurting performance. • Designed to reduce function call overhead and optimize the “tracepoint disabled” case. • When enabled, the call to a specific tracepoint can be looked up in the jump label table, and then replace the special no-op instructions with the assembly equivalent of "goto label" enabling the tracepoint function. • Results in reduced run time performance degradation when static tracepoints are disabled.
Full Dynticks Kernel Support (Full NOHZ) • CPUs can be diverted between 100 and 1000 times each second by the periodic timer interrupt. • For idle CPUs, it allows the periodic timer interrupt to be disabled for sleeping CPUs avoiding the need to service useless interrupts (for energy saving purposes.). • For busy CPUs, certain CPUs can remain in user-mode enabling critical applications to make full use of CPU cycles while eliminating expensive context switching (hurting application latency) due to interruptions by kernel related tasks. • Useful for users looking to gain every last bit of performance out of their system for latency sensitive applications. • Originally designed for real-time applications, but can also be beneficial for HPC (High Performance Computing) workloads where there is only a single task running. • Results in performance improvements of around 0.5-1.0% for typical systems.
Network Protocols
IEEE 802.1ad Stacked VLANs (QinQ)
VLAN
• Specification allows for multiple virtual LANs (VLANs) headers to be inserted into a single Ethernet frame avoiding VLAN conflicts across network infrastructures. • Enables customers to run their own VLANs inside a service provider's assigned VLAN.. • Configuration is performed using "ip link" (from iproute2): # ip link add link eth0 eth0.1000 type vlan proto 802.1ad id 1000 # ip link add link eth0.1000 eth0.1000.1000 type vlan proto 802.1q id 1000 52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64 92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84) 20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64
Stream Control Transmission Protocol (SCTP) • Transport layer protocol serving a similar role to common protocols such as Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). • Provides some of the same service features of both: • Message-oriented like UDP • Reliable, in-sequence transport of messages with congestion control like TCP
• Multihoming support enables transparent fail-over between redundant network paths. • RHEL 7 beta improvements: • Support for changing cryptographic hash function in SCTP • Allows the cryptographic hash function to be changed from MD5 (default) to SHA1.
• Additional SCTP association statistics support
Diagnostics
Netsniff-ng: ifpps screenshot
New Packages & Libraries
GeoIP • Library and utilities for providing IP Address or hostname mapping to country/city/organization resolution. • Useful for identifying information about Internet visitors. • Bind and Netsniff-ng have been enhanced to take advantage of GeoIP ACL support allowing restrictions to be placed based on a client's geographic location. • Includes basic IP to country lookup utility: # geoipupdate MD5 Digest of installed database is 52092bcfb13e2ca157b90519dc0d191f Updating /usr/share/GeoIP/GeoLiteCountry.dat Updated database MD5 Digest of installed database is f5ce2f7a4a156c580ed529600e84c5ce Updating /usr/share/GeoIP/GeoLiteCity.dat Updated database # geoiplookup 65.255.48.0 GeoIP Country Edition: TC, Turks and Caicos Island # geoiplookup 31.209.144.0 GeoIP Country Edition: IS, Iceland
libnl3 • Collection of libraries providing APIs to netlink protocol based Linux kernel interfaces. • Interfaces are split into several small libraries: • libnl: Core Library implementing the fundamentals • libnl-route: API to configuration interfaces of the NETLINK_ROUTE family • libnl-genl: API to generic netlink protocol • libnl-nf: API to netlink based netfilter configuration and monitoring interfaces
• libnl is used as the user-space component of Team Driver (libteam and teamd packages.) • Documentation available within the 'libnl3-doc' package.
Removed Packages & Discontinued Network Drivers
Removed Network Management Packages • Outlined in Section 4.2 of the RHEL 7.0 beta Release Notes: • Wireless-tools • Basic wireless device manipulation from the command line can be done with 'iw'.
• system-config-network • Network configuration can be done with nm-connection-editor, nmcli, or nmtui. • Note: nm-connection-editor is also present in Red Hat Enterprise Linux 6.
• system-config-firewall • Firewall rule management can be done with firewall-config (GUI) and firewall-cmd (CLI). • Note: system-config-firewall is still available as part of an alternative firewall solution for static-only environments along with iptables services.
Discontinued Network Drivers • Outlined in Section 4.4 of the RHEL 7.0 beta Release Notes (updated list provided below): • 3c574_cs, c589_cs, 3c59x, 8390, • acenic, amd8111e, at76c50x-usb, ath5k, axnet_cs, • b43, b43legacy, can-dev, cassini, cdc-phonet, cxgb, dl2k, • e100, ems_pci, ems_usb, fealnx, fmvj18x_cs, forcedeth, ixgb, kvaser_pci, • libertas, liberatas_cs, libertas_tf, libertas_tf_usb, mac80211_hwsim, • natsemi, ne2k-pci, niu, nmclan_cs, ns83820, • p54pci, p54usb, pcnet32, pcnet_cs, pppol2tp, r6040, • s2io, sc92031, sis190, sis900, sja1000, sja1000_platform, smc91c92_cs, • starfire, sundance, sungem, sungem_phy, sunhme, • tehuti, tlan, typhoon, usb8xxx, vcan, • via-rhine, via-velocity, vxge, xirc2ps_cs, zd1211rw