These days, computer networking is more than just pushing packets around. Now those packets are undergoing much more scrutiny and often being massaged along the way. Everything from encryption to video transcoding may happen along the way. Often this work would be done in a host processor requiring data to be moved in and out of a host. This isn’t bad for a single hop, but when multiple tasks are applied to a data stream, then the back and forth can generate lots of overhead.
Putting some intelligence into the network interface card (NIC) allows some tasks to be performed locally without involving a host processor. Xilinx’s U25 SmartNIC (Fig. 1) delivers more compute capabilities for handling these types of networking chores.
The U25 is based on Xilinx’s own Zynq system-on-chip (SoC). The SoC also include over 520K LUTs and a quad-core, Arm Cortex-A53 processor complex. The board adds 6 GB of DDR4 SDRAM and a pair of 10/25 Gigabit Ethernet (GbE) ports using SFP28 connections. There’s also a pair of x8 PCI Express Gen 3 ports.
The process complex acts more as a traffic cop, enabling the FPGA fabric to handle the heavy lifting. Its bump-in-the-wire acceleration can handle tasks such as encryption, video transcoding, and even machine-learning algorithms. The chip can also do the regular deep packet inspection (DPI) and security processing that are often found in other intelligent NICs. It can even provide NVM Express (NVMe) support to a host.
Xilinx provides support that makes it easy to utilize third-party or custom FPGA IP on the U25 SmartNIC. This has been the typical approach for using FPGAs in the enterprise and data center. The FPGA delivers the flexibility and performance, but developers provide the packaged functionality that network managers can program onto their cards.
Hardware acceleration helps a lot, but Xilinx’s Onload technology takes things a bit further. It links the adapter directly to the application, bypassing the operating system (Fig. 2). It can improve TCP throughput up to 400% and reduce latency by 80%. All this while cutting latency and reducing jitter to almost zero.
Onload software has binary-compatible, industry-standard interfaces. This allows Onload to work with existing applications without any changes. Much of the performance improvement is accomplished by limiting data movement and context switches that are more common on a host than an FPGA fabric.
The PCIe card form factor is useful in most cases; however, Xilinx has a solution for the Open Compute Project (OCP) based systems. The X2562 (Fig. 3) also has a pair of 10/25-GbE ports. The low-latency, smart NIC is compatible with Windows, Linux, and VMware. It has its own secure firmware upgrade support as well as Onload support.