Saturday, November 15, 2014

Performance Beyond 10GbE

Today performance servers leverage 10 Gigabit Ethernet (10GbE) to fully utilize all the compute resources at their disposal. As Intel's latest release of the 18 core Haswell server chips hits the market system architects are beginning to consider network fabrics above 10GbE. Four potential choices exist, and they are: 20GbE, 25GbE, 40GbE & 100GbE.

Before diving into each of these options we should set some groundwork. Most performance I/O adapters these days are inserted into the motherboard in a third generation PCI Express (PCIe Gen3) slot that is 8 lanes wide. The theoretical performance of this slot is 64 gigbits/second (Gbps), but after encoding & overhead the effective data rate is more like 52Gbps. Also it should be noted that on Intel systems PCIe slots have a preference to CPU sockets. So data coming from a PCIe slot that is "wired" to "Socket 0" but is destined for a core on the CPU in "Socket 1" will see a measurable degradation in performance. Most applications will likely not care, but if performance is your speciality you should look into this. You see those bits have to travel a much longer path to reach that distant core. If you're really interested in achieving optimum performance you should evenly split your I/O across slots mapped to each CPU socket.

Beyond 10GbE the two currently approved standards which are 40GbE and 100GbE. Many of the NIC companies are already shipping products that support 40GbE, and most of the performance switch vendors support both 40GbE & 100GbE connections. The reluctance of the NIC companies to go beyond 40GbE is bound to the common 8 lane PCIe Gen3 slots that most NIC cards are installed into. As mentioned above the slots these cards go into supports roughly 52Gbps in each direction. So while a dual port 40G NIC can deliver up to 80Gbps by definition, the card can only bring data into the motherboard at 52Gbps so the card by definition is roughly 35% over subscribed. This is why we're not going to see any 100GbE NICs in existing servers. For 100GbE NIC companies will require a 16-lane PCIe Gen3 slot or a future 8-lane PCIe Gen4 slot, as both should sustain roughly 104Gbps. So you’ll have to wait for Intel’s next tock (major step forward) and the delivery of Skylake, the successor to Broadwell, for real 100GbE NIC systems to appear.      

So what about 20GbE, is this something to consider? Well 20GbE is something HP cooked up working with QLogic that they'd delivered as a product for their blade system. It never really gained any traction outside of that platform. Normally 20Gbps is simply achieved by bonding both ports on a dual port 10GbE adapter together. This can be done several ways, and is very common place. This will likely go no further as a hardware option.

Now 25GbE is a horse of a different color, and it is seeing some adoption, but mostly at the top of rack switch level. To better understand this 100GbE is actually four 25GbE lanes, so fracturing this into 25GbE is actually somewhat logical. Arista Networks, Google, Microsoft, Broadcom & Mellanox are all working the switch side of this. In September of 2014 Broadcom announced their StrataXGS Tomahawk chip, which supports 128 ports of 25GbE, 50 ports of 50GbE or 32 ports of 100GbE. So these switches are really close, and we may even see them at SC14 this week. In October Emulex joined the 25GbE Consortium so clearly there will soon be some NICs in this space. At this time no vendors have announced 25GbE NICs.

No comments:

Post a Comment