Friday, December 4, 2015

99.99999% Available + 2.7us = 1 Awesome Computer

What do you get when you put together a pair of dual socket servers running in hardware lock-step with a pair of leading edge, ultra-low latency OS Bypass network adapters all running RedHat Enterprise Linux? One awesome 24 core system that boasts 99.99999% uptime, zero jitter, 2.7 microseconds of 1/2 round trip UDP latency, and 2.9 microseconds for TCP <drop the Mic>.

How is this possible? First we’ll cover what Stratus Technologies has done with Lock-Step, and how it makes the ftServer dramatically different than all others. Then we’ll explain what jitter is, and why removing it is so critical for deterministic systems like financial trading. Finally we’ll cover these impressive Solarflare ultra-low latency numbers, and what they really mean.

We’ve all bought something with a credit card, flown through Chicago O’hare, used public utilities, and possibly even called 9-1-1. What you don’t know is that very often at the heart of each of these systems is a Stratus server. Stratus should adopt the old Timex slogan “It takes a licking, and keeps on ticking” because that’s what it means to provide 99.99999% up time, you’re allowed three seconds a year for unplanned outages. Three seconds is how long it takes me to say “99.99999% up time.” How is this possible? Imagine running a three legged race with a friend. Ideally if you each compared your actions continuously with every step you could run the race at the pace of the slowest of the two of you. This is the key concept behind Lock-Step, comparing, then knowing what to do as one starts to stumble to ensure the team continues moving forward no matter what happens. Stratus leverages the latest 12-core Intel Haswell E5-2670v3 server processors with support for up to 512GB of DDR4. If any hardware component in the server fails, the system as a whole continues moving forward, alerts an admin who then replaces the failed component, then that subsystem is brought back online. I challenge you to find another computer in your life that has ever offered that level of availability over the typical 5-7 year life-cycle that Stratus servers often see.

So what is Jitter? When a computer core becomes distracted from doing it’s primary task to go off and do some routine house keeping (operating system or hardware driven), the impact of that temporary distraction is known as Jitter. With normal computing tasks Jitter is hardly noticeable, it’s the computer equivalent of background noise. With certain VERY time critical computing tasks though, like say financial trading, even one Jitter event could be devastating. Suppose your server's primary function is financial trading, and it receives a signal from market A that someone wants to buy IBM at $100, and on market B it sees a second signal that another entity wishes to sell IBM at $99. So the trading algorithm on your server buys the stock on B for $99, but then the instant it has confirmation of your purchase a thermal sensor in your server generates an interrupt. The CPU then that is running your trading algorithm goes off to service that interrupt which results in it running some code to determine which fan to turn on. Eventually, say a millisecond or so later, control is returned to your trading algorithm, but by then the buyer on market A is gone, and the new price of IBM has fallen to $99. That’s the impact of Jitter, brief often totally random moments in the trading day stolen to do basic house keeping. These stolen moments can quickly add up for traders, and for exchanges they can be devastating. Imagine a delayed order as a result of Jitter missing an opportunity! Stratus Technologies has crawled through their server architecture, and eliminated all potential sources of Jitter. Traders & exchanges using other platforms have to do all this by hand, and this is still as much art as it is science. That's one reason why over 1,400 different customers regularly depend on Solarflare.

Finally, there’s ultra-low latency networking via generic TCP/IP and UDP networking. In the diagram below network latency is in blue. Market data arrives via UDP and orders are placed through the more reliable TCP/IP protocol. Here is a quick anatomy of part of the trading process showing one UDP receive and one TCP send. There are other components, but this is a distilled example.


Initially the packet is received in from the wire, the light blue block, and the packet passes through the physical interface, electrical networking signals are converted to layer-2 logical bits. From there the packet is passed to the on-chip layer-2 switch which steers the packet to one of 2,048 virtualized NICs (vNIC) instances, also on the chip. The VNIC then uses DMA to transfer the packet into system memory, all of which takes 500 nanoseconds. The packet has now left the network adapter and is on its way to a communications stack somewhere in system memory, the dark blue box. Here is where Solarflare shines. In the top timeline the dark blue box represents their host kernel device driver and the Linux communications stack. Solarflare's kernel device driver is arguably one of the fastest in the industry, but most of this dark blue box is time spent working with the kernel. There are CPU task switches, and several memory copies of the packet, as it moves through the system, and thousands of CPU instructions are executed, all told this can be nearly 3,000 nanoseconds. In the bottom timeline the packet is DMA'd directly into user-space where Solarflare's very tight user space stack sits. This is where the packet is quickly processed and handed off to the end user application via the traditional sockets interface. All without additional data copies, and CPU task switches, and completed in just under 1,000 nanoseconds a savings of about 2,000 nanoseconds or roughly 4,600 CPU instructions for this processor at this speed. All this, and we've just received a packet into our application, represented by the green blocks. 

So in the two bars above the first represents market data coming in via Solarflare's generic kernel device driver then going through the normal Linux stack until the packet is handed off to the application. The response packet, in this case a trade via TCP, is sent back through the stack to the network adapter and eventually put on the wire, all told just over 9,000 nanoseconds. With Stratus & Solarflare the second bar shows the latency of the same transaction, but traveling through Solarflare's OS Bypass stack in both directions, the difference here is that the transaction hits the exchange over 4,000 nanoseconds sooner. This means you can trade at nearly twice the speed, a true competitive advantage. Now four millionths of a second isn't something humans can easily grasp, so let's jump to light speed, this is how long it takes a photon of light to cover nearly a mile. 

So if you're looking to build a financial trading system with ultra-high availability, zero jitter & extreme network performance, you have only one choice Stratus's new ftServer

No comments:

Post a Comment