The Blog of James: How can latency be reduced? (Poll)

Is it possible to reduce latency? The answer is probably “yes” for just about everyone reading this article. How? The latest poll at Low-Latency.com asks that very question.


Can latency be reduced through software only? Or is hardware necessary? Or perhaps a prudent combination of the two? Maybe it’s too complicated to address the question with such discreet answers. Let us know your opinion at Low-Latency.com.

There is definitely a trend - in some circles - to use hardware solutions. So there must be a reason! Meanwhile, not everyone has given up on improving software - and what does that say about the quality of software in our industry?

It used to be (in my lifetime) that computer programmers agonized for hours and days over how to improve an algorithm or function to be as efficient that it could be - both in memory (bandwidth) and processing power. That has tended to take second - or even third - concern to speed to market and reduction in operating costs.

Perhaps we have come full circle where again we must save money, be agile, AND have the most efficient algorithms? How do we, as an industry, reconcile these competing demands given our legacy infrastructure and current staff? Is better hardware the only resonable solution?

AddThis Social Bookmark Button AddThis Feed Button

4 Responses to “The Blog of James: How can latency be reduced? (Poll)”

  1. Steven McCoy Says:

    The obvious retort to a hardware-only solution is the corporate desire to lock-in the client to guarantee a revenue stream. Single vendor proprietary solutions are what many companies are looking to get away from, hence the collaborated effort on open cross-platform messaging systems like AMQP.

    Aside from low-latencies there is also a front on bandwidth for storage systems, 10 GigE versus Fibre Channel. The overheads the 10 GigE teams are seeing: Chelsio, BNT, Teak, etc, have been for a significant period interrupt rates exceeding core processing speeds. Jumbo frames were introduced to lower interrupt rates, you can see this on SOHO NAS appliances jumping from 20 MB/s to 40 MB/s on very modest cores. RDMA and ToE seek to completely bypass O/S kernels and the kernel to user-space context switching. However the benefit is only short term, the disadvantages to reduced networking functionality could be very severe in the long term. Linus has stated his preference is against mainstreaming ToE networking:

    http://www.linux-foundation.org/en/Net:TOE

    The future is no doubt a cloud combination of different approaches. Core speed increases allow processing of more interrupts, increased core count could mean dedicated, but generic, network cores instead of vendor proprietary modules. This could allow 10 GigE on more equipment but could push tomorrows 40 and 100 GigE to more customized hardware.

    Software tuning can only solve minor percentages of performance, brand new algorithms and ideas might only appear from research into specific fields like Google’s farm of divide-and-conquer processing that require significantly larger deployments than typical financial institution data centres to run efficiently or make returns on research.

  2. Geno Valente Says:

    I think the current poll results are right on (about 50% saying BOTH - HW and SW tuning). A combination of both hardware acceleration and software improvements is what many people are doing to lower-latency and increase total message throughput from what we see. If someone can achieve what they need with just moving from dual to quad cores or a C-code rewrite, then that is usually the easiest. For those people that are really pushing the envelope, then that is where we are seeing FPGA technology come into play.

    With the push of APIs (Torenzza and QuickAssist) then the “single vendor” lock in becomes a lot less of the a problem and allows you to build your own low-latency system with CPU+FPGA pretty quickly. Your code can be retargeted to different platforms and accelerators pretty simply in the future so the vendor “lock in” fear is minimized.

    Still many customers want to just purchase a low-latency market feed solution, make money, and focus on the algo only. This is where experts like Activ and others come in as they know hardware and software design and give you a great solution out of the box.

    The third solution is to hire a hardware expert to help you build your own. Companies like SLE (www.siliconlogic.com ) and STS (www.ststech.com) help our customers all the time build exactly what they want, make HW/SW trade-offs, then in some cases help them learn how to take over hardware design and add more functionality on their own when they need it.

    Regardless of how you solve you problem, I think people are realizing that 8, 16, 32 cores isn’t going to solve all the their problems anymore.

  3. Martin Sustrik Says:

    Although HW solutions may reduce the latency, my feeling is that investing into niche HW solutions can be quite risky. Stock trading business is not that big market and when recession hits, providers of dedicated hardware will suffer the first and the most painful blow. Thus the problem of vendor-lock-in transforms into even more fearsome problem of dead-vendor-lock-in. The history’s lesson is that generic solutions - whether hardware or software - are those to survive for a long time.

  4. Bob Van Valzah Says:

    Martin has a good point. Any company lashing FPGAs into a box to reduce latency is chasing a small market. They have to achieve profitability before they run out of seed money or go bust. But there are a few things you can depend on in the future without needing FPGAs.

    We will have fast socket-based interfaces to network hardware. That socket call may result in a context switch to a kernel or be trapped by a kernel-bypass library, but sockets are clearly here to stay and they’ll only get faster. The drive for faster sockets comes not just from the financial world, but also from many other sectors. For example, anybody with a 4- or 8-core box is well aware that GigE may not be fast enough to keep all those cores fed with work.

    We will have fast network equipment that can make wire-speed filtering decisions based on packet headers. For multicast, packets can also be copied at wire speed. Cisco doesn’t use FPGAs, they use ASICs. They’re much less expensive and a much better tool for the job if you know that all filtering and copying decisions can be made on the packet header without having to inspect the payload. Again, there’s a huge demand for this outside of finance so high-performance switches and routers will be available at reasonable prices even if one vendor goes under.

    These observations suggest that it’s best to build systems that only depend on fast sockets and fast header-based forwarding and copying decisions. That’s our strategy.

Leave a Comment

You are not logged-in