Boost ASIO 能否用于构建低延迟应用程序?
Can Boost ASIO be used to build low-latency applications?
Boost ASIO 能否用于构建低延迟应用程序,例如 HFT(高频交易)?
所以Boost.ASIO使用平台特定的最优解复用机制:IOCP、epoll、kqueue、poll_set、/dev/poll
也可以与 TOE(TCP/IP 卸载引擎)和 OpenOnload(内核旁路 BSD 套接字)一起使用以太网适配器。
但是可以使用Boost.ASIO + TOE + OpenOnload 构建低延迟应用程序吗?
几年前,我评估了 Boost Asio 在高频交易中的应用。据我所知,今天的基础知识仍然相同。以下是我决定不使用它的一些原因:
- Asio 依赖于
bind()
风格的回调。这里有一些开销。
- 如何安排某些低级操作在正确的时间或以正确的方式发生并不明显。
- 在需要优化的地方有很多复杂的代码。针对特定用例优化复杂的通用代码更加困难。认为您不需要查看幕后情况是错误的。
- HFT 应用程序几乎不需要可移植性。特别是,"automatic" 选择多路复用机制与任务背道而驰,因为必须分别测试和优化每种机制——这会增加工作量而不是减少工作量。
- 如果要使用第三方库,
libev
、libevent
和 libuv
等其他库更经得起考验,可以避免其中的一些缺点。
相关:C++ Socket Server - Unable to saturate CPU
这是 Asio 作者的建议,发布到 public SG-14 Google 组(不幸的是有问题,他们已经转移到另一个邮件列表系统):
I do work on ultra low latency financial markets systems. Like many
in the industry, I am unable to divulge project specifics. However, I
will attempt to answer your question.
In general:
At the lowest latencies you will find hardware based solutions.
Then: Vendor-specific kernel bypass APIs. For example where you encode and decode frames, or use a (partial) TCP/IP stack
implementation that does not follow the BSD socket API model.
And then: Vendor-supplied drop-in (i.e. LD_PRELOAD) kernel bypass libraries, which re-implement the BSD socket API in a way that is
transparent to the application.
Asio works very well with drop-in kernel bypass libraries. Using
these, Asio-based applications can implement standard financial
markets protocols, handle multiple concurrent connections, and expect
median 1/2 round trip latencies of ~2 usec, low jitter and high
message rates.
My advice to those using Asio for low latency work can be summarised
as: "Spin, pin, and drop-in".
Spin: Don't sleep. Don't context switch. Use io_service::poll()
instead of io_service::run(). Prefer single-threaded scheduling.
Disable locking and thread support. Disable power management. Disable
C-states. Disable interrupt coalescing.
Pin: Assign CPU affinity. Assign interrupt affinity. Assign memory to
NUMA nodes. Consider the physical location of NICs. Isolate cores from
general OS use. Use a system with a single physical CPU.
Drop-in: Choose NIC vendors based on performance and availability of
drop-in kernel bypass libraries. Use the kernel bypass library.
This advice is decoupled from the specific protocol implementation
being used. Thus, as a Beast user you could apply these techniques
right now, and if you did you would have an HTTP implementation with
~10 usec latency (N.B. number plucked from air, no actual benchmarking
performed). Of course, a specific protocol implementation should still
pay attention to things that may affect latency, such as encoding and
decoding efficiency, memory allocations, and so on.
As far as the low latency space is concerned, the main things missing
from Asio and the Networking TS are:
Batching datagram syscalls (i.e. sendmmsg, recvmmsg).
Certain socket options.
These are not included because they are (at present) OS-specific and
not part of POSIX. However, Asio and the Networking TS do provide an
escape hatch, in the form of the native_*() functions and the
"extensible" type requirements.
Cheers, Chris
Boost ASIO 能否用于构建低延迟应用程序,例如 HFT(高频交易)?
所以Boost.ASIO使用平台特定的最优解复用机制:IOCP、epoll、kqueue、poll_set、/dev/poll
也可以与 TOE(TCP/IP 卸载引擎)和 OpenOnload(内核旁路 BSD 套接字)一起使用以太网适配器。
但是可以使用Boost.ASIO + TOE + OpenOnload 构建低延迟应用程序吗?
几年前,我评估了 Boost Asio 在高频交易中的应用。据我所知,今天的基础知识仍然相同。以下是我决定不使用它的一些原因:
- Asio 依赖于
bind()
风格的回调。这里有一些开销。 - 如何安排某些低级操作在正确的时间或以正确的方式发生并不明显。
- 在需要优化的地方有很多复杂的代码。针对特定用例优化复杂的通用代码更加困难。认为您不需要查看幕后情况是错误的。
- HFT 应用程序几乎不需要可移植性。特别是,"automatic" 选择多路复用机制与任务背道而驰,因为必须分别测试和优化每种机制——这会增加工作量而不是减少工作量。
- 如果要使用第三方库,
libev
、libevent
和libuv
等其他库更经得起考验,可以避免其中的一些缺点。
相关:C++ Socket Server - Unable to saturate CPU
这是 Asio 作者的建议,发布到 public SG-14 Google 组(不幸的是有问题,他们已经转移到另一个邮件列表系统):
I do work on ultra low latency financial markets systems. Like many in the industry, I am unable to divulge project specifics. However, I will attempt to answer your question.
In general:
At the lowest latencies you will find hardware based solutions.
Then: Vendor-specific kernel bypass APIs. For example where you encode and decode frames, or use a (partial) TCP/IP stack implementation that does not follow the BSD socket API model.
And then: Vendor-supplied drop-in (i.e. LD_PRELOAD) kernel bypass libraries, which re-implement the BSD socket API in a way that is transparent to the application.
Asio works very well with drop-in kernel bypass libraries. Using these, Asio-based applications can implement standard financial markets protocols, handle multiple concurrent connections, and expect median 1/2 round trip latencies of ~2 usec, low jitter and high message rates.
My advice to those using Asio for low latency work can be summarised as: "Spin, pin, and drop-in".
Spin: Don't sleep. Don't context switch. Use io_service::poll() instead of io_service::run(). Prefer single-threaded scheduling. Disable locking and thread support. Disable power management. Disable C-states. Disable interrupt coalescing.
Pin: Assign CPU affinity. Assign interrupt affinity. Assign memory to NUMA nodes. Consider the physical location of NICs. Isolate cores from general OS use. Use a system with a single physical CPU.
Drop-in: Choose NIC vendors based on performance and availability of drop-in kernel bypass libraries. Use the kernel bypass library.
This advice is decoupled from the specific protocol implementation being used. Thus, as a Beast user you could apply these techniques right now, and if you did you would have an HTTP implementation with ~10 usec latency (N.B. number plucked from air, no actual benchmarking performed). Of course, a specific protocol implementation should still pay attention to things that may affect latency, such as encoding and decoding efficiency, memory allocations, and so on.
As far as the low latency space is concerned, the main things missing from Asio and the Networking TS are:
Batching datagram syscalls (i.e. sendmmsg, recvmmsg).
Certain socket options.
These are not included because they are (at present) OS-specific and not part of POSIX. However, Asio and the Networking TS do provide an escape hatch, in the form of the native_*() functions and the "extensible" type requirements.
Cheers, Chris