July 8th, 2024

C++ patterns for low-latency applications including high-frequency trading

Research paper explores C++ Design Patterns for low-latency applications, focusing on high-frequency trading. Introduces Low-Latency Programming Repository, optimizes trading strategy, and implements Disruptor pattern for performance gains. Aimed at enhancing latency-sensitive applications.

Read original articleLink Icon
C++ patterns for low-latency applications including high-frequency trading

The research paper titled "C++ Design Patterns for Low-latency Applications Including High-frequency Trading" by Paul Bilokon and Burak Gunduz addresses the optimization of latency-critical code, particularly focusing on high-frequency trading systems. The study introduces a Low-Latency Programming Repository, optimizes a market-neutral statistical arbitrage pairs trading strategy, and implements the Disruptor pattern in C++. The repository acts as a practical guide with statistical benchmarking, while the trading strategy enhancements resulted in improved speed and profitability. The Disruptor pattern demonstrated notable performance enhancements compared to traditional queuing methods, with evaluation metrics including speed, cache utilization, and statistical significance. Techniques like Cache Warming and Constexpr showed significant gains in reducing latency. Future directions involve expanding the repository, testing the optimized trading algorithm in live environments, and integrating the Disruptor pattern for comprehensive system benchmarking. The work targets academics and industry professionals aiming to enhance performance in latency-sensitive applications.

Related

Show HN: High-frequency trading and market-making backtesting tool with examples

Show HN: High-frequency trading and market-making backtesting tool with examples

The GitHub URL leads to the "HftBacktest" project, a Rust framework for high-frequency trading. It offers detailed simulation, order book reconstruction, latency considerations, multi-asset backtesting, and live trading bot deployment.

Optimizing the Roc parser/compiler with data-oriented design

Optimizing the Roc parser/compiler with data-oriented design

The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.

Four lines of code it was four lines of code

Four lines of code it was four lines of code

The programmer resolved a CPU utilization issue by removing unnecessary Unix domain socket code from a TCP and TLS service handler. This debugging process emphasized meticulous code review and system interaction understanding.

Properly Testing Concurrent Data Structures

Properly Testing Concurrent Data Structures

The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.

Understanding Software Dynamics [book review]

Understanding Software Dynamics [book review]

The book "Understanding Software Dynamics" by Richard L. Sites, Addison-Wesley 2022, explores software performance analysis, emphasizing measurement techniques, KUTrace toolchain design, and practical performance analysis examples. Sites' approach benefits programmers, SREs, and system designers, offering insights for optimizing software services.

Link Icon 13 comments
By @nickelpro - 6 months
Fairly trivial base introduction to the subject.

In my experience teaching undergrads they mostly get this stuff already. Their CompArch class has taught them the basics of branch prediction, cache coherence, and instruction caches; the trivial elements of performance.

I'm somewhat surprised the piece doesn't deal at all with a classic performance killer, false sharing, although it seems mostly concerned with single-threaded latency. The total lack of "free" optimization tricks like fat LTO, PGO, or even the standardized hinting attributes ([[likely]], [[unlikely]]) for optimizing icache layout was also surprising.

Neither this piece, nor my undergraduates, deal with the more nitty-gritty elements of performance. These mostly get into the usage specifics of particular IO APIs, synchronization primitives, IPC mechanisms, and some of the more esoteric compiler builtins.

Besides all that, what the nascent low-latency programmer almost always lacks, and the hardest thing to instill in them, is a certain paranoia. A genuine fear, hate, and anger, towards unnecessary allocations, copies, and other performance killers. A creeping feeling that causes them to compulsively run the benchmarks through callgrind looking for calls into the object cache that miss and go to an allocator in the middle of the hot loop.

I think a formative moment for me was when I was writing a low-latency server and I realized that constructing a vector I/O operation ended up being overall slower than just copying the small objects I was dealing with into a contiguous buffer and performing a single write. There's no such thing as a free copy, and that includes fat pointers.

By @twic - 6 months
My emphasis:

> The output of this test is a test statistic (t-statistic) and an associated p-value. The t-statistic, also known as the score, is the result of the unit-root test on the residuals. A more negative t-statistic suggests that the residuals are more likely to be stationary. The p-value provides a measure of the probability that the null hypothesis of the test (no cointegration) is true. The results of your test yielded a p-value of approximately 0.0149 and a t-statistic of -3.7684.

I think they used an LLM to write this bit.

It's also a really weird example. They look at correlation of once-a-day close prices over five years, and then write code to calculate the spread with 65 microsecond latency. That doesn't actually make any sense as something to do. And you wouldn't be calculating statistics on the spread in your inner loop. And 65 microseconds is far too slow for an inner loop. I suppose the point is just to exercise some optimisation techniques - but this is a rather unrepresentative thing to optimise!

By @sneilan1 - 6 months
I've got an implementation of a stock exchange that uses the LMAX disruptor pattern in C++ https://github.com/sneilan/stock-exchange

And a basic implementation of the LMAX disruptor as a couple C++ files https://github.com/sneilan/lmax-disruptor-tutorial

I've been looking to rebuild this in rust however. I reached the point where I implemented my own websocket protocol, authentication system, SSL etc. Then I realized that memory management and dependencies are a lot easier in rust. Especially for a one man software project.

By @jeffreygoesto - 6 months
By @winternewt - 6 months
I made a C++ logging library [1] that has many similarities to the LMAX disruptor. It appears to have found some use among the HFT community.

The original intent was to enable highly detailed logging without performance degradation for "post-mortem" debugging in production environments. I had coworkers who would refuse to include logging of certain important information for troubleshooting, because they were scared that it would impact performance. This put an end to that argument.

[1] https://github.com/mattiasflodin/reckless

By @munificent - 6 months
> The noted efficiency in compile-time dispatch is due to decisions about function calls being made during the compilation phase. By bypassing the decision-making overhead present in runtime dispatch, programs can execute more swiftly, thus boosting performance.

The other benefit with compile-time dispatch is that when the compiler can statically determine which function is being called, it may be able to inline the called function's code directly at the callsite. That eliminates all of the function call overhead and may also enable further optimizations (dead code elimination, constant propagation, etc.).

By @globular-toast - 6 months
Is there any good reason for high-frequency trading to exist? People often complain about bitcoin wasting energy, but oddly this gets a free pass despite this being a definite net negative to society as far as I can tell.
By @astromaniak - 6 months
Just in case you are a pro developer, the whole thing is worth looking at:

https://github.com/CppCon/CppCon2017/tree/master/Presentatio...

and up

By @ykonstant - 6 months
I am curious: why does this field use/used C++ instead of C for the logic? What benefits does C++ have over C in the domain? I am proficient in C/assembly but completely ignorant of the practices in HFT so please go easy on the explanations!
By @ibeff - 6 months
The structure and tone of this text reeks of LLM.
By @poulpy123 - 6 months
the irony being that if something should not be high frequency, it is trading
By @apantel - 6 months
Anyone know of resources like this for Java?
By @gedanziger - 6 months
Very cool intro to the subject!