A high-performance, lock-free limit order book and matching engine in C++20. Designed for ultra-low latency with zero heap allocations on the hot path.
Orders sent: 500,000
Fills received: 250,000
~130 ns/order | ~7.7M orders/sec
| Component | Latency |
|---|---|
| Pool alloc | 8 cycles/op (44x faster than new) |
| Pool dealloc | 4 cycles/op (10x faster than delete) |
| Book add | 221 cycles/op |
| Book cancel | 151 cycles/op |
| E2E matching (p50) | 104,500 cycles |
| E2E matching (p99) | 371,678 cycles |
[Input Thread] ──SPSC──> [Matching Thread] ──SPSC──> [Output Thread]
core 0 core 1 core 2
- Single matching thread -- avoids contention on the book (same as real exchanges)
- Lock-free SPSC ring buffers between pipeline stages
- Each thread pinned to a dedicated CPU core via
pthread_setaffinity_np
- Fixed-point prices (
int64_tcents) -- no floating point on the hot path - Pool allocator with pre-allocated slab -- zero
malloc/freeduring matching alignas(64)Order struct -- exactly one cache line, prevents false sharing- Intrusive doubly-linked list per price level -- O(1) append and remove
std::mapwithstd::greaterfor bids -- O(1) best-bid viabegin()- Acquire/release memory ordering -- minimal synchronization overhead in SPSC queue
_mm_pausespin-wait -- reduces power and pipeline stalls on busy loopsmlockall-- prevents page faults on the hot path
include/
types.h Order, Fill, PriceLevel, Side, OrderType
spsc_queue.h Lock-free SPSC ring buffer (header-only template)
pool_allocator.h Fixed-size object pool (header-only template)
order_book.h Bid/ask book with intrusive linked lists
matching_engine.h Price-time priority matching (header-only template)
utils.h rdtsc, CPU pinning, mlock, huge pages
src/
main.cpp 3-thread pipeline demo
order_book.cpp OrderBook implementation
bench/
benchmark.cpp Per-component and end-to-end latency benchmarks
tests/
test_matching.cpp Correctness tests (8 test cases)
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)./order_book # 3-thread pipeline, prints throughput stats
./benchmark # per-component latency + p50/p99/p999
./test_matching # correctness tests8 correctness tests covering:
- SPSC queue push/pop, full/empty boundary conditions
- Pool allocator alloc/dealloc, exhaustion, memory reuse
- Order book add/cancel, multi-level best bid/ask
- Full fill, partial fill, price priority, time priority, no-match scenarios
-O3 -march=native -flto -fno-exceptions -fno-rtti