Skip to content

vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape#877

Draft
enghitalo wants to merge 23 commits into
MDA2AV:mainfrom
enghitalo:perf/vanilla-json-comp-cache-route-slice
Draft

vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape#877
enghitalo wants to merge 23 commits into
MDA2AV:mainfrom
enghitalo:perf/vanilla-json-comp-cache-route-slice

Conversation

@enghitalo

@enghitalo enghitalo commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Allocation/CPU-reduction and large-I/O follow-ups to the zero-alloc handler (#866), in frameworks/vanilla-epoll/main.v (+ vanilla-io_uring where noted) — consolidated into one PR.

Allocation / GC-pressure (both backends)

  1. json-comp gzip cache — the response for a given (count, m) is deterministic and gzip CPU dominates, so the complete gzipped response is cached per (count, m) (bounded map + RwMutex) and appended on a hit. Compress once, reuse.
  2. Zero-alloc routing — route on the path before ? via a tos() view into the request buffer instead of all_before('?')'s per-request copy.
  3. Precomputed query keysqint/qstr take a precomputed const []u8 (qk_*) instead of key.bytes() per call; path ints (/json/<n>, /crud/items/<id>) parsed in place (parse_u_at) instead of route[n..] substrings.
  4. DB: prepared statement — async-db uses PQprepare/PQexecPrepared (lazily prepared per pooled connection) instead of exec_param_many's per-request server-side SQL re-parse.
  5. DB: single-pass HTML escapeescape_html (fortunes) does one pass with a no-alloc fast path instead of replace_each's five full-string passes.
  6. Hot-path byte-write armor — replace single-element out << b in the JSON paths with push_many/indexed writes (wi itoa into a stack scratch + one push_many; fused object }, separators). Single-element << is 4–7× slower on post-0.5.1 V (vlang/v#27468); this keeps the hot path fast across V versions. Output is byte-identical (verified across counts 0..4096 + edge ints).

Large I/O (epoll only; io_uring is a follow-up)

  1. /upload answered by Content-Length (streaming drain) — the engine now drains large request bodies into a fixed buffer (recv + discard) instead of buffering the whole 20 MB into a per-conn read buffer (which grew into a big scanned GC block → 64-core stop-the-world cliff). The handler answers from req.content_length() (lib change merged as epoll: stream (drain) large request bodies instead of buffering them — fix the upload cliff enghitalo/vanilla#31), falling back to body.len for small/buffered bodies (crud/json POSTs unaffected — only >1 MiB drains). Local (vs the upload leader zix): single-conn 45 req/s / 895 MB/s (zix 45 / 903); 32-parallel 303 req/s / 6.1 GB/s (zix 336 / 6.7) — within ~10%, the 86× gap is closed; server RSS 12 MB (was ~928 MB while buffering).
  2. Static assets via sendfile(2) — preload each asset's O_RDONLY fd + a precomputed response head; out << f.header then core.queue_file(f.fd, 0, f.size) streams the body zero-copy from the page cache, so the write buffer never grows into a large scanned block (the static "cliff"). Local vendor.js (307 KB, 64 conns): 25.7K → 59.3K req/s (2.3×), 7.36 → 16.97 GB/s; md5-identical, keep-alive OK.

Build

  1. Pin V 0.5.1, built from source (pinned-vc bootstrap vlang/vc@f461dfeb) — reproducible builds, and avoids the post-0.5.1 master << regression (#27468) that otherwise slows every path. Matches the-benchmarker pin (v: pin compiler to V 0.5.1 (from source); keep vanilla_io_uring but skip it in CI the-benchmarker/web-frameworks#9466).

Arena results (vanilla-epoll, 64-core, Δ vs #866's zero-alloc baseline)

Test Conn RPS Δ RPS Δ Mem
upload 32 11,109 +36,930% −89.5%
upload 256 12,660 +22,507% −85.4%
static 6800 1,168,588 +460.2% −90.1%
static 4096 1,142,779 +329.4% −89.9%
static 1024 1,075,725 +238.1% −87.1%
pipelined 512 35,233,232 +1,380.6% −9.4%
pipelined 4096 34,680,659 +1,406.7% −3.2%
json-comp 4096 844,718 +1,380.1% −0.5%
json-comp 16384 878,355 +1,142.3% −7.5%
json-comp 512 496,559 +914.7% −33.3%
json 4096 891,795 +84.0% +2.3%
baseline 512 650,591 +39.7% −4.2%
baseline 4096 621,002 +44.7% −2.6%
crud 4096 155,943 +30.4% +8.1%
fortunes 1024 3,894 +46.7% −57.6%
limited-conn 4096 370,133 +22.0% −2.2%
api-16 1024 21,254 +7.6% −1.1%
async-db 1024 11,094 +2.1% −4.0%
api-4 256 33,407 +1.7% −4.2%

Highlights: upload now leads the profile — the streaming drain (#7) took it from ~30 to 11,109 req/s, past the prior leader (~4.3K), at −89% memory. static +460% at −90% memory (sendfile(2), #8 — the buffer-growth cliff is gone). json-comp leads (+1,380%, gzip-response cache). pipelined +1,407% (zero-alloc routing + <<-armor). The DB-bound profiles (api-4/16, async-db) are flat by design — #877 doesn't touch the DB engine; closing those is the async DB driver's job (below).

Lib dependencies (all merged to enghitalo/vanilla main)

The DB-bound profiles stay flat because they're driver-bound (db.pg's synchronous text protocol; veb on the same stack is slower, so vanilla already maxes it). Closing that gap is the async, epoll-integrated DB driver: the async runtime (ac.watch(fd, …) + continuations) is now merged to enghitalo/vanilla main (#34#36#37, Linux epoll + macOS kqueue), and a measured PoC showed ~6× per-thread on the DB path. The remaining piece — a thin db_async consumer on that runtime — is tracked in enghitalo/vanilla#32. Correctness verified for all paths here.

Supersedes #878 (folded in here).

🤖 Generated with Claude Code

json-comp recompressed the gzip body on EVERY request even though the output
for a given (count, m) is fully deterministic — and gzip CPU, not allocation,
dominates that profile. Cache the COMPLETE gzipped response per (count, m) and
append the cached copy on a hit (bounded map, RwMutex). The benchmark hits only
a handful of (count, m) pairs, so the cache stays tiny.

Also route on the path WITHOUT allocating: a tos() view into the request buffer
instead of all_before('?')'s per-request string copy (one alloc per request on
the hot path), shaving GC churn off baseline/json too.

Local before/after (16-core loopback, gcannon, single listener):
  json-comp  58K -> 390K req/s  (+570%, 6.7x)

Correctness verified: gzip body decodes to the right items/count/total; the
cached response is byte-identical across requests; all other routes unchanged.
Applies to both the epoll and io_uring variants.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll --save

@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-io_uring --save

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

enghitalo and others added 2 commits June 15, 2026 09:31
Remove the remaining small per-request allocations on the hot path:
  • qint/qstr took a `string` key and called `key.bytes()` every request (one
    []u8 alloc per parameter — baseline parses a+b, async-db min+max+limit…).
    Keys are now precomputed `const []u8` (qk_*), built once at init.
  • /json/<n> and /crud/items/<id> parsed the id via route[n..].i64(), a
    substring copy. parse_u_at() reads the digits straight from the path view.

Local before/after (16-core loopback) is within noise (baseline ~528K→530K,
json ~206K→212K) — these allocs are tiny next to the response builder MDA2AV#866
removed — but allocation scaled hard on the 64-core arena (json +322% there),
so this trims more GC churn for that environment at zero cost. Note: @[manualfree]
is a no-op under the GC build the arena uses (`v -prod` = Boehm GC; manualfree
only affects -autofree), so reducing allocations is the lever, not manualfree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 691,385 1430.3% 73MiB +48.5% +1.4%
baseline 4096 641,367 1422.2% 192MiB +49.4% -0.5%
pipelined 512 34,861,446 6533.4% 50MiB +1364.9% -5.7%
pipelined 4096 34,907,987 6679.4% 177MiB +1416.6% -5.3%
limited-conn 512 246,993 800.9% 61MiB +13.9% -1.6%
limited-conn 4096 351,525 1094.5% 170MiB +15.8% -5.0%
json 4096 829,499 3030.2% 184MiB +71.1% +4.0%
json-comp 512 626,441 1841.6% 69MiB +1180.1% -20.7%
json-comp 4096 817,306 2363.9% 196MiB +1332.1% +5.9%
json-comp 16384 857,291 2255.6% 640MiB +1112.5% -9.1%
upload 32 50 155.1% 851MiB +66.7% +0.8%
upload 256 64 157.8% 1.6GiB +14.3% +14.3%
api-4 256 31,193 343.4% 72MiB -5.1% +1.4%
api-16 1024 21,370 412.9% 96MiB +8.2% +3.2%
static 1024 324,639 6517.5% 735MiB +2.0% -1.2%
static 4096 269,324 6311.9% 2.8GiB +1.2% ~0%
static 6800 234,662 5927.6% 4.5GiB +12.5% ~0%
async-db 1024 10,385 350.4% 100MiB -4.4% ~0%
crud 4096 151,120 1095.6% 385MiB +26.4% +0.8%
fortunes 1024 2,538 462.8% 144MiB -4.4% ~0%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.87ms   15.90ms   68.50ms   209.50ms   349.90ms

  2123944 requests in 15.00s, 2123946 responses
  Throughput: 141.57K req/s
  Bandwidth:  44.34MB/s
  Status codes: 2xx=2123946, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2123946 / 2123946 responses (100.0%)
  Reconnects: 8648
  Per-template: 102356,107229,106639,107585,108745,109607,108012,108333,110420,107164,108962,110297,112849,110516,108837,106943,102005,92179,93952,101316
  Per-template-ok: 102356,107229,106639,107585,108745,109607,108012,108333,110420,107164,108962,110297,112849,110516,108837,106943,102005,92179,93952,101316
[info] CPU 1074.1% | Mem 380MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.50ms   15.60ms   68.10ms   206.30ms   341.90ms

  2150552 requests in 15.00s, 2150552 responses
  Throughput: 143.35K req/s
  Bandwidth:  45.05MB/s
  Status codes: 2xx=2150552, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2150547 / 2150552 responses (100.0%)
  Reconnects: 8792
  Per-template: 104423,106441,107804,108430,107574,108834,110222,110767,110824,109433,112401,112525,110592,108865,108727,109744,105989,96021,98867,102064
  Per-template-ok: 104423,106441,107804,108430,107574,108834,110222,110767,110824,109433,112401,112525,110592,108865,108727,109744,105989,96021,98867,102064
[info] CPU 1150.6% | Mem 384MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   27.02ms   12.80ms   68.50ms   216.00ms   359.00ms

  2267127 requests in 15.00s, 2266807 responses
  Throughput: 151.09K req/s
  Bandwidth:  47.56MB/s
  Status codes: 2xx=2266807, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2266807 / 2266807 responses (100.0%)
  Reconnects: 9379
  Per-template: 109722,113876,113817,115498,118556,118661,119059,118259,117391,117556,118142,117400,115900,114695,114129,112793,111259,100633,97843,101618
  Per-template-ok: 109722,113876,113817,115498,118556,118661,119059,118259,117391,117556,118142,117400,115900,114695,114129,112793,111259,100633,97843,101618
[info] CPU 1095.6% | Mem 385MiB

=== Best: 151120 req/s (CPU: 1095.6%, Mem: 385MiB) ===
[info] input BW: 12.97MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   391.62ms   326.00ms   802.60ms    1.26s    1.92s

  12391 requests in 5.00s, 12391 responses
  Throughput: 2.48K req/s
  Bandwidth:  58.74MB/s
  Status codes: 2xx=12391, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 12391 / 12391 responses (100.0%)
[info] CPU 425.4% | Mem 144MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   382.01ms   316.90ms   769.60ms    1.31s    1.72s

  12694 requests in 5.00s, 12694 responses
  Throughput: 2.54K req/s
  Bandwidth:  60.17MB/s
  Status codes: 2xx=12694, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 12694 / 12694 responses (100.0%)
[info] CPU 462.8% | Mem 144MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   382.68ms   327.30ms   768.80ms    1.22s    1.67s

  12674 requests in 5.00s, 12674 responses
  Throughput: 2.53K req/s
  Bandwidth:  60.08MB/s
  Status codes: 2xx=12674, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 12674 / 12674 responses (100.0%)
[info] CPU 467.2% | Mem 144MiB

=== Best: 2538 req/s (CPU: 462.8%, Mem: 144MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

…scape

Folds the DB-path work into this PR so everything lands together:
  • async-db uses a PostgreSQL prepared statement (PQprepare/PQexecPrepared via
    db.pg, lazily prepared per pooled connection) instead of exec_param_many's
    per-request server-side SQL re-parse — local +9%.
  • escape_html (fortunes) does ONE pass with a no-alloc fast path instead of
    replace_each's five full-string passes — local +27% fortunes.

DB profiles remain bound by the stdlib db.pg driver (text protocol), so this
narrows the gap without closing it. Both backends.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enghitalo enghitalo changed the title vanilla: cache json-comp gzip responses + zero-alloc routing vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-io_uring | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 647,072 1463.5% 1.1GiB NEW NEW
baseline 4096 703,870 1407.3% 1.1GiB NEW NEW
pipelined 512 36,003,929 6638.8% 975MiB NEW NEW
pipelined 4096 37,347,977 6334.6% 1.1GiB NEW NEW
limited-conn 512 347,829 1141.5% 1.2GiB NEW NEW
limited-conn 4096 420,182 1080.9% 1.3GiB NEW NEW
json 4096 955,599 2797.5% 1.3GiB NEW NEW
json-comp 512 880,396 2582.2% 1.2GiB NEW NEW
json-comp 4096 990,800 2357.9% 1.3GiB NEW NEW
json-comp 16384 919,244 1977.0% 1.6GiB NEW NEW
upload 32 42 125.0% 2.5GiB NEW NEW
upload 256 53 137.3% 3.0GiB NEW NEW
api-4 256 35,036 326.9% 1.2GiB NEW NEW
api-16 1024 22,284 558.7% 1.2GiB NEW NEW
static 1024 242,200 5663.9% 1.5GiB NEW NEW
static 4096 15,272 312.6% 2.3GiB NEW NEW
static 6800 11,767 221.4% 2.3GiB NEW NEW
async-db 1024 10,316 515.4% 1.3GiB NEW NEW
crud 4096 140,552 1053.9% 1.4GiB NEW NEW
fortunes 1024 3,005 546.2% 1.3GiB NEW NEW
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   39.29ms   18.50ms   85.40ms   256.20ms    2.34s

  1558109 requests in 15.00s, 1556445 responses
  Throughput: 103.75K req/s
  Bandwidth:  32.51MB/s
  Status codes: 2xx=1556445, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1556445 / 1556445 responses (100.0%)
  Reconnects: 5757
  Per-template: 75016,77481,79475,82505,78489,77556,79644,80415,82010,83121,80384,77613,78401,78382,78640,78521,74355,68226,71653,74558
  Per-template-ok: 75016,77481,79475,82505,78489,77556,79644,80415,82010,83121,80384,77613,78401,78382,78640,78521,74355,68226,71653,74558
[info] CPU 868.1% | Mem 1.3GiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   29.71ms   13.60ms   78.10ms   242.90ms   378.50ms

  2061953 requests in 15.00s, 2061761 responses
  Throughput: 137.43K req/s
  Bandwidth:  43.07MB/s
  Status codes: 2xx=2061761, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2061760 / 2061761 responses (100.0%)
  Reconnects: 8421
  Per-template: 100554,99694,104048,106050,107381,105455,104318,104801,102312,103708,103770,106445,108125,108873,106389,107476,100108,87328,93850,101075
  Per-template-ok: 100554,99694,104048,106050,107381,105455,104318,104801,102312,103708,103770,106445,108125,108873,106389,107476,100108,87328,93850,101075
[info] CPU 1094.4% | Mem 1.4GiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   29.06ms   12.40ms   78.00ms   239.80ms   384.00ms

  2108291 requests in 15.00s, 2108291 responses
  Throughput: 140.53K req/s
  Bandwidth:  44.01MB/s
  Status codes: 2xx=2108291, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2108290 / 2108291 responses (100.0%)
  Reconnects: 8578
  Per-template: 102864,104902,109917,112326,107508,107899,106518,108780,109341,106407,107645,105126,105749,106288,106395,108554,102387,92251,96817,100616
  Per-template-ok: 102864,104902,109917,112326,107508,107899,106518,108780,109341,106407,107645,105126,105749,106288,106395,108554,102387,92251,96817,100616
[info] CPU 1053.9% | Mem 1.4GiB

=== Best: 140552 req/s (CPU: 1053.9%, Mem: 1.4GiB) ===
[info] input BW: 12.06MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-io_uring.json
httparena-bench-vanilla-io_uring
httparena-bench-vanilla-io_uring

==============================================
=== vanilla-io_uring / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   353.80ms   318.10ms   692.60ms    1.13s    1.37s

  13804 requests in 5.00s, 13804 responses
  Throughput: 2.76K req/s
  Bandwidth:  65.44MB/s
  Status codes: 2xx=13804, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13804 / 13804 responses (100.0%)
[info] CPU 484.8% | Mem 1.2GiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   334.28ms   287.40ms   658.20ms    1.07s    1.42s

  14677 requests in 5.00s, 14677 responses
  Throughput: 2.93K req/s
  Bandwidth:  69.57MB/s
  Status codes: 2xx=14677, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 14677 / 14677 responses (100.0%)
[info] CPU 521.4% | Mem 1.3GiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   328.41ms   270.80ms   675.90ms    1.07s    1.27s

  15027 requests in 5.00s, 15027 responses
  Throughput: 3.00K req/s
  Bandwidth:  71.23MB/s
  Status codes: 2xx=15027, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15027 / 15027 responses (100.0%)
[info] CPU 546.2% | Mem 1.3GiB

=== Best: 3005 req/s (CPU: 546.2%, Mem: 1.3GiB) ===
[info] saved results/fortunes/1024/vanilla-io_uring.json
httparena-bench-vanilla-io_uring
httparena-bench-vanilla-io_uring
[info] skip: vanilla-io_uring does not subscribe to baseline-h2
[info] skip: vanilla-io_uring does not subscribe to static-h2
[info] skip: vanilla-io_uring does not subscribe to baseline-h2c
[info] skip: vanilla-io_uring does not subscribe to json-h2c
[info] skip: vanilla-io_uring does not subscribe to baseline-h3
[info] skip: vanilla-io_uring does not subscribe to static-h3
[info] skip: vanilla-io_uring does not subscribe to gateway-64
[info] skip: vanilla-io_uring does not subscribe to gateway-h3
[info] skip: vanilla-io_uring does not subscribe to production-stack
[info] skip: vanilla-io_uring does not subscribe to unary-grpc
[info] skip: vanilla-io_uring does not subscribe to unary-grpc-tls
[info] skip: vanilla-io_uring does not subscribe to stream-grpc
[info] skip: vanilla-io_uring does not subscribe to stream-grpc-tls
[info] skip: vanilla-io_uring does not subscribe to echo-ws
[info] skip: vanilla-io_uring does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

/benchmark -f vanilla-io_uring

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@MDA2AV

MDA2AV commented Jun 15, 2026

Copy link
Copy Markdown
Owner

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 530,288 1255.3% 131MiB +13.9% +81.9%
baseline 4096 501,273 1195.8% 383MiB +16.8% +98.4%
pipelined 512 31,997,856 6546.6% 73MiB +1244.6% +37.7%
pipelined 4096 30,134,134 5101.2% 225MiB +1209.2% +20.3%
limited-conn 512 297,116 1096.1% 1018MiB +37.0% +1541.9%
limited-conn 4096 378,353 1190.9% 1.1GiB +24.7% +529.3%
json 4096 473,509 1693.8% 445MiB -2.3% +151.4%
json-comp 512 416,046 1610.9% 314MiB +750.2% +260.9%
json-comp 4096 431,079 1457.1% 337MiB +655.3% +82.2%
json-comp 16384 422,031 1394.8% 654MiB +496.9% -7.1%
upload 32 2,102 2586.8% 2.8GiB +6906.7% +239.7%
upload 256 2,307 5935.3% 8.8GiB +4019.6% +528.6%
api-4 256 27,404 390.0% 221MiB -16.6% +211.3%
api-16 1024 11,516 608.4% 430MiB -41.7% +362.4%
static 1024 322,967 6353.0% 732MiB +1.5% -1.6%
static 4096 268,919 6173.7% 2.7GiB +1.0% -3.6%
static 6800 251,503 5480.1% 4.5GiB +20.6% ~0%
async-db 1024 4,826 541.1% 419MiB -55.6% +319.0%
crud 4096 64,307 653.9% 361MiB -46.2% -5.5%
fortunes 1024 314 262.7% 282MiB -88.2% +95.8%
Full log
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   78.83ms   69.30ms   113.60ms   361.70ms   780.80ms

  777365 requests in 15.00s, 776981 responses
  Throughput: 51.79K req/s
  Bandwidth:  16.06MB/s
  Status codes: 2xx=776981, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 776981 / 776981 responses (100.0%)
  Reconnects: 1283
  Per-template: 37704,38194,39300,39000,39601,39907,39300,39665,39229,39873,39475,38877,39621,39894,39906,39859,35968,37149,37583,36876
  Per-template-ok: 37704,38194,39300,39000,39601,39907,39300,39665,39229,39873,39475,38877,39621,39894,39906,39859,35968,37149,37583,36876
[info] CPU 568.9% | Mem 344MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   66.08ms   63.20ms   102.30ms   188.20ms   300.30ms

  927306 requests in 15.00s, 925834 responses
  Throughput: 61.71K req/s
  Bandwidth:  19.31MB/s
  Status codes: 2xx=925834, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 925834 / 925834 responses (100.0%)
  Reconnects: 2535
  Per-template: 44709,44002,45686,47366,47146,46872,47200,47135,46967,47472,47350,47567,48705,47772,48881,48131,44425,41402,43611,43435
  Per-template-ok: 44709,44002,45686,47366,47146,46872,47200,47135,46967,47472,47350,47567,48705,47772,48881,48131,44425,41402,43611,43435
[info] CPU 650.5% | Mem 356MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   63.54ms   59.60ms   105.80ms   194.40ms   292.60ms

  964616 requests in 15.00s, 964616 responses
  Throughput: 64.30K req/s
  Bandwidth:  20.13MB/s
  Status codes: 2xx=964616, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 964616 / 964616 responses (100.0%)
  Reconnects: 2795
  Per-template: 44455,47154,49462,49447,49770,50402,50333,50011,49363,51307,49661,50349,50575,49533,50045,49939,45843,40484,42545,43938
  Per-template-ok: 44455,47154,49462,49447,49770,50402,50333,50011,49363,51307,49661,50349,50575,49533,50045,49939,45843,40484,42545,43938
[info] CPU 653.9% | Mem 361MiB

=== Best: 64307 req/s (CPU: 653.9%, Mem: 361MiB) ===
[info] input BW: 5.52MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.21s    2.34s    3.47s    4.36s    4.87s

  1574 requests in 5.00s, 1574 responses
  Throughput: 314 req/s
  Bandwidth:  7.46MB/s
  Status codes: 2xx=1574, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1574 / 1574 responses (100.0%)
[info] CPU 262.7% | Mem 282MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.66s    2.68s    4.09s    4.90s    5.00s

  1242 requests in 5.00s, 1242 responses
  Throughput: 248 req/s
  Bandwidth:  5.89MB/s
  Status codes: 2xx=1242, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1242 / 1242 responses (100.0%)
  Latency overflow (>5s): 2
[info] CPU 258.5% | Mem 322MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.41s    2.40s    3.63s    4.50s    5.00s

  1476 requests in 5.00s, 1476 responses
  Throughput: 295 req/s
  Bandwidth:  7.00MB/s
  Status codes: 2xx=1476, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1476 / 1476 responses (100.0%)
  Latency overflow (>5s): 2
[info] CPU 288.6% | Mem 532MiB

=== Best: 314 req/s (CPU: 262.7%, Mem: 282MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

@enghitalo enghitalo marked this pull request as draft June 15, 2026 18:17
Single-element array push (`arr << x`) is 4-7x slower on post-0.5.1 V
(vlang/v#27468) while bulk push_many, allocation and indexed writes are
unaffected. The two hot single-element `<<` sites are now bulk writes:

  - wi() built integer digits with `out << tmp[i]` per digit; it now itoa's
    back-to-front into the [20]u8 scratch and flushes with one push_many.
  - write_json_response() pushed the item separator `,` and closing `}`
    one byte at a time; the closing `}` is now fused with the separator
    into a single '},' / '}' push_many.

Output is byte-identical (verified across counts 0..4096 and edge-value
integers). This makes the JSON hot path fast on both the 0.5.1 release and
current master, independent of the upstream codegen regression. Both
epoll and io_uring backends.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enghitalo enghitalo force-pushed the perf/vanilla-json-comp-cache-route-slice branch from 1b5fcc2 to 44fa661 Compare June 15, 2026 18:36
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

Build V from source at the 0.5.1 tag instead of the prebuilt release zip.
Plain `make` can't build an old tag: its latest_vc step `git pull`s the
newest vlang/vc bootstrap, which no longer matches 0.5.1's vlib (fails with
`unknown ident \`native\``). So pin vc to the commit cut for 0.5.1
(vlang/vc f461dfeb = "[v:master] 0c3183c - V 0.5.1") and run make's own
bootstrap recipe (cc -> v1 -> v2 -> v). Drop curl/unzip from the build deps.

Pinned by tag, not a master commit, because post-0.5.1 master carries a
codegen regression (single-element array push 4-7x slower, vlang/v#27468).
Both backends; verified the source-built compiler serves /json and /pipeline
correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 659,864 1334.0% 69MiB +41.7% -4.2%
baseline 4096 624,890 1375.0% 194MiB +45.6% +0.5%
pipelined 512 35,168,531 6604.6% 51MiB +1377.8% -3.8%
pipelined 4096 34,964,400 6533.7% 178MiB +1419.0% -4.8%
limited-conn 512 250,691 821.5% 62MiB +15.6% ~0%
limited-conn 4096 347,316 1072.5% 172MiB +14.5% -3.9%
json 4096 881,803 3042.3% 185MiB +81.9% +4.5%
json-comp 512 626,271 1820.0% 79MiB +1179.8% -9.2%
json-comp 4096 824,402 2458.3% 196MiB +1344.5% +5.9%
json-comp 16384 894,877 2477.9% 647MiB +1165.6% -8.1%
upload 32 46 144.5% 847MiB +53.3% +0.4%
upload 256 71 155.2% 998MiB +26.8% -30.4%
api-4 256 31,195 343.6% 73MiB -5.0% +2.8%
api-16 1024 21,007 400.4% 97MiB +6.4% +4.3%
static 1024 322,818 6500.4% 737MiB +1.5% -0.9%
static 4096 268,029 6373.7% 2.8GiB +0.7% ~0%
static 6800 244,673 6050.3% 4.6GiB +17.3% +2.2%
async-db 1024 10,947 354.5% 101MiB +0.7% +1.0%
crud 4096 146,526 1156.2% 427MiB +22.5% +11.8%
fortunes 1024 3,077 490.6% 137MiB +15.9% -4.9%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   29.02ms   16.10ms   69.00ms   209.30ms   350.40ms

  2113916 requests in 15.00s, 2113892 responses
  Throughput: 140.90K req/s
  Bandwidth:  44.37MB/s
  Status codes: 2xx=2113892, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2113892 / 2113892 responses (100.0%)
  Reconnects: 8567
  Per-template: 101405,106858,107829,106214,107816,109879,110692,111997,111260,111341,107354,105563,107516,108128,106042,106372,102942,93048,92750,98886
  Per-template-ok: 101405,106858,107829,106214,107816,109879,110692,111997,111260,111341,107354,105563,107516,108128,106042,106372,102942,93048,92750,98886
[info] CPU 1061.9% | Mem 421MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   27.86ms   15.20ms   66.80ms   200.30ms   335.80ms

  2198066 requests in 15.00s, 2197894 responses
  Throughput: 146.50K req/s
  Bandwidth:  45.95MB/s
  Status codes: 2xx=2197894, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2197893 / 2197894 responses (100.0%)
  Reconnects: 9115
  Per-template: 106249,108287,113891,111366,111265,112215,113129,112007,111663,111816,114793,114691,112342,112572,115490,111680,106850,95790,99510,102287
  Per-template-ok: 106249,108287,113891,111366,111265,112215,113129,112007,111663,111816,114793,114691,112342,112572,115490,111680,106850,95790,99510,102287
[info] CPU 1156.2% | Mem 427MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.57ms   15.80ms   68.30ms   206.90ms   348.60ms

  2144855 requests in 15.00s, 2144664 responses
  Throughput: 142.96K req/s
  Bandwidth:  45.05MB/s
  Status codes: 2xx=2144664, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2144663 / 2144664 responses (100.0%)
  Reconnects: 8840
  Per-template: 102391,105424,108681,109748,111774,111964,110179,110482,111350,110055,107495,107869,110809,106407,107743,110571,106809,95788,97946,101178
  Per-template-ok: 102391,105424,108681,109748,111774,111964,110179,110482,111350,110055,107495,107869,110809,106407,107743,110571,106809,95788,97946,101178
[info] CPU 1170.1% | Mem 433MiB

=== Best: 146526 req/s (CPU: 1156.2%, Mem: 427MiB) ===
[info] input BW: 12.58MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   321.68ms   258.40ms   676.30ms    1.18s    1.69s

  15094 requests in 5.00s, 15094 responses
  Throughput: 3.02K req/s
  Bandwidth:  71.55MB/s
  Status codes: 2xx=15094, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15094 / 15094 responses (100.0%)
[info] CPU 460.3% | Mem 135MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   316.30ms   246.30ms   677.50ms    1.27s    1.77s

  15367 requests in 5.00s, 15367 responses
  Throughput: 3.07K req/s
  Bandwidth:  72.84MB/s
  Status codes: 2xx=15367, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15367 / 15367 responses (100.0%)
[info] CPU 487.5% | Mem 137MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   316.52ms   255.30ms   676.60ms    1.14s    1.51s

  15389 requests in 5.00s, 15389 responses
  Throughput: 3.08K req/s
  Bandwidth:  72.95MB/s
  Status codes: 2xx=15389, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15389 / 15389 responses (100.0%)
[info] CPU 490.6% | Mem 137MiB

=== Best: 3077 req/s (CPU: 490.6%, Mem: 137MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

@enghitalo enghitalo marked this pull request as ready for review June 15, 2026 20:20
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

The pipelined profile (the arena's highest-RPS test, ~35M rps) is a fixed
plaintext "ok". Match it on `target` immediately after the path is sliced and
blit a precomputed full-response constant + return — before the '?'-scan, the
route slice, and write_resp's 6-part piecewise header build. requests/pipeline.raw
is `GET /pipeline` with no query, so the exact-match is correct; the now-redundant
route=='/pipeline' arm is dropped from the dispatch chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 665,839 1349.1% 71MiB +43.0% -1.4%
baseline 4096 649,082 1453.1% 191MiB +51.2% -1.0%
pipelined 512 36,425,360 6594.6% 51MiB +1430.6% -3.8%
pipelined 4096 36,594,003 6680.5% 176MiB +1489.8% -5.9%
limited-conn 512 234,492 731.1% 60MiB +8.1% -3.2%
limited-conn 4096 367,938 1157.3% 170MiB +21.3% -5.0%
json 4096 825,756 2806.5% 182MiB +70.4% +2.8%
json-comp 512 609,230 1777.8% 64MiB +1145.0% -26.4%
json-comp 4096 808,494 2303.0% 191MiB +1316.6% +3.2%
json-comp 16384 872,097 2409.5% 646MiB +1133.4% -8.2%
upload 32 14,642 500.7% 294MiB +48706.7% -65.2%
upload 256 12,768 757.1% 190MiB +22700.0% -86.7%
api-4 256 31,974 332.0% 71MiB -2.7% ~0%
api-16 1024 21,580 405.4% 96MiB +9.3% +3.2%
static 1024 1,074,176 5661.9% 102MiB +237.6% -86.3%
static 4096 1,137,433 5889.2% 291MiB +327.3% -89.9%
static 6800 1,176,339 5783.7% 462MiB +463.9% -90.0%
async-db 1024 10,927 365.6% 100MiB +0.5% ~0%
crud 4096 141,175 1197.3% 361MiB +18.0% -5.5%
fortunes 1024 2,990 449.1% 136MiB +12.6% -5.6%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   30.73ms   19.20ms   67.40ms   203.30ms   349.50ms

  1996527 requests in 15.00s, 1996527 responses
  Throughput: 133.08K req/s
  Bandwidth:  41.70MB/s
  Status codes: 2xx=1996527, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1996527 / 1996527 responses (100.0%)
  Reconnects: 7955
  Per-template: 96410,96919,97818,100422,102070,103434,101407,101315,103208,103663,104419,102074,103254,101781,101736,101488,96423,89258,92470,96958
  Per-template-ok: 96410,96919,97818,100422,102070,103434,101407,101315,103208,103663,104419,102074,103254,101781,101736,101488,96423,89258,92470,96958
[info] CPU 1150.1% | Mem 354MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.98ms   15.50ms   70.30ms   215.00ms   357.70ms

  2115286 requests in 15.00s, 2115286 responses
  Throughput: 141.00K req/s
  Bandwidth:  44.26MB/s
  Status codes: 2xx=2115286, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2115283 / 2115286 responses (100.0%)
  Reconnects: 8702
  Per-template: 100945,104676,106728,108621,107969,107222,106993,107375,111163,110726,110961,110849,109567,108116,106233,108394,102962,94004,94024,97755
  Per-template-ok: 100945,104676,106728,108621,107969,107222,106993,107375,111163,110726,110961,110849,109567,108116,106233,108394,102962,94004,94024,97755
[info] CPU 1187.8% | Mem 360MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.96ms   16.10ms   68.90ms   208.80ms   357.70ms

  2117627 requests in 15.00s, 2117627 responses
  Throughput: 141.15K req/s
  Bandwidth:  44.14MB/s
  Status codes: 2xx=2117627, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2117626 / 2117627 responses (100.0%)
  Reconnects: 8596
  Per-template: 102795,108061,108404,106066,107591,110523,108127,108357,105792,106192,108524,108204,108984,109681,110248,107810,102019,94362,96497,99389
  Per-template-ok: 102795,108061,108404,106066,107591,110523,108127,108357,105792,106192,108524,108204,108984,109681,110248,107810,102019,94362,96497,99389
[info] CPU 1197.3% | Mem 361MiB

=== Best: 141175 req/s (CPU: 1197.3%, Mem: 361MiB) ===
[info] input BW: 12.12MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   326.07ms   265.50ms   698.70ms    1.11s    1.45s

  14953 requests in 5.00s, 14953 responses
  Throughput: 2.99K req/s
  Bandwidth:  70.88MB/s
  Status codes: 2xx=14953, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 14953 / 14953 responses (100.0%)
[info] CPU 449.1% | Mem 136MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   332.23ms   274.30ms   680.10ms    1.17s    1.65s

  14724 requests in 5.00s, 14724 responses
  Throughput: 2.94K req/s
  Bandwidth:  69.80MB/s
  Status codes: 2xx=14724, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 14724 / 14724 responses (100.0%)
[info] CPU 474.8% | Mem 137MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   331.88ms   265.00ms   700.40ms    1.19s    1.59s

  14770 requests in 5.00s, 14770 responses
  Throughput: 2.95K req/s
  Bandwidth:  70.02MB/s
  Status codes: 2xx=14770, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 14770 / 14770 responses (100.0%)
[info] CPU 472.4% | Mem 136MiB

=== Best: 2990 req/s (CPU: 449.1%, Mem: 136MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

…content)

callgrind on the render path showed ~26% of its instructions were the zero-fill
of strings.new_builder(32768) — a flat 32 KB block for a ~1.5 KB response, re-
zeroed every request as the GC reuses it. Size it from the actual rows instead
(160 + 96/row + message bytes); an outlier grows once. The other builders here
were already content-sized.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 639,805 1335.3% 67MiB +37.4% -6.9%
baseline 4096 649,542 1421.0% 191MiB +51.3% -1.0%
pipelined 512 36,348,876 6583.8% 52MiB +1427.4% -1.9%
pipelined 4096 36,293,785 6473.1% 186MiB +1476.8% -0.5%
limited-conn 512 227,867 745.0% 62MiB +5.1% ~0%
limited-conn 4096 365,187 1125.0% 178MiB +20.3% -0.6%
json 4096 827,166 2793.9% 184MiB +70.7% +4.0%
json-comp 512 609,626 1769.5% 65MiB +1145.8% -25.3%
json-comp 4096 814,886 2393.0% 192MiB +1327.8% +3.8%
json-comp 16384 876,232 2379.1% 638MiB +1139.3% -9.4%
upload 32 11,707 472.7% 94MiB +38923.3% -88.9%
upload 256 14,745 881.3% 387MiB +26230.4% -73.0%
api-4 256 32,039 329.7% 73MiB -2.5% +2.8%
api-16 1024 21,759 408.7% 95MiB +10.2% +2.2%
static 1024 1,068,339 5601.6% 99MiB +235.8% -86.7%
static 4096 1,138,222 5823.3% 291MiB +327.6% -89.9%
static 6800 1,176,484 5922.2% 459MiB +464.0% -90.0%
async-db 1024 10,887 342.4% 98MiB +0.2% -2.0%
crud 4096 144,020 1195.1% 355MiB +20.4% -7.1%
fortunes 1024 4,097 226.7% 71MiB +54.3% -50.7%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   30.46ms   17.10ms   71.40ms   211.50ms   344.10ms

  2015123 requests in 15.00s, 2015123 responses
  Throughput: 134.32K req/s
  Bandwidth:  42.55MB/s
  Status codes: 2xx=2015123, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2015122 / 2015123 responses (100.0%)
  Reconnects: 8046
  Per-template: 97859,100561,102713,102205,102919,104177,105709,104560,103134,101090,101329,101951,99708,101598,103377,104741,102028,89202,90421,95840
  Per-template-ok: 97859,100561,102713,102205,102919,104177,105709,104560,103134,101090,101329,101951,99708,101598,103377,104741,102028,89202,90421,95840
[info] CPU 1099.6% | Mem 355MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   28.34ms   14.80ms   69.60ms   215.00ms   345.30ms

  2160304 requests in 15.00s, 2160305 responses
  Throughput: 144.00K req/s
  Bandwidth:  45.05MB/s
  Status codes: 2xx=2160305, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2160305 / 2160305 responses (100.0%)
  Reconnects: 8825
  Per-template: 107096,108706,109840,112071,111236,112623,111369,108570,108051,109455,110861,109352,111382,107383,110046,111637,104577,95851,98066,102133
  Per-template-ok: 107096,108706,109840,112071,111236,112623,111369,108570,108051,109455,110861,109352,111382,107383,110046,111637,104577,95851,98066,102133
[info] CPU 1195.1% | Mem 355MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   29.37ms   15.80ms   71.60ms   215.00ms   357.80ms

  2086670 requests in 15.00s, 2086671 responses
  Throughput: 139.09K req/s
  Bandwidth:  43.90MB/s
  Status codes: 2xx=2086671, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2086668 / 2086671 responses (100.0%)
  Reconnects: 8473
  Per-template: 98776,104543,108642,108167,106080,106602,104374,102054,102615,105810,108982,108213,107820,107541,107849,108013,104458,94577,94341,97211
  Per-template-ok: 98776,104543,108642,108167,106080,106602,104374,102054,102615,105810,108982,108213,107820,107541,107849,108013,104458,94577,94341,97211
[info] CPU 1183.5% | Mem 359MiB

=== Best: 144020 req/s (CPU: 1195.1%, Mem: 355MiB) ===
[info] input BW: 12.36MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.46ms   5.35ms   9.55ms   16.70ms   93.90ms

  20487 requests in 5.00s, 20487 responses
  Throughput: 4.09K req/s
  Bandwidth:  97.12MB/s
  Status codes: 2xx=20487, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 20487 / 20487 responses (100.0%)
[info] CPU 226.7% | Mem 71MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   5.91ms   4.52ms   9.68ms   13.70ms   28.20ms

  18264 requests in 5.00s, 18264 responses
  Throughput: 3.65K req/s
  Bandwidth:  86.58MB/s
  Status codes: 2xx=18264, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 18264 / 18264 responses (100.0%)
[info] CPU 183.9% | Mem 84MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.35ms   5.69ms   9.68ms   12.90ms   25.00ms

  19598 requests in 5.00s, 19598 responses
  Throughput: 3.92K req/s
  Bandwidth:  92.90MB/s
  Status codes: 2xx=19598, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 19598 / 19598 responses (100.0%)
[info] CPU 219.9% | Mem 83MiB

=== Best: 4097 req/s (CPU: 226.7%, Mem: 71MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

…ttp_request

Adopt the no-Result parser entry point (vanilla MDA2AV#44): fill a caller-owned
HttpRequest in place via request_parser.decode_into instead of the
decode_http_request wrapper, which boxed the big struct in a `!HttpRequest`
Result every request (~13% of the parse path per callgrind, on top of the -26%
from find_byte_idx). Captures the full ~-35% parse-instruction cut on the
pipelined hot path.

GATED on vanilla MDA2AV#44 being merged into main (decode_into must exist in the
vanilla the arena clones). Not locally build-validated: this sync framework uses
db.pg.ConnectionPool, absent on the post-0.5.1 local V; mechanical 3-line swap.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@enghitalo enghitalo marked this pull request as draft June 17, 2026 12:24
…ilder)

Back-port the async build's cache-HIT path to sync. crud is ~75% cached GETs;
each one went through ok_xcache — a fresh strings.Builder(body+96) + 9 writes +
a materialized []u8 — then wb copied that into the conn buffer again (alloc +
double copy per hit). Write the header fragments + body straight into `out` via
ws/wi instead. Independent of the async re-bench; pure GC-churn reduction on the
hot path (this direct-to-out path is part of why the async build ran crud 3x).

Not locally build-validated (this sync build uses db.pg.ConnectionPool, absent on
the post-0.5.1 local V); mechanical ws/wi/wb usage, validated by the pending MDA2AV#877
benchmark. crud_list/create/update left as-is (lower frequency, DB-bound).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 672,004 1375.7% 71MiB +44.3% -1.4%
baseline 4096 620,980 1329.0% 200MiB +44.7% +3.6%
pipelined 512 37,244,659 6642.3% 52MiB +1465.1% -1.9%
pipelined 4096 37,324,649 6701.3% 178MiB +1521.6% -4.8%
limited-conn 512 237,698 753.1% 59MiB +9.6% -4.8%
limited-conn 4096 380,217 1187.6% 177MiB +25.3% -1.1%
json 4096 824,881 2759.5% 183MiB +70.2% +3.4%
json-comp 512 577,679 1701.3% 63MiB +1080.5% -27.6%
json-comp 4096 841,212 2363.8% 191MiB +1373.9% +3.2%
json-comp 16384 916,824 2335.7% 647MiB +1196.7% -8.1%
upload 32 11,774 444.4% 93MiB +39146.7% -89.0%
upload 256 14,144 927.8% 247MiB +25157.1% -82.8%
api-4 256 31,864 331.7% 71MiB -3.0% ~0%
api-16 1024 21,482 404.0% 97MiB +8.8% +4.3%
static 1024 1,084,448 5654.4% 101MiB +240.8% -86.4%
static 4096 1,156,531 5768.4% 291MiB +334.5% -89.9%
static 6800 1,188,803 5943.1% 461MiB +469.9% -90.0%
async-db 1024 10,810 459.6% 106MiB -0.5% +6.0%
crud 4096 177,297 1292.3% 371MiB +48.3% -2.9%
fortunes 1024 4,367 243.8% 72MiB +64.5% -50.0%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   25.65ms   22.90ms   45.60ms   87.50ms   174.20ms

  2395219 requests in 15.00s, 2395220 responses
  Throughput: 159.66K req/s
  Bandwidth:  50.50MB/s
  Status codes: 2xx=2395220, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2395220 / 2395220 responses (100.0%)
  Reconnects: 10002
  Per-template: 111526,117472,120570,122814,123986,122927,123153,124067,122790,121288,123332,123445,124027,122684,121824,122716,119095,109531,107089,110884
  Per-template-ok: 111526,117472,120570,122814,123986,122927,123153,124067,122790,121288,123332,123445,124027,122684,121824,122716,119095,109531,107089,110884
[info] CPU 1143.6% | Mem 365MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   23.04ms   18.60ms   42.90ms   117.00ms   213.30ms

  2659465 requests in 15.00s, 2659465 responses
  Throughput: 177.27K req/s
  Bandwidth:  56.08MB/s
  Status codes: 2xx=2659465, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2659465 / 2659465 responses (100.0%)
  Reconnects: 11409
  Per-template: 122844,127923,132034,134944,136183,135333,134071,136287,136360,137061,138586,136786,137455,136340,136092,138169,134559,125847,123231,119360
  Per-template-ok: 122844,127923,132034,134944,136183,135333,134071,136287,136360,137061,138586,136786,137455,136340,136092,138169,134559,125847,123231,119360
[info] CPU 1292.3% | Mem 371MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   24.21ms   21.20ms   43.10ms   90.60ms   176.10ms

  2531479 requests in 15.00s, 2531476 responses
  Throughput: 168.74K req/s
  Bandwidth:  53.22MB/s
  Status codes: 2xx=2531476, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2531474 / 2531476 responses (100.0%)
  Reconnects: 10735
  Per-template: 116802,121790,123745,127531,129949,130182,130386,130019,132187,131171,130270,129767,132213,132304,129424,128412,126113,119661,115145,114403
  Per-template-ok: 116802,121790,123745,127531,129949,130182,130386,130019,132187,131171,130270,129767,132213,132304,129424,128412,126113,119661,115145,114403
[info] CPU 1214.7% | Mem 377MiB

=== Best: 177297 req/s (CPU: 1292.3%, Mem: 371MiB) ===
[info] input BW: 15.22MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.62ms   6.45ms   9.83ms   16.00ms   46.70ms

  21839 requests in 5.00s, 21839 responses
  Throughput: 4.37K req/s
  Bandwidth:  103.53MB/s
  Status codes: 2xx=21839, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 21839 / 21839 responses (100.0%)
[info] CPU 243.8% | Mem 72MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.52ms   4.98ms   10.30ms   19.00ms   47.20ms

  18827 requests in 5.00s, 18827 responses
  Throughput: 3.76K req/s
  Bandwidth:  89.25MB/s
  Status codes: 2xx=18827, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 18827 / 18827 responses (100.0%)
[info] CPU 206.1% | Mem 84MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   324.84ms   248.50ms   711.30ms    1.27s    1.90s

  14960 requests in 5.00s, 14960 responses
  Throughput: 2.99K req/s
  Bandwidth:  70.92MB/s
  Status codes: 2xx=14960, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 14960 / 14960 responses (100.0%)
[info] CPU 456.0% | Mem 144MiB

=== Best: 4367 req/s (CPU: 243.8%, Mem: 72MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

enghitalo and others added 2 commits June 17, 2026 12:54
…refix

The whole pipelined profile is a fixed `GET /pipeline ` request, but handle()
still ran decode_into before the route check. Match the raw request-line prefix
FIRST and blit the precomputed response — skipping decode_into entirely on the
arena's highest-RPS endpoint (parse is ~8% of the pipelined per-request cost).
Drops the now-redundant post-decode `target == '/pipeline'` arm.

Context: investigation showed the pipelined gap to the leaders (~37M vs ~56M) is
NOT the I/O backend — vanilla-io_uring (37M) ≈ vanilla-epoll, and rust-epoll hits
56M, so epoll is fine; the gap is V's per-request CPU floor. This is a modest
per-request CPU trim, not a 1.5x closer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ache

Two cache wins from the json/crud study (both contained, framework-only).

crud (75% cached GET, small ~200B response): the cache-HIT cost was dominated by
the SINGLE shared RwMutex — 128 worker cores rlocking the same lock per GET ping-
pong its cache line (MESI bouncing), not the map lookup. Replace `cache
map[int]string` (body only, rebuilt per hit) with `crud_shards [256]CrudShard`,
each its own RwMutex + map storing the FULL `X-Cache: HIT` response bytes. A hit
is now: rlock one of 256 shards (id & 255) + a single push_many — no rebuild, no
double copy, ~256x less lock contention. crud_update invalidates only its shard.

json (non-gzip): add json_cache (count<<32|m -> full response bytes), mirroring
the existing gz_cache. The 7 benchmarked (count,m) pairs are deterministic, so a
hit collapses the per-item prefix loop + the double price*qty*m pass into one
push_many. Build-on-miss kept (bounded at 1024 like gz).

Verified V's array `init:` is per-element (256 distinct mutexes/maps). Not locally
build-validated (MDA2AV#877 uses db.pg.ConnectionPool, absent on the post-0.5.1 local V);
validated by the arena benchmark. crud is type "engine" — appears in the crud data
(MDA2AV#15) but confirm leaderboard scoring. No 200ms TTL (spec asks for it; current cache
is grow-until-invalidate — a known correctness gap that slightly inflates HIT rate).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 682,658 1398.1% 68MiB +46.6% -5.6%
baseline 4096 655,800 1459.6% 200MiB +52.8% +3.6%
pipelined 512 39,218,131 6675.9% 54MiB +1548.0% +1.9%
pipelined 4096 39,230,028 6669.3% 177MiB +1604.3% -5.3%
limited-conn 512 248,953 781.8% 61MiB +14.8% -1.6%
limited-conn 4096 371,808 1134.6% 171MiB +22.5% -4.5%
json 4096 821,961 2636.6% 185MiB +69.6% +4.5%
json-comp 512 632,662 1814.7% 63MiB +1192.9% -27.6%
json-comp 4096 816,883 2372.2% 182MiB +1331.3% -1.6%
json-comp 16384 913,140 2498.1% 634MiB +1191.5% -9.9%
upload 32 12,182 504.9% 117MiB +40506.7% -86.1%
upload 256 12,940 796.8% 230MiB +23007.1% -84.0%
api-4 256 32,449 331.4% 72MiB -1.2% +1.4%
api-16 1024 21,520 410.8% 95MiB +9.0% +2.2%
static 1024 1,067,927 5638.3% 101MiB +235.6% -86.4%
static 4096 1,140,778 5907.5% 294MiB +328.6% -89.7%
static 6800 1,179,739 5931.3% 461MiB +465.5% -90.0%
async-db 1024 11,026 357.5% 99MiB +1.5% -1.0%
crud 4096 160,836 1050.9% 316MiB +34.5% -17.3%
fortunes 1024 4,130 237.7% 69MiB +55.6% -52.1%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   27.53ms   11.30ms   74.70ms   228.10ms   383.40ms

  2225818 requests in 15.00s, 2225818 responses
  Throughput: 148.36K req/s
  Bandwidth:  46.76MB/s
  Status codes: 2xx=2225818, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2225817 / 2225818 responses (100.0%)
  Reconnects: 9127
  Per-template: 106167,109444,112316,114631,112096,111123,112875,117887,116577,115274,115358,114744,114073,113299,114105,112776,109874,100830,99631,102737
  Per-template-ok: 106167,109444,112316,114631,112096,111123,112875,117887,116577,115274,115358,114744,114073,113299,114105,112776,109874,100830,99631,102737
[info] CPU 1053.3% | Mem 316MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   25.64ms   9.35ms   71.30ms   223.00ms   381.80ms

  2388360 requests in 15.00s, 2388168 responses
  Throughput: 159.18K req/s
  Bandwidth:  49.86MB/s
  Status codes: 2xx=2388168, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2388165 / 2388168 responses (100.0%)
  Reconnects: 9994
  Per-template: 112861,115262,121270,121791,123857,126778,123273,120645,120401,120872,121872,124387,123475,124960,121620,121413,115534,108132,107696,112066
  Per-template-ok: 112861,115262,121270,121791,123857,126778,123273,120645,120401,120872,121872,124387,123475,124960,121620,121413,115534,108132,107696,112066
[info] CPU 1130.3% | Mem 316MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   25.37ms   8.94ms   70.80ms   228.10ms   396.70ms

  2412553 requests in 15.00s, 2412553 responses
  Throughput: 160.81K req/s
  Bandwidth:  50.55MB/s
  Status codes: 2xx=2412553, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2412549 / 2412553 responses (100.0%)
  Reconnects: 10221
  Per-template: 117420,120162,120718,122265,124733,126063,127070,125686,122350,123558,123170,123124,122122,124803,124981,123431,118193,104627,105545,112528
  Per-template-ok: 117420,120162,120718,122265,124733,126063,127070,125686,122350,123558,123170,123124,122122,124803,124981,123431,118193,104627,105545,112528
[info] CPU 1050.9% | Mem 316MiB

=== Best: 160836 req/s (CPU: 1050.9%, Mem: 316MiB) ===
[info] input BW: 13.80MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.88ms   7.32ms   10.00ms   19.50ms   62.90ms

  20651 requests in 5.00s, 20651 responses
  Throughput: 4.13K req/s
  Bandwidth:  97.89MB/s
  Status codes: 2xx=20651, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 20651 / 20651 responses (100.0%)
[info] CPU 237.7% | Mem 69MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.46ms   5.75ms   9.90ms   14.00ms   23.10ms

  19524 requests in 5.00s, 19524 responses
  Throughput: 3.90K req/s
  Bandwidth:  92.56MB/s
  Status codes: 2xx=19524, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 19524 / 19524 responses (100.0%)
[info] CPU 205.6% | Mem 83MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.56ms   5.66ms   10.00ms   13.40ms   18.50ms

  19211 requests in 5.00s, 19211 responses
  Throughput: 3.84K req/s
  Bandwidth:  91.07MB/s
  Status codes: 2xx=19211, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 19211 / 19211 responses (100.0%)
[info] CPU 222.9% | Mem 82MiB

=== Best: 4130 req/s (CPU: 237.7%, Mem: 69MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

enghitalo added a commit to enghitalo/HttpArena that referenced this pull request Jun 18, 2026
Local profiling (callgrind on a gcc-built binary, isolated gcannon) showed this
branch lost the `has_pipeline_prefix` fast path the MDA2AV#877 branch had: handle() ran
decode_into + parse_http1_request_line on EVERY request, so the highest-RPS
/pipeline test paid the full HTTP parse (~55% of the per-request CPU; the
in-handle parse alone ~17%) for a fixed response.

Restore it: match the raw `GET /pipeline ` prefix and blit pipeline_resp BEFORE
any parsing. The request is already framed by the reactor, so decode adds nothing
here. After: the per-request /pipeline profile has ZERO parse functions, and local
throughput rose from 90% to 96% of the bare-C epoll floor (gc none). The remaining
gap + the boehm-vs-gc-none delta is per-request allocation/GC (next).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 695,031 1439.0% 83MiB +49.3% +15.3%
baseline 4096 646,846 1440.6% 205MiB +50.7% +6.2%
pipelined 512 38,827,695 6603.7% 59MiB +1531.6% +11.3%
pipelined 4096 39,223,014 6467.0% 194MiB +1604.0% +3.7%
limited-conn 512 266,855 899.3% 74MiB +23.0% +19.4%
limited-conn 4096 358,052 1075.9% 187MiB +18.0% +4.5%
json 4096 839,959 2687.6% 190MiB +73.3% +7.3%
json-comp 512 770,540 2517.1% 75MiB +1474.6% -13.8%
json-comp 4096 874,643 2470.8% 196MiB +1432.5% +5.9%
json-comp 16384 855,011 2313.3% 638MiB +1109.3% -9.4%
upload 32 12,158 473.9% 97MiB +40426.7% -88.5%
upload 256 14,688 950.3% 379MiB +26128.6% -73.6%
api-4 256 35,638 341.8% 86MiB +8.5% +21.1%
api-16 1024 22,647 511.2% 109MiB +14.7% +17.2%
static 1024 1,068,801 5638.4% 108MiB +235.9% -85.5%
static 4096 1,146,888 5808.9% 309MiB +330.9% -89.2%
static 6800 1,185,636 5799.2% 470MiB +468.3% -89.8%
async-db 1024 10,637 479.8% 113MiB -2.1% +13.0%
crud 4096 160,928 1149.7% 321MiB +34.6% -16.0%
fortunes 1024 3,059 479.1% 151MiB +15.2% +4.9%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   26.91ms   11.00ms   69.80ms   232.10ms   403.40ms

  2278811 requests in 15.00s, 2278811 responses
  Throughput: 151.90K req/s
  Bandwidth:  47.80MB/s
  Status codes: 2xx=2278811, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2278811 / 2278811 responses (100.0%)
  Reconnects: 9433
  Per-template: 110318,111875,114084,115098,117793,118893,116972,117709,113355,113684,115203,118286,118123,116500,114326,115482,112901,104253,104139,109817
  Per-template-ok: 110318,111875,114084,115098,117793,118893,116972,117709,113355,113684,115203,118286,118123,116500,114326,115482,112901,104253,104139,109817
[info] CPU 1068.8% | Mem 320MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   25.34ms   9.54ms   68.60ms   224.00ms   373.70ms

  2414126 requests in 15.00s, 2413934 responses
  Throughput: 160.90K req/s
  Bandwidth:  50.46MB/s
  Status codes: 2xx=2413934, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2413934 / 2413934 responses (100.0%)
  Reconnects: 10138
  Per-template: 116146,117360,120616,122785,125944,125623,125603,125819,125208,122036,119415,121031,120563,124128,124797,124451,118027,107958,111077,115347
  Per-template-ok: 116146,117360,120616,122785,125944,125623,125603,125819,125208,122036,119415,121031,120563,124128,124797,124451,118027,107958,111077,115347
[info] CPU 1149.7% | Mem 321MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   25.81ms   9.42ms   71.90ms   228.40ms   382.10ms

  2371943 requests in 15.00s, 2371943 responses
  Throughput: 158.10K req/s
  Bandwidth:  49.72MB/s
  Status codes: 2xx=2371943, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2371943 / 2371943 responses (100.0%)
  Reconnects: 9980
  Per-template: 115009,115289,117846,120482,121217,122622,122817,122535,122699,125578,124369,122303,119540,118379,117638,121331,117213,106138,107507,111431
  Per-template-ok: 115009,115289,117846,120482,121217,122622,122817,122535,122699,125578,124369,122303,119540,118379,117638,121331,117213,106138,107507,111431
[info] CPU 1077.8% | Mem 329MiB

=== Best: 160928 req/s (CPU: 1149.7%, Mem: 321MiB) ===
[info] input BW: 13.81MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   326.07ms   266.90ms   677.20ms    1.18s    1.69s

  15031 requests in 5.00s, 15031 responses
  Throughput: 3.00K req/s
  Bandwidth:  71.25MB/s
  Status codes: 2xx=15031, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15031 / 15031 responses (100.0%)
[info] CPU 463.9% | Mem 151MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   318.96ms   255.50ms   665.40ms    1.17s    1.47s

  15297 requests in 5.00s, 15297 responses
  Throughput: 3.06K req/s
  Bandwidth:  72.51MB/s
  Status codes: 2xx=15297, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15297 / 15297 responses (100.0%)
[info] CPU 479.1% | Mem 151MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   319.68ms   255.50ms   675.70ms    1.12s    1.59s

  15293 requests in 5.00s, 15293 responses
  Throughput: 3.06K req/s
  Bandwidth:  72.50MB/s
  Status codes: 2xx=15293, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15293 / 15293 responses (100.0%)
[info] CPU 486.6% | Mem 151MiB

=== Best: 3059 req/s (CPU: 479.1%, Mem: 151MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-epoll

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 918,697 1992.2% 78MiB +97.3% +8.3%
baseline 4096 846,332 1815.9% 206MiB +97.2% +6.7%
pipelined 512 38,822,096 6604.7% 58MiB +1531.4% +9.4%
pipelined 4096 39,363,740 6701.0% 182MiB +1610.1% -2.7%
limited-conn 512 652,228 2300.3% 392MiB +200.7% +532.3%
limited-conn 4096 660,408 2094.5% 906MiB +117.6% +406.1%
json 4096 2,246,993 6408.9% 185MiB +363.6% +4.5%
json-comp 512 2,136,403 6175.4% 82MiB +4265.8% -5.7%
json-comp 4096 2,353,670 6285.2% 228MiB +4024.0% +23.2%
json-comp 16384 2,350,073 5985.0% 232MiB +3223.8% -67.0%
upload 32 22,257 1294.7% 112MiB +74090.0% -86.7%
upload 256 25,152 3763.2% 609MiB +44814.3% -57.5%
api-4 256 37,468 347.2% 167MiB +14.0% +135.2%
api-16 1024 23,972 565.4% 282MiB +21.4% +203.2%
static 1024 1,075,875 5681.9% 105MiB +238.1% -85.9%
static 4096 1,140,865 5841.3% 309MiB +328.6% -89.2%
static 6800 1,177,122 5961.2% 463MiB +464.3% -90.0%
async-db 1024 10,446 488.1% 137MiB -3.9% +37.0%
crud 4096 167,996 1181.6% 401MiB +40.5% +5.0%
fortunes 1024 3,092 503.0% 151MiB +16.5% +4.9%
Full log
[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   26.15ms   10.90ms   69.00ms   215.00ms   412.00ms

  2343625 requests in 15.00s, 2342345 responses
  Throughput: 156.13K req/s
  Bandwidth:  48.98MB/s
  Status codes: 2xx=2342345, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2342345 / 2342345 responses (100.0%)
  Reconnects: 9764
  Per-template: 113253,118038,120072,120011,121938,121687,121987,123364,118842,116441,120924,119579,119331,119390,118024,119022,113077,105330,104676,107359
  Per-template-ok: 113253,118038,120072,120011,121938,121687,121987,123364,118842,116441,120924,119579,119331,119390,118024,119022,113077,105330,104676,107359
[info] CPU 1090.2% | Mem 378MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   24.30ms   9.33ms   65.90ms   209.60ms   347.80ms

  2519950 requests in 15.00s, 2519950 responses
  Throughput: 167.97K req/s
  Bandwidth:  52.47MB/s
  Status codes: 2xx=2519950, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2519947 / 2519950 responses (100.0%)
  Reconnects: 10755
  Per-template: 116226,119576,125397,129626,132448,132380,129475,128187,128283,131586,130475,129450,130317,130550,129995,128757,120084,114216,115593,117326
  Per-template-ok: 116226,119576,125397,129626,132448,132380,129475,128187,128283,131586,130475,129450,130317,130550,129995,128757,120084,114216,115593,117326
[info] CPU 1181.6% | Mem 401MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   24.71ms   9.45ms   67.60ms   215.80ms   363.90ms

  2474515 requests in 15.00s, 2474515 responses
  Throughput: 164.94K req/s
  Bandwidth:  52.03MB/s
  Status codes: 2xx=2474515, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2474511 / 2474515 responses (100.0%)
  Reconnects: 10412
  Per-template: 116355,120571,124948,126878,128795,128208,126538,127284,127350,126807,129441,130756,126692,127182,127627,128173,122302,108981,107903,111720
  Per-template-ok: 116355,120571,124948,126878,128795,128208,126538,127284,127350,126807,129441,130756,126692,127182,127627,128173,122302,108981,107903,111720
[info] CPU 1172.2% | Mem 418MiB

=== Best: 167996 req/s (CPU: 1181.6%, Mem: 401MiB) ===
[info] input BW: 14.42MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   320.83ms   261.10ms   683.20ms    1.16s    1.60s

  15165 requests in 5.00s, 15165 responses
  Throughput: 3.03K req/s
  Bandwidth:  71.89MB/s
  Status codes: 2xx=15165, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15165 / 15165 responses (100.0%)
[info] CPU 472.7% | Mem 152MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   317.45ms   254.70ms   677.80ms    1.17s    1.70s

  15419 requests in 5.00s, 15419 responses
  Throughput: 3.08K req/s
  Bandwidth:  73.09MB/s
  Status codes: 2xx=15419, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15419 / 15419 responses (100.0%)
[info] CPU 500.2% | Mem 151MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   314.26ms   251.40ms   655.00ms    1.17s    1.68s

  15464 requests in 5.00s, 15464 responses
  Throughput: 3.09K req/s
  Bandwidth:  73.31MB/s
  Status codes: 2xx=15464, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15464 / 15464 responses (100.0%)
[info] CPU 503.0% | Mem 151MiB

=== Best: 3092 req/s (CPU: 503.0%, Mem: 151MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

MDA2AV pushed a commit that referenced this pull request Jun 19, 2026
#888)

* vanilla: cache json-comp gzip responses + zero-alloc routing

json-comp recompressed the gzip body on EVERY request even though the output
for a given (count, m) is fully deterministic — and gzip CPU, not allocation,
dominates that profile. Cache the COMPLETE gzipped response per (count, m) and
append the cached copy on a hit (bounded map, RwMutex). The benchmark hits only
a handful of (count, m) pairs, so the cache stays tiny.

Also route on the path WITHOUT allocating: a tos() view into the request buffer
instead of all_before('?')'s per-request string copy (one alloc per request on
the hot path), shaving GC churn off baseline/json too.

Local before/after (16-core loopback, gcannon, single listener):
  json-comp  58K -> 390K req/s  (+570%, 6.7x)

Correctness verified: gzip body decodes to the right items/count/total; the
cached response is byte-identical across requests; all other routes unchanged.
Applies to both the epoll and io_uring variants.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* vanilla: precompute query-key bytes + parse path ints in place

Remove the remaining small per-request allocations on the hot path:
  • qint/qstr took a `string` key and called `key.bytes()` every request (one
    []u8 alloc per parameter — baseline parses a+b, async-db min+max+limit…).
    Keys are now precomputed `const []u8` (qk_*), built once at init.
  • /json/<n> and /crud/items/<id> parsed the id via route[n..].i64(), a
    substring copy. parse_u_at() reads the digits straight from the path view.

Local before/after (16-core loopback) is within noise (baseline ~528K→530K,
json ~206K→212K) — these allocs are tiny next to the response builder #866
removed — but allocation scaled hard on the 64-core arena (json +322% there),
so this trims more GC churn for that environment at zero cost. Note: @[manualfree]
is a no-op under the GC build the arena uses (`v -prod` = Boehm GC; manualfree
only affects -autofree), so reducing allocations is the lever, not manualfree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Benchmark results: vanilla-epoll

* vanilla: DB path — prepared statement (async-db) + single-pass HTML escape

Folds the DB-path work into this PR so everything lands together:
  • async-db uses a PostgreSQL prepared statement (PQprepare/PQexecPrepared via
    db.pg, lazily prepared per pooled connection) instead of exec_param_many's
    per-request server-side SQL re-parse — local +9%.
  • escape_html (fortunes) does ONE pass with a no-alloc fast path instead of
    replace_each's five full-string passes — local +27% fortunes.

DB profiles remain bound by the stdlib db.pg driver (text protocol), so this
narrows the gap without closing it. Both backends.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Benchmark results: vanilla-io_uring

* vanilla: armor hot-path byte writes against the V `<<` regression

Single-element array push (`arr << x`) is 4-7x slower on post-0.5.1 V
(vlang/v#27468) while bulk push_many, allocation and indexed writes are
unaffected. The two hot single-element `<<` sites are now bulk writes:

  - wi() built integer digits with `out << tmp[i]` per digit; it now itoa's
    back-to-front into the [20]u8 scratch and flushes with one push_many.
  - write_json_response() pushed the item separator `,` and closing `}`
    one byte at a time; the closing `}` is now fused with the separator
    into a single '},' / '}' push_many.

Output is byte-identical (verified across counts 0..4096 and edge-value
integers). This makes the JSON hot path fast on both the 0.5.1 release and
current master, independent of the upstream codegen regression. Both
epoll and io_uring backends.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* vanilla: build pinned V 0.5.1 from source (pinned vc bootstrap)

Build V from source at the 0.5.1 tag instead of the prebuilt release zip.
Plain `make` can't build an old tag: its latest_vc step `git pull`s the
newest vlang/vc bootstrap, which no longer matches 0.5.1's vlib (fails with
`unknown ident \`native\``). So pin vc to the commit cut for 0.5.1
(vlang/vc f461dfeb = "[v:master] 0c3183c - V 0.5.1") and run make's own
bootstrap recipe (cc -> v1 -> v2 -> v). Drop curl/unzip from the build deps.

Pinned by tag, not a master commit, because post-0.5.1 master carries a
codegen regression (single-element array push 4-7x slower, vlang/v#27468).
Both backends; verified the source-built compiler serves /json and /pipeline
correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* vanilla-epoll: serve static assets with sendfile(2) (zero-copy)

The static handler copied each asset's full prebuilt response (up to ~300 KB)
into the per-connection write_buf every request — a userspace copy plus a large
*scanned* write_buf that grows the GC's stop-the-world cost at high conn counts
(why vanilla sat ~4x behind nginx/swerver on the static profile).

Preload each asset's fd once (O_RDONLY, page-cached, borrowed for the server's
life) and a precomputed response head; serve the head into write_buf and stream
the body zero-copy via core.queue_file (sendfile(2), already wired through the
epoll backend's deferred-send + EPOLLOUT path). write_buf no longer grows, the
body is never copied, and the kernel pushes file pages straight to the socket —
the same model nginx and swerver use.

Local (vendor.js 307 KB, 64c, wrk): 25.7K -> 59.3K req/s, 7.36 -> 16.97 GB/s
(2.3x). Output verified byte-identical (md5) incl. keep-alive. epoll only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* vanilla-epoll: answer /upload by Content-Length (engine drains large bodies)

The lib now streams (drains) request bodies larger than 1 MiB instead of
buffering them, so for a large upload req.body is empty — but the byte count the
upload profile wants is the declared Content-Length. Answer by
req.content_length() (falls back to the buffered body length when absent, which
also covers small bodies that still take the buffered path).

Depends on enghitalo/vanilla#31 (adds HttpRequest.content_length() + the engine
drain); the Dockerfile clones lib main, so that PR must merge before this builds.
Local (source-built V 0.5.1): upload single-conn 45 req/s / 907 MB/s, 32c 303
req/s / 6.1 GB/s — matching the top upload servers; RSS 14 MB (was ~1 GB
buffering). epoll only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: re-trigger validate — vanilla main now provides HttpRequest.content_length() (drain #31 merged); the prior run cloned vanilla before it landed

* refactor(vanilla-epoll): route write-buffer appends through wb()/push_many

Replace the remaining `out << <[]u8>` appends (static header, error consts, the
four crud_* results, and the json-comp gzip-cache hit/store) with a wb() helper
that calls push_many, uniform with the existing ws/wi. The bit-shift `<<` in the
gz-cache key is unrelated and kept as is.

Note: V already lowers `array << array` to array_push_many, so this is codegen-
neutral — a consistency / regression-safety change (the whole write path now
takes push_many's fast path explicitly, robust if `<<` ever regresses for arrays
the way the single-element path did, vlang/v#27468). The hot single-element `<<`
was already armored by ws/wi.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(vanilla-epoll): async-db via the native pg_async driver (enghitalo/vanilla#32)

Convert the framework from the blocking db.pg ConnectionPool to vanilla's native
async Postgres driver (pg_async, vanilla#39) on the epoll async runtime. The DB
endpoints now PARK on the PG socket (ac.watch) and resume in a continuation
instead of blocking a worker thread per query — closing the async-db gap (#32).

- ServerConfig: request_handler → async_handler + make_state. Each worker owns a
  per-worker pg_async.PgPool (no cross-worker sharing, no locks) plus its own
  cache-aside and json-comp caches; the dataset/prefixes/static assets stay
  shared read-only.
- async-db, fortunes, crud (list/get/create/update) issue a query, park, and
  render in a single resume continuation that switches on a small per-request
  stash. crud_list folds page+total into ONE window-count query (count(*) OVER())
  instead of two round-trips. crud_get keeps a per-worker cache-aside (X-Cache).
- DB responses are now hand-built (ws/wi/wb), and JSONB (tags) is emitted RAW
  from its binary form — no json.encode reflection, no decode/re-encode.
- Drops the db.pg dependency entirely, so the framework also builds on master V
  (master removed pg.ConnectionPool); the non-DB hot paths are unchanged.

Validated on V master against PostgreSQL 18 (items 100k + fortune 199): every
endpoint correct (async-db items incl. binary jsonb, sorted fortunes, crud
list/get/create/update, X-Cache). Throughput: async-db ~14.3k rps @ 4.35ms p50;
/json ~376k rps. Per-worker caches warm under load (a re-GET may MISS across
workers under SO_REUSEPORT — by design, vs the old shared+mutex cache).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci: re-trigger #877 — vanilla #40 merged (net-import 0.5.1 build fix)

The previous run failed because the framework's Docker cloned vanilla main BEFORE
the fix landed: V's `net` declares C.socket with typed-enum params on the 0.5.1
tag, clashing with http_server.socket's int C.socket (socket_tcp.c.v). vanilla
PR #40 removes the net imports (socket_tcp → C.htons; pg_async → raw libc dial),
verified to compile under `v -prod .` on the true 0.5.1 tag for both
vanilla-epoll and vanilla-io_uring. This empty commit re-runs validate so it
re-clones the fixed vanilla.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(ci): cache-bust the vanilla clone so the build picks up library fixes

The Docker `RUN git clone … vanilla` layer was cached indefinitely on the
self-hosted runner, so re-runs kept building against a STALE vanilla checkout —
which is why #877 stayed red even after the build fix (vanilla #40) merged: the
build never re-cloned to get it.

Add `ADD https://api.github.com/.../refs/heads/main` before the clone in both
vanilla Dockerfiles. The fetched ref (main's SHA) changes whenever vanilla main
moves, invalidating this layer's cache and forcing a fresh clone. Adding the
step also re-clones on this build (new layer structure), so it now picks up #40.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(vanilla-epoll): share the crud + json-comp caches across workers

The crud cache-aside was per-worker (WorkerCtx), but validate.sh's crud check
does two GETs to /crud/items/42 and requires X-Cache MISS then HIT. With
SO_REUSEPORT the two requests land on different workers, so a per-worker cache
returns MISS both times → validation fails.

Move the cache-aside (and the json-comp gzip cache) into the process-shared
`Shared` (renamed from SharedRO), guarded by RwMutexes since workers are
separate threads — restoring the original shared-cache semantics. The async
Postgres pool stays per-worker (make_state); only the caches are shared.

Verified against the real pgdb-seed.sql + dataset.json: GET /crud/items/42 now
returns MISS then HIT; async-db (count=limit), crud list (5 items, total 9986,
page 1), and fortunes (202 <tr>) all match validate.sh's checks. Compiles under
`v -prod .` on the true V 0.5.1 tag.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): size the per-worker pool by usable cores, not host CPUs

Pair with vanilla's cpuset-aware max_thread_pool_size: compute the per-worker
Postgres pool size against core.max_thread_pool_size (usable cores) instead of
runtime.nr_cpus() (host count). Under api-N the engine now spawns N workers, so
per_worker = total/N gives a sane pool (e.g. 64/4=16, 64/16=4) instead of
64/128=1 — matching the async path's threads≈cores model.

Experiment for #32: test whether removing the 128-on-N-cores oversubscription
recovers the async-db / api-16 regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* revert(vanilla-epoll): restore the sync db.pg DB path (drop the async pg_async conversion)

Three clean, post-cache-bust benchmarks agree the native async pg_async path is a
net loss on the arena's LOCAL low-latency DB profiles: epoll-async vs io_uring-sync
showed sync winning api-4 ~4.9×, fortunes ~3.6×, api-16 ~1.6×, async-db ~1.2×
(io_uring even handicapped by the cpuset change). The async path is bound by DB
concurrency (pool conns) and never beats sync libpq's concurrency-via-threads for
sub-ms queries; cpuset tuning only traded api-16 for api-4. The only async win was
crud, which is cache-bound (skips the DB) — preserved by the sync framework too.

Restore main.v to the pre-conversion sync version (d1a0e73): db.pg ConnectionPool
+ request_handler, keeping ALL the sync-path wins (pipelined, static via sendfile,
upload streaming-drain, json-comp gzip cache, zero-alloc routing, shared X-Cache).
pg_async stays in the vanilla library as a capability for the case it actually wins
(latency-bound / network Postgres). The Dockerfile vanilla-clone cache-bust stays.

Verified: builds under `v -prod .` on the V 0.5.1 tag against current vanilla main.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): short-circuit /pipeline with a precomputed constant

The pipelined profile (the arena's highest-RPS test, ~35M rps) is a fixed
plaintext "ok". Match it on `target` immediately after the path is sliced and
blit a precomputed full-response constant + return — before the '?'-scan, the
route slice, and write_resp's 6-part piecewise header build. requests/pipeline.raw
is `GET /pipeline` with no query, so the exact-match is correct; the now-redundant
route=='/pipeline' arm is dropped from the dispatch chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): right-size the fortunes render builder (32KB -> content)

callgrind on the render path showed ~26% of its instructions were the zero-fill
of strings.new_builder(32768) — a flat 32 KB block for a ~1.5 KB response, re-
zeroed every request as the GC reuses it. Size it from the actual rows instead
(160 + 96/row + message bytes); an outlier grows once. The other builders here
were already content-sized.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): re-enable async-db (pg_async park/resume) on flat-state runtime

Restore the native-pg_async async DB path (per-worker PgPool via make_state;
async_handler; submit -> ac.watch(pg fd) -> .suspend -> single on_db_ready
continuation that pumps the result, renders by kind, releases the conn). This is
the proven #32 conversion (was reverted at 84f3dc9 because it REGRESSED the arena
on the old map-based reactor + per-request malloc), re-applied now that PR #41
replaced that with the flat fd-indexed reactor (no hashmap, no per-request alloc)
— the overhead that sank it is gone.

Why: fortunes (2,990 rps) / async-db (10,927) are capped at ~16-way concurrency
by sync thread-per-core blocking (the 64-conn pool sits 75% idle; CPU ~460% = 11
cores idle, waiting on PG). Park/resume frees the worker to keep many queries in
flight -> uses the whole pool -> the swerver (#1, 293k) model. This is the
EXPERIMENT to confirm #41 turns the old regression into a win; needs an arena run.

Keeps the recent sync wins (json-comp gzip cache, route-slice, /pipeline
short-circuit). Builds clean on the flat-state runtime (post-0.5.1 local V).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(vanilla-epoll): process-shared crud + json-comp caches (async X-Cache)

validate.sh failed `[crud cache-aside]: expected MISS then HIT, got MISS MISS`:
restoring the #32 async conversion brought back per-worker caches, but SO_REUSEPORT
routes the two probe GETs to different workers, so each MISSes its cold cache.
Move the crud + gz caches out of per-worker WorkerCtx into the shared SharedRO,
mutex-guarded (RwMutex) — the same process-shared model the sync path uses. Pool
stays per-worker (no lock). Builds clean; this unblocks the async benchmark.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(vanilla-epoll): un-starve the async pool (drop the >8 clamp) + decode_into

The #884 async-db regression was NOT the sub-ms-PG crossover — it was pool
starvation + load-shedding (per the regression analysis). DATABASE_MAX_CONN=256
across 16 workers should give 16 conns/worker, but a `min(8)` clamp forced 8 →
only 128 of the 256 budget used. With one-in-flight-per-conn and closed-loop load
(~64 client conns/worker), the 8-slot ceiling is hit constantly; park() then
SHEDS the overflow as an empty 200, so the closed-loop clients spin and real
throughput collapses to ~1 core (async-db -30%, fortunes -77%, api -75%). crud
was unaffected (+208%) only because it is cache-HIT served and never touches the
pool.

Fixes:
1. Drop the >8 clamp → use the full 256 budget = 16 conns/worker (2x in-flight),
   sized to Postgres max_connections.
2. Adopt request_parser.decode_into (no `!HttpRequest` boxing, ~13% of parse) —
   the same no-boxing entry the sync build now uses; recovers the json-comp/json
   non-DB delta that the async dispatch path was paying.

Builds clean (pg_async, post-0.5.1 local V; vanilla #44 with decode_into is now
in main). Follow-ups (not here): queue on pool-full instead of shedding empty
200s; hoist AsyncCtx out of the async_drain per-request loop (vanilla).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(vanilla-epoll): pipeline async-db queries across requests (increment 3)

Wire the framework onto vanilla's cross-request pipelining. park() now picks the
least-loaded connection via acquire_pipelined() (shed only when all conns are at
the max_inflight cap) instead of acquire()'s one-in-flight-per-conn — the latter
starved the per-worker pool (2 conns/worker on the arena box ⇒ ceiling of 2, then
park sheds the overflow as empty 200s; PR #884's regression). async_submit's
shed-on-full bool is now checked. on_db_ready drops the exclusive release: a
pipelined connection is not held exclusively, its in-flight slot frees when
async_on_readable pops the reply, and the reactor (per-fd watch queue) runs the
connection's parked requests front-first so the FIFO reply aligns with each
request's Stash. async-db/fortunes/crud-list all flow through park(), so all gain
conns×N concurrency. Builds against the local pipelining vanilla.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci(vanilla-epoll): pin vanilla to the pipelining branch to bench before merge

TEMPORARY: point the Dockerfile clone + cache-bust ADD at vanilla's
feat/pg-async-pipelining (PR #45) so this arena PR builds against the
cross-request pipelining library and can benchmark before #45 merges. Revert to
refs/heads/main once #45 lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): restore /pipeline skip-decode fast path

Local profiling (callgrind on a gcc-built binary, isolated gcannon) showed this
branch lost the `has_pipeline_prefix` fast path the #877 branch had: handle() ran
decode_into + parse_http1_request_line on EVERY request, so the highest-RPS
/pipeline test paid the full HTTP parse (~55% of the per-request CPU; the
in-handle parse alone ~17%) for a fixed response.

Restore it: match the raw `GET /pipeline ` prefix and blit pipeline_resp BEFORE
any parsing. The request is already framed by the reactor, so decode adds nothing
here. After: the per-request /pipeline profile has ZERO parse functions, and local
throughput rose from 90% to 96% of the bare-C epoll floor (gc none). The remaining
gap + the boehm-vs-gc-none delta is per-request allocation/GC (next).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci(vanilla-epoll): build against vanilla main (pipelining + recv-path merged)

vanilla#45 (cross-request pipelining: driver + reactor + pool + the alloc-free
recv path) is merged to vanilla main, so drop the temporary feat/pg-async-pipelining
pin and clone main again (with the main-ref cache-bust). Keeps -gc none. The
async-db +329% / pipelined +1622% results were against this exact code (main now
== the merged branch), so no re-bench needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): arm pooled PG conns with watch_persistent

Park DB queries on the pooled connection via ac.watch_persistent so a
client disconnecting mid-query no longer closes (and forces a reconnect +
re-auth on) the pooled connection: the runtime drains the orphaned reply
in order and keeps the conn open for reuse. Both the initial park and the
not-ready re-arm use it (the single-watch path resets the slot, so the
re-arm must re-stamp the pool-owned flag).

Depends on vanilla watch_persistent (enghitalo/vanilla#47); land after it
merges + vanilla main is updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): render DB responses into a reused per-worker buffer

render_async_db / render_fortunes / render_crud_list each allocated a fresh
response-body []u8 (4 KiB / 32 KiB / 8 KiB) per request. The binary ships
`-gc none`, so those buffers are never freed — a multi-GiB leak under DB load
(async-db measured ~12 KiB/request total, tens of GiB on the arena).

Build the body in a single per-worker scratch buffer (WorkerCtx.scratch), reset
to len 0 each response; it grows to a high-water mark then stays. Safe because a
worker serves one request at a time (no concurrency). Paired with the pg_async
per-connection frames-buffer pool, this takes async-db from 11,971 -> 1,263
bytes/request leaked under `-gc none` (-89.5%) locally; the Boehm build is dead
flat (0 B/req, 41 MiB) and `-gc none` is now FASTER than Boehm on async-db
(48.8K vs 44.2K req/s) since it no longer thrashes an ever-growing heap.

(render_fortunes still allocates its Fortune vector + per-row message copies;
that and the residual ~1.3 KiB/request of async-db allocs are a follow-up.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): zero-alloc DB request path (Stage B, -gc none)

Eliminate the remaining per-request heap allocations on the DB routes (which leak
under the binary's `-gc none` build). With the pg_async/request_parser Stage B
(reused submit scratch, no .bytes() in the wire builders, no-alloc query parse),
async-db drops from 1,263 -> 159 bytes/request (11,971 -> 159 across Stage A+B,
-98.7%); the Boehm build is dead flat.

- Bind params: replace the per-request `[?[]u8(x.str().bytes()), ...]` literals in
  every start_* with reused per-worker buffers — param_scratch (int params as
  decimal bytes) + params_buf (the []?[]u8), refilled via push_int/push_bytes. The
  borrowed slices are copied by write_bind synchronously inside park, so they never
  outlive the call. param_scratch cap (256) ≫ worst case (5×20) so it never
  reallocates mid-request (which would dangle already-pushed slices).
- Query parse: qint parses i64 in place (parse_i64_slice); qstr_slice returns a
  borrowed []u8 view instead of .clone(). Shed-path fallbacks are module consts
  (no per-request `.bytes()`).
- Stash: a per-worker free-list (stash_pool) instead of `&Stash{}` per request;
  returned only on the terminal .done path — never on the not-ready re-arm, where
  it stays live as the watch udata (incl. a FIX 3 dead tombstone). Statement form
  for the borrow (a `&Struct{}` if-expression branch miscompiles under -g, #27485).
- /fortunes: reused fortunes_buf with BORROWED message views (no bytestr().clone()),
  an explicit byte comparator for the sort, and escape_html_into that escapes
  directly into the render scratch (no Builder/string per row).

Gated: all routes correct vs PG18 (async-db, fortunes sort+escape incl. the
<script> XSS row, crud list/get/create/update round-trip, cache MISS->HIT,
baseline11). Builds clean -prod and -g. Needs the pg_async/request_parser Stage B
(enghitalo/vanilla); land after that merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(vanilla-epoll): handle i64::MIN in wi() integer formatter

wi()'s itoa negated n into a signed accumulator (`x = -x`), which overflows
for i64::MIN — its magnitude isn't representable as i64 — leaving x negative
so the digit loop never ran and only '-' was emitted. Build the magnitude in
u64 instead (-(n+1)+1, with the +1 done in u64).

Unreachable from current routes (ids are 32-bit, query ints are clamped) so
this is not a live bug, but it's a latent correctness hole in a general
integer helper. Verified against i64::MIN/MAX, 0, and assorted +/- values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(vanilla-epoll): zero-alloc int responses on /baseline11 and /upload

Both built their body with `n.str()` — an int->string heap allocation on every
request that leaks under -gc none. callgrind on /baseline11 showed 1.002
allocs/request, all impl_i64_to_string; at 3.4M RPS that path alone was ~6 GiB
RSS in the arena. Format the int into the reused per-worker scratch via a new
emit_int() helper instead (the same render-scratch pattern the DB paths use).

The read DB paths (async-db, fortunes, crud-list) are already callgrind-clean
per request; this closes the general/plaintext response path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(vanilla-epoll): 503 (not 400/404) when the crud DB pool sheds under load

Under DB-pipeline saturation, park() returns the caller's fallback. The read
paths shed to a benign empty 200, but crud create/update/get shed to
bad_request (400) / not_found (404) — misreporting a backpressure shed as a
client error. At arena scale (4096 conns) this surfaced as ~1.4% "unexpected
status" on crud, while every other framework AND the previously-recorded
vanilla-epoll showed 0 with the SAME gcannon (its requests are well-formed —
confirmed by faithful fixture replay: 100% 2xx at 16 and 96 threads, fresh-conn
and keep-alive+reconnect; reproduced the 400 only by forcing pool saturation
with DATABASE_MAX_CONN=1).

Only the shed fallback for the three crud write/get paths becomes 503 Service
Unavailable — the honest backpressure status. Genuine 400 (malformed JSON body)
and 404 (missing item) are unchanged.

Reducing the shed itself (pool / max_inflight capacity, which trades memory) and
a holistic shed policy (read paths also shed to an empty 200) are deferred to a
measured investigation tracked upstream in vanilla.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf+fix(vanilla-epoll): zero-alloc chunked-body parse on /baseline11 (reused scratch)

/baseline11's chunked-POST path parsed the body with `dechunk(s string) string`,
which built a strings.Builder + .str() (and strconv_hex's trim_space) per request
— a permanent leak under -gc none. In the arena baseline mix (1/3 of the
templates are chunked POSTs, at ~3.8M req/s) this was the ~6 GiB RSS the #888
re-bench still showed after emit_int closed the GET path.

Replace with `(mut w WorkerCtx) body_int()` dechunking into a reused per-worker
scratch (WorkerCtx.dechunk_buf): byte-walk the chunked region (dechunk_into),
append data bytes via push_many, parse the integer in place (parse_i64_slice).
No allocation in steady state. Verified byte-identical to the old dechunk on
valid bodies (single/multi-chunk, hex/uppercase sizes, 0x100, chunk-extensions)
and safe on malformed input.

Also hardens a latent OOB read / DoS: the chunk-size range check is now
overflow-safe — `size > end - data_start` instead of `data_start + size > end`,
which wraps i32 for a crafted chunk size near 0x7fffffff, slipping past the guard
and feeding a ~2 GiB out-of-bounds read into push_many (remotely-triggerable
worker segfault). The old dechunk hit the same input as a bounds-checked panic;
this makes it a controlled, bounded reject. parse_hex_slice now accumulates in
i64 and saturates so the size value itself can't wrap. A 5-lens adversarial
review (correctness, scratch-reuse lifetime, zero-alloc, method/caller parity)
returned GO after this fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Benchmark results: vanilla-epoll

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants