vanilla-epoll: re-enable async-db (pg_async park/resume) on the flat-state runtime by enghitalo · Pull Request #884 · MDA2AV/HttpArena

enghitalo · 2026-06-17T11:13:57Z

Experiment — benchmark vs #877 (the sync baseline).

What

Switches the vanilla-epoll DB path from the synchronous db.pg connection pool back to the native pg_async park-and-resume model: per-worker PgPool via make_state, an async_handler that submits the query and ac.watch(pg_fd) → .suspend, and a single on_db_ready continuation that pumps the result, renders by kind, and releases the connection. Non-DB routes (/pipeline, /baseline11, /json, /static, /upload) return .done immediately.

Why

On the sync path, fortunes (~3.0k rps) and async-db (~10.9k rps) are capped at ~16-way concurrency by thread-per-core blocking — each worker thread blocks on acquire → exec → release, so the 64-connection pool sits ~75% idle and the server runs at ~460% CPU (≈11 cores idle, parked in libpq recv waiting on Postgres). Park-and-resume frees the worker to keep many queries in flight → uses the whole pool → the model the current async-db leader (swerver, ~370k) uses.

Why now (it was reverted before)

This is the #32 conversion that was reverted (regressed the arena) because the old map-based reactor + a per-request malloc per watch cost more than the sync path won at sub-ms local-PG latency. enghitalo/vanilla#41 replaced that with a flat fd-indexed reactor (no hashmap, no per-request allocation) — the exact overhead that sank it. This PR is the experiment to confirm #41 turns the regression into a win.

Notes for review

Stacked on vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape #877: this branch is vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape #877 (json-comp gzip cache, route-slice, /pipeline short-circuit) plus the async DB re-conversion — the delta vs vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape #877 is the DB path going async. Build verified clean against the flat-state runtime.
The async framework code itself is the previously-validated feat: /validate and /benchmark slash commands for PRs #32 conversion (it passed validate.sh before); only the runtime underneath changed.

What to watch in the benchmark

Should rise: fortunes, async-db, crud (target 4–12× as concurrency lifts from ~16 to pool size).
Must NOT regress: baseline, json, json-comp (Add spring framework #2), pipelined, static (now ~1.2× from Metadata improvements to add more info #1) — these are non-DB and return .done immediately, so they should match vanilla: json-comp cache, zero-alloc routing, DB prepared statement + HTML escape #877.

/benchmark -f vanilla-epoll

🤖 Generated with Claude Code

json-comp recompressed the gzip body on EVERY request even though the output for a given (count, m) is fully deterministic — and gzip CPU, not allocation, dominates that profile. Cache the COMPLETE gzipped response per (count, m) and append the cached copy on a hit (bounded map, RwMutex). The benchmark hits only a handful of (count, m) pairs, so the cache stays tiny. Also route on the path WITHOUT allocating: a tos() view into the request buffer instead of all_before('?')'s per-request string copy (one alloc per request on the hot path), shaving GC churn off baseline/json too. Local before/after (16-core loopback, gcannon, single listener): json-comp 58K -> 390K req/s (+570%, 6.7x) Correctness verified: gzip body decodes to the right items/count/total; the cached response is byte-identical across requests; all other routes unchanged. Applies to both the epoll and io_uring variants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove the remaining small per-request allocations on the hot path: • qint/qstr took a `string` key and called `key.bytes()` every request (one []u8 alloc per parameter — baseline parses a+b, async-db min+max+limit…). Keys are now precomputed `const []u8` (qk_*), built once at init. • /json/<n> and /crud/items/<id> parsed the id via route[n..].i64(), a substring copy. parse_u_at() reads the digits straight from the path view. Local before/after (16-core loopback) is within noise (baseline ~528K→530K, json ~206K→212K) — these allocs are tiny next to the response builder MDA2AV#866 removed — but allocation scaled hard on the 64-core arena (json +322% there), so this trims more GC churn for that environment at zero cost. Note: @[manualfree] is a no-op under the GC build the arena uses (`v -prod` = Boehm GC; manualfree only affects -autofree), so reducing allocations is the lever, not manualfree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…scape Folds the DB-path work into this PR so everything lands together: • async-db uses a PostgreSQL prepared statement (PQprepare/PQexecPrepared via db.pg, lazily prepared per pooled connection) instead of exec_param_many's per-request server-side SQL re-parse — local +9%. • escape_html (fortunes) does ONE pass with a no-alloc fast path instead of replace_each's five full-string passes — local +27% fortunes. DB profiles remain bound by the stdlib db.pg driver (text protocol), so this narrows the gap without closing it. Both backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Single-element array push (`arr << x`) is 4-7x slower on post-0.5.1 V (vlang/v#27468) while bulk push_many, allocation and indexed writes are unaffected. The two hot single-element `<<` sites are now bulk writes: - wi() built integer digits with `out << tmp[i]` per digit; it now itoa's back-to-front into the [20]u8 scratch and flushes with one push_many. - write_json_response() pushed the item separator `,` and closing `}` one byte at a time; the closing `}` is now fused with the separator into a single '},' / '}' push_many. Output is byte-identical (verified across counts 0..4096 and edge-value integers). This makes the JSON hot path fast on both the 0.5.1 release and current master, independent of the upstream codegen regression. Both epoll and io_uring backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Build V from source at the 0.5.1 tag instead of the prebuilt release zip. Plain `make` can't build an old tag: its latest_vc step `git pull`s the newest vlang/vc bootstrap, which no longer matches 0.5.1's vlib (fails with `unknown ident \`native\``). So pin vc to the commit cut for 0.5.1 (vlang/vc f461dfeb = "[v:master] 0c3183c - V 0.5.1") and run make's own bootstrap recipe (cc -> v1 -> v2 -> v). Drop curl/unzip from the build deps. Pinned by tag, not a master commit, because post-0.5.1 master carries a codegen regression (single-element array push 4-7x slower, vlang/v#27468). Both backends; verified the source-built compiler serves /json and /pipeline correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The static handler copied each asset's full prebuilt response (up to ~300 KB) into the per-connection write_buf every request — a userspace copy plus a large *scanned* write_buf that grows the GC's stop-the-world cost at high conn counts (why vanilla sat ~4x behind nginx/swerver on the static profile). Preload each asset's fd once (O_RDONLY, page-cached, borrowed for the server's life) and a precomputed response head; serve the head into write_buf and stream the body zero-copy via core.queue_file (sendfile(2), already wired through the epoll backend's deferred-send + EPOLLOUT path). write_buf no longer grows, the body is never copied, and the kernel pushes file pages straight to the socket — the same model nginx and swerver use. Local (vendor.js 307 KB, 64c, wrk): 25.7K -> 59.3K req/s, 7.36 -> 16.97 GB/s (2.3x). Output verified byte-identical (md5) incl. keep-alive. epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bodies) The lib now streams (drains) request bodies larger than 1 MiB instead of buffering them, so for a large upload req.body is empty — but the byte count the upload profile wants is the declared Content-Length. Answer by req.content_length() (falls back to the buffered body length when absent, which also covers small bodies that still take the buffered path). Depends on enghitalo/vanilla#31 (adds HttpRequest.content_length() + the engine drain); the Dockerfile clones lib main, so that PR must merge before this builds. Local (source-built V 0.5.1): upload single-conn 45 req/s / 907 MB/s, 32c 303 req/s / 6.1 GB/s — matching the top upload servers; RSS 14 MB (was ~1 GB buffering). epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nt_length() (drain MDA2AV#31 merged); the prior run cloned vanilla before it landed

…_many Replace the remaining `out << <[]u8>` appends (static header, error consts, the four crud_* results, and the json-comp gzip-cache hit/store) with a wb() helper that calls push_many, uniform with the existing ws/wi. The bit-shift `<<` in the gz-cache key is unrelated and kept as is. Note: V already lowers `array << array` to array_push_many, so this is codegen- neutral — a consistency / regression-safety change (the whole write path now takes push_many's fast path explicitly, robust if `<<` ever regresses for arrays the way the single-element path did, vlang/v#27468). The hot single-element `<<` was already armored by ws/wi. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…lo/vanilla#32) Convert the framework from the blocking db.pg ConnectionPool to vanilla's native async Postgres driver (pg_async, vanilla#39) on the epoll async runtime. The DB endpoints now PARK on the PG socket (ac.watch) and resume in a continuation instead of blocking a worker thread per query — closing the async-db gap (MDA2AV#32). - ServerConfig: request_handler → async_handler + make_state. Each worker owns a per-worker pg_async.PgPool (no cross-worker sharing, no locks) plus its own cache-aside and json-comp caches; the dataset/prefixes/static assets stay shared read-only. - async-db, fortunes, crud (list/get/create/update) issue a query, park, and render in a single resume continuation that switches on a small per-request stash. crud_list folds page+total into ONE window-count query (count(*) OVER()) instead of two round-trips. crud_get keeps a per-worker cache-aside (X-Cache). - DB responses are now hand-built (ws/wi/wb), and JSONB (tags) is emitted RAW from its binary form — no json.encode reflection, no decode/re-encode. - Drops the db.pg dependency entirely, so the framework also builds on master V (master removed pg.ConnectionPool); the non-DB hot paths are unchanged. Validated on V master against PostgreSQL 18 (items 100k + fortune 199): every endpoint correct (async-db items incl. binary jsonb, sorted fortunes, crud list/get/create/update, X-Cache). Throughput: async-db ~14.3k rps @ 4.35ms p50; /json ~376k rps. Per-worker caches warm under load (a re-GET may MISS across workers under SO_REUSEPORT — by design, vs the old shared+mutex cache). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…1 build fix) The previous run failed because the framework's Docker cloned vanilla main BEFORE the fix landed: V's `net` declares C.socket with typed-enum params on the 0.5.1 tag, clashing with http_server.socket's int C.socket (socket_tcp.c.v). vanilla PR MDA2AV#40 removes the net imports (socket_tcp → C.htons; pg_async → raw libc dial), verified to compile under `v -prod .` on the true 0.5.1 tag for both vanilla-epoll and vanilla-io_uring. This empty commit re-runs validate so it re-clones the fixed vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ixes The Docker `RUN git clone … vanilla` layer was cached indefinitely on the self-hosted runner, so re-runs kept building against a STALE vanilla checkout — which is why MDA2AV#877 stayed red even after the build fix (vanilla MDA2AV#40) merged: the build never re-cloned to get it. Add `ADD https://api.github.com/.../refs/heads/main` before the clone in both vanilla Dockerfiles. The fetched ref (main's SHA) changes whenever vanilla main moves, invalidating this layer's cache and forcing a fresh clone. Adding the step also re-clones on this build (new layer structure), so it now picks up MDA2AV#40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The crud cache-aside was per-worker (WorkerCtx), but validate.sh's crud check does two GETs to /crud/items/42 and requires X-Cache MISS then HIT. With SO_REUSEPORT the two requests land on different workers, so a per-worker cache returns MISS both times → validation fails. Move the cache-aside (and the json-comp gzip cache) into the process-shared `Shared` (renamed from SharedRO), guarded by RwMutexes since workers are separate threads — restoring the original shared-cache semantics. The async Postgres pool stays per-worker (make_state); only the caches are shared. Verified against the real pgdb-seed.sql + dataset.json: GET /crud/items/42 now returns MISS then HIT; async-db (count=limit), crud list (5 items, total 9986, page 1), and fortunes (202 <tr>) all match validate.sh's checks. Compiles under `v -prod .` on the true V 0.5.1 tag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…st CPUs Pair with vanilla's cpuset-aware max_thread_pool_size: compute the per-worker Postgres pool size against core.max_thread_pool_size (usable cores) instead of runtime.nr_cpus() (host count). Under api-N the engine now spawns N workers, so per_worker = total/N gives a sane pool (e.g. 64/4=16, 64/16=4) instead of 64/128=1 — matching the async path's threads≈cores model. Experiment for MDA2AV#32: test whether removing the 128-on-N-cores oversubscription recovers the async-db / api-16 regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… pg_async conversion) Three clean, post-cache-bust benchmarks agree the native async pg_async path is a net loss on the arena's LOCAL low-latency DB profiles: epoll-async vs io_uring-sync showed sync winning api-4 ~4.9×, fortunes ~3.6×, api-16 ~1.6×, async-db ~1.2× (io_uring even handicapped by the cpuset change). The async path is bound by DB concurrency (pool conns) and never beats sync libpq's concurrency-via-threads for sub-ms queries; cpuset tuning only traded api-16 for api-4. The only async win was crud, which is cache-bound (skips the DB) — preserved by the sync framework too. Restore main.v to the pre-conversion sync version (d1a0e73): db.pg ConnectionPool + request_handler, keeping ALL the sync-path wins (pipelined, static via sendfile, upload streaming-drain, json-comp gzip cache, zero-alloc routing, shared X-Cache). pg_async stays in the vanilla library as a capability for the case it actually wins (latency-bound / network Postgres). The Dockerfile vanilla-clone cache-bust stays. Verified: builds under `v -prod .` on the V 0.5.1 tag against current vanilla main. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The pipelined profile (the arena's highest-RPS test, ~35M rps) is a fixed plaintext "ok". Match it on `target` immediately after the path is sliced and blit a precomputed full-response constant + return — before the '?'-scan, the route slice, and write_resp's 6-part piecewise header build. requests/pipeline.raw is `GET /pipeline` with no query, so the exact-match is correct; the now-redundant route=='/pipeline' arm is dropped from the dispatch chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…content) callgrind on the render path showed ~26% of its instructions were the zero-fill of strings.new_builder(32768) — a flat 32 KB block for a ~1.5 KB response, re- zeroed every request as the GC reuses it. Size it from the actual rows instead (160 + 96/row + message bytes); an outlier grows once. The other builders here were already content-sized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…t-state runtime Restore the native-pg_async async DB path (per-worker PgPool via make_state; async_handler; submit -> ac.watch(pg fd) -> .suspend -> single on_db_ready continuation that pumps the result, renders by kind, releases the conn). This is the proven MDA2AV#32 conversion (was reverted at 84f3dc9 because it REGRESSED the arena on the old map-based reactor + per-request malloc), re-applied now that PR MDA2AV#41 replaced that with the flat fd-indexed reactor (no hashmap, no per-request alloc) — the overhead that sank it is gone. Why: fortunes (2,990 rps) / async-db (10,927) are capped at ~16-way concurrency by sync thread-per-core blocking (the 64-conn pool sits 75% idle; CPU ~460% = 11 cores idle, waiting on PG). Park/resume frees the worker to keep many queries in flight -> uses the whole pool -> the swerver (#1, 293k) model. This is the EXPERIMENT to confirm MDA2AV#41 turns the old regression into a win; needs an arena run. Keeps the recent sync wins (json-comp gzip cache, route-slice, /pipeline short-circuit). Builds clean on the flat-state runtime (post-0.5.1 local V). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

enghitalo · 2026-06-17T11:14:17Z

/benchmark -f vanilla-epoll

github-actions · 2026-06-17T11:14:31Z

👋 /benchmark request received. A collaborator will review and approve the run.

…ache) validate.sh failed `[crud cache-aside]: expected MISS then HIT, got MISS MISS`: restoring the MDA2AV#32 async conversion brought back per-worker caches, but SO_REUSEPORT routes the two probe GETs to different workers, so each MISSes its cold cache. Move the crud + gz caches out of per-worker WorkerCtx into the shared SharedRO, mutex-guarded (RwMutex) — the same process-shared model the sync path uses. Pool stays per-worker (no lock). Builds clean; this unblocks the async benchmark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

enghitalo · 2026-06-17T11:37:21Z

/benchmark -f vanilla-epoll

github-actions · 2026-06-17T11:37:36Z

👋 /benchmark request received. A collaborator will review and approve the run.

github-actions · 2026-06-17T11:52:54Z

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test	Conn	RPS	CPU	Mem	Δ RPS	Δ Mem
baseline	512	654,110	1363.5%	62MiB	+40.5%	-13.9%
baseline	4096	613,239	1374.1%	192MiB	+42.9%	-0.5%
pipelined	512	36,511,753	6684.5%	52MiB	+1434.3%	-1.9%
pipelined	4096	36,429,398	6681.1%	176MiB	+1482.7%	-5.9%
limited-conn	512	240,408	760.7%	60MiB	+10.8%	-3.2%
limited-conn	4096	376,667	1173.6%	178MiB	+24.1%	-0.6%
json	4096	878,433	3095.9%	183MiB	+81.2%	+3.4%
json-comp	512	541,058	1555.4%	61MiB	+1005.7%	-29.9%
json-comp	4096	593,362	1730.0%	178MiB	+939.7%	-3.8%
json-comp	16384	654,031	1719.6%	635MiB	+825.0%	-9.8%
upload	32	11,950	463.1%	116MiB	+39733.3%	-86.3%
upload	256	13,112	880.0%	238MiB	+23314.3%	-83.4%
api-4	256	8,132	101.6%	55MiB	-75.2%	-22.5%
api-16	1024	10,717	135.7%	84MiB	-45.7%	-9.7%
static	1024	1,066,646	5700.9%	99MiB	+235.2%	-86.7%
static	4096	1,139,529	5911.8%	293MiB	+328.1%	-89.8%
static	6800	1,164,958	5816.5%	462MiB	+458.4%	-90.0%
async-db	1024	7,570	143.7%	82MiB	-30.3%	-18.0%
crud	4096	443,697	1536.5%	416MiB	+271.0%	+8.9%
fortunes	1024	938	163.5%	80MiB	-64.7%	-44.4%

Full log

  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   8.49ms   1.67ms   19.50ms   78.50ms   409.60ms

  5818540 requests in 15.00s, 5818349 responses
  Throughput: 387.82K req/s
  Bandwidth:  97.00MB/s
  Status codes: 2xx=4956711, 3xx=0, 4xx=861638, 5xx=0
  Latency samples: 5818349 / 5818349 responses (100.0%)
  Reconnects: 28118
  Per-template: 162620,200670,236321,271385,305053,334052,360435,382709,396569,404983,408556,412324,415703,417007,418763,418477,27378,41557,81685,122102
  Per-template-ok: 84399,162561,197143,230365,263751,292985,319157,341135,355712,363725,367305,371073,373756,375465,377513,377261,27378,5983,21301,48743

  WARNING: 861638/5818349 responses (14.8%) had unexpected status (expected 2xx)
[info] CPU 1192.7% | Mem 334MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.08ms   1.39ms   12.60ms   85.00ms   238.40ms

  7256781 requests in 15.00s, 7256891 responses
  Throughput: 483.70K req/s
  Bandwidth:  128.79MB/s
  Status codes: 2xx=6655456, 3xx=0, 4xx=601427, 5xx=0
  Latency samples: 7256776 / 7256891 responses (100.0%)
  Reconnects: 35770
  Per-template: 162329,202556,242530,281889,321523,360547,399007,435079,467918,502792,534210,563937,588540,611868,630573,648741,56184,42291,82064,122198
  Per-template-ok: 94060,181427,219189,256905,295446,334450,373018,408610,441707,475814,507349,537133,561801,584941,603651,621903,56184,13613,30848,57301

  WARNING: 601435/7256891 responses (8.3%) had unexpected status (expected 2xx)
[info] CPU 1536.5% | Mem 416MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.14ms   1.36ms   12.00ms   92.20ms   244.50ms

  7145000 requests in 15.00s, 7145002 responses
  Throughput: 476.24K req/s
  Bandwidth:  127.14MB/s
  Status codes: 2xx=6534585, 3xx=0, 4xx=610417, 5xx=0
  Latency samples: 7145001 / 7145002 responses (100.0%)
  Reconnects: 35132
  Per-template: 163524,203569,244291,284626,323810,361613,398326,431021,462520,493296,520037,546319,571362,594533,608855,624202,62881,44095,83118,123003
  Per-template-ok: 94451,182051,220472,259147,297168,334973,371500,404223,435183,466316,493383,519448,544273,567229,581604,596837,62881,16121,31282,56042

  WARNING: 610417/7145002 responses (8.5%) had unexpected status (expected 2xx)
[info] CPU 1506.5% | Mem 515MiB

=== Best: 443697 req/s (CPU: 1536.5%, Mem: 416MiB) ===
[info] input BW: 38.08MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   2.67ms   3.30ms   3.44ms   4.25ms   40.70ms

  3906 requests in 5.00s, 3906 responses
  Throughput: 780 req/s
  Bandwidth:  18.52MB/s
  Status codes: 2xx=3906, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 3906 / 3906 responses (100.0%)
[info] CPU 162.1% | Mem 61MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   8.02ms    881us   4.17ms   7.75ms    4.97s

  4285 requests in 5.00s, 4285 responses
  Throughput: 856 req/s
  Bandwidth:  20.32MB/s
  Status codes: 2xx=4285, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4285 / 4285 responses (100.0%)
[info] CPU 154.8% | Mem 75MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   14.11ms    974us   4.99ms   6.85ms    3.81s

  4694 requests in 5.00s, 4694 responses
  Throughput: 938 req/s
  Bandwidth:  22.26MB/s
  Status codes: 2xx=4694, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4694 / 4694 responses (100.0%)
[info] CPU 163.5% | Mem 80MiB

=== Best: 938 req/s (CPU: 163.5%, Mem: 80MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

…code_into The MDA2AV#884 async-db regression was NOT the sub-ms-PG crossover — it was pool starvation + load-shedding (per the regression analysis). DATABASE_MAX_CONN=256 across 16 workers should give 16 conns/worker, but a `min(8)` clamp forced 8 → only 128 of the 256 budget used. With one-in-flight-per-conn and closed-loop load (~64 client conns/worker), the 8-slot ceiling is hit constantly; park() then SHEDS the overflow as an empty 200, so the closed-loop clients spin and real throughput collapses to ~1 core (async-db -30%, fortunes -77%, api -75%). crud was unaffected (+208%) only because it is cache-HIT served and never touches the pool. Fixes: 1. Drop the >8 clamp → use the full 256 budget = 16 conns/worker (2x in-flight), sized to Postgres max_connections. 2. Adopt request_parser.decode_into (no `!HttpRequest` boxing, ~13% of parse) — the same no-boxing entry the sync build now uses; recovers the json-comp/json non-DB delta that the async dispatch path was paying. Builds clean (pg_async, post-0.5.1 local V; vanilla MDA2AV#44 with decode_into is now in main). Follow-ups (not here): queue on pool-full instead of shedding empty 200s; hoist AsyncCtx out of the async_drain per-request loop (vanilla). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

enghitalo · 2026-06-17T12:16:32Z

/benchmark -f vanilla-epoll

github-actions · 2026-06-17T12:16:42Z

👋 /benchmark request received. A collaborator will review and approve the run.

github-actions · 2026-06-17T12:32:12Z

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test	Conn	RPS	CPU	Mem	Δ RPS	Δ Mem
baseline	512	652,494	1368.3%	65MiB	+40.1%	-9.7%
baseline	4096	648,123	1438.2%	190MiB	+51.0%	-1.6%
pipelined	512	37,256,216	6674.5%	52MiB	+1465.6%	-1.9%
pipelined	4096	37,086,342	6486.7%	184MiB	+1511.2%	-1.6%
limited-conn	512	248,576	780.4%	61MiB	+14.6%	-1.6%
limited-conn	4096	368,806	1098.1%	177MiB	+21.5%	-1.1%
json	4096	788,164	2795.4%	182MiB	+62.6%	+2.8%
json-comp	512	496,609	1458.6%	60MiB	+914.8%	-31.0%
json-comp	4096	620,195	1793.6%	186MiB	+986.7%	+0.5%
json-comp	16384	655,782	1748.3%	640MiB	+827.5%	-9.1%
upload	32	9,932	393.3%	76MiB	+33006.7%	-91.0%
upload	256	12,536	836.2%	186MiB	+22285.7%	-87.0%
api-4	256	8,435	102.5%	56MiB	-74.3%	-21.1%
api-16	1024	10,728	138.2%	83MiB	-45.7%	-10.8%
static	1024	1,070,817	5635.4%	99MiB	+236.5%	-86.7%
static	4096	1,137,556	5889.8%	294MiB	+327.4%	-89.7%
static	6800	1,174,627	5844.3%	460MiB	+463.1%	-90.0%
async-db	1024	7,388	145.7%	81MiB	-32.0%	-19.0%
crud	4096	442,600	1533.4%	472MiB	+270.1%	+23.6%
fortunes	1024	1,026	177.6%	71MiB	-61.4%	-50.7%

Full log

  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   7.61ms   1.65ms   18.00ms   77.20ms   319.00ms

  6314155 requests in 15.00s, 6313132 responses
  Throughput: 420.80K req/s
  Bandwidth:  106.53MB/s
  Status codes: 2xx=5424265, 3xx=0, 4xx=888867, 5xx=0
  Latency samples: 6313132 / 6313132 responses (100.0%)
  Reconnects: 30631
  Per-template: 162620,202292,241408,278430,314983,348803,377228,401562,422216,441109,453542,466913,477278,481408,479176,481982,35480,42219,81933,122550
  Per-template-ok: 83304,162342,199970,236271,272113,306654,334326,358392,378927,397933,410005,423592,434898,438831,435952,438956,35480,6920,21299,48100

  WARNING: 888867/6313132 responses (14.1%) had unexpected status (expected 2xx)
[info] CPU 1274.3% | Mem 374MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.06ms   1.41ms   11.70ms   91.80ms   233.40ms

  7243514 requests in 15.00s, 7243514 responses
  Throughput: 482.80K req/s
  Bandwidth:  128.57MB/s
  Status codes: 2xx=6639004, 3xx=0, 4xx=604510, 5xx=0
  Latency samples: 7243509 / 7243514 responses (100.0%)
  Reconnects: 35673
  Per-template: 162608,202901,243301,283143,322705,361211,398580,434857,469745,503499,534490,560991,586649,608565,625091,641510,56661,42578,82307,122117
  Per-template-ok: 94182,181371,219727,258027,296752,334762,372314,408294,442755,476386,508049,534180,559396,581462,598239,614715,56661,14802,29430,57496

  WARNING: 604510/7243514 responses (8.3%) had unexpected status (expected 2xx)
[info] CPU 1533.4% | Mem 472MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.19ms   1.43ms   11.90ms   90.40ms   236.30ms

  7099093 requests in 15.00s, 7099094 responses
  Throughput: 473.19K req/s
  Bandwidth:  126.34MB/s
  Status codes: 2xx=6486550, 3xx=0, 4xx=612544, 5xx=0
  Latency samples: 7099082 / 7099094 responses (100.0%)
  Reconnects: 34845
  Per-template: 162902,202465,242543,281729,321241,359629,396603,432125,463239,492837,520824,544687,567958,586327,597987,610242,63985,44633,83698,123428
  Per-template-ok: 95323,180508,218416,255992,294173,333035,369469,404633,435844,465203,493593,516980,540500,558761,570443,582664,63985,16784,32102,58132

  WARNING: 612544/7099094 responses (8.6%) had unexpected status (expected 2xx)
[info] CPU 1483.8% | Mem 569MiB

=== Best: 442600 req/s (CPU: 1533.4%, Mem: 472MiB) ===
[info] input BW: 37.99MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   3.13ms   3.43ms   4.33ms   7.74ms   32.60ms

  4102 requests in 5.00s, 4102 responses
  Throughput: 820 req/s
  Bandwidth:  19.45MB/s
  Status codes: 2xx=4102, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4102 / 4102 responses (100.0%)
[info] CPU 169.1% | Mem 60MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   2.67ms   3.84ms   4.01ms   5.36ms   8.40ms

  3937 requests in 5.00s, 3937 responses
  Throughput: 787 req/s
  Bandwidth:  18.66MB/s
  Status codes: 2xx=3937, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 3937 / 3937 responses (100.0%)
[info] CPU 153.2% | Mem 73MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.95ms    769us   3.94ms   4.43ms   5.62ms

  5131 requests in 5.00s, 5131 responses
  Throughput: 1.02K req/s
  Bandwidth:  24.32MB/s
  Status codes: 2xx=5131, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 5131 / 5131 responses (100.0%)
[info] CPU 177.6% | Mem 71MiB

=== Best: 1026 req/s (CPU: 177.6%, Mem: 71MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

github-actions · 2026-06-17T12:47:47Z

Benchmark Results

Framework: vanilla-epoll | Test: all tests

Test	Conn	RPS	CPU	Mem	Δ RPS	Δ Mem
baseline	512	670,537	1409.2%	65MiB	+44.0%	-9.7%
baseline	4096	623,592	1352.0%	190MiB	+45.3%	-1.6%
pipelined	512	37,090,396	6660.6%	54MiB	+1458.6%	+1.9%
pipelined	4096	36,746,598	6462.5%	177MiB	+1496.4%	-5.3%
limited-conn	512	247,183	797.2%	60MiB	+14.0%	-3.2%
limited-conn	4096	356,800	1111.5%	175MiB	+17.6%	-2.2%
json	4096	879,811	3037.6%	184MiB	+81.5%	+4.0%
json-comp	512	481,534	1356.0%	62MiB	+884.0%	-28.7%
json-comp	4096	599,980	1750.1%	182MiB	+951.3%	-1.6%
json-comp	16384	624,326	1722.0%	644MiB	+783.0%	-8.5%
upload	32	9,803	398.8%	77MiB	+32576.7%	-90.9%
upload	256	12,778	813.6%	186MiB	+22717.9%	-87.0%
api-4	256	8,569	98.3%	55MiB	-73.9%	-22.5%
api-16	1024	11,691	131.6%	83MiB	-40.8%	-10.8%
static	1024	1,074,536	5647.3%	101MiB	+237.7%	-86.4%
static	4096	1,140,166	5831.2%	295MiB	+328.4%	-89.7%
static	6800	1,174,193	5955.1%	466MiB	+462.9%	-89.9%
async-db	1024	7,505	147.2%	84MiB	-30.9%	-16.0%
crud	4096	433,954	1523.7%	464MiB	+262.9%	+21.5%
fortunes	1024	899	172.7%	71MiB	-66.1%	-50.7%

Full log

  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   7.42ms   1.62ms   18.60ms   67.40ms   245.60ms

  6478318 requests in 15.00s, 6476206 responses
  Throughput: 431.67K req/s
  Bandwidth:  109.53MB/s
  Status codes: 2xx=5574248, 3xx=0, 4xx=901958, 5xx=0
  Latency samples: 6476206 / 6476206 responses (100.0%)
  Reconnects: 31495
  Per-template: 162593,203435,242485,280761,318362,350967,380998,409313,432227,453177,468180,483276,492539,499286,505929,509613,36749,41749,81967,122600
  Per-template-ok: 83090,162483,199752,237540,274993,307388,337887,365790,388557,409375,424184,439764,448625,455746,462302,466042,36749,6453,20580,46948

  WARNING: 901958/6476206 responses (13.9%) had unexpected status (expected 2xx)
[info] CPU 1326.2% | Mem 384MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.26ms   1.50ms   12.60ms   87.60ms   236.30ms

  7115691 requests in 15.00s, 7115692 responses
  Throughput: 474.25K req/s
  Bandwidth:  125.84MB/s
  Status codes: 2xx=6509316, 3xx=0, 4xx=606376, 5xx=0
  Latency samples: 7115687 / 7115692 responses (100.0%)
  Reconnects: 34961
  Per-template: 162358,201989,241815,281815,321331,359703,396073,429781,463084,496099,525322,552137,572011,589915,603142,618808,53157,42970,82164,122013
  Per-template-ok: 94122,180490,218083,256513,295031,333224,369328,402614,436314,469343,498335,524912,544656,562727,575699,591660,53157,15410,30980,56713

  WARNING: 606376/7115692 responses (8.5%) had unexpected status (expected 2xx)
[info] CPU 1523.7% | Mem 464MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   6.23ms   1.53ms   12.00ms   89.20ms   232.80ms

  7075610 requests in 15.00s, 7075610 responses
  Throughput: 471.63K req/s
  Bandwidth:  125.95MB/s
  Status codes: 2xx=6471072, 3xx=0, 4xx=604538, 5xx=0
  Latency samples: 7075601 / 7075610 responses (100.0%)
  Reconnects: 34733
  Per-template: 163346,202599,241988,282300,321334,357975,393378,429522,462709,492216,517431,542185,562900,582500,598964,609366,63338,45423,83241,122886
  Per-template-ok: 95143,181446,218314,257007,295312,332044,366980,402546,436002,465364,490317,515017,536000,555557,571853,582074,63338,17762,31291,57699

  WARNING: 604538/7075610 responses (8.5%) had unexpected status (expected 2xx)
[info] CPU 1517.9% | Mem 562MiB

=== Best: 433954 req/s (CPU: 1523.7%, Mem: 464MiB) ===
[info] input BW: 37.25MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll

==============================================
=== vanilla-epoll / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   3.65ms   4.06ms   4.26ms   8.02ms   11.20ms

  4400 requests in 5.00s, 4400 responses
  Throughput: 879 req/s
  Bandwidth:  20.86MB/s
  Status codes: 2xx=4400, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4400 / 4400 responses (100.0%)
[info] CPU 158.6% | Mem 60MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   2.75ms   3.64ms   4.39ms   7.57ms   8.29ms

  4497 requests in 5.00s, 4497 responses
  Throughput: 899 req/s
  Bandwidth:  21.32MB/s
  Status codes: 2xx=4497, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4497 / 4497 responses (100.0%)
[info] CPU 172.7% | Mem 71MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   12.65ms   3.83ms   4.12ms   8.30ms    3.26s

  3941 requests in 5.00s, 3941 responses
  Throughput: 787 req/s
  Bandwidth:  18.69MB/s
  Status codes: 2xx=3941, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 3941 / 3941 responses (100.0%)
[info] CPU 158.7% | Mem 72MiB

=== Best: 899 req/s (CPU: 172.7%, Mem: 71MiB) ===
[info] saved results/fortunes/1024/vanilla-epoll.json
httparena-bench-vanilla-epoll
httparena-bench-vanilla-epoll
[info] skip: vanilla-epoll does not subscribe to baseline-h2
[info] skip: vanilla-epoll does not subscribe to static-h2
[info] skip: vanilla-epoll does not subscribe to baseline-h2c
[info] skip: vanilla-epoll does not subscribe to json-h2c
[info] skip: vanilla-epoll does not subscribe to baseline-h3
[info] skip: vanilla-epoll does not subscribe to static-h3
[info] skip: vanilla-epoll does not subscribe to gateway-64
[info] skip: vanilla-epoll does not subscribe to gateway-h3
[info] skip: vanilla-epoll does not subscribe to production-stack
[info] skip: vanilla-epoll does not subscribe to unary-grpc
[info] skip: vanilla-epoll does not subscribe to unary-grpc-tls
[info] skip: vanilla-epoll does not subscribe to stream-grpc
[info] skip: vanilla-epoll does not subscribe to stream-grpc-tls
[info] skip: vanilla-epoll does not subscribe to echo-ws
[info] skip: vanilla-epoll does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

#888) * vanilla: cache json-comp gzip responses + zero-alloc routing json-comp recompressed the gzip body on EVERY request even though the output for a given (count, m) is fully deterministic — and gzip CPU, not allocation, dominates that profile. Cache the COMPLETE gzipped response per (count, m) and append the cached copy on a hit (bounded map, RwMutex). The benchmark hits only a handful of (count, m) pairs, so the cache stays tiny. Also route on the path WITHOUT allocating: a tos() view into the request buffer instead of all_before('?')'s per-request string copy (one alloc per request on the hot path), shaving GC churn off baseline/json too. Local before/after (16-core loopback, gcannon, single listener): json-comp 58K -> 390K req/s (+570%, 6.7x) Correctness verified: gzip body decodes to the right items/count/total; the cached response is byte-identical across requests; all other routes unchanged. Applies to both the epoll and io_uring variants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * vanilla: precompute query-key bytes + parse path ints in place Remove the remaining small per-request allocations on the hot path: • qint/qstr took a `string` key and called `key.bytes()` every request (one []u8 alloc per parameter — baseline parses a+b, async-db min+max+limit…). Keys are now precomputed `const []u8` (qk_*), built once at init. • /json/<n> and /crud/items/<id> parsed the id via route[n..].i64(), a substring copy. parse_u_at() reads the digits straight from the path view. Local before/after (16-core loopback) is within noise (baseline ~528K→530K, json ~206K→212K) — these allocs are tiny next to the response builder #866 removed — but allocation scaled hard on the 64-core arena (json +322% there), so this trims more GC churn for that environment at zero cost. Note: @[manualfree] is a no-op under the GC build the arena uses (`v -prod` = Boehm GC; manualfree only affects -autofree), so reducing allocations is the lever, not manualfree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Benchmark results: vanilla-epoll * vanilla: DB path — prepared statement (async-db) + single-pass HTML escape Folds the DB-path work into this PR so everything lands together: • async-db uses a PostgreSQL prepared statement (PQprepare/PQexecPrepared via db.pg, lazily prepared per pooled connection) instead of exec_param_many's per-request server-side SQL re-parse — local +9%. • escape_html (fortunes) does ONE pass with a no-alloc fast path instead of replace_each's five full-string passes — local +27% fortunes. DB profiles remain bound by the stdlib db.pg driver (text protocol), so this narrows the gap without closing it. Both backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Benchmark results: vanilla-io_uring * vanilla: armor hot-path byte writes against the V `<<` regression Single-element array push (`arr << x`) is 4-7x slower on post-0.5.1 V (vlang/v#27468) while bulk push_many, allocation and indexed writes are unaffected. The two hot single-element `<<` sites are now bulk writes: - wi() built integer digits with `out << tmp[i]` per digit; it now itoa's back-to-front into the [20]u8 scratch and flushes with one push_many. - write_json_response() pushed the item separator `,` and closing `}` one byte at a time; the closing `}` is now fused with the separator into a single '},' / '}' push_many. Output is byte-identical (verified across counts 0..4096 and edge-value integers). This makes the JSON hot path fast on both the 0.5.1 release and current master, independent of the upstream codegen regression. Both epoll and io_uring backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * vanilla: build pinned V 0.5.1 from source (pinned vc bootstrap) Build V from source at the 0.5.1 tag instead of the prebuilt release zip. Plain `make` can't build an old tag: its latest_vc step `git pull`s the newest vlang/vc bootstrap, which no longer matches 0.5.1's vlib (fails with `unknown ident \`native\``). So pin vc to the commit cut for 0.5.1 (vlang/vc f461dfeb = "[v:master] 0c3183c - V 0.5.1") and run make's own bootstrap recipe (cc -> v1 -> v2 -> v). Drop curl/unzip from the build deps. Pinned by tag, not a master commit, because post-0.5.1 master carries a codegen regression (single-element array push 4-7x slower, vlang/v#27468). Both backends; verified the source-built compiler serves /json and /pipeline correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * vanilla-epoll: serve static assets with sendfile(2) (zero-copy) The static handler copied each asset's full prebuilt response (up to ~300 KB) into the per-connection write_buf every request — a userspace copy plus a large *scanned* write_buf that grows the GC's stop-the-world cost at high conn counts (why vanilla sat ~4x behind nginx/swerver on the static profile). Preload each asset's fd once (O_RDONLY, page-cached, borrowed for the server's life) and a precomputed response head; serve the head into write_buf and stream the body zero-copy via core.queue_file (sendfile(2), already wired through the epoll backend's deferred-send + EPOLLOUT path). write_buf no longer grows, the body is never copied, and the kernel pushes file pages straight to the socket — the same model nginx and swerver use. Local (vendor.js 307 KB, 64c, wrk): 25.7K -> 59.3K req/s, 7.36 -> 16.97 GB/s (2.3x). Output verified byte-identical (md5) incl. keep-alive. epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * vanilla-epoll: answer /upload by Content-Length (engine drains large bodies) The lib now streams (drains) request bodies larger than 1 MiB instead of buffering them, so for a large upload req.body is empty — but the byte count the upload profile wants is the declared Content-Length. Answer by req.content_length() (falls back to the buffered body length when absent, which also covers small bodies that still take the buffered path). Depends on enghitalo/vanilla#31 (adds HttpRequest.content_length() + the engine drain); the Dockerfile clones lib main, so that PR must merge before this builds. Local (source-built V 0.5.1): upload single-conn 45 req/s / 907 MB/s, 32c 303 req/s / 6.1 GB/s — matching the top upload servers; RSS 14 MB (was ~1 GB buffering). epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: re-trigger validate — vanilla main now provides HttpRequest.content_length() (drain #31 merged); the prior run cloned vanilla before it landed * refactor(vanilla-epoll): route write-buffer appends through wb()/push_many Replace the remaining `out << <[]u8>` appends (static header, error consts, the four crud_* results, and the json-comp gzip-cache hit/store) with a wb() helper that calls push_many, uniform with the existing ws/wi. The bit-shift `<<` in the gz-cache key is unrelated and kept as is. Note: V already lowers `array << array` to array_push_many, so this is codegen- neutral — a consistency / regression-safety change (the whole write path now takes push_many's fast path explicitly, robust if `<<` ever regresses for arrays the way the single-element path did, vlang/v#27468). The hot single-element `<<` was already armored by ws/wi. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(vanilla-epoll): async-db via the native pg_async driver (enghitalo/vanilla#32) Convert the framework from the blocking db.pg ConnectionPool to vanilla's native async Postgres driver (pg_async, vanilla#39) on the epoll async runtime. The DB endpoints now PARK on the PG socket (ac.watch) and resume in a continuation instead of blocking a worker thread per query — closing the async-db gap (#32). - ServerConfig: request_handler → async_handler + make_state. Each worker owns a per-worker pg_async.PgPool (no cross-worker sharing, no locks) plus its own cache-aside and json-comp caches; the dataset/prefixes/static assets stay shared read-only. - async-db, fortunes, crud (list/get/create/update) issue a query, park, and render in a single resume continuation that switches on a small per-request stash. crud_list folds page+total into ONE window-count query (count(*) OVER()) instead of two round-trips. crud_get keeps a per-worker cache-aside (X-Cache). - DB responses are now hand-built (ws/wi/wb), and JSONB (tags) is emitted RAW from its binary form — no json.encode reflection, no decode/re-encode. - Drops the db.pg dependency entirely, so the framework also builds on master V (master removed pg.ConnectionPool); the non-DB hot paths are unchanged. Validated on V master against PostgreSQL 18 (items 100k + fortune 199): every endpoint correct (async-db items incl. binary jsonb, sorted fortunes, crud list/get/create/update, X-Cache). Throughput: async-db ~14.3k rps @ 4.35ms p50; /json ~376k rps. Per-worker caches warm under load (a re-GET may MISS across workers under SO_REUSEPORT — by design, vs the old shared+mutex cache). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci: re-trigger #877 — vanilla #40 merged (net-import 0.5.1 build fix) The previous run failed because the framework's Docker cloned vanilla main BEFORE the fix landed: V's `net` declares C.socket with typed-enum params on the 0.5.1 tag, clashing with http_server.socket's int C.socket (socket_tcp.c.v). vanilla PR #40 removes the net imports (socket_tcp → C.htons; pg_async → raw libc dial), verified to compile under `v -prod .` on the true 0.5.1 tag for both vanilla-epoll and vanilla-io_uring. This empty commit re-runs validate so it re-clones the fixed vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(ci): cache-bust the vanilla clone so the build picks up library fixes The Docker `RUN git clone … vanilla` layer was cached indefinitely on the self-hosted runner, so re-runs kept building against a STALE vanilla checkout — which is why #877 stayed red even after the build fix (vanilla #40) merged: the build never re-cloned to get it. Add `ADD https://api.github.com/.../refs/heads/main` before the clone in both vanilla Dockerfiles. The fetched ref (main's SHA) changes whenever vanilla main moves, invalidating this layer's cache and forcing a fresh clone. Adding the step also re-clones on this build (new layer structure), so it now picks up #40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(vanilla-epoll): share the crud + json-comp caches across workers The crud cache-aside was per-worker (WorkerCtx), but validate.sh's crud check does two GETs to /crud/items/42 and requires X-Cache MISS then HIT. With SO_REUSEPORT the two requests land on different workers, so a per-worker cache returns MISS both times → validation fails. Move the cache-aside (and the json-comp gzip cache) into the process-shared `Shared` (renamed from SharedRO), guarded by RwMutexes since workers are separate threads — restoring the original shared-cache semantics. The async Postgres pool stays per-worker (make_state); only the caches are shared. Verified against the real pgdb-seed.sql + dataset.json: GET /crud/items/42 now returns MISS then HIT; async-db (count=limit), crud list (5 items, total 9986, page 1), and fortunes (202 <tr>) all match validate.sh's checks. Compiles under `v -prod .` on the true V 0.5.1 tag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): size the per-worker pool by usable cores, not host CPUs Pair with vanilla's cpuset-aware max_thread_pool_size: compute the per-worker Postgres pool size against core.max_thread_pool_size (usable cores) instead of runtime.nr_cpus() (host count). Under api-N the engine now spawns N workers, so per_worker = total/N gives a sane pool (e.g. 64/4=16, 64/16=4) instead of 64/128=1 — matching the async path's threads≈cores model. Experiment for #32: test whether removing the 128-on-N-cores oversubscription recovers the async-db / api-16 regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * revert(vanilla-epoll): restore the sync db.pg DB path (drop the async pg_async conversion) Three clean, post-cache-bust benchmarks agree the native async pg_async path is a net loss on the arena's LOCAL low-latency DB profiles: epoll-async vs io_uring-sync showed sync winning api-4 ~4.9×, fortunes ~3.6×, api-16 ~1.6×, async-db ~1.2× (io_uring even handicapped by the cpuset change). The async path is bound by DB concurrency (pool conns) and never beats sync libpq's concurrency-via-threads for sub-ms queries; cpuset tuning only traded api-16 for api-4. The only async win was crud, which is cache-bound (skips the DB) — preserved by the sync framework too. Restore main.v to the pre-conversion sync version (d1a0e73): db.pg ConnectionPool + request_handler, keeping ALL the sync-path wins (pipelined, static via sendfile, upload streaming-drain, json-comp gzip cache, zero-alloc routing, shared X-Cache). pg_async stays in the vanilla library as a capability for the case it actually wins (latency-bound / network Postgres). The Dockerfile vanilla-clone cache-bust stays. Verified: builds under `v -prod .` on the V 0.5.1 tag against current vanilla main. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): short-circuit /pipeline with a precomputed constant The pipelined profile (the arena's highest-RPS test, ~35M rps) is a fixed plaintext "ok". Match it on `target` immediately after the path is sliced and blit a precomputed full-response constant + return — before the '?'-scan, the route slice, and write_resp's 6-part piecewise header build. requests/pipeline.raw is `GET /pipeline` with no query, so the exact-match is correct; the now-redundant route=='/pipeline' arm is dropped from the dispatch chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): right-size the fortunes render builder (32KB -> content) callgrind on the render path showed ~26% of its instructions were the zero-fill of strings.new_builder(32768) — a flat 32 KB block for a ~1.5 KB response, re- zeroed every request as the GC reuses it. Size it from the actual rows instead (160 + 96/row + message bytes); an outlier grows once. The other builders here were already content-sized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): re-enable async-db (pg_async park/resume) on flat-state runtime Restore the native-pg_async async DB path (per-worker PgPool via make_state; async_handler; submit -> ac.watch(pg fd) -> .suspend -> single on_db_ready continuation that pumps the result, renders by kind, releases the conn). This is the proven #32 conversion (was reverted at 84f3dc9 because it REGRESSED the arena on the old map-based reactor + per-request malloc), re-applied now that PR #41 replaced that with the flat fd-indexed reactor (no hashmap, no per-request alloc) — the overhead that sank it is gone. Why: fortunes (2,990 rps) / async-db (10,927) are capped at ~16-way concurrency by sync thread-per-core blocking (the 64-conn pool sits 75% idle; CPU ~460% = 11 cores idle, waiting on PG). Park/resume frees the worker to keep many queries in flight -> uses the whole pool -> the swerver (#1, 293k) model. This is the EXPERIMENT to confirm #41 turns the old regression into a win; needs an arena run. Keeps the recent sync wins (json-comp gzip cache, route-slice, /pipeline short-circuit). Builds clean on the flat-state runtime (post-0.5.1 local V). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(vanilla-epoll): process-shared crud + json-comp caches (async X-Cache) validate.sh failed `[crud cache-aside]: expected MISS then HIT, got MISS MISS`: restoring the #32 async conversion brought back per-worker caches, but SO_REUSEPORT routes the two probe GETs to different workers, so each MISSes its cold cache. Move the crud + gz caches out of per-worker WorkerCtx into the shared SharedRO, mutex-guarded (RwMutex) — the same process-shared model the sync path uses. Pool stays per-worker (no lock). Builds clean; this unblocks the async benchmark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(vanilla-epoll): un-starve the async pool (drop the >8 clamp) + decode_into The #884 async-db regression was NOT the sub-ms-PG crossover — it was pool starvation + load-shedding (per the regression analysis). DATABASE_MAX_CONN=256 across 16 workers should give 16 conns/worker, but a `min(8)` clamp forced 8 → only 128 of the 256 budget used. With one-in-flight-per-conn and closed-loop load (~64 client conns/worker), the 8-slot ceiling is hit constantly; park() then SHEDS the overflow as an empty 200, so the closed-loop clients spin and real throughput collapses to ~1 core (async-db -30%, fortunes -77%, api -75%). crud was unaffected (+208%) only because it is cache-HIT served and never touches the pool. Fixes: 1. Drop the >8 clamp → use the full 256 budget = 16 conns/worker (2x in-flight), sized to Postgres max_connections. 2. Adopt request_parser.decode_into (no `!HttpRequest` boxing, ~13% of parse) — the same no-boxing entry the sync build now uses; recovers the json-comp/json non-DB delta that the async dispatch path was paying. Builds clean (pg_async, post-0.5.1 local V; vanilla #44 with decode_into is now in main). Follow-ups (not here): queue on pool-full instead of shedding empty 200s; hoist AsyncCtx out of the async_drain per-request loop (vanilla). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(vanilla-epoll): pipeline async-db queries across requests (increment 3) Wire the framework onto vanilla's cross-request pipelining. park() now picks the least-loaded connection via acquire_pipelined() (shed only when all conns are at the max_inflight cap) instead of acquire()'s one-in-flight-per-conn — the latter starved the per-worker pool (2 conns/worker on the arena box ⇒ ceiling of 2, then park sheds the overflow as empty 200s; PR #884's regression). async_submit's shed-on-full bool is now checked. on_db_ready drops the exclusive release: a pipelined connection is not held exclusively, its in-flight slot frees when async_on_readable pops the reply, and the reactor (per-fd watch queue) runs the connection's parked requests front-first so the FIFO reply aligns with each request's Stash. async-db/fortunes/crud-list all flow through park(), so all gain conns×N concurrency. Builds against the local pipelining vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(vanilla-epoll): pin vanilla to the pipelining branch to bench before merge TEMPORARY: point the Dockerfile clone + cache-bust ADD at vanilla's feat/pg-async-pipelining (PR #45) so this arena PR builds against the cross-request pipelining library and can benchmark before #45 merges. Revert to refs/heads/main once #45 lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): restore /pipeline skip-decode fast path Local profiling (callgrind on a gcc-built binary, isolated gcannon) showed this branch lost the `has_pipeline_prefix` fast path the #877 branch had: handle() ran decode_into + parse_http1_request_line on EVERY request, so the highest-RPS /pipeline test paid the full HTTP parse (~55% of the per-request CPU; the in-handle parse alone ~17%) for a fixed response. Restore it: match the raw `GET /pipeline ` prefix and blit pipeline_resp BEFORE any parsing. The request is already framed by the reactor, so decode adds nothing here. After: the per-request /pipeline profile has ZERO parse functions, and local throughput rose from 90% to 96% of the bare-C epoll floor (gc none). The remaining gap + the boehm-vs-gc-none delta is per-request allocation/GC (next). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ci(vanilla-epoll): build against vanilla main (pipelining + recv-path merged) vanilla#45 (cross-request pipelining: driver + reactor + pool + the alloc-free recv path) is merged to vanilla main, so drop the temporary feat/pg-async-pipelining pin and clone main again (with the main-ref cache-bust). Keeps -gc none. The async-db +329% / pipelined +1622% results were against this exact code (main now == the merged branch), so no re-bench needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): arm pooled PG conns with watch_persistent Park DB queries on the pooled connection via ac.watch_persistent so a client disconnecting mid-query no longer closes (and forces a reconnect + re-auth on) the pooled connection: the runtime drains the orphaned reply in order and keeps the conn open for reuse. Both the initial park and the not-ready re-arm use it (the single-watch path resets the slot, so the re-arm must re-stamp the pool-owned flag). Depends on vanilla watch_persistent (enghitalo/vanilla#47); land after it merges + vanilla main is updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): render DB responses into a reused per-worker buffer render_async_db / render_fortunes / render_crud_list each allocated a fresh response-body []u8 (4 KiB / 32 KiB / 8 KiB) per request. The binary ships `-gc none`, so those buffers are never freed — a multi-GiB leak under DB load (async-db measured ~12 KiB/request total, tens of GiB on the arena). Build the body in a single per-worker scratch buffer (WorkerCtx.scratch), reset to len 0 each response; it grows to a high-water mark then stays. Safe because a worker serves one request at a time (no concurrency). Paired with the pg_async per-connection frames-buffer pool, this takes async-db from 11,971 -> 1,263 bytes/request leaked under `-gc none` (-89.5%) locally; the Boehm build is dead flat (0 B/req, 41 MiB) and `-gc none` is now FASTER than Boehm on async-db (48.8K vs 44.2K req/s) since it no longer thrashes an ever-growing heap. (render_fortunes still allocates its Fortune vector + per-row message copies; that and the residual ~1.3 KiB/request of async-db allocs are a follow-up.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): zero-alloc DB request path (Stage B, -gc none) Eliminate the remaining per-request heap allocations on the DB routes (which leak under the binary's `-gc none` build). With the pg_async/request_parser Stage B (reused submit scratch, no .bytes() in the wire builders, no-alloc query parse), async-db drops from 1,263 -> 159 bytes/request (11,971 -> 159 across Stage A+B, -98.7%); the Boehm build is dead flat. - Bind params: replace the per-request `[?[]u8(x.str().bytes()), ...]` literals in every start_* with reused per-worker buffers — param_scratch (int params as decimal bytes) + params_buf (the []?[]u8), refilled via push_int/push_bytes. The borrowed slices are copied by write_bind synchronously inside park, so they never outlive the call. param_scratch cap (256) ≫ worst case (5×20) so it never reallocates mid-request (which would dangle already-pushed slices). - Query parse: qint parses i64 in place (parse_i64_slice); qstr_slice returns a borrowed []u8 view instead of .clone(). Shed-path fallbacks are module consts (no per-request `.bytes()`). - Stash: a per-worker free-list (stash_pool) instead of `&Stash{}` per request; returned only on the terminal .done path — never on the not-ready re-arm, where it stays live as the watch udata (incl. a FIX 3 dead tombstone). Statement form for the borrow (a `&Struct{}` if-expression branch miscompiles under -g, #27485). - /fortunes: reused fortunes_buf with BORROWED message views (no bytestr().clone()), an explicit byte comparator for the sort, and escape_html_into that escapes directly into the render scratch (no Builder/string per row). Gated: all routes correct vs PG18 (async-db, fortunes sort+escape incl. the <script> XSS row, crud list/get/create/update round-trip, cache MISS->HIT, baseline11). Builds clean -prod and -g. Needs the pg_async/request_parser Stage B (enghitalo/vanilla); land after that merges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(vanilla-epoll): handle i64::MIN in wi() integer formatter wi()'s itoa negated n into a signed accumulator (`x = -x`), which overflows for i64::MIN — its magnitude isn't representable as i64 — leaving x negative so the digit loop never ran and only '-' was emitted. Build the magnitude in u64 instead (-(n+1)+1, with the +1 done in u64). Unreachable from current routes (ids are 32-bit, query ints are clamped) so this is not a live bug, but it's a latent correctness hole in a general integer helper. Verified against i64::MIN/MAX, 0, and assorted +/- values. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(vanilla-epoll): zero-alloc int responses on /baseline11 and /upload Both built their body with `n.str()` — an int->string heap allocation on every request that leaks under -gc none. callgrind on /baseline11 showed 1.002 allocs/request, all impl_i64_to_string; at 3.4M RPS that path alone was ~6 GiB RSS in the arena. Format the int into the reused per-worker scratch via a new emit_int() helper instead (the same render-scratch pattern the DB paths use). The read DB paths (async-db, fortunes, crud-list) are already callgrind-clean per request; this closes the general/plaintext response path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(vanilla-epoll): 503 (not 400/404) when the crud DB pool sheds under load Under DB-pipeline saturation, park() returns the caller's fallback. The read paths shed to a benign empty 200, but crud create/update/get shed to bad_request (400) / not_found (404) — misreporting a backpressure shed as a client error. At arena scale (4096 conns) this surfaced as ~1.4% "unexpected status" on crud, while every other framework AND the previously-recorded vanilla-epoll showed 0 with the SAME gcannon (its requests are well-formed — confirmed by faithful fixture replay: 100% 2xx at 16 and 96 threads, fresh-conn and keep-alive+reconnect; reproduced the 400 only by forcing pool saturation with DATABASE_MAX_CONN=1). Only the shed fallback for the three crud write/get paths becomes 503 Service Unavailable — the honest backpressure status. Genuine 400 (malformed JSON body) and 404 (missing item) are unchanged. Reducing the shed itself (pool / max_inflight capacity, which trades memory) and a holistic shed policy (read paths also shed to an empty 200) are deferred to a measured investigation tracked upstream in vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf+fix(vanilla-epoll): zero-alloc chunked-body parse on /baseline11 (reused scratch) /baseline11's chunked-POST path parsed the body with `dechunk(s string) string`, which built a strings.Builder + .str() (and strconv_hex's trim_space) per request — a permanent leak under -gc none. In the arena baseline mix (1/3 of the templates are chunked POSTs, at ~3.8M req/s) this was the ~6 GiB RSS the #888 re-bench still showed after emit_int closed the GET path. Replace with `(mut w WorkerCtx) body_int()` dechunking into a reused per-worker scratch (WorkerCtx.dechunk_buf): byte-walk the chunked region (dechunk_into), append data bytes via push_many, parse the integer in place (parse_i64_slice). No allocation in steady state. Verified byte-identical to the old dechunk on valid bodies (single/multi-chunk, hex/uppercase sizes, 0x100, chunk-extensions) and safe on malformed input. Also hardens a latent OOB read / DoS: the chunk-size range check is now overflow-safe — `size > end - data_start` instead of `data_start + size > end`, which wraps i32 for a crafted chunk size near 0x7fffffff, slipping past the guard and feeding a ~2 GiB out-of-bounds read into push_many (remotely-triggerable worker segfault). The old dechunk hit the same input as a bounds-checked panic; this makes it a controlled, bounded reject. parse_hex_slice now accumulates in i64 and saturates so the size value itself can't wrap. A 5-lens adversarial review (correctness, scratch-reuse lifetime, zero-alloc, method/caller parity) returned GO after this fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Benchmark results: vanilla-epoll --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

enghitalo and others added 20 commits June 15, 2026 09:20

Benchmark results: vanilla-epoll

c7dd2ee

Benchmark results: vanilla-io_uring

abf558e

ci: re-trigger validate — vanilla main now provides HttpRequest.conte…

705c0ca

…nt_length() (drain MDA2AV#31 merged); the prior run cloned vanilla before it landed

enghitalo requested review from Kaliumhexacyanoferrat and MDA2AV as code owners June 17, 2026 11:13

enghitalo marked this pull request as draft June 17, 2026 12:24

This was referenced Jun 17, 2026

perf(vanilla-epoll): cross-request pipelining for async-db enghitalo/HttpArena#1

Closed

vanilla-epoll: cross-request pipelining for async-db (stacked on #884) #888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vanilla-epoll: re-enable async-db (pg_async park/resume) on the flat-state runtime#884

vanilla-epoll: re-enable async-db (pg_async park/resume) on the flat-state runtime#884
enghitalo wants to merge 22 commits into
MDA2AV:mainfrom
enghitalo:perf/vanilla-async-db-flatstate

enghitalo commented Jun 17, 2026

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

enghitalo commented Jun 17, 2026

What

Why

Why now (it was reverted before)

Notes for review

What to watch in the benchmark

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Benchmark Results

Uh oh!

enghitalo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Benchmark Results

Uh oh!

github-actions Bot commented Jun 17, 2026

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant