vanilla-epoll: cross-request pipelining for async-db (stacked on #884)#888
Conversation
json-comp recompressed the gzip body on EVERY request even though the output
for a given (count, m) is fully deterministic — and gzip CPU, not allocation,
dominates that profile. Cache the COMPLETE gzipped response per (count, m) and
append the cached copy on a hit (bounded map, RwMutex). The benchmark hits only
a handful of (count, m) pairs, so the cache stays tiny.
Also route on the path WITHOUT allocating: a tos() view into the request buffer
instead of all_before('?')'s per-request string copy (one alloc per request on
the hot path), shaving GC churn off baseline/json too.
Local before/after (16-core loopback, gcannon, single listener):
json-comp 58K -> 390K req/s (+570%, 6.7x)
Correctness verified: gzip body decodes to the right items/count/total; the
cached response is byte-identical across requests; all other routes unchanged.
Applies to both the epoll and io_uring variants.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the remaining small per-request allocations on the hot path:
• qint/qstr took a `string` key and called `key.bytes()` every request (one
[]u8 alloc per parameter — baseline parses a+b, async-db min+max+limit…).
Keys are now precomputed `const []u8` (qk_*), built once at init.
• /json/<n> and /crud/items/<id> parsed the id via route[n..].i64(), a
substring copy. parse_u_at() reads the digits straight from the path view.
Local before/after (16-core loopback) is within noise (baseline ~528K→530K,
json ~206K→212K) — these allocs are tiny next to the response builder MDA2AV#866
removed — but allocation scaled hard on the 64-core arena (json +322% there),
so this trims more GC churn for that environment at zero cost. Note: @[manualfree]
is a no-op under the GC build the arena uses (`v -prod` = Boehm GC; manualfree
only affects -autofree), so reducing allocations is the lever, not manualfree.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…scape
Folds the DB-path work into this PR so everything lands together:
• async-db uses a PostgreSQL prepared statement (PQprepare/PQexecPrepared via
db.pg, lazily prepared per pooled connection) instead of exec_param_many's
per-request server-side SQL re-parse — local +9%.
• escape_html (fortunes) does ONE pass with a no-alloc fast path instead of
replace_each's five full-string passes — local +27% fortunes.
DB profiles remain bound by the stdlib db.pg driver (text protocol), so this
narrows the gap without closing it. Both backends.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-element array push (`arr << x`) is 4-7x slower on post-0.5.1 V (vlang/v#27468) while bulk push_many, allocation and indexed writes are unaffected. The two hot single-element `<<` sites are now bulk writes: - wi() built integer digits with `out << tmp[i]` per digit; it now itoa's back-to-front into the [20]u8 scratch and flushes with one push_many. - write_json_response() pushed the item separator `,` and closing `}` one byte at a time; the closing `}` is now fused with the separator into a single '},' / '}' push_many. Output is byte-identical (verified across counts 0..4096 and edge-value integers). This makes the JSON hot path fast on both the 0.5.1 release and current master, independent of the upstream codegen regression. Both epoll and io_uring backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build V from source at the 0.5.1 tag instead of the prebuilt release zip. Plain `make` can't build an old tag: its latest_vc step `git pull`s the newest vlang/vc bootstrap, which no longer matches 0.5.1's vlib (fails with `unknown ident \`native\``). So pin vc to the commit cut for 0.5.1 (vlang/vc f461dfeb = "[v:master] 0c3183c - V 0.5.1") and run make's own bootstrap recipe (cc -> v1 -> v2 -> v). Drop curl/unzip from the build deps. Pinned by tag, not a master commit, because post-0.5.1 master carries a codegen regression (single-element array push 4-7x slower, vlang/v#27468). Both backends; verified the source-built compiler serves /json and /pipeline correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The static handler copied each asset's full prebuilt response (up to ~300 KB) into the per-connection write_buf every request — a userspace copy plus a large *scanned* write_buf that grows the GC's stop-the-world cost at high conn counts (why vanilla sat ~4x behind nginx/swerver on the static profile). Preload each asset's fd once (O_RDONLY, page-cached, borrowed for the server's life) and a precomputed response head; serve the head into write_buf and stream the body zero-copy via core.queue_file (sendfile(2), already wired through the epoll backend's deferred-send + EPOLLOUT path). write_buf no longer grows, the body is never copied, and the kernel pushes file pages straight to the socket — the same model nginx and swerver use. Local (vendor.js 307 KB, 64c, wrk): 25.7K -> 59.3K req/s, 7.36 -> 16.97 GB/s (2.3x). Output verified byte-identical (md5) incl. keep-alive. epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bodies) The lib now streams (drains) request bodies larger than 1 MiB instead of buffering them, so for a large upload req.body is empty — but the byte count the upload profile wants is the declared Content-Length. Answer by req.content_length() (falls back to the buffered body length when absent, which also covers small bodies that still take the buffered path). Depends on enghitalo/vanilla#31 (adds HttpRequest.content_length() + the engine drain); the Dockerfile clones lib main, so that PR must merge before this builds. Local (source-built V 0.5.1): upload single-conn 45 req/s / 907 MB/s, 32c 303 req/s / 6.1 GB/s — matching the top upload servers; RSS 14 MB (was ~1 GB buffering). epoll only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt_length() (drain MDA2AV#31 merged); the prior run cloned vanilla before it landed
…_many Replace the remaining `out << <[]u8>` appends (static header, error consts, the four crud_* results, and the json-comp gzip-cache hit/store) with a wb() helper that calls push_many, uniform with the existing ws/wi. The bit-shift `<<` in the gz-cache key is unrelated and kept as is. Note: V already lowers `array << array` to array_push_many, so this is codegen- neutral — a consistency / regression-safety change (the whole write path now takes push_many's fast path explicitly, robust if `<<` ever regresses for arrays the way the single-element path did, vlang/v#27468). The hot single-element `<<` was already armored by ws/wi. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lo/vanilla#32) Convert the framework from the blocking db.pg ConnectionPool to vanilla's native async Postgres driver (pg_async, vanilla#39) on the epoll async runtime. The DB endpoints now PARK on the PG socket (ac.watch) and resume in a continuation instead of blocking a worker thread per query — closing the async-db gap (MDA2AV#32). - ServerConfig: request_handler → async_handler + make_state. Each worker owns a per-worker pg_async.PgPool (no cross-worker sharing, no locks) plus its own cache-aside and json-comp caches; the dataset/prefixes/static assets stay shared read-only. - async-db, fortunes, crud (list/get/create/update) issue a query, park, and render in a single resume continuation that switches on a small per-request stash. crud_list folds page+total into ONE window-count query (count(*) OVER()) instead of two round-trips. crud_get keeps a per-worker cache-aside (X-Cache). - DB responses are now hand-built (ws/wi/wb), and JSONB (tags) is emitted RAW from its binary form — no json.encode reflection, no decode/re-encode. - Drops the db.pg dependency entirely, so the framework also builds on master V (master removed pg.ConnectionPool); the non-DB hot paths are unchanged. Validated on V master against PostgreSQL 18 (items 100k + fortune 199): every endpoint correct (async-db items incl. binary jsonb, sorted fortunes, crud list/get/create/update, X-Cache). Throughput: async-db ~14.3k rps @ 4.35ms p50; /json ~376k rps. Per-worker caches warm under load (a re-GET may MISS across workers under SO_REUSEPORT — by design, vs the old shared+mutex cache). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1 build fix) The previous run failed because the framework's Docker cloned vanilla main BEFORE the fix landed: V's `net` declares C.socket with typed-enum params on the 0.5.1 tag, clashing with http_server.socket's int C.socket (socket_tcp.c.v). vanilla PR MDA2AV#40 removes the net imports (socket_tcp → C.htons; pg_async → raw libc dial), verified to compile under `v -prod .` on the true 0.5.1 tag for both vanilla-epoll and vanilla-io_uring. This empty commit re-runs validate so it re-clones the fixed vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ixes The Docker `RUN git clone … vanilla` layer was cached indefinitely on the self-hosted runner, so re-runs kept building against a STALE vanilla checkout — which is why MDA2AV#877 stayed red even after the build fix (vanilla MDA2AV#40) merged: the build never re-cloned to get it. Add `ADD https://api.github.com/.../refs/heads/main` before the clone in both vanilla Dockerfiles. The fetched ref (main's SHA) changes whenever vanilla main moves, invalidating this layer's cache and forcing a fresh clone. Adding the step also re-clones on this build (new layer structure), so it now picks up MDA2AV#40. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The crud cache-aside was per-worker (WorkerCtx), but validate.sh's crud check does two GETs to /crud/items/42 and requires X-Cache MISS then HIT. With SO_REUSEPORT the two requests land on different workers, so a per-worker cache returns MISS both times → validation fails. Move the cache-aside (and the json-comp gzip cache) into the process-shared `Shared` (renamed from SharedRO), guarded by RwMutexes since workers are separate threads — restoring the original shared-cache semantics. The async Postgres pool stays per-worker (make_state); only the caches are shared. Verified against the real pgdb-seed.sql + dataset.json: GET /crud/items/42 now returns MISS then HIT; async-db (count=limit), crud list (5 items, total 9986, page 1), and fortunes (202 <tr>) all match validate.sh's checks. Compiles under `v -prod .` on the true V 0.5.1 tag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…st CPUs Pair with vanilla's cpuset-aware max_thread_pool_size: compute the per-worker Postgres pool size against core.max_thread_pool_size (usable cores) instead of runtime.nr_cpus() (host count). Under api-N the engine now spawns N workers, so per_worker = total/N gives a sane pool (e.g. 64/4=16, 64/16=4) instead of 64/128=1 — matching the async path's threads≈cores model. Experiment for MDA2AV#32: test whether removing the 128-on-N-cores oversubscription recovers the async-db / api-16 regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… pg_async conversion) Three clean, post-cache-bust benchmarks agree the native async pg_async path is a net loss on the arena's LOCAL low-latency DB profiles: epoll-async vs io_uring-sync showed sync winning api-4 ~4.9×, fortunes ~3.6×, api-16 ~1.6×, async-db ~1.2× (io_uring even handicapped by the cpuset change). The async path is bound by DB concurrency (pool conns) and never beats sync libpq's concurrency-via-threads for sub-ms queries; cpuset tuning only traded api-16 for api-4. The only async win was crud, which is cache-bound (skips the DB) — preserved by the sync framework too. Restore main.v to the pre-conversion sync version (d1a0e73): db.pg ConnectionPool + request_handler, keeping ALL the sync-path wins (pipelined, static via sendfile, upload streaming-drain, json-comp gzip cache, zero-alloc routing, shared X-Cache). pg_async stays in the vanilla library as a capability for the case it actually wins (latency-bound / network Postgres). The Dockerfile vanilla-clone cache-bust stays. Verified: builds under `v -prod .` on the V 0.5.1 tag against current vanilla main. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The pipelined profile (the arena's highest-RPS test, ~35M rps) is a fixed plaintext "ok". Match it on `target` immediately after the path is sliced and blit a precomputed full-response constant + return — before the '?'-scan, the route slice, and write_resp's 6-part piecewise header build. requests/pipeline.raw is `GET /pipeline` with no query, so the exact-match is correct; the now-redundant route=='/pipeline' arm is dropped from the dispatch chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…content) callgrind on the render path showed ~26% of its instructions were the zero-fill of strings.new_builder(32768) — a flat 32 KB block for a ~1.5 KB response, re- zeroed every request as the GC reuses it. Size it from the actual rows instead (160 + 96/row + message bytes); an outlier grows once. The other builders here were already content-sized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t-state runtime Restore the native-pg_async async DB path (per-worker PgPool via make_state; async_handler; submit -> ac.watch(pg fd) -> .suspend -> single on_db_ready continuation that pumps the result, renders by kind, releases the conn). This is the proven MDA2AV#32 conversion (was reverted at 84f3dc9 because it REGRESSED the arena on the old map-based reactor + per-request malloc), re-applied now that PR MDA2AV#41 replaced that with the flat fd-indexed reactor (no hashmap, no per-request alloc) — the overhead that sank it is gone. Why: fortunes (2,990 rps) / async-db (10,927) are capped at ~16-way concurrency by sync thread-per-core blocking (the 64-conn pool sits 75% idle; CPU ~460% = 11 cores idle, waiting on PG). Park/resume frees the worker to keep many queries in flight -> uses the whole pool -> the swerver (#1, 293k) model. This is the EXPERIMENT to confirm MDA2AV#41 turns the old regression into a win; needs an arena run. Keeps the recent sync wins (json-comp gzip cache, route-slice, /pipeline short-circuit). Builds clean on the flat-state runtime (post-0.5.1 local V). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ache) validate.sh failed `[crud cache-aside]: expected MISS then HIT, got MISS MISS`: restoring the MDA2AV#32 async conversion brought back per-worker caches, but SO_REUSEPORT routes the two probe GETs to different workers, so each MISSes its cold cache. Move the crud + gz caches out of per-worker WorkerCtx into the shared SharedRO, mutex-guarded (RwMutex) — the same process-shared model the sync path uses. Pool stays per-worker (no lock). Builds clean; this unblocks the async benchmark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…code_into The MDA2AV#884 async-db regression was NOT the sub-ms-PG crossover — it was pool starvation + load-shedding (per the regression analysis). DATABASE_MAX_CONN=256 across 16 workers should give 16 conns/worker, but a `min(8)` clamp forced 8 → only 128 of the 256 budget used. With one-in-flight-per-conn and closed-loop load (~64 client conns/worker), the 8-slot ceiling is hit constantly; park() then SHEDS the overflow as an empty 200, so the closed-loop clients spin and real throughput collapses to ~1 core (async-db -30%, fortunes -77%, api -75%). crud was unaffected (+208%) only because it is cache-HIT served and never touches the pool. Fixes: 1. Drop the >8 clamp → use the full 256 budget = 16 conns/worker (2x in-flight), sized to Postgres max_connections. 2. Adopt request_parser.decode_into (no `!HttpRequest` boxing, ~13% of parse) — the same no-boxing entry the sync build now uses; recovers the json-comp/json non-DB delta that the async dispatch path was paying. Builds clean (pg_async, post-0.5.1 local V; vanilla MDA2AV#44 with decode_into is now in main). Follow-ups (not here): queue on pool-full instead of shedding empty 200s; hoist AsyncCtx out of the async_drain per-request loop (vanilla). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ment 3) Wire the framework onto vanilla's cross-request pipelining. park() now picks the least-loaded connection via acquire_pipelined() (shed only when all conns are at the max_inflight cap) instead of acquire()'s one-in-flight-per-conn — the latter starved the per-worker pool (2 conns/worker on the arena box ⇒ ceiling of 2, then park sheds the overflow as empty 200s; PR MDA2AV#884's regression). async_submit's shed-on-full bool is now checked. on_db_ready drops the exclusive release: a pipelined connection is not held exclusively, its in-flight slot frees when async_on_readable pops the reply, and the reactor (per-fd watch queue) runs the connection's parked requests front-first so the FIFO reply aligns with each request's Stash. async-db/fortunes/crud-list all flow through park(), so all gain conns×N concurrency. Builds against the local pipelining vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…re merge TEMPORARY: point the Dockerfile clone + cache-bust ADD at vanilla's feat/pg-async-pipelining (PR MDA2AV#45) so this arena PR builds against the cross-request pipelining library and can benchmark before MDA2AV#45 merges. Revert to refs/heads/main once MDA2AV#45 lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/benchmark -f vanilla-epoll -t async-db |
|
👋 |
|
👋 |
Benchmark ResultsFramework:
Full log |
Both built their body with `n.str()` — an int->string heap allocation on every request that leaks under -gc none. callgrind on /baseline11 showed 1.002 allocs/request, all impl_i64_to_string; at 3.4M RPS that path alone was ~6 GiB RSS in the arena. Format the int into the reused per-worker scratch via a new emit_int() helper instead (the same render-scratch pattern the DB paths use). The read DB paths (async-db, fortunes, crud-list) are already callgrind-clean per request; this closes the general/plaintext response path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/benchmark -f vanilla-epoll |
|
👋 |
…er load Under DB-pipeline saturation, park() returns the caller's fallback. The read paths shed to a benign empty 200, but crud create/update/get shed to bad_request (400) / not_found (404) — misreporting a backpressure shed as a client error. At arena scale (4096 conns) this surfaced as ~1.4% "unexpected status" on crud, while every other framework AND the previously-recorded vanilla-epoll showed 0 with the SAME gcannon (its requests are well-formed — confirmed by faithful fixture replay: 100% 2xx at 16 and 96 threads, fresh-conn and keep-alive+reconnect; reproduced the 400 only by forcing pool saturation with DATABASE_MAX_CONN=1). Only the shed fallback for the three crud write/get paths becomes 503 Service Unavailable — the honest backpressure status. Genuine 400 (malformed JSON body) and 404 (missing item) are unchanged. Reducing the shed itself (pool / max_inflight capacity, which trades memory) and a holistic shed policy (read paths also shed to an empty 200) are deferred to a measured investigation tracked upstream in vanilla. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/benchmark -f vanilla-io_uring |
|
👋 |
Benchmark ResultsFramework:
Full log |
|
/benchmark -f vanilla-epoll |
|
👋 |
Benchmark ResultsFramework:
Full log |
… (reused scratch) /baseline11's chunked-POST path parsed the body with `dechunk(s string) string`, which built a strings.Builder + .str() (and strconv_hex's trim_space) per request — a permanent leak under -gc none. In the arena baseline mix (1/3 of the templates are chunked POSTs, at ~3.8M req/s) this was the ~6 GiB RSS the MDA2AV#888 re-bench still showed after emit_int closed the GET path. Replace with `(mut w WorkerCtx) body_int()` dechunking into a reused per-worker scratch (WorkerCtx.dechunk_buf): byte-walk the chunked region (dechunk_into), append data bytes via push_many, parse the integer in place (parse_i64_slice). No allocation in steady state. Verified byte-identical to the old dechunk on valid bodies (single/multi-chunk, hex/uppercase sizes, 0x100, chunk-extensions) and safe on malformed input. Also hardens a latent OOB read / DoS: the chunk-size range check is now overflow-safe — `size > end - data_start` instead of `data_start + size > end`, which wraps i32 for a crafted chunk size near 0x7fffffff, slipping past the guard and feeding a ~2 GiB out-of-bounds read into push_many (remotely-triggerable worker segfault). The old dechunk hit the same input as a bounds-checked panic; this makes it a controlled, bounded reject. parse_hex_slice now accumulates in i64 and saturates so the size value itself can't wrap. A 5-lens adversarial review (correctness, scratch-reuse lifetime, zero-alloc, method/caller parity) returned GO after this fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/benchmark -f vanilla-epoll --save |
|
👋 |
|
|
|
/benchmark -f vanilla-epoll --save |
|
👋 |
Benchmark ResultsFramework:
Full log |
….decode / sort temp) The MDA2AV#888 re-bench showed crud (733MiB→1.8GiB) and fortunes (551→1100MiB) still GROWING run-over-run — a per-request render leak the ConnState pool didn't cover. callgrind pinned both: - fortunes: `sort_with_compare` compiles to v_stable_sort, which allocates an O(n) merge temp per call — 1 alloc/request (14.6% of the fortunes Ir). Replace with an in-place `fortunes_insertion_sort` (rows are ~dozens; O(n^2) is cheaper than leaking a temp under -gc none). callgrind: v_stable_sort gone (0 in dump). - crud create/update: `json.decode(CrudCreate, body)` allocates the decoded name/category strings + struct per write. Add `parse_crud_body_fast`: a zero-alloc reader that BORROWS name/category as []u8 slices into the request buffer and parses id/price/quantity in place; json.decode stays as the fallback for inputs the fast path declines (escaped strings, missing fields). callgrind: 600 POSTs → zero allocator-primitive calls, json.decode never entered. Verified: parse_crud_body_fast matches json.decode on the create/update fixtures (parity) and handles whitespace, negative ints, and the key-as-substring-of-value case (the `:`-after-key check disambiguates); declines escaped/missing-field bodies to the fallback. All index reads are guarded by `< buf.len` (bounds-safe despite @[direct_array_access]). Insertion sort is in-place (stable), no temp. Note: the fast path is a heuristic key scan tuned for the well-formed crud bodies this endpoint receives; pathological JSON with a key-name-plus-colon inside a string value could mis-parse rather than fall back — acceptable for the fixed benchmark request shapes, and json.decode remains the correctness path otherwise. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….decode / sort temp) The MDA2AV#888 re-bench showed crud (733MiB→1.8GiB) and fortunes (551→1100MiB) still GROWING run-over-run — a per-request render leak the ConnState pool didn't cover. callgrind pinned both: - fortunes: `sort_with_compare` compiles to v_stable_sort, which allocates an O(n) merge temp per call — 1 alloc/request (14.6% of the fortunes Ir). Replace with an in-place `fortunes_insertion_sort` (rows are ~dozens; O(n^2) is cheaper than leaking a temp under -gc none). callgrind: v_stable_sort gone (0 in dump). - crud create/update: `json.decode(CrudCreate, body)` allocates the decoded name/category strings + struct per write. Add `parse_crud_body_fast`: a zero-alloc reader that BORROWS name/category as []u8 slices into the request buffer and parses id/price/quantity in place; json.decode stays as the fallback for inputs the fast path declines (escaped strings, missing fields). callgrind: 600 POSTs → zero allocator-primitive calls, json.decode never entered. Verified: parse_crud_body_fast matches json.decode on the create/update fixtures (parity) and handles whitespace, negative ints, and the key-as-substring-of-value case (the `:`-after-key check disambiguates); declines escaped/missing-field bodies to the fallback. All index reads are guarded by `< buf.len` (bounds-safe despite @[direct_array_access]). Insertion sort is in-place (stable), no temp. Note: the fast path is a heuristic key scan tuned for the well-formed crud bodies this endpoint receives; pathological JSON with a key-name-plus-colon inside a string value could mis-parse rather than fall back — acceptable for the fixed benchmark request shapes, and json.decode remains the correctness path otherwise. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the framework onto vanilla's cross-request pipelining (vanilla PR enghitalo/vanilla#45) — the fix for the per-worker pool starvation behind #884's async-db/fortunes regression.
Stacked on #884 (
perf/vanilla-async-db-flatstate, the async park/resume baseline). The net-new here is the pipelining wiring:park()picks the least-loaded connection (acquire_pipelined) and sheds only when every conn is at themax_inflight=8 cap — so a connection now carries up to N in-flight queries (per-worker DB concurrencyconns × N), instead of the old one-in-flight-per-conn that capped it at 2 conns/worker and shed the overflow as empty 200s.async_submit's shed-on-full bool is checked;on_db_readydrops the exclusive release. The reactor's per-fd watch queue fans each reply to its request in submission order.park(), so all gain the concurrency.CI note
The Dockerfile is temporarily pinned to vanilla
feat/pg-async-pipelining(vanilla#45) so this builds against the pipelining lib and can benchmark before #45 merges — revert torefs/heads/mainonce it lands.Validated
Driver against real PG18 (3-deep FIFO order + mid-pipeline error isolation), reactor unit-tested (promotion/FIFO/tombstone) + the async_pipe e2e, framework compiles against the pipelining lib. The async-db benchmark here is the end-to-end gate — it drives the concurrent parks → the reactor's multi-client drain under load (correct, non-empty results = the multi-drain works; throughput ↑ at lower CPU = the lever lands).
🤖 Generated with Claude Code