Skip to content

feat(crdb): experimental in-band query cancellation#3176

Draft
ecordell wants to merge 10 commits into
authzed:mainfrom
ecordell:crdb-inband-cancel
Draft

feat(crdb): experimental in-band query cancellation#3176
ecordell wants to merge 10 commits into
authzed:mainfrom
ecordell:crdb-inband-cancel

Conversation

@ecordell

Copy link
Copy Markdown
Contributor

When a request context is canceled mid-query on the CRDB datastore, pgx's default behavior is to destroy the connection. Under load this drains the write pool and triggers the metastable death spiral described in #2576. A previous attempt to fix this using pgx's built-in CancelRequestContextWatcherHandler (pgwire cancel protocol) was reverted in #2434 because CRDB applies pgwire cancels asynchronously; a late-arriving cancel could kill the next query on the connection.

This PR implements in-band cancellation using CockroachDB's CANCEL QUERIES statement instead. A new pool.Canceler owns a small dedicated connection pool and a registry mapping each pooled write connection to its CRDB session_id (captured via SHOW session_id at connect). A custom ctxwatch.Handler fires on context cancellation and issues CANCEL QUERIES IF EXISTS (SELECT query_id FROM [SHOW CLUSTER STATEMENTS] WHERE session_id = '...') on a sibling connection, then blocks until that statement completes before releasing the original connection back to the pool. This sequencing eliminates the wrong-query race by construction: no cancellation can be in flight when the next query starts on the same session.

The feature is gated behind --datastore-experimental-crdb-query-cancellation (default off). When enabled, set --write-conn-acquisition-timeout=0; with connections no longer being destroyed on cancel, pool exhaustion under load is far less likely and the acquisition-timeout backpressure is no longer needed.

@github-actions github-actions Bot added area/cli Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.26316% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/datastore/crdb/pool/canceler.go 74.69% 14 Missing and 6 partials ⚠️
internal/datastore/crdb/crdb.go 52.95% 6 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant