feat(crdb): experimental in-band query cancellation by ecordell · Pull Request #3176 · authzed/spicedb

ecordell · 2026-06-12T01:16:28Z

When a request context is canceled mid-query on the CRDB datastore, pgx's default behavior is to destroy the connection. Under load this drains the write pool and triggers the metastable death spiral described in #2576. A previous attempt to fix this using pgx's built-in CancelRequestContextWatcherHandler (pgwire cancel protocol) was reverted in #2434 because CRDB applies pgwire cancels asynchronously; a late-arriving cancel could kill the next query on the connection.

This PR implements in-band cancellation using CockroachDB's CANCEL QUERIES statement instead. A new pool.Canceler owns a small dedicated connection pool and a registry mapping each pooled write connection to its CRDB session_id (captured via SHOW session_id at connect). A custom ctxwatch.Handler fires on context cancellation and issues CANCEL QUERIES IF EXISTS (SELECT query_id FROM [SHOW CLUSTER STATEMENTS] WHERE session_id = '...') on a sibling connection, then blocks until that statement completes before releasing the original connection back to the pool. This sequencing eliminates the wrong-query race by construction: no cancellation can be in flight when the next query starts on the same session.

The feature is gated behind --datastore-experimental-crdb-query-cancellation (default off). When enabled, set --write-conn-acquisition-timeout=0; with connections no longer being destroyed on cancel, pool exhaustion under load is far less likely and the acquisition-timeout backpressure is no longer needed.

…uery

codecov · 2026-06-12T01:19:49Z

Codecov Report

❌ Patch coverage is 85.26316% with 28 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/datastore/crdb/pool/canceler.go	74.69%	14 Missing and 6 partials ⚠️
internal/datastore/crdb/crdb.go	52.95%	6 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

ecordell added 8 commits June 11, 2026 20:55

feat(crdb): add tripwire metric for cancellations hitting the wrong q…

9a4a09b

…uery

fix(crdb): sever context for transaction rollbacks

28e1f03

feat(crdb): context watcher handler that cancels queries in-band

5a3c7be

feat(crdb): Canceler for in-band CANCEL QUERIES via sibling connections

ad4bdf7

feat(crdb): add WithQueryCancellation option

c0fdcd5

feat(crdb): wire in-band query cancellation into the datastore

899104b

test(crdb): integration tests for in-band query cancellation

fc825ec

feat(crdb): flag for experimental in-band query cancellation

1161588

github-actions Bot added area/cli Affects the command line area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jun 12, 2026

ecordell added 2 commits June 11, 2026 21:20

chore: add changelog entry for crdb in-band query cancellation

e4e7db2

chore: lint and doc fixes

0d2ddf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(crdb): experimental in-band query cancellation#3176

feat(crdb): experimental in-band query cancellation#3176
ecordell wants to merge 10 commits into
authzed:mainfrom
ecordell:crdb-inband-cancel

ecordell commented Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ecordell commented Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 12, 2026 •

edited

Loading