cancelchecker: fix data race in GoroutineCPUHandle registration#164603
Draft
ZhouXing19 wants to merge 1 commit intocockroachdb:masterfrom
Draft
cancelchecker: fix data race in GoroutineCPUHandle registration#164603ZhouXing19 wants to merge 1 commit intocockroachdb:masterfrom
ZhouXing19 wants to merge 1 commit intocockroachdb:masterfrom
Conversation
The row-based CancelChecker.Reset() was eagerly calling SQLCPUHandle.RegisterGoroutine() during Reset(), which runs on the main goroutine. The resulting GoroutineCPUHandle was keyed to the main goroutine's ID. When the CancelChecker was later used inside a ParallelUnorderedSynchronizer worker goroutine (via a Columnarizer wrapping a row-based processor), it called measureAndAdmit() on the main goroutine's handle. Meanwhile, the main goroutine's vectorized CancelChecker lazily registered and obtained the same handle (same goroutine ID). Both goroutines then called measureAndAdmit() concurrently on unsynchronized fields (cpuAccounted, pauseDur, etc.), triggering a data race detected by the race detector. The fix defers RegisterGoroutine() from Reset() to the first Check() call, matching the pattern already used in the vectorized cancel checker (colexecutils.CancelChecker) since PR cockroachdb#163771. This ensures each goroutine registers its own handle on the goroutine that actually uses it. Fixes cockroachdb#164346 Fixes cockroachdb#164347 Fixes cockroachdb#164348 Fixes cockroachdb#164349 Fixes cockroachdb#164350 Fixes cockroachdb#164352 Fixes cockroachdb#164353 Fixes cockroachdb#164354 Fixes cockroachdb#164355 Fixes cockroachdb#164594 Release note: None Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Merging to
|
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #164346
Fixes #164347
Fixes #164348
Fixes #164349
Fixes #164350
Fixes #164352
Fixes #164353
Fixes #164354
Fixes #164355
Fixes #164594
The row-based CancelChecker.Reset() was eagerly calling SQLCPUHandle.RegisterGoroutine() during Reset(), which runs on the main goroutine. The resulting GoroutineCPUHandle was keyed to the main goroutine's ID. When the CancelChecker was later used inside a ParallelUnorderedSynchronizer worker goroutine (via a Columnarizer wrapping a row-based processor), it called measureAndAdmit() on the main goroutine's handle. Meanwhile, the main goroutine's vectorized CancelChecker lazily registered and obtained the same handle (same goroutine ID). Both goroutines then called measureAndAdmit() concurrently on unsynchronized fields (cpuAccounted, pauseDur, etc.), triggering a data race detected by the race detector.
The fix defers RegisterGoroutine() from Reset() to the first Check() call, matching the pattern already used in the vectorized cancel checker (colexecutils.CancelChecker) since PR #163771. This ensures each goroutine registers its own handle on the goroutine that actually uses it.
Release note: None
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com