fix: SIGSEGV due to out of bounds access after delete + checkpoint#612
Open
adsharma wants to merge 1 commit into
Open
fix: SIGSEGV due to out of bounds access after delete + checkpoint#612adsharma wants to merge 1 commit into
adsharma wants to merge 1 commit into
Conversation
### Root cause After relationship deletes followed by a checkpoint, an in-memory CSR index entry (csrIndex->indices[nodeOffset]) can retain a stale INVALID_ROW_IDX (UINT64_MAX) sentinel row. Passing that through getQuotientRemainder(row, CHUNKED_NODE_GROUP_CAPACITY) produces an out-of-range chunk-group index, which truncates to UINT32_MAX in getGroup(idx_t) — an out-of-bounds std::vector access guarded only by a debug DASSERT (a no-op in release), so callers dereference a null/OOB group. The sentinel survives because checkpointInMemAndOnDisk calls collectLeafRegionsAndCSRLength (which runs collectInMemRegionChangesAndUpdateHeaderLength on all leaf regions, writing INVALID_ROW_IDX via setInvalid(i)), but then takes an early-return path (line 538, regionsToCheckpoint.empty()) without calling finalizeCheckpoint — so the index is left with stale sentinels and the chunked groups are not cleared. ### Fix src/storage/table/csr_node_group.cpp — bounds-check chunkIdx against chunkedGroups.getNumGroups(lock) before every unguarded getGroup call site, treating stale rows as skip/deleted (consistent with the two already-guarded sites at lines 851 and 1060). The 6 fixed sites: 1. scanCommittedInMemSequential → return empty result 2. scanCommittedInMemRandom → skip row 3. update (COMMITTED_IN_MEMORY) → no-op 4. delete_ (COMMITTED_IN_MEMORY) → no-op (return false) 5. collectInMemRegionChangesAndUpdateHeaderLength → treat as deleted, set invalid 6. populateCSRLengthInMemOnly → treat as deleted, decrement length
7900c57 to
5cb87ff
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #610
Root cause
After relationship deletes followed by a checkpoint, an in-memory
CSR index entry (csrIndex->indices[nodeOffset]) can retain a stale
INVALID_ROW_IDX (UINT64_MAX) sentinel row. Passing that through
getQuotientRemainder(row, CHUNKED_NODE_GROUP_CAPACITY) produces
an out-of-range chunk-group index, which truncates to UINT32_MAX
in getGroup(idx_t) — an out-of-bounds std::vector access guarded
only by a debug DASSERT (a no-op in release), so callers dereference
a null/OOB group.
The sentinel survives because checkpointInMemAndOnDisk calls
collectLeafRegionsAndCSRLength (which runs
collectInMemRegionChangesAndUpdateHeaderLength on all leaf regions,
writing INVALID_ROW_IDX via setInvalid(i)), but then takes an
early-return path (line 538, regionsToCheckpoint.empty()) without
calling finalizeCheckpoint — so the index is left with stale
sentinels and the chunked groups are not cleared.
Fix
src/storage/table/csr_node_group.cpp — bounds-check chunkIdx against
chunkedGroups.getNumGroups(lock) before every unguarded getGroup
call site, treating stale rows as skip/deleted (consistent with
the two already-guarded sites at lines 851 and 1060). The 6 fixed
sites: