Skip to content

fix: SIGSEGV due to out of bounds access after delete + checkpoint#612

Open
adsharma wants to merge 1 commit into
mainfrom
delete_checkpoint_fix
Open

fix: SIGSEGV due to out of bounds access after delete + checkpoint#612
adsharma wants to merge 1 commit into
mainfrom
delete_checkpoint_fix

Conversation

@adsharma

Copy link
Copy Markdown
Contributor

Fixes: #610

Root cause

After relationship deletes followed by a checkpoint, an in-memory
CSR index entry (csrIndex->indices[nodeOffset]) can retain a stale
INVALID_ROW_IDX (UINT64_MAX) sentinel row. Passing that through
getQuotientRemainder(row, CHUNKED_NODE_GROUP_CAPACITY) produces
an out-of-range chunk-group index, which truncates to UINT32_MAX
in getGroup(idx_t) — an out-of-bounds std::vector access guarded
only by a debug DASSERT (a no-op in release), so callers dereference
a null/OOB group.

The sentinel survives because checkpointInMemAndOnDisk calls
collectLeafRegionsAndCSRLength (which runs
collectInMemRegionChangesAndUpdateHeaderLength on all leaf regions,
writing INVALID_ROW_IDX via setInvalid(i)), but then takes an
early-return path (line 538, regionsToCheckpoint.empty()) without
calling finalizeCheckpoint — so the index is left with stale
sentinels and the chunked groups are not cleared.

Fix

src/storage/table/csr_node_group.cpp — bounds-check chunkIdx against
chunkedGroups.getNumGroups(lock) before every unguarded getGroup
call site, treating stale rows as skip/deleted (consistent with
the two already-guarded sites at lines 851 and 1060). The 6 fixed
sites:

  1. scanCommittedInMemSequential → return empty result
  2. scanCommittedInMemRandom → skip row
  3. update (COMMITTED_IN_MEMORY) → no-op
  4. delete_ (COMMITTED_IN_MEMORY) → no-op (return false)
  5. collectInMemRegionChangesAndUpdateHeaderLength → treat as deleted, set invalid
  6. populateCSRLengthInMemOnly → treat as deleted, decrement length

 ### Root cause

 After relationship deletes followed by a checkpoint, an in-memory
 CSR index entry (csrIndex->indices[nodeOffset]) can retain a stale
 INVALID_ROW_IDX (UINT64_MAX) sentinel row. Passing that through
 getQuotientRemainder(row, CHUNKED_NODE_GROUP_CAPACITY) produces
 an out-of-range chunk-group index, which truncates to UINT32_MAX
 in getGroup(idx_t) — an out-of-bounds std::vector access guarded
 only by a debug DASSERT (a no-op in release), so callers dereference
 a null/OOB group.

 The sentinel survives because checkpointInMemAndOnDisk calls
 collectLeafRegionsAndCSRLength (which runs
 collectInMemRegionChangesAndUpdateHeaderLength on all leaf regions,
 writing INVALID_ROW_IDX via setInvalid(i)), but then takes an
 early-return path (line 538, regionsToCheckpoint.empty()) without
 calling finalizeCheckpoint — so the index is left with stale
 sentinels and the chunked groups are not cleared.

 ### Fix

 src/storage/table/csr_node_group.cpp — bounds-check chunkIdx against
 chunkedGroups.getNumGroups(lock) before every unguarded getGroup
 call site, treating stale rows as skip/deleted (consistent with
 the two already-guarded sites at lines 851 and 1060). The 6 fixed
 sites:

 1. scanCommittedInMemSequential → return empty result
 2. scanCommittedInMemRandom → skip row
 3. update (COMMITTED_IN_MEMORY) → no-op
 4. delete_ (COMMITTED_IN_MEMORY) → no-op (return false)
 5. collectInMemRegionChangesAndUpdateHeaderLength → treat as deleted, set invalid
 6. populateCSRLengthInMemOnly → treat as deleted, decrement length
@adsharma adsharma force-pushed the delete_checkpoint_fix branch from 7900c57 to 5cb87ff Compare June 23, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant