fix(storage): guard CSR getGroup against stale/OOB group indices (SIGSEGV) by rysweet · Pull Request #610 · LadybugDB/ladybug

rysweet · 2026-06-22T15:57:46Z

Problem

A long-running process doing relationship deletes + checkpoints crashes with
SIGSEGV during the consolidation/checkpoint phase. The faulting access is
GroupCollection<ChunkedNodeGroup>::getGroup(groupIdx = 4294967295 / UINT32_MAX).

Root cause

After relationship delete + checkpoint churn, csrIndex->indices[nodeOffset] can
retain a stale row (an INVALID_ROW_IDX sentinel). getQuotientRemainder(INVALID_ROW_IDX)
yields an out-of-range chunk-group index. GroupCollection::getGroup only had a
debug-only DASSERT(groupIdx < groups.size()) (a no-op in release), so it performed an
out-of-bounds std::vector access and callers dereferenced the result → undefined
behavior / SIGSEGV.

The insertion-write loop in checkpointColumnInRegion already guards this case with
if (row == INVALID_ROW_IDX) continue;, but the two checkpoint-collect loops
(collectInMemRegionChangesAndUpdateHeaderLength and populateCSRLengthInMemOnly),
plus the scan / update / delete_ paths, do not — they dereference the getGroup
result directly.

Fix

GroupCollection::getGroup / getGroupNoLock now return nullptr for an out-of-range
index instead of performing an OOB access.
Every CSRNodeGroup deref site handles nullptr:
- scan (scanCommittedInMemSequential / scanCommittedInMemRandom): skip / empty result;
- update: no-op (target row gone); delete_: returns false;
- checkpoint insertion-write: skip;
- the two checkpoint-collect isDeleted loops: treat a stale/null group as deleted
  (purges the stale row via turnToNonSequential() + setInvalid(i)), consistent with
  the engine's existing handling of deleted rows.

This is version-independent (reproduced on 0.15.x–0.17.1) and present on current main.

Validation

Reproduced via a deterministic 120-iteration delete+checkpoint churn test (amplihack-memory
issue #100). Before: SIGSEGV mid-churn in collectInMemRegionChangesAndUpdateHeaderLength.
After (engine rebuilt from source with this patch): the workload completes cleanly and the
test passes.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…SEGV) Relationship delete + checkpoint churn can leave a stale row in csrIndex->indices[nodeOffset] that maps (via getQuotientRemainder of an INVALID_ROW_IDX sentinel) to an out-of-range chunk-group index. getGroup then performed an out-of-bounds std::vector access and callers dereferenced the result, crashing with SIGSEGV during the consolidation/ checkpoint phase (getGroup(groupIdx=UINT32_MAX)). GroupCollection::getGroup/getGroupNoLock now return nullptr for an out-of-range index instead of an OOB access, and every CSRNodeGroup deref site (scan, update, delete_, checkpoint insertion-write, and the two checkpoint-collect isDeleted loops) handles nullptr by skipping / treating the stale row as deleted. This matches the engine's existing INVALID_ROW_IDX skip in the insertion loop. Reproduced and fixed via a 120-iteration delete+checkpoint churn test in amplihack-memory (issue LadybugDB#100): the workload previously SIGSEGV'd mid-churn in collectInMemRegionChangesAndUpdateHeaderLength and now completes cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rysweet · 2026-06-22T16:09:38Z

Withdrawing this PR — it was opened prematurely. Closing to validate the fix and CI in a private fork first; will re-open a clean, CI-green PR if appropriate. Apologies for the noise.

rysweet mentioned this pull request Jun 22, 2026

SEGV: CSR rel-table scan getGroup(UINT32_MAX) null-deref — persisted corruption, version-independent (0.15.3/0.15.4/0.17.1) rysweet/amplihack-memory-lib#100

Closed

rysweet closed this Jun 22, 2026

rysweet deleted the fix/csr-getgroup-null-deref-segv branch June 22, 2026 16:17

This was referenced Jun 22, 2026

SIGSEGV: getGroup(UINT32_MAX) — unguarded out-of-range CSR chunk-group index on rel delete + checkpoint #611

Open

fix: SIGSEGV due to out of bounds access after delete + checkpoint #612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(storage): guard CSR getGroup against stale/OOB group indices (SIGSEGV)#610

fix(storage): guard CSR getGroup against stale/OOB group indices (SIGSEGV)#610
rysweet wants to merge 1 commit into
LadybugDB:mainfrom
rysweet:fix/csr-getgroup-null-deref-segv

rysweet commented Jun 22, 2026

Uh oh!

rysweet commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rysweet commented Jun 22, 2026

Problem

Root cause

Fix

Validation

Uh oh!

rysweet commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant