fix(resolve): prevent crashes in _merge_entities during multi-cluster runs#1
Conversation
… runs During a single run_resolution pass the mentions list is read once up front. Later clusters can name a survivor_id an earlier cluster already merged away, producing two crashes in _merge_entities: an AttributeError when the survivor-row SELECT returns None, and a sqlite3.IntegrityError when re-pointing relations onto the survivor violates the unique index on (subject_id, predicate, object_id, evidence_chunk_id). Return early if the survivor row is gone; use UPDATE OR IGNORE on the relation re-point and the canonical-name update; then delete relations still pointing at dropped ids so nothing dangles. Adds regression tests for both the stale-survivor and UNIQUE-collision paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR fixes two crash-level bugs in
Confidence Score: 4/5The crash fixes are correct and well-tested; the two new tests cover both failure paths described in the PR. The merge-count overcounting is cosmetic and does not affect graph correctness. The core changes — early-exit guard, OR IGNORE on relations and canonical name, orphan DELETE — correctly eliminate the crashes described in the PR and are backed by targeted unit tests. One unaddressed bug exists: run_resolution increments total_merges after every _merge_entities call regardless of whether the early-return path was taken, so the count stored in resolution_runs will overstate actual merges whenever a stale-survivor collision occurs. There is also a known data-quality trade-off where drop_ids entities are silently left unresolved when the early return fires, though subsequent runs will catch them. kgmd/resolve.py — specifically the total_merges increment at line 116 and the early-return behavior in _merge_entities.
|
| survivor_row = conn.execute( | ||
| "SELECT attributes FROM entities WHERE id = ?", (survivor_id,) | ||
| ).fetchone() | ||
| if survivor_row is None: | ||
| return |
There was a problem hiding this comment.
Drop-ids left unresolved on stale-survivor early return
When the early return fires, the drop_ids entities remain in the graph as unmerged duplicates — their entity rows, mentions, and relations are untouched. The PR description explicitly acknowledges this trade-off ("entity was already resolved into another survivor"), but in practice the dropped entities should have been folded into whatever entity ultimately absorbed survivor_id. Over multiple run_resolution calls the same drop candidates will keep surfacing as a cluster, so subsequent runs will likely catch them, but within a single pass the result is silent data-quality degradation rather than a true merge.
During a single
run_resolutionpass the mentions list is read once up front. Later clusters can name asurvivor_idan earlier cluster already merged away, producing two crashes in_merge_entities: (1) an AttributeError when the survivor-row SELECT returnsNone, and (2) asqlite3.IntegrityErrorwhen re-pointing relations onto the survivor violates the unique index on(subject_id, predicate, object_id, evidence_chunk_id)(the canonical-name UPDATE can likewise hitUNIQUE(canonical_name, entity_type)).Fix, all inside
_merge_entities: return early if the survivor row is gone (it was already resolved into another survivor); useUPDATE OR IGNOREon the relation re-point and the canonical-name update so a pre-existing duplicate doesn't abort the transaction; then delete relations still pointing at dropped ids so nothing dangles.Verified on a real corpus: 12,895 mentions resolved into 6,520 entities with 2,360 merges; before the fix the run crashed at the first stale-id collision.