Incremental indexing leaves per-(class, property) stats stale after retractions

## Summary

The per-`(class, property)` flake counts in the index stats
(`IndexStats.graphs[g].classes[*].properties[*].datatypes`) are not
current-state-accurate after an **incremental** refresh. A commit that only
*retracts* a property flake (with no new assertion of that property on the
subject) computes the correct decrement but never applies it, so the count stays
too high. The property-level counts (`graphs[g].properties[].count`) decrement
correctly; the class-scoped per-property counts do not.

## Root cause

- The stats hook records per-subject datatype deltas with **signed** values, so
  retractions decrement there: `id_hook.rs` (`subject_prop_dts ... += delta`).
- But the per-subject property **presence** set is **assert-only**: `id_hook.rs`
  (the branch gated on `rec.op`).
- The incremental class-stat merge in `build/incremental.rs` only applies a
  class's `class_prop_dts` deltas when that class appears in `class_properties`,
  which is derived from the assert-only presence set. For a retraction-only
  change the class is never revisited, so its decrement delta is dropped.

## Where it shows up

A fast path answers `COUNT(*)` of `?s rdf:type ?o1 . ?s P ?o2` directly from these
stats as `Σ_C Σ_dt classStat[C][P].count` — each `P`-flake on a `k`-typed subject
is attributed once per class, which equals the join's product-sum, so no scan is
needed. With stale counts this **over-counts** on any ledger that has had an
indexed retraction touching `P` on a typed subject.

## Current mitigation (stopgap, not ideal)

The fold is gated on `store.lex_sorted_string_ids()`, which is set only by bulk
import (`import.rs`) and cleared by any incremental refresh
(`incremental_root.rs`) — a reliable proxy for "class stats untouched by
incremental drift." So the fold fires only on pure bulk-import indexes (where the
full build produces exact counts) and defers to the always-correct merge
everywhere else. This enables the optimization for bulk-imported datasets but
disables it for any incrementally-updated ledger.

## Proposed fix

Make the incremental class-stat merge apply `class_prop_dts` (and lang/ref)
deltas for **every** class that has such a delta, not just classes with an
assertion in the current commit:

- Drive the per-class merge by the union of the delta-source maps
  (`class_properties ∪ class_prop_dts.keys ∪ class_prop_lang_deltas.keys ∪
  ref_edges.keys`).
- Apply the datatype deltas to the prior `ClassPropertyUsage.datatypes`
  regardless of presence, and drop entries that reach 0.
- Add indexer tests: an incremental refresh that retracts a property flake on a
  (base-)typed subject decrements the class-property count; full retraction
  removes it.

Once accurate, drop the `lex_sorted_string_ids` gate so the fold works for all
indexes.

## Related

The runtime novelty stats path (`runtime_stats.rs::assemble_fast_stats`) only
attributes property deltas when the subject's `rdf:type` is asserted in the same
batch. The fast path is gated to non-overlay so it isn't affected today, but it's
the same class of issue and worth addressing together.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental indexing leaves per-(class, property) stats stale after retractions #1266

Summary

Root cause

Where it shows up

Current mitigation (stopgap, not ideal)

Proposed fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incremental indexing leaves per-(class, property) stats stale after retractions #1266

Description

Summary

Root cause

Where it shows up

Current mitigation (stopgap, not ideal)

Proposed fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions