Skip to content

Freshness: guard graph_accel co-advance on the direct epoch-pair path (ADR-207 step 4 follow-up) #465

@aaronsb

Description

@aaronsb

Context

ADR-207 step 4 (PR #464) declared graph_accel.generation as a sub-counter that co-advances with the universal tick (graph_epochs.event_id). That co-advance is guaranteed by construction for callers that route through AGEClient.record_mutation — it advances the tick (record_epoch + complete_epoch) and invalidates the accelerator (graph.invalidate()) from one place, and a unit test pins it (tests/unit/lib/test_record_mutation_coadvance.py).

The gap

The co-advance is not guaranteed for the other mutation path: long-running jobs that call the record_epoch / complete_epoch pair directly (so they can tag nodes with the event_id mid-run). On that path, advancing the tick and invalidating the accelerator are two separate calls the author must remember to pair. Nothing enforces it — no construction guarantee, no test, no lint.

Surfaced as an optional observation in the PR #464 review.

Current state — not a live bug

The only direct-pair caller today is api/app/workers/ingestion_worker.py, and it does co-advance correctly: complete_epoch(event_id, "completed") (line ~721) is followed by age_client.graph.invalidate() (line ~726). So the accelerator and the tick stay in sync for ingestion.

The risk is future drift: a new long-running job (or a refactor of the existing one) that advances the tick but forgets graph.invalidate() would leave the in-memory grounding/polarity accelerator fresh-looking past a graph that has changed — silently serving stale grounding until something else invalidates it. This is precisely the class of silent-staleness defect ADR-207 set out to make impossible by construction.

Documented for now: PR #464 added a co-advance caveat to the record_epoch docstring and qualified the "by construction" claim in the SubCounter docstring (api/app/lib/freshness.py). Documentation is a speed bump, not a guard.

Options to consider (not yet decided)

  1. A test that pins the existing direct-pair caller co-advances — assert ingestion_worker's success path invalidates after complete_epoch. Cheap; pins today's behavior but doesn't protect a new path.
  2. A lint/static check flagging any record_epoch( use not paired with a graph.invalidate() (or record_mutation) in the same unit — protects new paths, but the heuristic is fuzzy (the pair spans a job's start and end).
  3. A helper that wraps the long-job lifecycle — e.g. a context manager with age_client.mutation_epoch(kind) as event_id: ... that records on enter, and on exit completes the epoch and invalidates the accelerator, so the long-job path also co-advances by construction. Strongest option; turns convention into structure, but a larger refactor touching the ingestion worker.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitectural decisions/changesenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions