Skip to content

Expose programmatic kernel metrics counters (KernelStats) #179

Description

@dgenio

Summary

Add a lightweight, dependency-free counters object — invocations, grants, denials by
reason code, redaction events, budget downgrades/exhaustions, handle
stores/expansions/evictions, fallback activations — readable at any time from the
kernel.

Why this matters

Operators embedding the kernel currently have no cheap answer to "is the firewall
actually redacting anything?" or "how often are budgets downgrading responses?"
without exporting full traces. A counters snapshot enables health checks,
dashboards, and the contextweaver-style debugging pattern (its BuildStats is the
direct analogue in the sibling repo) — and it gives the OTel exporter (#125)
ready-made gauges, without making OTel a requirement.

Current evidence

  • No counters exist: kernel/__init__.py, firewall/transform.py, and policy.py log events but aggregate nothing.
  • otel.py (optional extra) exports spans, not metrics, and requires the extra installed.
  • Sibling-project precedent: contextweaver's BuildStats is the documented first debugging step in its agent docs — the pattern is proven in this ecosystem.

External context

"Counters first, exporters second" is the conventional layering: in-process counters
work everywhere and back any exporter.

Proposed implementation

  1. stats.py module with a KernelStats dataclass of integer counters plus
    snapshot() (returns an immutable copy) and reset().
  2. Increment at the natural choke points: grant_capability, perform_invoke,
    firewall transform (redactions/downgrades), HandleStore (stores/evictions),
    fallback loop.
  3. Thread-safety: simple lock or per-counter itertools.count-style discipline;
    document guarantees.
  4. Wire into otel.py as gauges where the extra is active (follow-up acceptable).

AI-agent execution notes

  • Inspect first: kernel/__init__.py, kernel/_invoke.py, firewall/transform.py, handles.py, otel.py.
  • Keep it dependency-free and cheap (counter increments only; no timing histograms in v1).
  • Edge cases: streaming counts once per stream; dry-run counted separately or not at all (decide and document).
  • Do not duplicate trace content — counters are aggregates, traces are records.

Acceptance criteria

  • A README-quickstart-style flow yields expected counter values (tested).
  • Denials are counted per stable reason code.
  • snapshot() is safe under concurrent invokes (stress-tested lightly).

Test plan

Unit tests asserting counters across grant/invoke/deny/expand scenarios; a small
asyncio concurrency test. Run make ci.

Documentation plan

"Monitoring the kernel" docs subsection; CHANGELOG Added; cross-reference #125.

Migration and compatibility notes

Additive; not expected to require migration.

Risks and tradeoffs

Counter taxonomy becomes mildly contractual — start with a small set. Lock contention
is negligible at this granularity.

Suggested labels

product, developer-experience, reliability

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions