feat: structured decision observability (reason codes, metrics, logging) by dgenio · Pull Request #191 · dgenio/agentfence

dgenio · 2026-06-20T14:15:54Z

Summary

Implements the "structured decision observability" group — five related issues that share one code area and implementation path — as a single coherent change. The unifying idea: make AgentFence's decisions machine-readable and operable, across the CLI (check) and both proxies, without compromising the local-first / no-telemetry posture.

What changed (by issue):

Typed reason-code taxonomy (Introduce a typed reason/decision-code taxonomy for audit reasons #136). Every decision now carries a stable policy.ReasonCode (e.g. path_denied, url_bare_ip, taint_escalated, approval_timeout) alongside the existing human-readable reason. Codes are set at every engine decision site, on taint escalation, and on approval resolution; recorded on the audit Event (schema_version → "4"); and grouped by audit summarize. Free-text reasons are unchanged (additive).
Decision metrics (Add per-tool and per-session decision metrics counters for the CLI path #169). New internal/metrics package: a thread-safe Counters (by decision, tool, reason code; taint escalations, approval outcomes, eval latency, errors). check --metrics prints a dependency-free summary to stderr on exit (never pollutes a --output json stdout stream).
Prometheus metrics endpoint ([Feature] OpenTelemetry metrics and structured decision export for the proxy #101). proxy/proxy-http expose the same counters at /metrics via opt-in --metrics-listen <addr>, plus evaluation latency and operational error rates. Mode-B / spec note: implemented as a dependency-free Prometheus text-exposition emitter rather than the OpenTelemetry SDK, to preserve the repo's "stdlib + yaml + uuid only" invariant. Off by default; local and operator-chosen.
Structured operational logging ([Feature] Structured operational logging (--log-format json) for the proxy #121, Add structured operational logging to check decisions for non-proxy pipelines #163). New internal/oplog package wraps log/slog; --log-format text|json on check, proxy, and proxy-http routes stderr diagnostics. text (default) is byte-stable / unchanged; json emits one structured record per line. The operational log stays distinct from the audit log and the decision/JSON-RPC output.

Related issue

Refs #136, #169, #101, #121, #163

How verified

make ci (fmt-check + vet + go test -race + coverage): green, total coverage 80.9%.
- New packages: internal/metrics 89.1%, internal/oplog 71.7%.
New tests: engine reason-code table (every constraint family + default + rule-match + taint escalation), metrics counters/Prometheus/HTTP-handler/concurrency, oplog text+json+debug-gating, audit by_reason_code summarize, proxy metrics recording, and CLI --metrics / --log-format json (asserting stdout stays valid JSON and the default text path is unchanged).
Manual smoke: check --metrics summary on stderr; audit events emit schema_version:"4" + reason_code; proxy-http --metrics-listen serves /metrics in Prometheus format and emits JSON ops logs under --log-format json.
The audit-event schema drift-guard test (internal/audit/schema_test.go) is updated in lockstep with the new reason_code field and schema version.

Tradeoffs / risks

Audit schema bump to "4". reason_code is omitempty, so old readers ignore it and pre-taxonomy logs (no code) summarize fine. Per project guidance the maintainer asked to disregard strict back-compat; the field is nonetheless additive.
Options.Logger type change (io.Writer → *slog.Logger) on both proxies — an internal API; the one test that set it was updated. Operational stderr strings changed shape (now structured), but check's text contract is preserved.
[Feature] OpenTelemetry metrics and structured decision export for the proxy #101 deviates from the issue's suggested OTel SDK (documented above) to keep the binary dependency-free; an OTLP exporter could be a focused follow-up if desired.

Scope notes (Mode B, one PR for all five issues as requested)

Touches >5 files / >200 lines by design — this is one feature area implemented end to end, per the explicit request. Two new packages (internal/metrics, internal/oplog) plus wiring; new docs/observability.md; README status table, docs/audit-event-schema.md, and CHANGELOG updated; stale README "Demo output" blocks refreshed (schema_version "4" + reason_code).
Adjacent/out of scope (sensible follow-ups): an OTel/OTLP exporter ([Feature] OpenTelemetry metrics and structured decision export for the proxy #101 extension); a CI doc-claim check tying the status table to real commands (Add a CI doc-claim check that ties README/doc status to actual commands #165).

Checklist

Tests added or updated
Documentation updated if needed
CI passes (make ci green locally; race + 80.9% coverage)
Issue number included

🤖 Generated with Claude Code

https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH

Generated by Claude Code

Implements the "structured decision observability" group as one coherent change across the engine, audit, proxies, and CLI: - Typed reason codes (#136): every decision carries a stable, machine-readable policy.ReasonCode alongside the human-readable reason. Set at every engine decision site, taint escalation, and approval resolution; recorded on the audit Event (schema_version bumped to "4") and grouped by `audit summarize`. Free-text reasons are unchanged. - Decision metrics (#169): new internal/metrics package with thread-safe Counters (by decision, tool, reason code; taint escalations, approval outcomes, eval latency, errors). `check --metrics` prints a summary to stderr on exit, dependency-free. - Prometheus endpoint (#101): proxy/proxy-http expose the same counters at /metrics via opt-in `--metrics-listen <addr>`, rendered in Prometheus text-exposition format with no third-party dependency (kept off by default; local and operator-controlled, per the no-telemetry posture). - Structured operational logging (#121, #163): new internal/oplog package wraps log/slog; `--log-format text|json` on check/proxy/proxy-http routes stderr diagnostics. text (default) is unchanged; json emits one record per line. The operational log stays distinct from the audit log and decision output. Docs: new docs/observability.md; audit-event-schema, README status table, and CHANGELOG updated. README demo blocks refreshed (schema_version "4" + reason_code). make ci green (race + 80.9% coverage). Refs #136, #169, #101, #121, #163 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH

Copilot

Pull request overview

This PR implements “structured decision observability” across check, proxy, and proxy-http by introducing stable decision reason codes, in-process decision/latency/error counters with optional Prometheus exposition, and structured operational logging to stderr—while keeping the project’s local-first/no-telemetry posture.

Changes:

Add policy.ReasonCode and propagate it through engine decisions, taint escalation, approval resolution, and audit events (audit schema bumped to "4"), plus audit summarization by reason code.
Introduce internal/metrics counters (CLI check --metrics stderr summary; proxy /metrics via --metrics-listen) and wire metrics recording into both proxies.
Introduce internal/oplog (slog wrapper) and wire --log-format text|json into CLI and proxies; update docs/README/CHANGELOG and add extensive tests.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
schema/agentfence-audit-event.schema.json	Bumps schema version examples to 4 and adds `reason_code` field to the JSON schema.
README.md	Updates status table and demo output examples to schema v4 + `reason_code`.
internal/proxy/proxy.go	Switches proxy diagnostics to `*slog.Logger`, adds metrics wiring (latency/errors/decision counts), and records approval reason codes.
internal/proxy/proxy_test.go	Updates test options to match new logger default/type.
internal/proxy/metrics_test.go	Adds proxy-side test verifying metrics recording from `processAgentLine`.
internal/policy/reasoncode.go	Introduces the reason-code taxonomy as a stable enum-like string type.
internal/policy/policy.go	Adds `ReasonCode` to `EvaluationResult` JSON output (`omitempty`).
internal/oplog/oplog.go	Adds operational logging package wrapping `log/slog` with text/json formats.
internal/oplog/oplog_test.go	Adds tests for format parsing, text rendering, JSON output validity, and debug gating.
internal/metrics/metrics.go	Adds thread-safe counters, snapshot formatting (text/Prometheus), and latency/error tracking.
internal/metrics/metrics_test.go	Adds tests for counting/snapshotting, determinism, Prometheus output, and concurrency.
internal/metrics/http.go	Adds HTTP handlers/mux for `/metrics` endpoint (Prometheus text exposition).
internal/metrics/http_test.go	Adds tests for handler method gating and mux routing.
internal/httpproxy/httpproxy.go	Switches HTTP proxy diagnostics to `*slog.Logger`, adds metrics wiring, and records approval reason codes.
internal/engine/reasoncode_test.go	Adds coverage ensuring engine decision paths set correct `ReasonCode` (including taint escalation).
internal/engine/engine.go	Propagates reason codes through constraint evaluation and taint escalation into results/events.
internal/audit/summarize.go	Adds `ByReasonCode` grouping and includes it in text summary output.
internal/audit/summarize_reasoncode_test.go	Adds test for summarize-by-reason-code behavior and text output section.
internal/audit/audit.go	Bumps audit schema version to 4; adds `ReasonCode` to `Event` and sets it in new events/error events.
internal/approval/approval.go	Adds reason-code mapping to approval outcomes (`Outcome.Code`).
docs/observability.md	New doc describing decision streams, operational logging, CLI metrics summary, and proxy `/metrics`.
docs/audit-event-schema.md	Updates schema version to 4 and documents the reason-code taxonomy.
cmd/agentfence/observability_test.go	Adds CLI tests for `--metrics` stderr summary and `--log-format json` stderr behavior without stdout pollution.
cmd/agentfence/main.go	Adds flags/wiring for `--log-format`, `--metrics`, `--metrics-listen`; starts/stops metrics server for proxies.
CHANGELOG.md	Documents the new observability feature group and the schema bump.
.gitignore	Ignores locally-built `./agentfence` binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…rect docs - WritePrometheus now emits agentfence_reason_codes_total{code=...} so the /metrics endpoint actually exposes the reason-code breakdown promised by the feature and docs; corrected the misleading agentfence_decisions_total HELP text ("by tool and decision") and the function doc comment. - ServeMux doc comment fixed: it builds and returns a mux; it does not run a server (the caller owns the http.Server). - docs/observability.md: narrowed the "text default is byte-stable" claim to check's stderr only; note the proxies' text diagnostics changed shape and recommend --log-format json for machine parsing. Refs #101, #121, #163 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH

The /metrics example claimed HELP "by decision and reason code" but the emitter labels agentfence_decisions_total by {tool,decision} and exports reason codes as a separate agentfence_reason_codes_total series. Correct the HELP text, add the reason_codes_total series to the example, and list it in the metric-families table so the docs match WritePrometheus output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RLq9ynrFfGmCRJQDBGYsWz

…doc error kinds Address audit findings on the observability surface (no behavior change to decisions or the audit chain): - oplog.textHandler now serializes writes through a shared mutex (a pointer, shared across WithAttrs/WithGroup clones), matching slog's stdlib handlers. The stdio proxy logs from two relay goroutines, so without it records could interleave on stderr. - serveMetrics binds the listener synchronously (net.Listen) so an unusable --metrics-listen address is reported at startup instead of silently in a goroutine; a bind failure disables the endpoint without taking down the proxy. - docs/observability.md enumerates the actual agentfence_errors_total kinds (oversize, unparsed, batch, relay) emitted by the proxies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RLq9ynrFfGmCRJQDBGYsWz

Copilot AI review requested due to automatic review settings June 20, 2026 14:15

Copilot started reviewing on behalf of dgenio June 20, 2026 14:16 View session

Copilot AI reviewed Jun 20, 2026

View reviewed changes

Comment thread internal/metrics/metrics.go Outdated

Comment thread docs/observability.md

Comment thread internal/metrics/http.go Outdated

claude added 3 commits June 20, 2026 14:21

dgenio merged commit bccfdf2 into main Jun 21, 2026
2 checks passed

dgenio deleted the claude/issue-triage-grouping-wkmnhz branch June 21, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: structured decision observability (reason codes, metrics, logging)#191

feat: structured decision observability (reason codes, metrics, logging)#191
dgenio merged 4 commits into
mainfrom
claude/issue-triage-grouping-wkmnhz

dgenio commented Jun 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented Jun 20, 2026

Summary

Related issue

How verified

Tradeoffs / risks

Scope notes (Mode B, one PR for all five issues as requested)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants