feat: structured decision observability (reason codes, metrics, logging)#191
Merged
Conversation
Implements the "structured decision observability" group as one coherent change across the engine, audit, proxies, and CLI: - Typed reason codes (#136): every decision carries a stable, machine-readable policy.ReasonCode alongside the human-readable reason. Set at every engine decision site, taint escalation, and approval resolution; recorded on the audit Event (schema_version bumped to "4") and grouped by `audit summarize`. Free-text reasons are unchanged. - Decision metrics (#169): new internal/metrics package with thread-safe Counters (by decision, tool, reason code; taint escalations, approval outcomes, eval latency, errors). `check --metrics` prints a summary to stderr on exit, dependency-free. - Prometheus endpoint (#101): proxy/proxy-http expose the same counters at /metrics via opt-in `--metrics-listen <addr>`, rendered in Prometheus text-exposition format with no third-party dependency (kept off by default; local and operator-controlled, per the no-telemetry posture). - Structured operational logging (#121, #163): new internal/oplog package wraps log/slog; `--log-format text|json` on check/proxy/proxy-http routes stderr diagnostics. text (default) is unchanged; json emits one record per line. The operational log stays distinct from the audit log and decision output. Docs: new docs/observability.md; audit-event-schema, README status table, and CHANGELOG updated. README demo blocks refreshed (schema_version "4" + reason_code). make ci green (race + 80.9% coverage). Refs #136, #169, #101, #121, #163 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH
There was a problem hiding this comment.
Pull request overview
This PR implements “structured decision observability” across check, proxy, and proxy-http by introducing stable decision reason codes, in-process decision/latency/error counters with optional Prometheus exposition, and structured operational logging to stderr—while keeping the project’s local-first/no-telemetry posture.
Changes:
- Add
policy.ReasonCodeand propagate it through engine decisions, taint escalation, approval resolution, and audit events (audit schema bumped to"4"), plus audit summarization by reason code. - Introduce
internal/metricscounters (CLIcheck --metricsstderr summary; proxy/metricsvia--metrics-listen) and wire metrics recording into both proxies. - Introduce
internal/oplog(slog wrapper) and wire--log-format text|jsoninto CLI and proxies; update docs/README/CHANGELOG and add extensive tests.
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/agentfence-audit-event.schema.json | Bumps schema version examples to 4 and adds reason_code field to the JSON schema. |
| README.md | Updates status table and demo output examples to schema v4 + reason_code. |
| internal/proxy/proxy.go | Switches proxy diagnostics to *slog.Logger, adds metrics wiring (latency/errors/decision counts), and records approval reason codes. |
| internal/proxy/proxy_test.go | Updates test options to match new logger default/type. |
| internal/proxy/metrics_test.go | Adds proxy-side test verifying metrics recording from processAgentLine. |
| internal/policy/reasoncode.go | Introduces the reason-code taxonomy as a stable enum-like string type. |
| internal/policy/policy.go | Adds ReasonCode to EvaluationResult JSON output (omitempty). |
| internal/oplog/oplog.go | Adds operational logging package wrapping log/slog with text/json formats. |
| internal/oplog/oplog_test.go | Adds tests for format parsing, text rendering, JSON output validity, and debug gating. |
| internal/metrics/metrics.go | Adds thread-safe counters, snapshot formatting (text/Prometheus), and latency/error tracking. |
| internal/metrics/metrics_test.go | Adds tests for counting/snapshotting, determinism, Prometheus output, and concurrency. |
| internal/metrics/http.go | Adds HTTP handlers/mux for /metrics endpoint (Prometheus text exposition). |
| internal/metrics/http_test.go | Adds tests for handler method gating and mux routing. |
| internal/httpproxy/httpproxy.go | Switches HTTP proxy diagnostics to *slog.Logger, adds metrics wiring, and records approval reason codes. |
| internal/engine/reasoncode_test.go | Adds coverage ensuring engine decision paths set correct ReasonCode (including taint escalation). |
| internal/engine/engine.go | Propagates reason codes through constraint evaluation and taint escalation into results/events. |
| internal/audit/summarize.go | Adds ByReasonCode grouping and includes it in text summary output. |
| internal/audit/summarize_reasoncode_test.go | Adds test for summarize-by-reason-code behavior and text output section. |
| internal/audit/audit.go | Bumps audit schema version to 4; adds ReasonCode to Event and sets it in new events/error events. |
| internal/approval/approval.go | Adds reason-code mapping to approval outcomes (Outcome.Code). |
| docs/observability.md | New doc describing decision streams, operational logging, CLI metrics summary, and proxy /metrics. |
| docs/audit-event-schema.md | Updates schema version to 4 and documents the reason-code taxonomy. |
| cmd/agentfence/observability_test.go | Adds CLI tests for --metrics stderr summary and --log-format json stderr behavior without stdout pollution. |
| cmd/agentfence/main.go | Adds flags/wiring for --log-format, --metrics, --metrics-listen; starts/stops metrics server for proxies. |
| CHANGELOG.md | Documents the new observability feature group and the schema bump. |
| .gitignore | Ignores locally-built ./agentfence binary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rect docs
- WritePrometheus now emits agentfence_reason_codes_total{code=...} so the
/metrics endpoint actually exposes the reason-code breakdown promised by the
feature and docs; corrected the misleading agentfence_decisions_total HELP
text ("by tool and decision") and the function doc comment.
- ServeMux doc comment fixed: it builds and returns a mux; it does not run a
server (the caller owns the http.Server).
- docs/observability.md: narrowed the "text default is byte-stable" claim to
check's stderr only; note the proxies' text diagnostics changed shape and
recommend --log-format json for machine parsing.
Refs #101, #121, #163
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH
The /metrics example claimed HELP "by decision and reason code" but the
emitter labels agentfence_decisions_total by {tool,decision} and exports
reason codes as a separate agentfence_reason_codes_total series. Correct
the HELP text, add the reason_codes_total series to the example, and list
it in the metric-families table so the docs match WritePrometheus output.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RLq9ynrFfGmCRJQDBGYsWz
…doc error kinds Address audit findings on the observability surface (no behavior change to decisions or the audit chain): - oplog.textHandler now serializes writes through a shared mutex (a pointer, shared across WithAttrs/WithGroup clones), matching slog's stdlib handlers. The stdio proxy logs from two relay goroutines, so without it records could interleave on stderr. - serveMetrics binds the listener synchronously (net.Listen) so an unusable --metrics-listen address is reported at startup instead of silently in a goroutine; a bind failure disables the endpoint without taking down the proxy. - docs/observability.md enumerates the actual agentfence_errors_total kinds (oversize, unparsed, batch, relay) emitted by the proxies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RLq9ynrFfGmCRJQDBGYsWz
This was referenced Jun 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the "structured decision observability" group — five related issues that share one code area and implementation path — as a single coherent change. The unifying idea: make AgentFence's decisions machine-readable and operable, across the CLI (
check) and both proxies, without compromising the local-first / no-telemetry posture.What changed (by issue):
policy.ReasonCode(e.g.path_denied,url_bare_ip,taint_escalated,approval_timeout) alongside the existing human-readable reason. Codes are set at every engine decision site, on taint escalation, and on approval resolution; recorded on the auditEvent(schema_version→"4"); and grouped byaudit summarize. Free-text reasons are unchanged (additive).internal/metricspackage: a thread-safeCounters(by decision, tool, reason code; taint escalations, approval outcomes, eval latency, errors).check --metricsprints a dependency-free summary to stderr on exit (never pollutes a--output jsonstdout stream).proxy/proxy-httpexpose the same counters at/metricsvia opt-in--metrics-listen <addr>, plus evaluation latency and operational error rates. Mode-B / spec note: implemented as a dependency-free Prometheus text-exposition emitter rather than the OpenTelemetry SDK, to preserve the repo's "stdlib + yaml + uuid only" invariant. Off by default; local and operator-chosen.checkdecisions for non-proxy pipelines #163). Newinternal/oplogpackage wrapslog/slog;--log-format text|jsononcheck,proxy, andproxy-httproutes stderr diagnostics.text(default) is byte-stable / unchanged;jsonemits one structured record per line. The operational log stays distinct from the audit log and the decision/JSON-RPC output.Related issue
Refs #136, #169, #101, #121, #163
How verified
make ci(fmt-check + vet +go test -race+ coverage): green, total coverage 80.9%.internal/metrics89.1%,internal/oplog71.7%.by_reason_codesummarize, proxy metrics recording, and CLI--metrics/--log-format json(asserting stdout stays valid JSON and the default text path is unchanged).check --metricssummary on stderr; audit events emitschema_version:"4"+reason_code;proxy-http --metrics-listenserves/metricsin Prometheus format and emits JSON ops logs under--log-format json.internal/audit/schema_test.go) is updated in lockstep with the newreason_codefield and schema version.Tradeoffs / risks
"4".reason_codeisomitempty, so old readers ignore it and pre-taxonomy logs (no code) summarize fine. Per project guidance the maintainer asked to disregard strict back-compat; the field is nonetheless additive.Options.Loggertype change (io.Writer→*slog.Logger) on both proxies — an internal API; the one test that set it was updated. Operational stderr strings changed shape (now structured), butcheck's text contract is preserved.Scope notes (Mode B, one PR for all five issues as requested)
internal/metrics,internal/oplog) plus wiring; newdocs/observability.md; README status table,docs/audit-event-schema.md, and CHANGELOG updated; stale README "Demo output" blocks refreshed (schema_version "4"+reason_code).Checklist
make cigreen locally; race + 80.9% coverage)🤖 Generated with Claude Code
https://claude.ai/code/session_01Y5TznmPR3zr2ZvppT2pMyH
Generated by Claude Code