Conversation
- Update go.opentelemetry.io/otel/sdk and sdk/metric from v1.39.0 to v1.40.0 - Make `update-deps` fetch test deps, run `go mod tidy -v`, and verify the module
- Add admin audit archiver (JSONL) and tests; trigger archiving after retention/pruning - Wire retention + archive dir/interval options through worker/admin configs - Add Prometheus metrics endpoint (/admin/v1/metrics/prometheus) and text rendering - Bump toolchain/deps (Go 1.26.0, golangci-lint v2.9.0, grpc v1.79.1)
There was a problem hiding this comment.
Pull request overview
Adds admin-facing operational features (audit retention + archival, and Prometheus-format metrics) to support the admin UI/service, along with a toolchain/dependency bump.
Changes:
- Add admin audit log age retention with optional file-based archival of pruned/evicted audit events.
- Add Prometheus text rendering for admin observability metrics and expose new gateway endpoints for Prometheus metrics.
- Bump Go/tooling versions and update a few dependencies.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| worker.go | Adds TaskManager fields/config wiring for audit retention + archival; starts archival loop with workers. |
| options.go | Adds TaskManager options for audit retention + archive dir/interval. |
| admin_audit.go | Implements age cutoff filtering and age-based pruning + archival hookup. |
| admin_audit_archive.go | Adds file-backed JSONL audit archiver and background flush loop. |
| admin_audit_archive_test.go | Adds a test validating archival of aged-out audit events. |
| admin_observability_prometheus.go | Adds Prometheus text exposition renderer for AdminObservability snapshots. |
| admin_gateway.go | Adds /admin/v1/metrics/prometheus endpoint and format negotiation for metrics output. |
| cmd/worker-service/main.go | Adds env/config plumbing for audit retention + archive settings. |
| cmd/worker-admin/main.go | Adds env/config plumbing for audit retention + archive settings. |
| admin-ui/README.md | Documents new env vars and metrics endpoints for the gateway. |
| PRD-admin-service.md | Updates PRD to reflect Prometheus metrics export and audit retention/archival support. |
| job_payload.go | Adds an explicit default branch in numeric parsing helper. |
| go.mod / go.sum | Updates Go version and bumps OTel SDK + gRPC dependencies. |
| Makefile | Updates tooling versions and tightens update-deps target. |
| Dockerfile / Dockerfile.worker | Updates Go version build arg. |
| .project-settings.env | Updates Go + golangci-lint versions. |
| .pre-commit/*-hook | Updates referenced Go / golangci-lint versions. |
| .golangci.yaml | Updates Go version used by golangci-lint. |
| cspell.json | Adds “Archiver” to dictionary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| flush() | ||
|
|
||
| return |
There was a problem hiding this comment.
On shutdown (ctx.Done), the archival loop flushes only the current in-memory batch and returns without draining archiver.queue. Any events still buffered in the channel at cancellation time will be dropped and never written. Consider draining the queue (non-blocking) into the batch before the final flush, or providing an explicit Close/Flush path invoked from TaskManager.StopNow/StopGraceful.
| flush() | |
| return | |
| // On shutdown, best-effort drain any events that are already queued | |
| for { | |
| select { | |
| case event := <-archiver.queue: | |
| batch = append(batch, event) | |
| if len(batch) >= adminAuditArchiveBatchSize { | |
| flush() | |
| } | |
| default: | |
| // No more events available without blocking; flush remaining batch | |
| flush() | |
| return | |
| } | |
| } |
| // WithAdminAuditArchiveDir enables file archival for aged-out audit events. | ||
| func WithAdminAuditArchiveDir(dir string) TaskManagerOption { | ||
| return func(cfg *taskManagerConfig) { | ||
| cfg.auditArchiveDir = dir | ||
| } |
There was a problem hiding this comment.
WithAdminAuditArchiveDir is documented as archiving only "aged-out" events, but recordAdminAuditEvent also archives events dropped due to auditEventLimit trimming (even if auditRetention is disabled). Either update the option comment to reflect both behaviors (age + count eviction) or change archival to only run for age-based pruning.
| firstKeep := 0 | ||
| for firstKeep < len(tm.auditEvents) && tm.auditEvents[firstKeep].At.Before(cutoff) { | ||
| firstKeep++ | ||
| } | ||
|
|
||
| if firstKeep == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| archived := make([]AdminAuditEvent, firstKeep) | ||
| copy(archived, tm.auditEvents[:firstKeep]) | ||
|
|
||
| if firstKeep >= len(tm.auditEvents) { | ||
| tm.auditEvents = tm.auditEvents[:0] | ||
|
|
||
| return archived | ||
| } | ||
|
|
||
| tm.auditEvents = tm.auditEvents[firstKeep:] | ||
|
|
||
| return archived | ||
| } | ||
|
|
There was a problem hiding this comment.
pruneAuditEventsByAgeLocked assumes tm.auditEvents are ordered oldest->newest (it only advances firstKeep from the start). Since AdminRecordAuditEvent allows callers to provide a non-zero event.At, out-of-order timestamps can leave expired events in memory and prevent archival. Consider partitioning the slice by cutoff (scan all events and keep those >= cutoff) or inserting/sorting events by At before pruning.
| firstKeep := 0 | |
| for firstKeep < len(tm.auditEvents) && tm.auditEvents[firstKeep].At.Before(cutoff) { | |
| firstKeep++ | |
| } | |
| if firstKeep == 0 { | |
| return nil | |
| } | |
| archived := make([]AdminAuditEvent, firstKeep) | |
| copy(archived, tm.auditEvents[:firstKeep]) | |
| if firstKeep >= len(tm.auditEvents) { | |
| tm.auditEvents = tm.auditEvents[:0] | |
| return archived | |
| } | |
| tm.auditEvents = tm.auditEvents[firstKeep:] | |
| return archived | |
| } | |
| archived := make([]AdminAuditEvent, 0, len(tm.auditEvents)) | |
| kept := make([]AdminAuditEvent, 0, len(tm.auditEvents)) | |
| for _, event := range tm.auditEvents { | |
| if event.At.Before(cutoff) { | |
| archived = append(archived, event) | |
| } else { | |
| kept = append(kept, event) | |
| } | |
| } | |
| if len(archived) == 0 { | |
| // Nothing to archive; leave tm.auditEvents unchanged. | |
| return nil | |
| } | |
| // Replace in-memory events with the ones we are keeping. | |
| tm.auditEvents = kept | |
| return archived | |
| } |
| tm.auditEvents = append(tm.auditEvents, event) | ||
|
|
||
| archived := tm.pruneAuditEventsByAgeLocked(time.Now()) | ||
| if tm.auditEventLimit > 0 && len(tm.auditEvents) > tm.auditEventLimit { | ||
| tm.auditEvents = tm.auditEvents[len(tm.auditEvents)-tm.auditEventLimit:] | ||
| dropCount := len(tm.auditEvents) - tm.auditEventLimit | ||
| dropped := make([]AdminAuditEvent, dropCount) | ||
| copy(dropped, tm.auditEvents[:dropCount]) | ||
| archived = append(archived, dropped...) | ||
| tm.auditEvents = tm.auditEvents[dropCount:] | ||
| } | ||
|
|
||
| tm.auditEventsMu.Unlock() | ||
|
|
||
| tm.archiveAuditEvents(archived) | ||
| tm.persistAdminAuditEvent(ctx, event) |
There was a problem hiding this comment.
Age-based pruning/archival currently only happens when recording a new audit event (pruneAuditEventsByAgeLocked is called from recordAdminAuditEvent). If audit activity stops, events can age out but remain in memory indefinitely and never get archived (they're only filtered on reads). Consider adding a periodic pruning loop (similar to the task retention loop) or pruning+archiving during AdminAuditEvents reads when retention is enabled.
- Sanitize incoming request_id header + log fields; regenerate UUID when invalid - Make job runner tarball URL parsing/validation context-aware and enforce allowed hosts/schemes via sectvalidate - Add #nosec annotations for vetted metric/SSE/JSON writes and exec usage - Add sectvalidate to cspell dictionary
No description provided.