Metrics dashboard by victor-cuevas · Pull Request #72 · cloudblue/chaperone

victor-cuevas · 2026-03-19T16:51:00Z

No description provided.

Parse Prometheus text format from proxy /metrics endpoints using expfmt, store snapshots in per-instance ring buffers (~1h retention at 10s intervals), compute per-second rates from counter deltas (handling resets), compute p50/p95/p99 from histogram buckets via linear interpolation, and expose results via REST API (GET /api/metrics/{id}, GET /api/metrics/fleet). New admin/metrics package: parser, ring buffer, rate/percentile math, and Collector orchestrator. Poller extended to scrape /metrics on successful health probes. Fleet endpoint aggregates KPIs across instances with trend indicators vs 1h ago.

…l, and vendor charts - Add ECharts + vue-echarts with tree-shaken imports (line chart, grid, tooltip, legend, data zoom) - Add metrics formatting utilities (formatRps, formatLatency, formatErrorRate, formatCount, trendDirection) with 25 tests - Add metrics Pinia store for fleet and instance metrics with 6 tests - Add FleetKpiPanel with 4 KPI cards (RPS, error rate, connections, panics) and trend indicators - Add instance detail view at /instances/:id with breadcrumb, status header, and Overview/Traffic tabs - Overview tab: 5 KPI cards + P50/P95/P99 latency time-series chart - Traffic tab: interactive vendor table with checkboxes, color dots, totals row; 3 stacked charts (RPS, Latency, Error Rate) per selected vendor - Latency chart shows P50/P95/P99 for single vendor, P99-only for multiple (per design spec) - Auto-select top 3 vendors by RPS, prune stale selections when vendor list changes - Make instance cards and table rows clickable to navigate to detail view - Add P50/P95 fields to SeriesPoint in Go backend for overview latency chart - Escape HTML in ECharts tooltip formatters to prevent XSS via vendor IDs - Add tabpanel ARIA roles for accessibility

Add useAnimatedValue composable that transitions numeric values over 400ms using requestAnimationFrame with ease-out cubic easing. Applied to fleet and instance KPI cards so values animate smoothly instead of jumping on each 10s poll cycle.

qarlosh

Solid work overall — clean architecture, comprehensive tests, and good coverage of the LITE-33864 and LITE-33865 requirements. A few items below to address before merge.

qarlosh · 2026-03-27T12:01:42Z

admin/ui/src/components/TrafficTab.vue

+function buildBaseOption() {
+	return {
+		grid: { top: 16, right: 16, bottom: 24, left: 56 },
+		xAxis: {


Functional gap vs LITE-33865 spec: The task description calls for "three stacked time-aligned line charts (RPS, Latency, Error Rate) with shared time-range selector". I see DataZoomComponent is imported in chart-setup.js, but none of the chart options here include a dataZoom configuration. Users can't zoom into a time window across the three charts.

Please add the shared time-range selector, or we can track this as a fast follow-up if you prefer — but it is explicitly in the acceptance criteria.

qarlosh · 2026-03-27T12:01:51Z

admin/ui/src/utils/html.js

+	return s
+		.replace(/&/g, '&amp;')
+		.replace(/</g, '&lt;')
+		.replace(/>/g, '&gt;')


Minor — defense in depth: This escapes & < > " but not single quotes (' → '). Not exploitable in current usage since vendor IDs go into text content, not single-quote-delimited attributes. But it's cheap to add and prevents surprises if this helper is reused elsewhere.

qarlosh · 2026-03-27T12:01:58Z

admin/metrics/collector.go

+
+	oldRPS := counterRate(prev.totalRequests(), curr.totalRequests(), dt)
+	rpsTrend := im.RPS - oldRPS
+	im.RPSTrend = &rpsTrend


Code duplication: fillTrends and trendSnapshot (starting at line 315) share ~15 lines of identical logic — buffer length check, 50-minute gate, findNearest, prev/curr pair extraction. Consider extracting a shared helper like:

func historicalPair(buf *Ring) (prev, curr Snapshot, ok bool)

This would remove the duplication and make the trend calculation easier to maintain if the 1h window logic ever changes.

qarlosh · 2026-03-27T12:02:07Z

admin/metrics/collector.go

+	if !ok || buf.Len() == 0 {
+		return nil
+	}
+


Potential dead code: GetInstanceSummary is exported and tested, but no handler or poller actually calls it. The fleet endpoint uses GetFleetMetrics → summarizeForFleet internally (which re-implements the same summary logic), and the instance endpoint uses GetInstanceMetrics.

Is this intentional as a forward-looking public API? If so, consider having summarizeForFleet delegate to it to avoid duplicating the RPS/ErrorRate/P99 computation in two places. If not, it can be removed.

victor-cuevas added 3 commits March 19, 2026 17:46

victor-cuevas marked this pull request as ready for review March 25, 2026 16:29

victor-cuevas requested a review from qarlosh March 25, 2026 16:30

This comment was marked as duplicate.

Sign in to view

qarlosh requested changes Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics dashboard#72

Metrics dashboard#72
victor-cuevas wants to merge 3 commits intofeat/admin-portalfrom
feat/admin-metrics

victor-cuevas commented Mar 19, 2026

Uh oh!

This comment was marked as duplicate.

Uh oh!

qarlosh left a comment

Uh oh!

qarlosh Mar 27, 2026

Uh oh!

qarlosh Mar 27, 2026

Uh oh!

qarlosh Mar 27, 2026

Uh oh!

qarlosh Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

victor-cuevas commented Mar 19, 2026

Uh oh!

This comment was marked as duplicate.

Uh oh!

qarlosh left a comment

Choose a reason for hiding this comment

Uh oh!

qarlosh Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

qarlosh Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

qarlosh Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

qarlosh Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants