Skip to content

Metrics dashboard#72

Open
victor-cuevas wants to merge 3 commits intofeat/admin-portalfrom
feat/admin-metrics
Open

Metrics dashboard#72
victor-cuevas wants to merge 3 commits intofeat/admin-portalfrom
feat/admin-metrics

Conversation

@victor-cuevas
Copy link
Copy Markdown

No description provided.

Parse Prometheus text format from proxy /metrics endpoints using expfmt, store snapshots in per-instance ring buffers (~1h retention at 10s intervals), compute per-second rates from counter deltas (handling resets), compute p50/p95/p99 from histogram buckets via linear interpolation, and expose results via REST API (GET /api/metrics/{id}, GET /api/metrics/fleet).

New admin/metrics package: parser, ring buffer, rate/percentile math, and Collector orchestrator. Poller extended to scrape /metrics on successful health probes. Fleet endpoint aggregates KPIs across instances with trend indicators vs 1h ago.
…l, and vendor charts

- Add ECharts + vue-echarts with tree-shaken imports (line chart, grid, tooltip, legend, data zoom)
- Add metrics formatting utilities (formatRps, formatLatency, formatErrorRate, formatCount, trendDirection) with 25 tests
- Add metrics Pinia store for fleet and instance metrics with 6 tests
- Add FleetKpiPanel with 4 KPI cards (RPS, error rate, connections, panics) and trend indicators
- Add instance detail view at /instances/:id with breadcrumb, status header, and Overview/Traffic tabs
- Overview tab: 5 KPI cards + P50/P95/P99 latency time-series chart
- Traffic tab: interactive vendor table with checkboxes, color dots, totals row; 3 stacked charts (RPS, Latency, Error Rate) per selected vendor
- Latency chart shows P50/P95/P99 for single vendor, P99-only for multiple (per design spec)
- Auto-select top 3 vendors by RPS, prune stale selections when vendor list changes
- Make instance cards and table rows clickable to navigate to detail view
- Add P50/P95 fields to SeriesPoint in Go backend for overview latency chart
- Escape HTML in ECharts tooltip formatters to prevent XSS via vendor IDs
- Add tabpanel ARIA roles for accessibility
Add useAnimatedValue composable that transitions numeric values over
400ms using requestAnimationFrame with ease-out cubic easing. Applied
to fleet and instance KPI cards so values animate smoothly instead of
jumping on each 10s poll cycle.
@victor-cuevas victor-cuevas marked this pull request as ready for review March 25, 2026 16:29
@victor-cuevas victor-cuevas requested a review from qarlosh March 25, 2026 16:30
qarlosh

This comment was marked as duplicate.

Copy link
Copy Markdown
Collaborator

@qarlosh qarlosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work overall — clean architecture, comprehensive tests, and good coverage of the LITE-33864 and LITE-33865 requirements. A few items below to address before merge.

function buildBaseOption() {
return {
grid: { top: 16, right: 16, bottom: 24, left: 56 },
xAxis: {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional gap vs LITE-33865 spec: The task description calls for "three stacked time-aligned line charts (RPS, Latency, Error Rate) with shared time-range selector". I see DataZoomComponent is imported in chart-setup.js, but none of the chart options here include a dataZoom configuration. Users can't zoom into a time window across the three charts.

Please add the shared time-range selector, or we can track this as a fast follow-up if you prefer — but it is explicitly in the acceptance criteria.

return s
.replace(/&/g, '&')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — defense in depth: This escapes & < > " but not single quotes ('&#39;). Not exploitable in current usage since vendor IDs go into text content, not single-quote-delimited attributes. But it's cheap to add and prevents surprises if this helper is reused elsewhere.


oldRPS := counterRate(prev.totalRequests(), curr.totalRequests(), dt)
rpsTrend := im.RPS - oldRPS
im.RPSTrend = &rpsTrend
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication: fillTrends and trendSnapshot (starting at line 315) share ~15 lines of identical logic — buffer length check, 50-minute gate, findNearest, prev/curr pair extraction. Consider extracting a shared helper like:

func historicalPair(buf *Ring) (prev, curr Snapshot, ok bool)

This would remove the duplication and make the trend calculation easier to maintain if the 1h window logic ever changes.

if !ok || buf.Len() == 0 {
return nil
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential dead code: GetInstanceSummary is exported and tested, but no handler or poller actually calls it. The fleet endpoint uses GetFleetMetricssummarizeForFleet internally (which re-implements the same summary logic), and the instance endpoint uses GetInstanceMetrics.

Is this intentional as a forward-looking public API? If so, consider having summarizeForFleet delegate to it to avoid duplicating the RPS/ErrorRate/P99 computation in two places. If not, it can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants