Conversation
Parse Prometheus text format from proxy /metrics endpoints using expfmt, store snapshots in per-instance ring buffers (~1h retention at 10s intervals), compute per-second rates from counter deltas (handling resets), compute p50/p95/p99 from histogram buckets via linear interpolation, and expose results via REST API (GET /api/metrics/{id}, GET /api/metrics/fleet).
New admin/metrics package: parser, ring buffer, rate/percentile math, and Collector orchestrator. Poller extended to scrape /metrics on successful health probes. Fleet endpoint aggregates KPIs across instances with trend indicators vs 1h ago.
…l, and vendor charts - Add ECharts + vue-echarts with tree-shaken imports (line chart, grid, tooltip, legend, data zoom) - Add metrics formatting utilities (formatRps, formatLatency, formatErrorRate, formatCount, trendDirection) with 25 tests - Add metrics Pinia store for fleet and instance metrics with 6 tests - Add FleetKpiPanel with 4 KPI cards (RPS, error rate, connections, panics) and trend indicators - Add instance detail view at /instances/:id with breadcrumb, status header, and Overview/Traffic tabs - Overview tab: 5 KPI cards + P50/P95/P99 latency time-series chart - Traffic tab: interactive vendor table with checkboxes, color dots, totals row; 3 stacked charts (RPS, Latency, Error Rate) per selected vendor - Latency chart shows P50/P95/P99 for single vendor, P99-only for multiple (per design spec) - Auto-select top 3 vendors by RPS, prune stale selections when vendor list changes - Make instance cards and table rows clickable to navigate to detail view - Add P50/P95 fields to SeriesPoint in Go backend for overview latency chart - Escape HTML in ECharts tooltip formatters to prevent XSS via vendor IDs - Add tabpanel ARIA roles for accessibility
Add useAnimatedValue composable that transitions numeric values over 400ms using requestAnimationFrame with ease-out cubic easing. Applied to fleet and instance KPI cards so values animate smoothly instead of jumping on each 10s poll cycle.
qarlosh
left a comment
There was a problem hiding this comment.
Solid work overall — clean architecture, comprehensive tests, and good coverage of the LITE-33864 and LITE-33865 requirements. A few items below to address before merge.
| function buildBaseOption() { | ||
| return { | ||
| grid: { top: 16, right: 16, bottom: 24, left: 56 }, | ||
| xAxis: { |
There was a problem hiding this comment.
Functional gap vs LITE-33865 spec: The task description calls for "three stacked time-aligned line charts (RPS, Latency, Error Rate) with shared time-range selector". I see DataZoomComponent is imported in chart-setup.js, but none of the chart options here include a dataZoom configuration. Users can't zoom into a time window across the three charts.
Please add the shared time-range selector, or we can track this as a fast follow-up if you prefer — but it is explicitly in the acceptance criteria.
| return s | ||
| .replace(/&/g, '&') | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') |
There was a problem hiding this comment.
Minor — defense in depth: This escapes & < > " but not single quotes (' → '). Not exploitable in current usage since vendor IDs go into text content, not single-quote-delimited attributes. But it's cheap to add and prevents surprises if this helper is reused elsewhere.
|
|
||
| oldRPS := counterRate(prev.totalRequests(), curr.totalRequests(), dt) | ||
| rpsTrend := im.RPS - oldRPS | ||
| im.RPSTrend = &rpsTrend |
There was a problem hiding this comment.
Code duplication: fillTrends and trendSnapshot (starting at line 315) share ~15 lines of identical logic — buffer length check, 50-minute gate, findNearest, prev/curr pair extraction. Consider extracting a shared helper like:
func historicalPair(buf *Ring) (prev, curr Snapshot, ok bool)This would remove the duplication and make the trend calculation easier to maintain if the 1h window logic ever changes.
| if !ok || buf.Len() == 0 { | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Potential dead code: GetInstanceSummary is exported and tested, but no handler or poller actually calls it. The fleet endpoint uses GetFleetMetrics → summarizeForFleet internally (which re-implements the same summary logic), and the instance endpoint uses GetInstanceMetrics.
Is this intentional as a forward-looking public API? If so, consider having summarizeForFleet delegate to it to avoid duplicating the RPS/ErrorRate/P99 computation in two places. If not, it can be removed.
No description provided.