Feat: Observability Enhancements#19
Merged
Merged
Conversation
Persist the producer span context in the event_outbox.traceparent column, re-activate it in the relay (so enqueue rejoins the request trace), and inject traceparent into the Redis stream message so the worker can continue the trace.
Add an OTel tracer (worker/utils/tracing.py), extract the producer context in consume() and start worker.consume as a child-with-link, and span the pipeline stages (dispatch, download, dedup, image variants, ffmpeg). Also wire the previously-uncalled init_metrics and record job/asset/queue metrics so the worker SLIs have data.
Stamp trace_id/span_id onto API request logs (TracingMiddleware now runs before the logger) and worker logs, and broaden the Loki derived-field regex to link log lines to their Tempo trace across formats.
Read the chi route pattern after routing (http_route was always 'unknown'), cut the metric export interval to 15s, add finer histogram buckets for the queue-lag SLI, and export DB connection-pool gauges via db.Stats().
Add the loadtest overlay (CPU/mem pinning + full sampling), Prometheus SLO recording rules, remote-write receiver and exemplar storage, and pin Tempo to 2.6.1 (latest had an incompatible config schema).
Move the dashboard provider config into the dashboards provisioning dir (it was misplaced under datasources/, so no dashboards ever loaded) and repair the legacy metrics dashboard's datasource binding and stale metric names.
API RED, worker/app saturation (USE), pipeline funnel, queue health, and a consolidated experiment overview combining k6 client load with server-side pipeline, worker, queue and DB metrics.
Closed- and open-model scripts running the real presign→upload→complete client flow with per-iteration unique bytes (dedup defeat), SLO-mapped thresholds, a host-run wrapper, and Prometheus remote-write of client metrics.
First bottleneck-analysis writeup: single-threaded worker saturates at ~1.1 jobs/s while the API stays idle, motivating Track 1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.