fix: Tempo search API returns incorrect startTimeUnixNano and overflows durationMs#153
fix: Tempo search API returns incorrect startTimeUnixNano and overflows durationMs#153szibis wants to merge 2 commits into
Conversation
…ws durationMs When a span row lacks the start_time_unix_nano field (e.g., trace index rows), the local variable defaults to 0. min(MaxInt64, 0) produces 0, corrupting the trace summary's start time. This causes durationMs to equal endTimeUnixNano/1e6 (~1.78 trillion), which overflows Grafana's uint32 durationMs field with: "json: cannot unmarshal number into Go value of type uint32". Fix: only update startTimeUnixNano/endTimeUnixNano when the field is actually present in the span row. Also quote startTimeUnixNano as a string in the JSON response to match the Tempo API specification (Grafana Tempo returns startTimeUnixNano as a quoted string, not a bare integer). Verified against Grafana Tempo's protobuf spec (tempo.proto) and HTTP API documentation examples.
Tests cover: - Single span with all fields - Span missing start_time_unix_nano (the bug trigger) - All spans missing start time (sentinel fallback to 0) - Multiple traces - Missing trace_id error - Root span identification - durationMs fits uint32 (Tempo API compat) - JSON response format (startTimeUnixNano quoted as string)
|
Hello. While checking the issue, do you have any context why the
I'm testing with OTel demo as input, and it seems working just fine. Is it missing in your source input? And what query argument you sent to the search API (e.g. could you provide a
What does the "trace index rows" mean here? The internal index in VictoriaTraces should be excluded by While this pull request helps avoid edge cases, I'm trying to understand how it happened in the first place: Is it due to incorrect input data, or is it because VictoriaTraces incorrectly transformed the data and start timestamp? |
|
After re-check I think we will merge this patch, as it adds boundary checks to improve query responsiveness. We just need a bit more context on how the issue was triggered, in case we overlooked anything in the ingestion flow. Thank you! |
I have reached these errors in some Docker Compose tests. I need to find where exactly this happened. I did not get a screenshot, but the error was coming from the main search page for Tempo datasource with VT behind, and I chose some service and push search, which appears with the mentioned error. Also, I may reach some edge case with some specific testing data. |
|
I see. It would be great if you can:
And you can find example on our playground: here. Again the main purpose is to pinpoint the cause: whether it happened during ingestion, or query time. I would like to leave it open for now and wait for more information. But the response should be safeguarded via #168. |
fix #153 safeguard duration calculation for the Tempo search API when start or end time is missing. Previously, the value could exceed the int32 range and cause display errors.
Summary
The Tempo
/api/searchendpoint returnsstartTimeUnixNano: 0for all traces, causingdurationMsto overflow Grafana'suint32field:json: cannot unmarshal number 1778794580917 into Go value of type uint32Root cause: When a span row lacks
start_time_unix_nano(e.g., trace index rows), the local variable defaults to 0.min(MaxInt64, 0)produces 0, makingdurationMs = endTimeUnixNano / 1e6(~1.78 trillion) — overflowsuint32 durationMsfrom Tempo's protobuf spec.Changes
tempo.go: Only update start/end times when field is present in span rowtempo.qtpl: QuotestartTimeUnixNanoas string per Tempo API spec (Grafana Tempo returns"1684778327699392724"not bare integer)tempo_test.go: 9 tests covering the fix, durationMs uint32 compat, and JSON formatBefore / After
startTimeUnixNano0(integer)"1778797098262567591"(string)durationMs1778797098296(overflow)32(correct)Verification
go build ./...passes,go test ./app/vtselect/...passes (9 new tests + 7 existing)Summary by cubic
Fixes Tempo
/api/searchreturningstartTimeUnixNano: 0and overflowingdurationMs, which broke Grafana parsing. Now start/end times are handled only when present andstartTimeUnixNanois emitted as a string for API compatibility.min(MaxInt64, 0)corrupting start time."startTimeUnixNano"as a quoted string per Tempo API; keepsdurationMswithinuint32and fixes Grafana unmarshalling.math.MaxInt64to0before response.uint32duration bounds, and JSON format.Written for commit 8f7e535. Summary will update on new commits.