Skip to content

fix: Tempo search API returns incorrect startTimeUnixNano and overflows durationMs#153

Draft
szibis wants to merge 2 commits into
VictoriaMetrics:masterfrom
szibis:fix/tempo-search-starttime-duration
Draft

fix: Tempo search API returns incorrect startTimeUnixNano and overflows durationMs#153
szibis wants to merge 2 commits into
VictoriaMetrics:masterfrom
szibis:fix/tempo-search-starttime-duration

Conversation

@szibis

@szibis szibis commented May 14, 2026

Copy link
Copy Markdown

Summary

The Tempo /api/search endpoint returns startTimeUnixNano: 0 for all traces, causing durationMs to overflow Grafana's uint32 field:
json: cannot unmarshal number 1778794580917 into Go value of type uint32

Root cause: When a span row lacks start_time_unix_nano (e.g., trace index rows), the local variable defaults to 0. min(MaxInt64, 0) produces 0, making durationMs = endTimeUnixNano / 1e6 (~1.78 trillion) — overflows uint32 durationMs from Tempo's protobuf spec.

Changes

  • tempo.go: Only update start/end times when field is present in span row
  • tempo.qtpl: Quote startTimeUnixNano as string per Tempo API spec (Grafana Tempo returns "1684778327699392724" not bare integer)
  • tempo_test.go: 9 tests covering the fix, durationMs uint32 compat, and JSON format

Before / After

Field Before After
startTimeUnixNano 0 (integer) "1778797098262567591" (string)
durationMs 1778797098296 (overflow) 32 (correct)

Verification

  • Verified against Tempo API docs and tempo.proto
  • go build ./... passes, go test ./app/vtselect/... passes (9 new tests + 7 existing)
  • Tested live with Grafana Tempo datasource + VT v0.8.2

Summary by cubic

Fixes Tempo /api/search returning startTimeUnixNano: 0 and overflowing durationMs, which broke Grafana parsing. Now start/end times are handled only when present and startTimeUnixNano is emitted as a string for API compatibility.

  • Bug Fixes
    • Update start/end only when the field exists; avoid min(MaxInt64, 0) corrupting start time.
    • Emit "startTimeUnixNano" as a quoted string per Tempo API; keeps durationMs within uint32 and fixes Grafana unmarshalling.
    • Normalize missing starts: convert sentinel math.MaxInt64 to 0 before response.
    • Added tests covering the fix, uint32 duration bounds, and JSON format.

Written for commit 8f7e535. Summary will update on new commits.

szibis added 2 commits May 15, 2026 00:24
…ws durationMs

When a span row lacks the start_time_unix_nano field (e.g., trace index
rows), the local variable defaults to 0. min(MaxInt64, 0) produces 0,
corrupting the trace summary's start time. This causes durationMs to
equal endTimeUnixNano/1e6 (~1.78 trillion), which overflows Grafana's
uint32 durationMs field with: "json: cannot unmarshal number into Go
value of type uint32".

Fix: only update startTimeUnixNano/endTimeUnixNano when the field is
actually present in the span row. Also quote startTimeUnixNano as a
string in the JSON response to match the Tempo API specification
(Grafana Tempo returns startTimeUnixNano as a quoted string, not a bare
integer).

Verified against Grafana Tempo's protobuf spec (tempo.proto) and HTTP
API documentation examples.
Tests cover:
- Single span with all fields
- Span missing start_time_unix_nano (the bug trigger)
- All spans missing start time (sentinel fallback to 0)
- Multiple traces
- Missing trace_id error
- Root span identification
- durationMs fits uint32 (Tempo API compat)
- JSON response format (startTimeUnixNano quoted as string)

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

@szibis szibis mentioned this pull request May 15, 2026
2 tasks
@jiekun jiekun self-assigned this May 19, 2026
@jiekun

jiekun commented May 25, 2026

Copy link
Copy Markdown
Member

Hello. While checking the issue, do you have any context why the startTimeUnixNano is missing for a span?

The Tempo /api/search endpoint returns startTimeUnixNano: 0 for all traces

I'm testing with OTel demo as input, and it seems working just fine. Is it missing in your source input? And what query argument you sent to the search API (e.g. could you provide a curl example)?

When a span row lacks start_time_unix_nano (e.g., trace index rows), the local variable defaults to 0.

What does the "trace index rows" mean here? The internal index in VictoriaTraces should be excluded by {trace_id_idx_stream=""} in the LogsQL.

While this pull request helps avoid edge cases, I'm trying to understand how it happened in the first place: Is it due to incorrect input data, or is it because VictoriaTraces incorrectly transformed the data and start timestamp?

@jiekun

jiekun commented May 25, 2026

Copy link
Copy Markdown
Member

After re-check I think we will merge this patch, as it adds boundary checks to improve query responsiveness. We just need a bit more context on how the issue was triggered, in case we overlooked anything in the ingestion flow. Thank you!

@szibis

szibis commented May 26, 2026

Copy link
Copy Markdown
Author

After re-check I think we will merge this patch, as it adds boundary checks to improve query responsiveness. We just need a bit more context on how the issue was triggered, in case we overlooked anything in the ingestion flow. Thank you!

I have reached these errors in some Docker Compose tests. I need to find where exactly this happened. I did not get a screenshot, but the error was coming from the main search page for Tempo datasource with VT behind, and I chose some service and push search, which appears with the mentioned error. Also, I may reach some edge case with some specific testing data.

@jiekun

jiekun commented May 29, 2026

Copy link
Copy Markdown
Member

I see. It would be great if you can:

  • try to find an example trace (maybe with trace_id or something else).
  • and then try to lookup the full trace span in VMUI, by the following LogsQL:
trace_id:="<your_trace_id>" | fields trace_id, span_id, start_time_unix_nano, end_time_unix_nano

And you can find example on our playground: here.

Again the main purpose is to pinpoint the cause: whether it happened during ingestion, or query time.

I would like to leave it open for now and wait for more information. But the response should be safeguarded via #168.

jiekun added a commit that referenced this pull request May 29, 2026
fix #153

safeguard duration calculation for the Tempo search API when start or end time is missing. Previously, the value could exceed the int32 range and cause display errors.
@jiekun jiekun marked this pull request as draft May 29, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants