CCM-33842: Capture LiteLLM token usage in spans#6
Open
shreyas70 wants to merge 4 commits into
Open
Conversation
Copy response usage metadata onto the SDK-owned LiteLLM span before ending it so OTLP exports include input and output token attributes. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Emit canonical provider, response metadata, and dotted cache/reasoning token attributes on LiteLLM spans. Co-authored-by: Cursor <cursoragent@cursor.com>
t-santoshsahu
previously approved these changes
Jun 25, 2026
Rely on the SDK wrapper for LiteLLM response enrichment so exported spans do not retain the legacy gen_ai.system attribute. Co-authored-by: Cursor <cursoragent@cursor.com>
| import litellm # pylint: disable=import-outside-toplevel | ||
|
|
||
| otel_logger = _get_otel_logger() | ||
| _register_otel_callback(otel_logger) |
Collaborator
There was a problem hiding this comment.
Can you test this, I guess this is required for the instrumentation.
Drop the stale local variable left after removing LiteLLM callback registration. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gen_ai.usage.input_tokens,gen_ai.usage.output_tokens, andgen_ai.usage.total_tokens.gen_ai.provider.nameinstead ofgen_ai.system.gen_ai.response.model,gen_ai.response.id, andgen_ai.response.finish_reasons.gen_ai.usage.cache_read.input_tokens,gen_ai.usage.cache_creation.input_tokens, andgen_ai.usage.reasoning.output_tokens.gen_ai.systemand duplicates response attributes after the wrapper now enriches the span directly.Why
llm-model-serviceBedrock/LiteLLM responses include token usage, but exportedlitellm_requestspans showedinput_tokens=0andoutput_tokens=0because LiteLLM response enrichment happened after the wrapper-owned span had already ended. Copying response metadata beforespan.end()makes non-streaming LiteLLM spans usable for cost attribution.Out of scope
deployment.environment.namechanges.Test plan
.venv/bin/python -m pytest test/instrumentation/litellm/litellm_instrumentation_test.pyResult:
7 passed.