Skip to content

CCM-33842: Capture LiteLLM token usage in spans#6

Open
shreyas70 wants to merge 4 commits into
harness:mainfrom
shreyas70:CCM-33842-litellm-token-usage
Open

CCM-33842: Capture LiteLLM token usage in spans#6
shreyas70 wants to merge 4 commits into
harness:mainfrom
shreyas70:CCM-33842-litellm-token-usage

Conversation

@shreyas70

@shreyas70 shreyas70 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Capture LiteLLM response usage on the SDK-owned span before it ends so UDP ingest receives gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.usage.total_tokens.
  • Normalize LiteLLM GenAI semantic convention fields by emitting gen_ai.provider.name instead of gen_ai.system.
  • Add LiteLLM response metadata before span end: gen_ai.response.model, gen_ai.response.id, and gen_ai.response.finish_reasons.
  • Emit dotted cache/reasoning token attributes when available: gen_ai.usage.cache_read.input_tokens, gen_ai.usage.cache_creation.input_tokens, and gen_ai.usage.reasoning.output_tokens.
  • Do not register LiteLLM's own OTEL callback from the SDK wrapper, since it adds legacy gen_ai.system and duplicates response attributes after the wrapper now enriches the span directly.

Why

llm-model-service Bedrock/LiteLLM responses include token usage, but exported litellm_request spans showed input_tokens=0 and output_tokens=0 because LiteLLM response enrichment happened after the wrapper-owned span had already ended. Copying response metadata before span.end() makes non-streaming LiteLLM spans usable for cost attribution.

Out of scope

  • Streaming usage accumulation.
  • Tenant/user allocation attributes.
  • Resource-level deployment.environment.name changes.

Test plan

  • .venv/bin/python -m pytest test/instrumentation/litellm/litellm_instrumentation_test.py

Result: 7 passed.

Copy response usage metadata onto the SDK-owned LiteLLM span before ending it so OTLP exports include input and output token attributes.

Co-authored-by: Cursor <cursoragent@cursor.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Emit canonical provider, response metadata, and dotted cache/reasoning token attributes on LiteLLM spans.

Co-authored-by: Cursor <cursoragent@cursor.com>
t-santoshsahu
t-santoshsahu previously approved these changes Jun 25, 2026
Rely on the SDK wrapper for LiteLLM response enrichment so exported spans do not retain the legacy gen_ai.system attribute.

Co-authored-by: Cursor <cursoragent@cursor.com>
import litellm # pylint: disable=import-outside-toplevel

otel_logger = _get_otel_logger()
_register_otel_callback(otel_logger)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test this, I guess this is required for the instrumentation.

Drop the stale local variable left after removing LiteLLM callback registration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants