feat(cost): per-modality token breakdown in Python SDK (F3 Track C)#596
feat(cost): per-modality token breakdown in Python SDK (F3 Track C)#596john-weiler wants to merge 6 commits into
Conversation
Extends LlmMetrics with optional num_image_input_tokens / num_audio_input_tokens /
num_audio_output_tokens fields and threads them through the full ingestion path:
LlmMetrics model → logger.add_llm_span() → base_handler → span_params.
Extraction is implemented in two integrations:
LangChain (primary path):
- _extract_gemini_modality_breakdown() checks two surfaces:
1. message.usage_metadata.input_token_details / output_token_details
(langchain-google-genai >= 2.x with LangChain Core UsageMetadata)
2. message.response_metadata.prompt_tokens_details / candidates_tokens_details
(raw Gemini API ModalityTokenCount list forwarded by the adapter)
- Both sync and async handlers updated.
galileo-adk (native Gemini SDK path):
- _extract_usage_metadata() extended to walk prompt_tokens_details /
candidates_tokens_details (ModalityTokenCount list with .modality enum
or string) and bucket into image/audio counts.
- span_manager.end_llm() carries the 3 new fields through to end_node().
All other integrations (crewai, openai_agents, otel) unchanged — they go
through the OpenAI-compat path which drops modality breakdown (per RFC).
Tests: 4 new TestParseLlmResult cases + 6 new TestExtractUsageMetadata
ADK cases; all existing tests pass.
Part of F3 multimodal token/cost support.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #596 +/- ##
==========================================
+ Coverage 83.31% 83.99% +0.67%
==========================================
Files 125 135 +10
Lines 10659 11942 +1283
==========================================
+ Hits 8881 10031 +1150
- Misses 1778 1911 +133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| # Per-modality breakdown — only present on native Gemini SDK responses. | ||
| prompt_details = getattr(usage, "prompt_tokens_details", None) | ||
| candidates_details = getattr(usage, "candidates_tokens_details", None) | ||
| if prompt_details or candidates_details: | ||
| image_in = 0 | ||
| audio_in = 0 | ||
| audio_out = 0 | ||
| has_prompt = bool(prompt_details) | ||
| has_candidates = bool(candidates_details) | ||
| for entry in prompt_details or []: | ||
| modality_attr = getattr(entry, "modality", None) |
There was a problem hiding this comment.
_extract_usage_metadata duplicates the Gemini modality breakdown in ::_extract_gemini_modality_breakdown, should we extract a shared helper so both paths use the same logic?
Want Baz to fix this for you? Activate Fixer
| # Per-modality breakdown — None means "counted as text in the flat totals". | ||
| # Only populated for providers that return modality-level token counts (e.g. Gemini native). | ||
| num_image_input_tokens: None | Unset | int = UNSET | ||
| num_audio_input_tokens: None | Unset | int = UNSET | ||
| num_audio_output_tokens: None | Unset | int = UNSET |
There was a problem hiding this comment.
src/galileo/resources/ is regenerated from openapi.yaml, but LlmMetrics in the spec doesn't include the new modality fields, so the next regen will drop them and break src/galileo/logger/logger.py:1247-1255; should we move this into the OpenAPI source/generator input instead of patching the generated file?
Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
src/galileo/resources/models/llm_metrics.py around lines 30-34 and in the additions to
`LlmMetrics.to_dict()` and `LlmMetrics.from_dict()` (new fields
`num_image_input_tokens`, `num_audio_input_tokens`, `num_audio_output_tokens`), don’t
keep these changes in the generated client boundary. Instead, update the OpenAPI source
schema that generates `LlmMetrics` (likely openapi.yaml) to add these three properties
(type int, nullable/optional using the same conventions as the other token fields) so
regeneration won’t delete them. After updating the OpenAPI schema, remove the manual
edits from the generated llm_metrics.py and regenerate via
scripts/auto-generate-api-client.sh, then run the failing logger tests
(src/galileo/logger/logger.py around 1247-1255) to ensure there are no unexpected-kwarg
errors.
There was a problem hiding this comment.
We will regenerate once API changes are in ...
There was a problem hiding this comment.
One clarification on the premise: regeneration won't actually break logger.py:1247-1255. That code constructs galileo_core.schemas.logging.span.LlmMetrics (a pydantic model with model_config extra='allow'), not the edited attrs class in src/galileo/resources/models/llm_metrics.py. I verified the modality kwargs are accepted and serialized via model_dump() purely through extra='allow' — so the write/ingest path does not depend on this generated-file edit at all. The edit only affects the read-side models (LlmSpan, extended_llm_span_record, partial_extended_llm_span_record) that call LlmMetrics.from_dict(). So the regen concern is real for the read path, but the generated-file change is functionally unnecessary for emitting the new fields.
|
/astra review |
There was a problem hiding this comment.
⚠️ This review was generated by an AI agent (Astra) and may contain mistakes. Please verify all suggestions independently.
Verdict: request_changes — The core per-modality token threading is sound, but the bundled OpenAI multimodal-message path introduces a tool-linkage regression and ships with effectively no test coverage.
General Comments
- 🟡 minor (design): The new multimodal path makes
convert_to_galileo_messagereturn aLoggedMessagewhosecontentis a list ofTextContentBlock/DataContentBlock. These messages are then placed intoLlmSpan.input/output, whose declared type (galileo_coreMessage.content) isstr | list[ContentPart]. When the span is serialized viamodel_dump(), pydantic emits aPydanticSerializationUnexpectedValuewarning ("Expectedstr") for every multimodal message. I verified the block data does survive serialization, so this is not data loss, but it means aUserWarningis logged on every multimodal OpenAI call — noisy for a telemetry path that is meant to observe without interfering. Consider whethergalileo_coreneeds a content-block-aware span type, or suppress/handle the warning at the boundary.
Follow-ups
Suggested follow-up work that could be tracked as Shortcut stories:
galileo-adk/src/galileo_adk/observer.py:503-557:_extract_usage_metadata(ADK) and_extract_gemini_modality_breakdown(LangChain utils) implement the same Gemini per-modality walk with subtly different null-vs-zero semantics: ADK only setsimage_input_tokens/audio_input_tokenswhenprompt_tokens_detailsis present andaudio_output_tokenswhencandidates_tokens_detailsis present (otherwise the key is absent → None downstream), while the LangChain path always returns a 0-filled 3-tuple once any detail data exists. Consider extracting a shared helper so both paths agree on the null/zero contract (matches reviewer comment r3337251851).src/galileo/openai/extractors.py:221-267:_openai_content_parts_to_blocksre-implementstype→content-block conversion that already exists ingalileo/utils/serialization._convert_langchain_content_block. Consider reusing/sharing that helper to avoid drift (matches reviewer comment r3373853343).
fercor-cisco
left a comment
There was a problem hiding this comment.
Please check the Astra feedback and test failures.
…linkage, provider scope - galileo-adk/trace_builder.py: add image_input_tokens/audio_input_tokens/audio_output_tokens to TraceBuilder.add_llm_span() so the ADK path doesn't raise TypeError when GalileoBaseHandler calls through with these new kwargs; thread them into LlmMetrics - src/galileo/handlers/langchain/utils.py: fix Surface 1 has_detail_data guard to only activate when audio/image keys are actually present — previously any non-empty input_token_details (e.g. Anthropic cache_read, OpenAI reasoning) incorrectly returned (0,0,0) instead of (None,None,None) for providers that have no modality breakdown - src/galileo/openai/extractors.py: forward tool_calls and tool_call_id into LoggedMessage when content is a multimodal list; previously tool-call linkage was silently dropped for tool-role messages with array content Co-Authored-By: Claude <noreply@anthropic.com>
Mirrors the existing image_input_tokens/audio_* fields — threads image_output_tokens from LlmMetrics through LlmMetrics.to_dict/from_dict, GalileoLogger.add_llm_span/log_single_llm_call, GalileoBaseHandler.end_node, both LangChain handlers, _extract_gemini_modality_breakdown (Surface 1/2/3 output paths + candidates_tokens_details IMAGE loop), LLMEndResult, SpanManager.end_llm, TraceBuilder.add_llm_span, and GalileoObserver._extract_usage_metadata / on_llm_end so the ADK and LangChain paths can report image generation output tokens. Co-Authored-By: Claude <noreply@anthropic.com>
…modal tool linkage - test_langchain.py: 3 new TestParseLlmResult cases — image_output_tokens from surface 1 output_token_details, image_output_tokens from surface 2 candidates_tokens_details, and non-Gemini provider with only cache_read in input_token_details returns None not zero for all modality fields - test_observer.py: image_output_tokens from candidates_tokens_details IMAGE entry - test_openai_extractors.py: full coverage of _openai_content_parts_to_blocks (text, image data URI, image plain URL, input_audio, unrecognised type, mixed) and convert_to_galileo_message tool linkage (tool_call_id and tool_calls both preserved when content is a list of parts) Co-Authored-By: Claude <noreply@anthropic.com>
…rces/ Adds num_image_input_tokens, num_audio_input_tokens, num_audio_output_tokens, num_image_output_tokens to LlmMetrics in openapi.yaml so the generated src/galileo/resources/models/llm_metrics.py includes them natively — the previously hand-edited fields will now survive re-generation instead of being dropped on the next regen cycle. Co-Authored-By: Claude <noreply@anthropic.com>
User description
Summary
LlmMetricswithnum_image_input_tokens,num_audio_input_tokens,num_audio_output_tokens(optional, forward-compat safe)logger.add_llm_span()→base_handler→span_params_extract_gemini_modality_breakdown()checks three surfaces (priority order):message.usage_metadata.input_token_details/output_token_details(langchain-google-genai ≥ 2.x with LangChain CoreUsageMetadata)message.response_metadata.prompt_tokens_details/candidates_tokens_details(raw Gemini API list shape)message.response_metadata["usage_metadata"]nested dict — defensive fallback for providers that nest usage underresponse_metadata(added in response to review comment; covers forward-compat if upstreamlangchain-google-genaiadds modality info there)ChatGenerationor rawAIMessageinterchangeably_extract_usage_metadata()walks native SDKprompt_tokens_details/candidates_tokens_details(handles.modalityas enum or string);span_manager.end_llm()carries 3 new fields through toend_node()Known limitation — langchain-google-genai 4.2.4
Discovered during local E2E testing:
langchain-google-genai4.2.4 (latest) does not surface Gemini's per-modality breakdown on theAIMessage. The wrapper at chat_models.py:1263 hardcodes:…reading
prompt_token_countandcached_content_token_countfrom the Gemini response but droppingprompt_tokens_details(which contains the modality breakdown). Verified end-to-end:prompt_tokens_detailsreturned to consumergoogle-genaiSDK directly[(TEXT, 10), (AUDIO, 8)]✓langchain-google-genai→AIMessage.usage_metadata.input_token_details{"cache_read": 0}(modality stripped)langchain-google-genai→AIMessage.response_metadatafinish_reason,model_name,safety_ratings,model_providerUntil
langchain-google-genaiis patched upstream, the LangChain Gemini path will not produce per-modality cost data. The extractor in this PR is correct and will start producing data automatically once the upstream surfaces it. Per-modality cost continues to work for:galileo-adk(usesgoogle-genaiSDK directly — verified breakdown reaches the extractor)logger.add_llm_span(audio_input_tokens=…, …)callsPath forward: file an upstream PR against
langchain-google-genaito threadprompt_tokens_detailsthroughinput_token_details. Estimated ~10 lines.Test plan
TestParseLlmResultcases (Surface 1, Surface 2, text-only, precedence)TestParseLlmResultcases (Surface 3 nested-usage, Surface 1 wins over Surface 3)TestExtractUsageMetadataADK cases covering audio+image, text-only, empty candidates, enum-vs-string modalitygoogle-genaidirect +logger.add_llm_span(audio_input_tokens=…)Depends on runners
feat/f3-multimodal-cost-foundation(orbit) + runners PR #2480.🤖 Generated with Claude Code
Generated description
Below is a concise technical summary of the changes proposed in this PR:
Propagate per-modality token counts through
GalileoObserver/TraceBuilder, the LangChain result parser, and span handling soLlmMetricsand logged traces receive Gemini-native image/audio breakdowns. Convert OpenAI multimodal content parts intoLoggedMessageblocks to keep image/audio data and tool linkage intact in downstream ingestion.Modified files (1)
Latest Contributors(2)
TextContentBlock/DataContentBlocksequences and emitLoggedMessageinstances so the ingestion layer shows the actual image/audio parts while preserving tool-call metadata; covertests/test_openai_extractors.pyto guard the new parsing behavior.Modified files (2)
Latest Contributors(2)
GalileoObserverthroughTraceBuilder,SpanManager,GalileoLogger,GalileoBaseHandler, and the LangChain callbacks soLlmMetricsand end-node spans reflect Gemini image/audio usage while the LangChain parser (LLMEndResult) sources those counts from the three surfaces of its Gemini messages and returns deterministic zeros when detail lists are present; covertests/test_observer.pyandtests/test_langchain.pyto validate the new modality extraction logic.Modified files (10)
Latest Contributors(2)