feat(cost): per-modality token breakdown in Python SDK (F3 Track C) by john-weiler · Pull Request #596 · rungalileo/galileo-python

john-weiler · 2026-06-01T21:12:29Z

User description

Summary

Extends LlmMetrics with num_image_input_tokens, num_audio_input_tokens, num_audio_output_tokens (optional, forward-compat safe)
Threads through logger.add_llm_span() → base_handler → span_params
LangChain handler: _extract_gemini_modality_breakdown() checks three surfaces (priority order):
1. message.usage_metadata.input_token_details / output_token_details (langchain-google-genai ≥ 2.x with LangChain Core UsageMetadata)
2. message.response_metadata.prompt_tokens_details / candidates_tokens_details (raw Gemini API list shape)
3. message.response_metadata["usage_metadata"] nested dict — defensive fallback for providers that nest usage under response_metadata (added in response to review comment; covers forward-compat if upstream langchain-google-genai adds modality info there)
Sync + async handlers both updated; also handles a ChatGeneration or raw AIMessage interchangeably
galileo-adk: _extract_usage_metadata() walks native SDK prompt_tokens_details/candidates_tokens_details (handles .modality as enum or string); span_manager.end_llm() carries 3 new fields through to end_node()
crewai, openai_agents, otel: untouched — OpenAI-compat path drops breakdown (per RFC)

Known limitation — langchain-google-genai 4.2.4

Discovered during local E2E testing: langchain-google-genai 4.2.4 (latest) does not surface Gemini's per-modality breakdown on the AIMessage. The wrapper at chat_models.py:1263 hardcodes:

input_token_details={"cache_read": cache_read_tokens}

…reading prompt_token_count and cached_content_token_count from the Gemini response but dropping prompt_tokens_details (which contains the modality breakdown). Verified end-to-end:

Path	`prompt_tokens_details` returned to consumer
`google-genai` SDK directly	`[(TEXT, 10), (AUDIO, 8)]` ✓
`langchain-google-genai` → `AIMessage.usage_metadata.input_token_details`	`{"cache_read": 0}` (modality stripped)
`langchain-google-genai` → `AIMessage.response_metadata`	only `finish_reason`, `model_name`, `safety_ratings`, `model_provider`

Until langchain-google-genai is patched upstream, the LangChain Gemini path will not produce per-modality cost data. The extractor in this PR is correct and will start producing data automatically once the upstream surfaces it. Per-modality cost continues to work for:

galileo-adk (uses google-genai SDK directly — verified breakdown reaches the extractor)
Direct logger.add_llm_span(audio_input_tokens=…, …) calls
Vertex AI judge spans (covered in runners PR #2480)

Path forward: file an upstream PR against langchain-google-genai to thread prompt_tokens_details through input_token_details. Estimated ~10 lines.

Test plan

4 existing TestParseLlmResult cases (Surface 1, Surface 2, text-only, precedence)
2 new TestParseLlmResult cases (Surface 3 nested-usage, Surface 1 wins over Surface 3)
6 new TestExtractUsageMetadata ADK cases covering audio+image, text-only, empty candidates, enum-vs-string modality
All existing tests pass
Local E2E validated end-to-end via google-genai direct + logger.add_llm_span(audio_input_tokens=…)

Depends on runners feat/f3-multimodal-cost-foundation (orbit) + runners PR #2480.

🤖 Generated with Claude Code

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Propagate per-modality token counts through GalileoObserver/TraceBuilder, the LangChain result parser, and span handling so LlmMetrics and logged traces receive Gemini-native image/audio breakdowns. Convert OpenAI multimodal content parts into LoggedMessage blocks to keep image/audio data and tool linkage intact in downstream ingestion.

Topic Details

Other

Other files

Modified files (1)

galileo-adk/tests/test_observer.py

Latest Contributors(2)

User	Commit	Date
jaweiler@splunk.com	test: add coverage for...	June 23, 2026
jweiler@galileo.ai	fixes	June 08, 2026

OpenAI multimodal

Convert OpenAI list-based multimodal content into TextContentBlock/DataContentBlock sequences and emit LoggedMessage instances so the ingestion layer shows the actual image/audio parts while preserving tool-call metadata; cover tests/test_openai_extractors.py to guard the new parsing behavior.

Modified files (2)

src/galileo/openai/extractors.py
tests/test_openai_extractors.py

Latest Contributors(2)

User	Commit	Date
jaweiler@splunk.com	test: add coverage for...	June 23, 2026
jweiler@galileo.ai	fixes	June 08, 2026

Modal metrics flow

Connect the per-modality token fields from GalileoObserver through TraceBuilder, SpanManager, GalileoLogger, GalileoBaseHandler, and the LangChain callbacks so LlmMetrics and end-node spans reflect Gemini image/audio usage while the LangChain parser (LLMEndResult) sources those counts from the three surfaces of its Gemini messages and returns deterministic zeros when detail lists are present; cover tests/test_observer.py and tests/test_langchain.py to validate the new modality extraction logic.

Modified files (10)

galileo-adk/src/galileo_adk/observer.py
galileo-adk/src/galileo_adk/span_manager.py
galileo-adk/src/galileo_adk/trace_builder.py
src/galileo/handlers/base_handler.py
src/galileo/handlers/langchain/async_handler.py
src/galileo/handlers/langchain/handler.py
src/galileo/handlers/langchain/utils.py
src/galileo/logger/logger.py
src/galileo/resources/models/llm_metrics.py
tests/test_langchain.py

Latest Contributors(2)

User	Commit	Date
jaweiler@splunk.com	feat: add image_output...	June 23, 2026
jweiler@galileo.ai	feat(cost): per-modali...	June 01, 2026

_{Review this PR on Baz | Customize your next review}

Extends LlmMetrics with optional num_image_input_tokens / num_audio_input_tokens / num_audio_output_tokens fields and threads them through the full ingestion path: LlmMetrics model → logger.add_llm_span() → base_handler → span_params. Extraction is implemented in two integrations: LangChain (primary path): - _extract_gemini_modality_breakdown() checks two surfaces: 1. message.usage_metadata.input_token_details / output_token_details (langchain-google-genai >= 2.x with LangChain Core UsageMetadata) 2. message.response_metadata.prompt_tokens_details / candidates_tokens_details (raw Gemini API ModalityTokenCount list forwarded by the adapter) - Both sync and async handlers updated. galileo-adk (native Gemini SDK path): - _extract_usage_metadata() extended to walk prompt_tokens_details / candidates_tokens_details (ModalityTokenCount list with .modality enum or string) and bucket into image/audio counts. - span_manager.end_llm() carries the 3 new fields through to end_node(). All other integrations (crewai, openai_agents, otel) unchanged — they go through the OpenAI-compat path which drops modality breakdown (per RFC). Tests: 4 new TestParseLlmResult cases + 6 new TestExtractUsageMetadata ADK cases; all existing tests pass. Part of F3 multimodal token/cost support. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-06-01T21:14:55Z

Codecov Report

❌ Patch coverage is 92.54658% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.99%. Comparing base (f9feed6) to head (697487c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/galileo/handlers/langchain/utils.py	93.97%	5 Missing ⚠️
src/galileo/openai/extractors.py	87.17%	5 Missing ⚠️
galileo-adk/src/galileo_adk/observer.py	94.87%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #596      +/-   ##
==========================================
+ Coverage   83.31%   83.99%   +0.67%     
==========================================
  Files         125      135      +10     
  Lines       10659    11942    +1283     
==========================================
+ Hits         8881    10031    +1150     
- Misses       1778     1911     +133

Flag	Coverage Δ
galileo-adk	`88.98% <94.87%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

baz-reviewer · 2026-06-01T21:20:28Z


+        # Per-modality breakdown — only present on native Gemini SDK responses.
+        prompt_details = getattr(usage, "prompt_tokens_details", None)
+        candidates_details = getattr(usage, "candidates_tokens_details", None)
+        if prompt_details or candidates_details:
+            image_in = 0
+            audio_in = 0
+            audio_out = 0
+            has_prompt = bool(prompt_details)
+            has_candidates = bool(candidates_details)
+            for entry in prompt_details or []:
+                modality_attr = getattr(entry, "modality", None)


_extract_usage_metadata duplicates the Gemini modality breakdown in ::_extract_gemini_modality_breakdown, should we extract a shared helper so both paths use the same logic?

Want Baz to fix this for you? Activate Fixer

baz-reviewer · 2026-06-01T21:20:28Z

+    # Per-modality breakdown — None means "counted as text in the flat totals".
+    # Only populated for providers that return modality-level token counts (e.g. Gemini native).
+    num_image_input_tokens: None | Unset | int = UNSET
+    num_audio_input_tokens: None | Unset | int = UNSET
+    num_audio_output_tokens: None | Unset | int = UNSET


src/galileo/resources/ is regenerated from openapi.yaml, but LlmMetrics in the spec doesn't include the new modality fields, so the next regen will drop them and break src/galileo/logger/logger.py:1247-1255; should we move this into the OpenAPI source/generator input instead of patching the generated file?

Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In src/galileo/resources/models/llm_metrics.py around lines 30-34 and in the additions to `LlmMetrics.to_dict()` and `LlmMetrics.from_dict()` (new fields `num_image_input_tokens`, `num_audio_input_tokens`, `num_audio_output_tokens`), don’t keep these changes in the generated client boundary. Instead, update the OpenAPI source schema that generates `LlmMetrics` (likely openapi.yaml) to add these three properties (type int, nullable/optional using the same conventions as the other token fields) so regeneration won’t delete them. After updating the OpenAPI schema, remove the manual edits from the generated llm_metrics.py and regenerate via scripts/auto-generate-api-client.sh, then run the failing logger tests (src/galileo/logger/logger.py around 1247-1255) to ensure there are no unexpected-kwarg errors.

We will regenerate once API changes are in ...

One clarification on the premise: regeneration won't actually break logger.py:1247-1255. That code constructs galileo_core.schemas.logging.span.LlmMetrics (a pydantic model with model_config extra='allow'), not the edited attrs class in src/galileo/resources/models/llm_metrics.py. I verified the modality kwargs are accepted and serialized via model_dump() purely through extra='allow' — so the write/ingest path does not depend on this generated-file edit at all. The edit only affects the read-side models (LlmSpan, extended_llm_span_record, partial_extended_llm_span_record) that call LlmMetrics.from_dict(). So the regen concern is real for the read path, but the generated-file change is functionally unnecessary for emitting the new fields.

fercor-cisco · 2026-06-09T01:03:32Z

/astra review

galileo-astra

⚠️ This review was generated by an AI agent (Astra) and may contain mistakes. Please verify all suggestions independently.

Verdict: request_changes — The core per-modality token threading is sound, but the bundled OpenAI multimodal-message path introduces a tool-linkage regression and ships with effectively no test coverage.

General Comments

🟡 minor (design): The new multimodal path makes convert_to_galileo_message return a LoggedMessage whose content is a list of TextContentBlock/DataContentBlock. These messages are then placed into LlmSpan.input/output, whose declared type (galileo_core Message.content) is str | list[ContentPart]. When the span is serialized via model_dump(), pydantic emits a PydanticSerializationUnexpectedValue warning ("Expected str") for every multimodal message. I verified the block data does survive serialization, so this is not data loss, but it means a UserWarning is logged on every multimodal OpenAI call — noisy for a telemetry path that is meant to observe without interfering. Consider whether galileo_core needs a content-block-aware span type, or suppress/handle the warning at the boundary.

Follow-ups

Suggested follow-up work that could be tracked as Shortcut stories:

galileo-adk/src/galileo_adk/observer.py:503-557: _extract_usage_metadata (ADK) and _extract_gemini_modality_breakdown (LangChain utils) implement the same Gemini per-modality walk with subtly different null-vs-zero semantics: ADK only sets image_input_tokens/audio_input_tokens when prompt_tokens_details is present and audio_output_tokens when candidates_tokens_details is present (otherwise the key is absent → None downstream), while the LangChain path always returns a 0-filled 3-tuple once any detail data exists. Consider extracting a shared helper so both paths agree on the null/zero contract (matches reviewer comment r3337251851).
src/galileo/openai/extractors.py:221-267: _openai_content_parts_to_blocks re-implements type→content-block conversion that already exists in galileo/utils/serialization._convert_langchain_content_block. Consider reusing/sharing that helper to avoid drift (matches reviewer comment r3373853343).

fercor-cisco

Please check the Astra feedback and test failures.

john-weiler · 2026-06-23T14:29:51Z

https://app.shortcut.com/galileo/story/67258/wire-multimodal-tokens-into-galileo-python

…linkage, provider scope - galileo-adk/trace_builder.py: add image_input_tokens/audio_input_tokens/audio_output_tokens to TraceBuilder.add_llm_span() so the ADK path doesn't raise TypeError when GalileoBaseHandler calls through with these new kwargs; thread them into LlmMetrics - src/galileo/handlers/langchain/utils.py: fix Surface 1 has_detail_data guard to only activate when audio/image keys are actually present — previously any non-empty input_token_details (e.g. Anthropic cache_read, OpenAI reasoning) incorrectly returned (0,0,0) instead of (None,None,None) for providers that have no modality breakdown - src/galileo/openai/extractors.py: forward tool_calls and tool_call_id into LoggedMessage when content is a multimodal list; previously tool-call linkage was silently dropped for tool-role messages with array content Co-Authored-By: Claude <noreply@anthropic.com>

Mirrors the existing image_input_tokens/audio_* fields — threads image_output_tokens from LlmMetrics through LlmMetrics.to_dict/from_dict, GalileoLogger.add_llm_span/log_single_llm_call, GalileoBaseHandler.end_node, both LangChain handlers, _extract_gemini_modality_breakdown (Surface 1/2/3 output paths + candidates_tokens_details IMAGE loop), LLMEndResult, SpanManager.end_llm, TraceBuilder.add_llm_span, and GalileoObserver._extract_usage_metadata / on_llm_end so the ADK and LangChain paths can report image generation output tokens. Co-Authored-By: Claude <noreply@anthropic.com>

…modal tool linkage - test_langchain.py: 3 new TestParseLlmResult cases — image_output_tokens from surface 1 output_token_details, image_output_tokens from surface 2 candidates_tokens_details, and non-Gemini provider with only cache_read in input_token_details returns None not zero for all modality fields - test_observer.py: image_output_tokens from candidates_tokens_details IMAGE entry - test_openai_extractors.py: full coverage of _openai_content_parts_to_blocks (text, image data URI, image plain URL, input_audio, unrecognised type, mixed) and convert_to_galileo_message tool linkage (tool_call_id and tool_calls both preserved when content is a list of parts) Co-Authored-By: Claude <noreply@anthropic.com>

…rces/ Adds num_image_input_tokens, num_audio_input_tokens, num_audio_output_tokens, num_image_output_tokens to LlmMetrics in openapi.yaml so the generated src/galileo/resources/models/llm_metrics.py includes them natively — the previously hand-edited fields will now survive re-generation instead of being dropped on the next regen cycle. Co-Authored-By: Claude <noreply@anthropic.com>

baz-reviewer Bot reviewed Jun 1, 2026

View reviewed changes

john-weiler mentioned this pull request Jun 2, 2026

feat(cost): per-modality token breakdown in JS SDK (F3 Track D) rungalileo/galileo-js#619

Open

3 tasks

fixes

bee7b89

baz-reviewer Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread src/galileo/openai/extractors.py

Comment thread src/galileo/openai/extractors.py Outdated

galileo-astra Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread src/galileo/openai/extractors.py Outdated

Comment thread src/galileo/handlers/langchain/utils.py

Comment thread src/galileo/openai/extractors.py

fercor-cisco reviewed Jun 9, 2026

View reviewed changes

dmcwhorter reviewed Jun 15, 2026

View reviewed changes

Comment thread galileo-adk/src/galileo_adk/observer.py

john-weiler and others added 4 commits June 23, 2026 10:59

Conversation

john-weiler commented Jun 1, 2026 • edited by baz-reviewer Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Known limitation — langchain-google-genai 4.2.4

Test plan

Generated description

Uh oh!

codecov Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

baz-reviewer Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baz-reviewer Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

john-weiler Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

galileo-astra Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fercor-cisco commented Jun 9, 2026

Uh oh!

galileo-astra Bot left a comment

Choose a reason for hiding this comment

General Comments

Follow-ups

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fercor-cisco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

john-weiler commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

john-weiler commented Jun 1, 2026 •

edited by baz-reviewer Bot

Loading

codecov Bot commented Jun 1, 2026 •

edited

Loading