Skip to content

[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712)#725

Merged
xintongsong merged 1 commit into
apache:release-0.2from
xintongsong:backport-712-release-0.2
Jun 2, 2026
Merged

[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712)#725
xintongsong merged 1 commit into
apache:release-0.2from
xintongsong:backport-712-release-0.2

Conversation

@xintongsong
Copy link
Copy Markdown
Contributor

Backport of #712 to release-0.2. The change keeps token-metric recording on the operator/mailbox (action) thread instead of the durable-execution pool thread, mirroring the Python side of the framework.

Scope on release-0.2

The original PR touched 7 connections; release-0.2 only ships 4 of them. The Bedrock, AzureOpenAI and OpenAIResponses connections are not present on this branch and are intentionally excluded from the backport. Covered connections:

  • Ollama
  • Anthropic
  • AzureAI
  • OpenAI (single connection on release-0.2; on main it has since been split into Completions / Responses / AzureOpenAI)

What changed

  • BaseChatModelSetup gains public recordTokenMetrics(String, long, long) — the Python-parity record site. The setup's bound metric group is the action metric group, so the emitted metric path and counter names are unchanged.
  • BaseChatModelConnection.recordTokenMetrics(...) and the now-dead connection.setMetricGroup(...) forwarding in BaseChatModelSetup are removed.
  • Each of the 4 connections' chat() now stashes model_name / promptTokens / completionTokens into the response ChatMessage.extraArgs instead of recording inside chat().
  • ChatModelAction records after the durable call returns (before structured-output reassignment, which would drop the keys) via a new static recordChatTokenMetrics(...) helper.
  • RunnerContext metric-group getter javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable.

Recording is gated identically to Python: non-empty model name and both token counts greater than zero; values are read as Number and converted with longValue() to tolerate Integer/Long across durable recovery.

This also fixes the same latent gap as on main: Python-backed chat models invoked from the Java action previously recorded no token metrics (the path bypasses the Java connection's recording); they are now captured once via the setup.

Tests

  • Relocated the base token-metrics test to the setup (BaseChatModelSetupTokenMetricsTest), mirroring the rename on main.
  • New ChatModelActionTest covers recordChatTokenMetrics: records when all keys are present and positive; Integer-typed token values still recorded via Number#longValue(); skips when a key is missing or non-numeric; skips when a token is 0 or the model name is empty (Python parity).
  • ./tools/build.sh -j and module-level mvn test on api / plan / 4 covered connections all pass locally.

API

Adds public BaseChatModelSetup.recordTokenMetrics(String, long, long) and removes the previously protected BaseChatModelConnection.recordTokenMetrics(...). No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

…the async call boundary (apache#712)

Backport of apache#712 to release-0.2 with scope narrowed to the chat-model
connections that exist on this branch. The Bedrock, AzureOpenAI and
OpenAIResponses connection variants are not present on release-0.2 and
are intentionally excluded.

Move token-metric recording from the durable async callable (where it
crossed the operator/mailbox thread boundary) to the action thread:

- BaseChatModelSetup gains public recordTokenMetrics(String, long, long).
- BaseChatModelConnection.recordTokenMetrics(...) and the
  connection.setMetricGroup(...) forwarding in BaseChatModelSetup are
  removed.
- Each connection's chat() stashes model_name / promptTokens /
  completionTokens into ChatMessage.extraArgs (Ollama, Anthropic, AzureAI,
  OpenAI on release-0.2).
- ChatModelAction records via the setup after durableExecute(Async)
  returns, before structured-output reassignment.
- RunnerContext.getAgentMetricGroup/getActionMetricGroup javadoc notes
  that the returned group must only be accessed from the operator
  thread, not inside a durable callable.

Emitted metric paths and counter names are unchanged. Records are
gated identically to Python: non-empty model name and both token
counts greater than zero; Integer/Long token values are accepted via
Number#longValue().

Tests:
- BaseChatModelConnectionTokenMetricsTest renamed and rewritten to
  BaseChatModelSetupTokenMetricsTest (target moved from connection to
  setup).
- New ChatModelActionTest covers recordChatTokenMetrics: records when
  all keys present and positive; Integer-typed values still recorded;
  skips on missing key, non-numeric value, zero token, or empty model
  name.
@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.2.2 priority/major Default priority of the PR or issue. labels Jun 1, 2026
Copy link
Copy Markdown
Collaborator

@weiqingy weiqingy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xintongsong xintongsong merged commit 6b3e3e7 into apache:release-0.2 Jun 2, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.2.2 priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants