Skip to content

[api][plan][integrations] Record built-in chat token metrics outside the async call boundary#712

Merged
xintongsong merged 2 commits into
apache:mainfrom
weiqingy:706-impl
May 31, 2026
Merged

[api][plan][integrations] Record built-in chat token metrics outside the async call boundary#712
xintongsong merged 2 commits into
apache:mainfrom
weiqingy:706-impl

Conversation

@weiqingy
Copy link
Copy Markdown
Collaborator

Linked issue: #706

Purpose of change

The Java built-in chat action recorded token metrics from inside the durable async callable: each chat-model connection called BaseChatModelConnection.recordTokenMetrics(...) within its chat() method, which runs on a durable-execution pool thread. The runtime metric group is meant to be used from the operator/mailbox (action) thread, so touching it from the callable crosses that boundary.

This brings Java to the same execution boundary as Python. Mirroring chat_model_action.py:

  • Each of the 7 Java connections (Ollama, Anthropic, Bedrock, AzureAI, OpenAI Completions/Responses/AzureOpenAI) now stashes model_name / promptTokens / completionTokens into the response ChatMessage.extraArgs instead of recording inside chat().
  • ChatModelAction records after the durable call returns (before structured-output reassignment, which would drop the keys) via a new BaseChatModelSetup.recordTokenMetrics(...) — the Python-parity record site (chat_model._record_token_metrics). The setup's bound metric group is the action metric group, so the emitted metric path and counter names are unchanged.
  • The old BaseChatModelConnection.recordTokenMetrics(...) and its now-dead connection.setMetricGroup(...) forwarding are removed.
  • The RunnerContext metric-group getter Javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable.

Recording matches Python's guard exactly: the model name must be non-empty and both token counts greater than zero; values are read as Number and converted with longValue() to tolerate Integer/Long across the Pemja bridge and durable recovery.

This also fixes a latent gap: Python-backed chat models invoked from the Java action previously recorded no token metrics (the path bypasses the Java connection's recording); they are now captured once.

Tests

  • Relocated the base token-metrics test to the setup (BaseChatModelSetupTokenMetricsTest), mirroring Python's test_token_metrics.py: records to the per-model sub-group counters, no-ops without a metric group, separate sub-groups per model, getResourceType() == CHAT_MODEL.
  • Extended ChatModelActionTest with cases for the new recordChatTokenMetrics helper: records once when all keys are present and positive; Integer-typed token values still recorded via Number.longValue(); skips when a key is missing or non-numeric; skips when a token is 0 or the model name is empty (Python parity).
  • ./tools/build.sh -j, ./tools/ut.sh -j, and ./tools/lint.sh -c all pass.

API

Adds public BaseChatModelSetup.recordTokenMetrics(String, long, long) (the Python-parity record site) and removes the previously protected BaseChatModelConnection.recordTokenMetrics(...). No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

weiqingy added 2 commits May 28, 2026 22:47
…the async call boundary

The Java built-in chat action recorded token metrics from inside the
durableExecuteAsync callable: each chat-model connection called
BaseChatModelConnection.recordTokenMetrics(...) within chat(), which runs on
a durable-execution pool thread. The runtime metric group is meant to be used
from the operator/mailbox (action) thread.

Mirror the Python path: connections now stash model_name/promptTokens/
completionTokens into the response ChatMessage.extraArgs, and ChatModelAction
records after the durable call returns via a new
BaseChatModelSetup.recordTokenMetrics(...). The setup's bound metric group is
the action metric group, so the emitted metric path and counter names are
unchanged. This also captures token metrics for Python-backed models invoked
from the Java action, which previously recorded none.

Recording matches Python's guard: model name non-empty and both token counts
greater than zero; values are read as Number and converted with longValue() to
tolerate Integer/Long across the Pemja bridge and durable recovery.
Clarify that the metric groups returned by getAgentMetricGroup() and
getActionMetricGroup() must only be accessed from the operator/mailbox (action)
thread, not from inside a durableExecute / durableExecuteAsync callable, which
runs on a separate thread pool.
@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 29, 2026
Copy link
Copy Markdown
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xintongsong xintongsong merged commit e98af52 into apache:main May 31, 2026
74 of 76 checks passed
@joeyutong
Copy link
Copy Markdown
Contributor

@xintongsong Should this PR be backported to the release-0.2 branch?

@xintongsong
Copy link
Copy Markdown
Contributor

Exactly. That's what I was wondering by asking for the affectVersion label in #706.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants