[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712) by xintongsong · Pull Request #725 · apache/flink-agents

xintongsong · 2026-06-01T11:27:54Z

Backport of #712 to release-0.2. The change keeps token-metric recording on the operator/mailbox (action) thread instead of the durable-execution pool thread, mirroring the Python side of the framework.

Scope on release-0.2

The original PR touched 7 connections; release-0.2 only ships 4 of them. The Bedrock, AzureOpenAI and OpenAIResponses connections are not present on this branch and are intentionally excluded from the backport. Covered connections:

Ollama
Anthropic
AzureAI
OpenAI (single connection on release-0.2; on main it has since been split into Completions / Responses / AzureOpenAI)

What changed

BaseChatModelSetup gains public recordTokenMetrics(String, long, long) — the Python-parity record site. The setup's bound metric group is the action metric group, so the emitted metric path and counter names are unchanged.
BaseChatModelConnection.recordTokenMetrics(...) and the now-dead connection.setMetricGroup(...) forwarding in BaseChatModelSetup are removed.
Each of the 4 connections' chat() now stashes model_name / promptTokens / completionTokens into the response ChatMessage.extraArgs instead of recording inside chat().
ChatModelAction records after the durable call returns (before structured-output reassignment, which would drop the keys) via a new static recordChatTokenMetrics(...) helper.
RunnerContext metric-group getter javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable.

Recording is gated identically to Python: non-empty model name and both token counts greater than zero; values are read as Number and converted with longValue() to tolerate Integer/Long across durable recovery.

This also fixes the same latent gap as on main: Python-backed chat models invoked from the Java action previously recorded no token metrics (the path bypasses the Java connection's recording); they are now captured once via the setup.

Tests

Relocated the base token-metrics test to the setup (BaseChatModelSetupTokenMetricsTest), mirroring the rename on main.
New ChatModelActionTest covers recordChatTokenMetrics: records when all keys are present and positive; Integer-typed token values still recorded via Number#longValue(); skips when a key is missing or non-numeric; skips when a token is 0 or the model name is empty (Python parity).
./tools/build.sh -j and module-level mvn test on api / plan / 4 covered connections all pass locally.

API

Adds public BaseChatModelSetup.recordTokenMetrics(String, long, long) and removes the previously protected BaseChatModelConnection.recordTokenMetrics(...). No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged.

Documentation

doc-needed
doc-not-needed
doc-included

…the async call boundary (apache#712) Backport of apache#712 to release-0.2 with scope narrowed to the chat-model connections that exist on this branch. The Bedrock, AzureOpenAI and OpenAIResponses connection variants are not present on release-0.2 and are intentionally excluded. Move token-metric recording from the durable async callable (where it crossed the operator/mailbox thread boundary) to the action thread: - BaseChatModelSetup gains public recordTokenMetrics(String, long, long). - BaseChatModelConnection.recordTokenMetrics(...) and the connection.setMetricGroup(...) forwarding in BaseChatModelSetup are removed. - Each connection's chat() stashes model_name / promptTokens / completionTokens into ChatMessage.extraArgs (Ollama, Anthropic, AzureAI, OpenAI on release-0.2). - ChatModelAction records via the setup after durableExecute(Async) returns, before structured-output reassignment. - RunnerContext.getAgentMetricGroup/getActionMetricGroup javadoc notes that the returned group must only be accessed from the operator thread, not inside a durable callable. Emitted metric paths and counter names are unchanged. Records are gated identically to Python: non-empty model name and both token counts greater than zero; Integer/Long token values are accepted via Number#longValue(). Tests: - BaseChatModelConnectionTokenMetricsTest renamed and rewritten to BaseChatModelSetupTokenMetricsTest (target moved from connection to setup). - New ChatModelActionTest covers recordChatTokenMetrics: records when all keys present and positive; Integer-typed values still recorded; skips on missing key, non-numeric value, zero token, or empty model name.

weiqingy

LGTM

github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.2.2 priority/major Default priority of the PR or issue. labels Jun 1, 2026

weiqingy approved these changes Jun 1, 2026

View reviewed changes

xintongsong merged commit 6b3e3e7 into apache:release-0.2 Jun 2, 2026
22 checks passed

xintongsong mentioned this pull request Jun 2, 2026

[Bug] Java chat token metrics are recorded inside async call boundary #706

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712)#725

[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712)#725
xintongsong merged 1 commit into
apache:release-0.2from
xintongsong:backport-712-release-0.2

xintongsong commented Jun 1, 2026

Uh oh!

weiqingy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xintongsong commented Jun 1, 2026

Scope on release-0.2

What changed

Tests

API

Documentation

Uh oh!

weiqingy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants