[api][plan][integrations] Record built-in chat token metrics outside the async call boundary by weiqingy · Pull Request #712 · apache/flink-agents

weiqingy · 2026-05-29T06:12:28Z

Linked issue: #706

Purpose of change

The Java built-in chat action recorded token metrics from inside the durable async callable: each chat-model connection called BaseChatModelConnection.recordTokenMetrics(...) within its chat() method, which runs on a durable-execution pool thread. The runtime metric group is meant to be used from the operator/mailbox (action) thread, so touching it from the callable crosses that boundary.

This brings Java to the same execution boundary as Python. Mirroring chat_model_action.py:

Each of the 7 Java connections (Ollama, Anthropic, Bedrock, AzureAI, OpenAI Completions/Responses/AzureOpenAI) now stashes model_name / promptTokens / completionTokens into the response ChatMessage.extraArgs instead of recording inside chat().
ChatModelAction records after the durable call returns (before structured-output reassignment, which would drop the keys) via a new BaseChatModelSetup.recordTokenMetrics(...) — the Python-parity record site (chat_model._record_token_metrics). The setup's bound metric group is the action metric group, so the emitted metric path and counter names are unchanged.
The old BaseChatModelConnection.recordTokenMetrics(...) and its now-dead connection.setMetricGroup(...) forwarding are removed.
The RunnerContext metric-group getter Javadoc now documents that the returned group must only be accessed from the operator/mailbox thread, not inside a durable callable.

Recording matches Python's guard exactly: the model name must be non-empty and both token counts greater than zero; values are read as Number and converted with longValue() to tolerate Integer/Long across the Pemja bridge and durable recovery.

This also fixes a latent gap: Python-backed chat models invoked from the Java action previously recorded no token metrics (the path bypasses the Java connection's recording); they are now captured once.

Tests

Relocated the base token-metrics test to the setup (BaseChatModelSetupTokenMetricsTest), mirroring Python's test_token_metrics.py: records to the per-model sub-group counters, no-ops without a metric group, separate sub-groups per model, getResourceType() == CHAT_MODEL.
Extended ChatModelActionTest with cases for the new recordChatTokenMetrics helper: records once when all keys are present and positive; Integer-typed token values still recorded via Number.longValue(); skips when a key is missing or non-numeric; skips when a token is 0 or the model name is empty (Python parity).
./tools/build.sh -j, ./tools/ut.sh -j, and ./tools/lint.sh -c all pass.

API

Adds public BaseChatModelSetup.recordTokenMetrics(String, long, long) (the Python-parity record site) and removes the previously protected BaseChatModelConnection.recordTokenMetrics(...). No change to public configuration, event, or resource APIs. Emitted metric names and paths are unchanged.

Documentation

doc-needed
doc-not-needed
doc-included

…the async call boundary The Java built-in chat action recorded token metrics from inside the durableExecuteAsync callable: each chat-model connection called BaseChatModelConnection.recordTokenMetrics(...) within chat(), which runs on a durable-execution pool thread. The runtime metric group is meant to be used from the operator/mailbox (action) thread. Mirror the Python path: connections now stash model_name/promptTokens/ completionTokens into the response ChatMessage.extraArgs, and ChatModelAction records after the durable call returns via a new BaseChatModelSetup.recordTokenMetrics(...). The setup's bound metric group is the action metric group, so the emitted metric path and counter names are unchanged. This also captures token metrics for Python-backed models invoked from the Java action, which previously recorded none. Recording matches Python's guard: model name non-empty and both token counts greater than zero; values are read as Number and converted with longValue() to tolerate Integer/Long across the Pemja bridge and durable recovery.

Clarify that the metric groups returned by getAgentMetricGroup() and getActionMetricGroup() must only be accessed from the operator/mailbox (action) thread, not from inside a durableExecute / durableExecuteAsync callable, which runs on a separate thread pool.

xintongsong

LGTM

joeyutong · 2026-06-01T09:05:05Z

@xintongsong Should this PR be backported to the release-0.2 branch?

xintongsong · 2026-06-01T10:04:08Z

Exactly. That's what I was wondering by asking for the affectVersion label in #706.

weiqingy added 2 commits May 28, 2026 22:47

github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 29, 2026

weiqingy mentioned this pull request May 29, 2026

[Bug] Java chat token metrics are recorded inside async call boundary #706

Open

2 tasks

xintongsong approved these changes May 31, 2026

View reviewed changes

xintongsong merged commit e98af52 into apache:main May 31, 2026
74 of 76 checks passed

xintongsong mentioned this pull request Jun 1, 2026

[release-0.2][backport][api][plan][integrations] Record built-in chat token metrics outside the async call boundary (#712) #725

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api][plan][integrations] Record built-in chat token metrics outside the async call boundary#712

[api][plan][integrations] Record built-in chat token metrics outside the async call boundary#712
xintongsong merged 2 commits into
apache:mainfrom
weiqingy:706-impl

weiqingy commented May 29, 2026

Uh oh!

xintongsong left a comment

Uh oh!

Uh oh!

joeyutong commented Jun 1, 2026

Uh oh!

xintongsong commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weiqingy commented May 29, 2026

Purpose of change

Tests

API

Documentation

Uh oh!

xintongsong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joeyutong commented Jun 1, 2026

Uh oh!

xintongsong commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants