[api][plan][runtime] Separate prompt arguments from message extra_args in BaseChatModelSetup.chat() by weiqingy · Pull Request #698 · apache/flink-agents

weiqingy · 2026-05-22T03:21:30Z

Linked issue: #220

Purpose of change

BaseChatModelSetup.chat() (Java + Python) previously filled prompt templates by flattening every input message's extra_args into a single map. This conflated chat-message metadata with prompt-template variables and forced callers to stuff template values into a generic metadata bag.

This PR adds an explicit arguments parameter on chat() and carries the same field on ChatRequestEvent. ChatModelAction extracts it from the event and forwards to the setup on both round 1 and tool-response continuations, so multi-turn flows keep re-filling the template correctly.

ChatMessage.extra_args is unchanged and still carries externalId, STRUCTURED_OUTPUT, OpenAI refusal, Ollama reasoning, and other provider-specific metadata used by chat-model connections.

Tests

New positive test (Java + Python): chat(messages, arguments, parameters) formats the Prompt template using values from arguments.
New negative test (Java + Python): a ChatMessage with extra_args set no longer feeds the prompt template — proves the cutover.
New multi-turn test (Java + Python): two consecutive chat() invocations with the same arguments re-fill the template each time.
New action-layer regression test (Java + Python): processChatRequestOrToolResponse with a ToolResponseEvent extracts the persisted arguments from the saved tool-request context and forwards to chat_model.chat(...) on round 2 — locks the multi-turn contract.
Bridge test updated: PythonChatModelSetupTest.testChat asserts "arguments" flows into Pemja kwargs marshalled to Python.
Existing migrated tests in test_built_in_actions.py, built_in_action_async_execution_test.py, and e2e_tests_mcp/mcp_test.py preserve their outcome assertions post-cutover — proves behavior parity.
Full Java + Python test sweeps green (./tools/ut.sh -j; uv run pytest: 509 passed / 13 skipped).
./tools/lint.sh -c clean.

API

Yes — this is a breaking API change:

Java BaseChatModelSetup: the existing 2-arg chat(List<ChatMessage>, Map<String, Object> parameters) overload is removed (it would erase to the same signature as a hypothetical chat(messages, arguments)). The new primary form is chat(List<ChatMessage>, Map<String, Object> arguments, Map<String, Object> parameters). The 1-arg chat(List<ChatMessage>) convenience overload is retained.
Python BaseChatModelSetup.chat: arguments: Mapping[str, Any] | None = None is added between messages and **kwargs.
Java ChatRequestEvent: new 4-arg constructor (String model, List<ChatMessage> messages, @Nullable Map<String, Object> arguments, @Nullable Object outputSchema). Existing 2-arg and 3-arg legacy constructors continue to work (delegating with empty arguments).
Python ChatRequestEvent.__init__: arguments: Dict[str, Any] | None = None added between messages and output_schema.

Migration: callers that previously set template variables via ChatMessage.extra_args should move them to ChatRequestEvent.arguments. All in-repo callers (3 Java examples, 3 Python examples, 2 e2e tests, 1 runtime test) are migrated in this PR.

Documentation

doc-needed
doc-not-needed
doc-included

…s in BaseChatModelSetup.chat() `BaseChatModelSetup.chat()` previously filled prompt templates by flattening every input message's `extra_args` into a single map. This conflated chat metadata with template variables. Introduce an explicit `arguments` parameter on `chat()` (Java + Python) and carry the same field on `ChatRequestEvent`, then thread it through `ChatModelAction` to the setup on both round 1 and tool-response continuations so multi-turn flows keep re-filling the template correctly. `ChatMessage.extra_args` is unchanged and still carries `externalId`, `STRUCTURED_OUTPUT`, OpenAI `refusal`, Ollama `reasoning`, and other provider-specific metadata used by chat-model connections. Closes apache#220

weiqingy · 2026-05-22T06:05:05Z

Looks like the 1 failing check is unrelated to this PR:

it-python [ubuntu-latest] [java-17] [python-3.12] [flink-2.1]: LLM-output nondeterminism in real-Ollama e2e test — test_react_agent_on_local_runner fails at _generate_structured_output (chat_model_action.py:236) because the local qwen3:1.7b returned a stray tool-call JSON instead of {"result": N}; all 3 retries produced different malformed responses. The same test passes on 3 sibling matrix slots in the same CI run (Python 3.11 + Flink 1.20, Python 3.11 + Flink 2.0, Python 3.12 + Flink 2.2), and this PR doesn't modify _generate_structured_output or the schema-validation path. Evidence: failing job log.

github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 22, 2026

weiqingy mentioned this pull request May 22, 2026

[Feature] Separate messages and arguments for prompt of chat method in BaseChatModelSetup #220

Open

2 tasks

[ci] Re-trigger CI (flaky it-python on python-3.12+flink-2.1)

9d7e438

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api][plan][runtime] Separate prompt arguments from message extra_args in BaseChatModelSetup.chat()#698

[api][plan][runtime] Separate prompt arguments from message extra_args in BaseChatModelSetup.chat()#698
weiqingy wants to merge 2 commits into
apache:mainfrom
weiqingy:220-impl

weiqingy commented May 22, 2026

Uh oh!

weiqingy commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weiqingy commented May 22, 2026

Purpose of change

Tests

API

Documentation

Uh oh!

weiqingy commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant