[Feature][Integration][Java] Add built-in support for Azure OpenAI Chat Model#695
[Feature][Integration][Java] Add built-in support for Azure OpenAI Chat Model#695alnzng wants to merge 8 commits into
Conversation
Adds AzureOpenAIChatModelConnection and AzureOpenAIChatModelSetup built on the openai-java SDK's Azure client, registers the new resource names, and wires the provider into the e2e ChatModel integration test. Extracts ChatMessage <-> OpenAI Chat Completions message conversion from OpenAICompletionsConnection into a shared OpenAIChatCompletionsUtils helper so both the OpenAI and Azure OpenAI connections reuse the same conversion logic.
|
@wenjin272 @xintongsong @weiqingy please help take a look this PR, thanks. |
wenjin272
left a comment
There was a problem hiding this comment.
Hi, @alnzng, Ty for contributing this integration. I left server comments.
In addition, I believe we need to update the documentation by removing references to Python-only implementations and adding Java parameters and examples.
| throw new IllegalArgumentException("azure_endpoint should not be null or empty."); | ||
| } | ||
|
|
||
| Integer timeoutSeconds = descriptor.getArgument("timeout"); |
There was a problem hiding this comment.
The OpenAiCompletionsConnection validates the boundaries for timeout and maxRetries, so we can align with that here.
| params.put("max_tokens", maxTokens); | ||
| } | ||
| if (additionalKwargs != null && !additionalKwargs.isEmpty()) { | ||
| params.putAll(additionalKwargs); |
There was a problem hiding this comment.
putAll(additionalKwargs) runs after the first-class fields, so a colliding key (temperature, model, max_tokens, ...) silently overwrites the validated value. For example, additional_kwargs {"temperature": 5.0} bypasses the constructor's 0.0–2.0 range check.
It appears that the Python side has the same issue.
There was a problem hiding this comment.
Updated and use a consistent logic from OpenAI / Anthropic model to handle additionalKwargs .
| if (modelOfAzureDeployment != null | ||
| && !modelOfAzureDeployment.isBlank() | ||
| && completion.usage().isPresent()) { | ||
| recordTokenMetrics( |
There was a problem hiding this comment.
If the user has not configured modelOfAzureDeployment, we will not record token metrics at all, which may result in a poor user experience. I am wondering whether we can use the model as a fallback.
The same issue exists on the Python side as well.
There was a problem hiding this comment.
Azure OpenAI APIs have different setup on the model, the model means a Azure deployment name, not a real LLM name. We have discussed this in the past: #478 (comment)
Fallback to model will confuse the user, I would prefer to keep current behavior. One thing we can improve is that we can call out the in the doc that modelOfAzureDeployment is required if the users want to see token metrics.
WDYT?
There was a problem hiding this comment.
Sorry, I forgot our previous discussion. I think it’s reasonable to clarify this matter in the documentation.
|
@wenjin272 PR is updated based on your comments. |
| (Map<String, Object>) mutableArgs.remove("additional_kwargs"); | ||
| if (additionalKwargs != null) { | ||
| for (Map.Entry<String, Object> entry : additionalKwargs.entrySet()) { | ||
| builder.putAdditionalBodyProperty( |
There was a problem hiding this comment.
The Setup-side fix (nesting additional_kwargs under its own key) closes the obvious case, but here in the connection we then forward those kwargs via builder.putAdditionalBodyProperty(...). The openai-java SDK serializes typed fields and additional body properties into the same JSON request body, so a caller can still pass e.g. additional_kwargs={"temperature": 5.0} alongside the typed temperature=0.3 and end up with both keys in the body (or one silently winning depending on SDK serialization order), bypassing the [0.0, 2.0] range check.
Should we detect collisions against the typed fields we already pop here (model, temperature, max_tokens, logprobs) and throw an IllegalArgumentException? That would keep the validation contract intact regardless of how the SDK merges the body.
The Python side has the same shape with **kwargs, **additional_kwargs (last one wins via dict expansion order) — should the fix apply on both sides?
| * {@code message.refusal()} is written as {@code extraArgs["refusal"]} when present, preserving | ||
| * prior Java behavior. | ||
| */ | ||
| public static ChatMessage convertFromOpenAIMessage( |
There was a problem hiding this comment.
The extraArgs parameter is dead today — both call sites (this one and AzureOpenAIChatModelConnection.chat) pass Map.of(). Also, the body copies caller args first (response.getExtraArgs().putAll(extraArgs)) and then writes refusal after, so a caller-supplied "refusal" key would be silently overwritten by message.refusal(). The precedence isn't obvious from the signature.
Should we drop the parameter until there's a real caller that needs it? That would keep the API surface minimal and avoid locking in surprising precedence. Or, if we want to keep it as a forward-looking hook, should we document the override behavior in the Javadoc?
Linked issue: #694
Purpose of change
Add Azure OpenAI as a first-class chat model provider for Java agents.
Tests
AzureOpenAIChatModelConnectionTestandAzureOpenAIChatModelSetupTest.API
AzureOpenAIChatModelConnection,AzureOpenAIChatModelSetup) and two new constants on ResourceName.ChatModel`.Documentation
doc-neededdoc-not-neededdoc-included