Python: Fix structured output parsing when text contents are not coalesced by t-anjan · Pull Request #4897 · microsoft/agent-framework

t-anjan · 2026-03-25T07:13:44Z

Summary

ChatResponse.value and AgentResponse.value use self.text to feed into model_validate_json() for structured output parsing. self.text delegates to Message.text, which joins multiple TextContent objects with " ".join().

This is correct for natural language responses, but corrupts JSON when text contents are not fully coalesced into a single content — spaces get injected into JSON keys and values, causing Pydantic
validation failures.

The fix is scoped to the value property, which only executes when response_format is set (i.e., structured output). Message.text and its " ".join behavior are unchanged — natural language responses
are unaffected.

Real-world failures

Observed in production with OpenAI-compatible chat client (OpenRouter → Gemini) using response_format for structured output. Failures were intermittent — retrying the same request succeeded.

Failure 1 — Space in JSON key:
LLM returned valid JSON with "action": "request_evidence", but after Message.text processing, Pydantic received "action ": "request_evidence" (trailing space in key). Error:
Field required [type=missing, input_value={'action ': 'request_evid...}]

Failure 2 — Space in JSON value:
LLM returned "readiness": "not_started", but Pydantic received "readiness": "not_started " (trailing space in value). Error:
Input should be 'not_started', ... [input_value='not_started ']

In both cases, the raw LLM response was valid JSON — the corruption was introduced by " ".join() in Message.text.

Changes

ChatResponse.value: when parsing structured output via response_format, concatenate text contents directly instead of using self.text (which delegates to Message.text's " ".join)
AgentResponse.value: same fix
Message.text is not modified — natural language joining behavior is preserved
Added tests for both classes demonstrating the bug and fix

Test plan

New tests: test_chat_response_value_multi_chunk_json and test_agent_response_value_multi_chunk_json
Each test creates a JSON response split across 3 TextContent objects (simulating uncoalesced streaming chunks), then asserts:
1. message.text contains injected spaces — confirming " ".join behavior exists: '{"resp onse": "He llo"}'
2. .value still parses correctly despite that — confirming the fix works
Without the fix, response.value throws: ValidationError: Field required [type=missing, input_value={'resp onse': 'He llo'}] — the same class of error observed in production
Existing test_types.py tests continue to pass

…esced When `Message.text` joins multiple `TextContent` objects, it uses `" ".join()` which is correct for natural language but corrupts JSON when used for structured output parsing. The `value` property on both `ChatResponse` and `AgentResponse` feeds `self.text` directly into `model_validate_json()`, causing Pydantic validation failures when text chunks happen to not be fully coalesced into a single content. This fix makes the `value` property concatenate text contents directly (without spaces) instead of going through `Message.text`, preserving the integrity of structured JSON output. ## Real-world impact This bug was observed in production with the OpenAI-compatible chat client (OpenRouter → Gemini) where streaming responses intermittently produced multiple text Content objects that survived coalescing. Two distinct failure modes were observed: **Failure 1 — Space injected into JSON key:** The LLM returned valid JSON with `"action": "request_evidence"`, but `Message.text` produced `"action ": "request_evidence"` (trailing space in key). Pydantic rejected this with: `Field required [type=missing, input_value={'action ': ...}]` **Failure 2 — Space injected into JSON value:** The LLM returned `"readiness": "not_started"`, but `Message.text` produced `"readiness": "not_started "` (trailing space in value). Pydantic rejected this with: `Input should be 'not_started', ... [input_value='not_started ']` Both failures were intermittent (retrying the same request succeeded) and the raw LLM response was valid JSON — the corruption was introduced by the `" ".join()` in `Message.text`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix structured output parsing when text contents are not coalesced#4897

Python: Fix structured output parsing when text contents are not coalesced#4897
t-anjan wants to merge 1 commit intomicrosoft:mainfrom
t-anjan:fix/structured-output-space-join

t-anjan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t-anjan commented Mar 25, 2026

Summary

Real-world failures

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant