Skip to content

Fix/structured output bugs#49

Merged
kylexqian merged 9 commits intomainfrom
fix/structured-output-bugs
Apr 11, 2026
Merged

Fix/structured output bugs#49
kylexqian merged 9 commits intomainfrom
fix/structured-output-bugs

Conversation

@kylexqian
Copy link
Copy Markdown
Collaborator

@kylexqian kylexqian commented Apr 3, 2026

Fix some of the bugs found from testing SDK changes w/ all providers

Mainly

  1. OpenAI enforces the word "JSON" to appear somewhere in the messages when using json_object (why? I have no idea)
  2. Anthropic json_schema messages don't return a usage dict -- which is needed for calculating token usage

…usage metadata

Two bugs causing 500s for structured output requests:

1. OpenAI json_object 500 (400 from OpenAI upstream): OpenAI requires the
   word 'json' to appear somewhere in the messages when using
   response_format.type='json_object'. Inject a system message
   'Respond in JSON format.' when no message already contains the word,
   in both the streaming and non-streaming paths. The injection happens
   after request_bytes is computed so the TEE hash covers the original
   user request.

2. Anthropic non-streaming json_schema 500: _invoke_anthropic_structured
   was constructing AIMessage(content=...) from scratch, discarding the
   usage_metadata from the underlying Anthropic response. The response
   body therefore had no 'usage' dict, causing the x402 cost resolver to
   raise ValueError. Fix by passing include_raw=True to
   with_structured_output, extracting usage_metadata from the raw
   AIMessage, and copying it onto the synthesized AIMessage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kylexqian kylexqian force-pushed the fix/structured-output-bugs branch from 89abdc3 to ac9f267 Compare April 3, 2026 09:21
kylexqian and others added 6 commits April 3, 2026 02:33
…usage metadata

Covers the two fixes from the previous commit:

1. TestJsonObjectKeywordInjection — verifies that a SystemMessage containing
   'json' is prepended to langchain_messages when response_format is
   json_object and no existing message already contains the word (both
   non-streaming and streaming paths). Also verifies the injection is
   case-insensitive and does not fire for json_schema mode.

2. TestAnthropicUsageMetadataPreservation — verifies that usage_metadata
   from the raw Anthropic AIMessage is copied onto the synthesized return
   value of _invoke_anthropic_structured, and that the resulting
   non-streaming response dict contains a correctly populated 'usage' field
   for the x402 cost calculator.

Also updates three existing test mocks that were returning plain dicts but
now need to match the include_raw=True format: {"raw": AIMessage, "parsed":
dict, "parsing_error": None}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…red output

_invoke_anthropic_structured is called synchronously before generate() runs,
so the streaming loop (chunks_iter=[]) never executes and final_usage stays
None. The final SSE chunk therefore has no 'usage' field, causing the x402
middleware to raise ValueError when trying to compute the session cost after
the response is sent — billing silently fails even though the client receives
a valid 200.

Fix: extract usage_metadata from the AIMessage returned by
_invoke_anthropic_structured (anthropic_structured_usage) and seed
final_usage from it at the top of the Anthropic branch inside generate().

Also adds a unit test that asserts the final SSE chunk contains prompt_tokens,
completion_tokens, and total_tokens when Anthropic structured output is used
in streaming mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t lost

Gemini returns cumulative usageMetadata on every SSE chunk; LangChain's
subtract_usage() converts these to deltas, meaning input_tokens only
appears non-zero in the *first* chunk carrying usage data and is 0 in
all subsequent ones.  The previous code replaced final_usage on every
chunk, so the last chunk's input_tokens=0 silently wiped the correct
prompt token count.

Fix: accumulate numeric delta fields across chunks instead of replacing.
Adds TestGeminiStreamingUsageAccumulation (3 tests) covering the
preservation of prompt tokens, the two-chunk delta pattern, and the
no-usage-chunks case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes structured output edge cases across providers (OpenAI json_object keyword requirement; Anthropic structured output usage; Gemini streaming usage accumulation) and expands test coverage for these scenarios.

Changes:

  • Preserve Anthropic structured-output usage_metadata and ensure it is surfaced in final streaming/non-streaming responses.
  • Inject a minimal system instruction containing “json” when using response_format.type="json_object" and no message already includes the keyword.
  • Accumulate streaming usage deltas across chunks (notably for Gemini) and add/adjust tests accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tee_gateway/controllers/chat_controller.py Adds OpenAI json_object keyword injection, preserves Anthropic usage metadata, and accumulates streaming usage across chunks.
tests/test_structured_outputs.py Updates Anthropic structured-output mocks to include include_raw shape; adds tests for usage propagation, keyword injection, and Gemini usage accumulation.
pyproject.toml Expands Ruff exclude patterns (venvs, site-packages, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tee_gateway/controllers/chat_controller.py Outdated
Comment thread tee_gateway/controllers/chat_controller.py
Comment thread tee_gateway/controllers/chat_controller.py Outdated

langchain_messages = convert_messages(chat_request.messages)

# OpenAI (and compatible providers) require the word "json" to appear
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we really need this. this class works and does not need to inject anything https://github.com/OpenGradient/memsync/blob/main/memsync/llms/openai.py#L99

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only the case when we use json_object

E.g. we get this error message

openai.BadRequestError: Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}

mem0ai/mem0#4248 -- seems like a known requirement.

usage_metadata is a plain dict, so getattr() always returned the default
0 instead of the actual token counts, causing Anthropic streaming structured
output to report zero usage in the final SSE chunk.

Also bumps all langchain packages to their latest 1.x releases:
langchain 1.2.15, langchain-core 1.2.26, langchain-openai 1.1.12,
langchain-anthropic 1.4.0, langchain-google-genai 4.2.1, langchain-xai 1.2.2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 4, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedanthropic@​0.84.0 ⏵ 0.89.096 +1100100100100

View full report

…ng errors

Two robustness fixes:

1. Replace inline `(getattr(m, "content", "") or "").lower()` with a
   dedicated `_messages_contain_json_word()` helper that handles both
   plain-string and list-of-parts (multimodal) message content.  The old
   one-liner called `.lower()` on a list when a message contained image
   parts, causing an AttributeError and a 500 on any json_object request
   that included multimodal input.

2. Check `parsing_error` in `_invoke_anthropic_structured` after calling
   `with_structured_output(include_raw=True)`.  Previously a schema
   mismatch silently serialised `None` as the string "None" and returned a
   signed 200; now it raises a ValueError that propagates to the outer
   exception handler and returns a 500 with a logged message.

Also consolidates the duplicate `_normalize_response_format` call in the
streaming json_object injection path to reuse `rf` already computed above.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kylexqian kylexqian merged commit 3a437a9 into main Apr 11, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants