Conversation
…usage metadata Two bugs causing 500s for structured output requests: 1. OpenAI json_object 500 (400 from OpenAI upstream): OpenAI requires the word 'json' to appear somewhere in the messages when using response_format.type='json_object'. Inject a system message 'Respond in JSON format.' when no message already contains the word, in both the streaming and non-streaming paths. The injection happens after request_bytes is computed so the TEE hash covers the original user request. 2. Anthropic non-streaming json_schema 500: _invoke_anthropic_structured was constructing AIMessage(content=...) from scratch, discarding the usage_metadata from the underlying Anthropic response. The response body therefore had no 'usage' dict, causing the x402 cost resolver to raise ValueError. Fix by passing include_raw=True to with_structured_output, extracting usage_metadata from the raw AIMessage, and copying it onto the synthesized AIMessage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
89abdc3 to
ac9f267
Compare
…usage metadata
Covers the two fixes from the previous commit:
1. TestJsonObjectKeywordInjection — verifies that a SystemMessage containing
'json' is prepended to langchain_messages when response_format is
json_object and no existing message already contains the word (both
non-streaming and streaming paths). Also verifies the injection is
case-insensitive and does not fire for json_schema mode.
2. TestAnthropicUsageMetadataPreservation — verifies that usage_metadata
from the raw Anthropic AIMessage is copied onto the synthesized return
value of _invoke_anthropic_structured, and that the resulting
non-streaming response dict contains a correctly populated 'usage' field
for the x402 cost calculator.
Also updates three existing test mocks that were returning plain dicts but
now need to match the include_raw=True format: {"raw": AIMessage, "parsed":
dict, "parsing_error": None}.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…red output _invoke_anthropic_structured is called synchronously before generate() runs, so the streaming loop (chunks_iter=[]) never executes and final_usage stays None. The final SSE chunk therefore has no 'usage' field, causing the x402 middleware to raise ValueError when trying to compute the session cost after the response is sent — billing silently fails even though the client receives a valid 200. Fix: extract usage_metadata from the AIMessage returned by _invoke_anthropic_structured (anthropic_structured_usage) and seed final_usage from it at the top of the Anthropic branch inside generate(). Also adds a unit test that asserts the final SSE chunk contains prompt_tokens, completion_tokens, and total_tokens when Anthropic structured output is used in streaming mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t lost Gemini returns cumulative usageMetadata on every SSE chunk; LangChain's subtract_usage() converts these to deltas, meaning input_tokens only appears non-zero in the *first* chunk carrying usage data and is 0 in all subsequent ones. The previous code replaced final_usage on every chunk, so the last chunk's input_tokens=0 silently wiped the correct prompt token count. Fix: accumulate numeric delta fields across chunks instead of replacing. Adds TestGeminiStreamingUsageAccumulation (3 tests) covering the preservation of prompt tokens, the two-chunk delta pattern, and the no-usage-chunks case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes structured output edge cases across providers (OpenAI json_object keyword requirement; Anthropic structured output usage; Gemini streaming usage accumulation) and expands test coverage for these scenarios.
Changes:
- Preserve Anthropic structured-output
usage_metadataand ensure it is surfaced in final streaming/non-streaming responses. - Inject a minimal system instruction containing “json” when using
response_format.type="json_object"and no message already includes the keyword. - Accumulate streaming usage deltas across chunks (notably for Gemini) and add/adjust tests accordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tee_gateway/controllers/chat_controller.py |
Adds OpenAI json_object keyword injection, preserves Anthropic usage metadata, and accumulates streaming usage across chunks. |
tests/test_structured_outputs.py |
Updates Anthropic structured-output mocks to include include_raw shape; adds tests for usage propagation, keyword injection, and Gemini usage accumulation. |
pyproject.toml |
Expands Ruff exclude patterns (venvs, site-packages, etc.). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| langchain_messages = convert_messages(chat_request.messages) | ||
|
|
||
| # OpenAI (and compatible providers) require the word "json" to appear |
There was a problem hiding this comment.
i'm not sure we really need this. this class works and does not need to inject anything https://github.com/OpenGradient/memsync/blob/main/memsync/llms/openai.py#L99
There was a problem hiding this comment.
This is only the case when we use json_object
E.g. we get this error message
openai.BadRequestError: Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}
mem0ai/mem0#4248 -- seems like a known requirement.
usage_metadata is a plain dict, so getattr() always returned the default 0 instead of the actual token counts, causing Anthropic streaming structured output to report zero usage in the final SSE chunk. Also bumps all langchain packages to their latest 1.x releases: langchain 1.2.15, langchain-core 1.2.26, langchain-openai 1.1.12, langchain-anthropic 1.4.0, langchain-google-genai 4.2.1, langchain-xai 1.2.2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…ng errors Two robustness fixes: 1. Replace inline `(getattr(m, "content", "") or "").lower()` with a dedicated `_messages_contain_json_word()` helper that handles both plain-string and list-of-parts (multimodal) message content. The old one-liner called `.lower()` on a list when a message contained image parts, causing an AttributeError and a 500 on any json_object request that included multimodal input. 2. Check `parsing_error` in `_invoke_anthropic_structured` after calling `with_structured_output(include_raw=True)`. Previously a schema mismatch silently serialised `None` as the string "None" and returned a signed 200; now it raises a ValueError that propagates to the outer exception handler and returns a 500 with a logged message. Also consolidates the duplicate `_normalize_response_format` call in the streaming json_object injection path to reuse `rf` already computed above. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix some of the bugs found from testing SDK changes w/ all providers
Mainly
json_object(why? I have no idea)usagedict -- which is needed for calculating token usage