fix(json_adapter): strip OpenAPI x-* vendor extensions from LM schemas#27
fix(json_adapter): strip OpenAPI x-* vendor extensions from LM schemas#27isaacbmiller wants to merge 1 commit into
Conversation
Greptile SummaryThis PR fixes a real schema-leakage bug (stanfordnlp#9686) where Pydantic's Confidence Score: 4/5Safe to merge for the two target paths; the pre-existing tool-call schema path gap (already flagged) is not introduced by this PR. The fix is correct and well-tested. Score is capped at 4 due to the known P1 gap in the tool-call schema path (dspy/adapters/types/tool.py) that shares the same failure mode and remains unaddressed — already documented in a prior review comment. dspy/adapters/types/tool.py — the model_json_schema() / TypeAdapter.json_schema() calls at lines 104–107 do not call strip_vendor_extensions. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Pydantic model with json_schema_extra x-* fields] --> B[model_json_schema / TypeAdapter.json_schema]
B --> C{Which path?}
C -->|response_format path| D["_get_structured_outputs_response_format\n(json_adapter.py)"]
C -->|prompt-text path| E["_get_json_schema → translate_field_type\n(utils.py)"]
D --> F[strip_vendor_extensions schema]
E --> F
F --> G[Recurse into $defs, properties, anyOf/oneOf/allOf, items, etc.]
G --> H[Remove x-* keys at each schema node]
G --> I[Leave examples/default/const/enum values untouched]
H --> J[Clean schema forwarded to LM]
I --> J
J --> K{Provider}
K -->|Strict-schema AWS Bedrock / OpenAI Structured Outputs| L[Accepted — no unknown peer keys]
K -->|Plain JSON mode| M[Accepted — x-* gone from prompt text]
Reviews (2): Last reviewed commit: "fix(json_adapter): strip OpenAPI x-* ven..." | Re-trigger Greptile |
Pydantic merges json_schema_extra entries into the generated schema as peer keys, not under a literal json_schema_extra key. The previous scrub in _get_structured_outputs_response_format was therefore a no-op on Pydantic 2.x and only walked top-level properties (missing nested user models in $defs). Replace it with a recursive strip_vendor_extensions helper that removes any key starting with 'x-' (the standard OpenAPI/JSON Schema vendor-extension namespace) at every depth, and apply it both to the schema sent as response_format and to the schema embedded in the prompt via _get_json_schema. Standard JSON Schema fields supplied via json_schema_extra (examples, format, pattern, minLength, etc.) are preserved, so they continue to guide the LM. Fixes the AWS Bedrock Converse 400 reported in stanfordnlp#9686. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
6ff2b38 to
7f56a30
Compare
|
lol I was hoping to make this PR but okay ;) |
|
lol feel free @shabie I havent used the x- fields too much and wanted to do a bit of investigation. Feel free to take this code (i havent reviewed it closely) or copy |
|
Alright. Then I'll do that. Thank you! Yeah I am working on a concrete problem where this showed up. My PR is very similar. I'll look at yours too. |
Summary
Fixes the leak reported in stanfordnlp/dspy#9686: when a user's Pydantic output model has a field with
json_schema_extra={"x-...": ...}, the vendor-extension keys are merged into the JSON schema as peer keys (not under a literaljson_schema_extrakey) and forwarded to the LM in two places:response_formatproduced by_get_structured_outputs_response_format(dspy/adapters/json_adapter.py)_get_json_schema→translate_field_type(dspy/adapters/utils.py)Strict-schema providers (AWS Bedrock Converse, OpenAI Structured Outputs) reject unknown peer keys (
{"message":"For 'anyOf', 'x-comparison' is not supported"}), the adapter falls back to plain JSON mode, and predictions return{}.The previous scrub:
was a no-op on Pydantic 2.x (verified on 2.12.4 and 2.13.3 — Pydantic does not emit a literal
json_schema_extrakey) and only inspected top-level properties, never recursing into$defswhere nested user models live.Change
strip_vendor_extensions(schema)helper indspy/adapters/utils.pythat recursively removes any key starting withx-(the standardised OpenAPI / JSON Schema vendor-extension namespace) from every dict in the schema, including$defs,anyOf,oneOf,items, etc._get_structured_outputs_response_formatand_get_json_schema).json_schema_extra(examples,format,pattern,minLength,minimum, …) are preserved so they continue to guide the LM.No public API change.
Tests
Three regression tests added to
tests/adapters/test_json_adapter.py:test_json_adapter_strips_vendor_extensions_from_response_format— reproduces the issue's offline repro and assertsx-comparisonis gone whileexamplessurvives.test_translate_field_type_strips_vendor_extensions_from_prompt_schema— covers the prompt-text leak and assertspatternis preserved.test_strip_vendor_extensions_helper— unit test for the helper coveringanyOfand$defs.tests/adapters/runs green (167 passed).Why
x-*specifically?x-is the well-defined OpenAPI / JSON Schema convention for vendor-only, non-portable extensions. Anything else a user puts injson_schema_extra(examples, pattern, format, etc.) is standard JSON Schema and benefits the LM, so it must continue to flow through.