Skip to content

fix(json_adapter): strip OpenAPI x-* vendor extensions from LM schemas#27

Closed
isaacbmiller wants to merge 1 commit into
mainfrom
fix/json-adapter-vendor-extensions
Closed

fix(json_adapter): strip OpenAPI x-* vendor extensions from LM schemas#27
isaacbmiller wants to merge 1 commit into
mainfrom
fix/json-adapter-vendor-extensions

Conversation

@isaacbmiller
Copy link
Copy Markdown

Summary

Fixes the leak reported in stanfordnlp/dspy#9686: when a user's Pydantic output model has a field with json_schema_extra={"x-...": ...}, the vendor-extension keys are merged into the JSON schema as peer keys (not under a literal json_schema_extra key) and forwarded to the LM in two places:

  1. response_format produced by _get_structured_outputs_response_format (dspy/adapters/json_adapter.py)
  2. The schema embedded in the prompt text via _get_json_schematranslate_field_type (dspy/adapters/utils.py)

Strict-schema providers (AWS Bedrock Converse, OpenAI Structured Outputs) reject unknown peer keys ({"message":"For 'anyOf', 'x-comparison' is not supported"}), the adapter falls back to plain JSON mode, and predictions return {}.

The previous scrub:

for prop in schema.get("properties", {}).values():
    prop.pop("json_schema_extra", None)

was a no-op on Pydantic 2.x (verified on 2.12.4 and 2.13.3 — Pydantic does not emit a literal json_schema_extra key) and only inspected top-level properties, never recursing into $defs where nested user models live.

Change

  • New strip_vendor_extensions(schema) helper in dspy/adapters/utils.py that recursively removes any key starting with x- (the standardised OpenAPI / JSON Schema vendor-extension namespace) from every dict in the schema, including $defs, anyOf, oneOf, items, etc.
  • Applied at both leak sites (_get_structured_outputs_response_format and _get_json_schema).
  • Standard JSON Schema fields delivered via json_schema_extra (examples, format, pattern, minLength, minimum, …) are preserved so they continue to guide the LM.

No public API change.

Tests

Three regression tests added to tests/adapters/test_json_adapter.py:

  • test_json_adapter_strips_vendor_extensions_from_response_format — reproduces the issue's offline repro and asserts x-comparison is gone while examples survives.
  • test_translate_field_type_strips_vendor_extensions_from_prompt_schema — covers the prompt-text leak and asserts pattern is preserved.
  • test_strip_vendor_extensions_helper — unit test for the helper covering anyOf and $defs.

tests/adapters/ runs green (167 passed).

Why x-* specifically?

x- is the well-defined OpenAPI / JSON Schema convention for vendor-only, non-portable extensions. Anything else a user puts in json_schema_extra (examples, pattern, format, etc.) is standard JSON Schema and benefits the LM, so it must continue to flow through.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR fixes a real schema-leakage bug (stanfordnlp#9686) where Pydantic's json_schema_extra with x-* vendor-extension keys was forwarded verbatim to strict-schema providers (AWS Bedrock Converse, OpenAI Structured Outputs), causing provider rejections and silent {} predictions. The new strip_vendor_extensions helper in utils.py correctly scopes its recursion to known JSON Schema container keywords, preserving standard fields like examples and pattern while removing x-* keys. It is applied at both identified leak sites. Note: the analogous schema path in dspy/adapters/types/tool.py (model_json_schema() / TypeAdapter.json_schema() calls at lines 104–107) still does not call strip_vendor_extensions — that gap was already flagged in a prior review pass.

Confidence Score: 4/5

Safe to merge for the two target paths; the pre-existing tool-call schema path gap (already flagged) is not introduced by this PR.

The fix is correct and well-tested. Score is capped at 4 due to the known P1 gap in the tool-call schema path (dspy/adapters/types/tool.py) that shares the same failure mode and remains unaddressed — already documented in a prior review comment.

dspy/adapters/types/tool.py — the model_json_schema() / TypeAdapter.json_schema() calls at lines 104–107 do not call strip_vendor_extensions.

Important Files Changed

Filename Overview
dspy/adapters/utils.py Adds strip_vendor_extensions with well-scoped recursion over JSON Schema container keywords; applied in _get_json_schema before move_type_to_front.
dspy/adapters/json_adapter.py Replaces the previous no-op json_schema_extra pop with strip_vendor_extensions(schema) call on the full model schema in _get_structured_outputs_response_format.
tests/adapters/test_json_adapter.py Three focused regression tests covering both leak sites and the helper directly, including the important assertion that user data inside examples is untouched.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Pydantic model with json_schema_extra x-* fields] --> B[model_json_schema / TypeAdapter.json_schema]
    B --> C{Which path?}
    C -->|response_format path| D["_get_structured_outputs_response_format\n(json_adapter.py)"]
    C -->|prompt-text path| E["_get_json_schema → translate_field_type\n(utils.py)"]
    D --> F[strip_vendor_extensions schema]
    E --> F
    F --> G[Recurse into $defs, properties, anyOf/oneOf/allOf, items, etc.]
    G --> H[Remove x-* keys at each schema node]
    G --> I[Leave examples/default/const/enum values untouched]
    H --> J[Clean schema forwarded to LM]
    I --> J
    J --> K{Provider}
    K -->|Strict-schema AWS Bedrock / OpenAI Structured Outputs| L[Accepted — no unknown peer keys]
    K -->|Plain JSON mode| M[Accepted — x-* gone from prompt text]
Loading

Reviews (2): Last reviewed commit: "fix(json_adapter): strip OpenAPI x-* ven..." | Re-trigger Greptile

Comment thread dspy/adapters/utils.py Outdated
Pydantic merges json_schema_extra entries into the generated schema as peer
keys, not under a literal json_schema_extra key. The previous scrub in
_get_structured_outputs_response_format was therefore a no-op on Pydantic 2.x
and only walked top-level properties (missing nested user models in $defs).

Replace it with a recursive strip_vendor_extensions helper that removes any
key starting with 'x-' (the standard OpenAPI/JSON Schema vendor-extension
namespace) at every depth, and apply it both to the schema sent as
response_format and to the schema embedded in the prompt via _get_json_schema.

Standard JSON Schema fields supplied via json_schema_extra (examples, format,
pattern, minLength, etc.) are preserved, so they continue to guide the LM.

Fixes the AWS Bedrock Converse 400 reported in stanfordnlp#9686.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@isaacbmiller isaacbmiller force-pushed the fix/json-adapter-vendor-extensions branch from 6ff2b38 to 7f56a30 Compare April 28, 2026 13:43
@shabie
Copy link
Copy Markdown

shabie commented Apr 28, 2026

lol I was hoping to make this PR but okay ;)

@isaacbmiller
Copy link
Copy Markdown
Author

lol feel free @shabie

I havent used the x- fields too much and wanted to do a bit of investigation. Feel free to take this code (i havent reviewed it closely) or copy

@shabie
Copy link
Copy Markdown

shabie commented Apr 28, 2026

Alright. Then I'll do that. Thank you!

Yeah I am working on a concrete problem where this showed up. My PR is very similar. I'll look at yours too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants