Skip to content

Commit 42366ff

Browse files
committed
fix: align structured outputs API with vLLM's offline inference pattern
Previously, structured outputs support was incorrectly implemented by extracting OpenAI-specific `extra_body.structured_outputs` from the RunPod API's `sampling_params` dictionary. This mixed two different API patterns: 1. The OpenAI API uses `extra_body` as a catch-all for vLLM-specific parameters that aren't part of the OpenAI spec 2. The RunPod API uses `sampling_params` as a direct pass-through to vLLM's `SamplingParams`, where all parameters should be at the same level The RunPod API should be a direct 1:1 mapping to vLLM's offline inference API, not an abstraction layer. Previously, structured outputs support incorrectly mixed OpenAI-specific patterns (`extra_body`) into the RunPod API. Changes: - Remove OpenAI `extra_body` extraction from RunPod API path - The OpenAI route already handles structured outputs correctly through vLLM's OpenAI serving code - RunPod API should only use `sampling_params` like all other parameters - Simplify to direct extraction from `sampling_params.structured_outputs` - Convert dict to `StructuredOutputsParams(**config)` directly - Set on `SamplingParams` exactly as vLLM expects - Update README to emphasize direct mapping to vLLM's API The new implementation allows users to pass structured outputs exactly as they would in vLLM's Python API: ```json { "sampling_params": { "max_tokens": 128, "structured_outputs": { "json": {"type": "object", "properties": {...}}, "regex": "[A-Z]+", "choice": ["Positive", "Negative"] } } } ``` This directly maps to `SamplingParams(structured_outputs=StructuredOutputsParams(...))` and maintains consistency with vLLM's documented offline inference patterns. Users familiar with vLLM can use our API without learning new concepts. When vLLM adds new structured output features, they automatically work without code changes since we're just passing through the same parameters.
1 parent c41f852 commit 42366ff

File tree

2 files changed

+42
-41
lines changed

2 files changed

+42
-41
lines changed

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,7 @@ Additional parameters supported by vLLM:
244244
| `stop_token_ids` | Optional[List[int]] | list | List of tokens that stop the generation when they are generated. The returned output will contain the stop tokens unless the stop tokens are special tokens. |
245245
| `skip_special_tokens` | Optional[bool] | True | Whether to skip special tokens in the output. |
246246
| `spaces_between_special_tokens`| Optional[bool] | True | Whether to add spaces between special tokens in the output. Defaults to True. |
247+
| `structured_outputs` | Optional[dict] | None | Constrains generations to JSON schemas, regexes, grammar, etc. See [Structured Outputs](https://docs.vllm.ai/en/latest/features/structured_outputs/). |
247248
| `add_generation_prompt` | Optional[bool] | True | Read more [here](https://huggingface.co/docs/transformers/main/en/chat_templating#what-are-generation-prompts) |
248249
| `echo` | Optional[bool] | False | Echo back the prompt in addition to the completion |
249250
| `repetition_penalty` | Optional[float] | 1.0 | Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens. |
@@ -416,4 +417,33 @@ You may either use a `prompt` or a list of `messages` as input.
416417
}
417418
```
418419

420+
#### Structured Outputs
421+
422+
The RunPod API mirrors vLLM's offline inference API directly. To enforce JSON schemas, regexes, grammar rules, or structural tags, provide a `structured_outputs` object directly inside `sampling_params`. The structure matches the `SamplingParams(structured_outputs=StructuredOutputsParams(...))` API from vLLM (`json`, `regex`, `choice`, `grammar`, `structural_tag`, etc.). Example enforcing a JSON schema:
423+
424+
```json
425+
{
426+
"input": {
427+
"messages": [
428+
{"role": "user", "content": "Return a JSON document with name and age"}
429+
],
430+
"sampling_params": {
431+
"max_tokens": 128,
432+
"structured_outputs": {
433+
"json": {
434+
"type": "object",
435+
"properties": {
436+
"name": {"type": "string"},
437+
"age": {"type": "integer"}
438+
},
439+
"required": ["name", "age"]
440+
}
441+
}
442+
}
443+
}
444+
}
445+
```
446+
447+
For all supported structured output types and usage patterns, refer to the vLLM [Structured Outputs guide](https://docs.vllm.ai/en/v0.11.1.1/features/structured_outputs/).
448+
419449
</details>

src/utils.py

Lines changed: 12 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,15 @@
44
from functools import wraps
55
from time import time
66
from vllm.entrypoints.openai.protocol import RequestResponseMetadata
7-
from vllm.sampling_params import StructuredOutputsParams
87

98
try:
109
from vllm.utils import random_uuid
1110
from vllm.entrypoints.openai.protocol import ErrorResponse
1211
from vllm import SamplingParams
12+
from vllm.sampling_params import StructuredOutputsParams
1313
except ImportError:
1414
logging.warning("Error importing vllm, skipping related imports. This is ONLY expected when baking model into docker image from a machine without GPUs")
15+
StructuredOutputsParams = None
1516
pass
1617

1718
logging.basicConfig(level=logging.INFO)
@@ -50,47 +51,17 @@ def __init__(self, job):
5051
self.max_batch_size = job.get("max_batch_size")
5152
self.apply_chat_template = job.get("apply_chat_template", False)
5253
self.use_openai_format = job.get("use_openai_format", False)
53-
samp_param = job.get("sampling_params", {})
54-
55-
# Reject deprecated old API format (top-level guided_json parameter)
56-
# worker-vllm v2.9.5+ updated to vLLM 0.11.0+, which uses
57-
# OpenAI-compatible extra_body.structured_outputs format
58-
if job.get("guided_json") is not None:
59-
raise ValueError(
60-
"The 'guided_json' parameter is deprecated in vLLM 0.11.0+. "
61-
"Please use 'structured_outputs' instead. "
62-
"See: https://docs.vllm.ai/en/v0.11.0/features/structured_outputs.html"
63-
)
64-
65-
# Extract extra_body (for new structured_outputs API) from sampling_params
66-
extra_body = samp_param.pop("extra_body", None)
67-
if extra_body and "structured_outputs" in extra_body:
68-
structured_outputs = extra_body["structured_outputs"]
69-
70-
# Create StructuredOutputsParams instance
71-
if "json" in structured_outputs:
72-
samp_param["structured_outputs"] = StructuredOutputsParams(
73-
json=structured_outputs["json"]
54+
samp_param = job.get("sampling_params", {}) or {}
55+
56+
# Convert structured_outputs dict to StructuredOutputsParams if present
57+
if "structured_outputs" in samp_param and StructuredOutputsParams is not None:
58+
so_value = samp_param["structured_outputs"]
59+
if isinstance(so_value, dict):
60+
samp_param["structured_outputs"] = StructuredOutputsParams(**so_value)
61+
elif not isinstance(so_value, StructuredOutputsParams):
62+
raise TypeError(
63+
"structured_outputs must be a dict or StructuredOutputsParams instance"
7464
)
75-
elif "regex" in structured_outputs:
76-
samp_param["structured_outputs"] = StructuredOutputsParams(
77-
regex=structured_outputs["regex"]
78-
)
79-
elif "choice" in structured_outputs:
80-
samp_param["structured_outputs"] = StructuredOutputsParams(
81-
choice=structured_outputs["choice"]
82-
)
83-
elif "grammar" in structured_outputs:
84-
samp_param["structured_outputs"] = StructuredOutputsParams(
85-
grammar=structured_outputs["grammar"]
86-
)
87-
elif "structural_tag" in structured_outputs:
88-
samp_param["structured_outputs"] = StructuredOutputsParams(
89-
structural_tag=structured_outputs["structural_tag"]
90-
)
91-
92-
# Store for potential use in OpenAI-compatible API
93-
self.extra_body = extra_body
9465

9566
if "max_tokens" not in samp_param:
9667
samp_param["max_tokens"] = 100

0 commit comments

Comments
 (0)