Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit d5231eb

Browse files
Merge pull request #1749 from janhq/chore/update-structured-output-documentation
Chore/update structured output documentation
2 parents 3bdf8fa + 208c3f5 commit d5231eb

File tree

1 file changed

+102
-162
lines changed

1 file changed

+102
-162
lines changed

docs/docs/guides/structured-outputs.md

Lines changed: 102 additions & 162 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,68 @@ title: Structured Outputs
55

66
Structured outputs, or response formats, are a feature designed to generate responses in a defined JSON schema, enabling more predictable and machine-readable outputs. This is essential for applications where data consistency and format adherence are crucial, such as automated data processing, structured data generation, and integrations with other systems.
77

8-
In recent developments, systems like OpenAI's models have excelled at producing these structured outputs. However, while open-source models like Llama 3.1 and Mistral Nemo offer powerful capabilities, they currently struggle to produce reliably structured JSON outputs required for advanced use cases. This often stems from the models not being specifically trained on tasks demanding strict schema adherence.
8+
In recent developments, systems like OpenAI's models have excelled at producing these structured outputs. However, while open-source models like Llama 3.1 and Mistral Nemo offer powerful capabilities, they currently struggle to produce reliably structured JSON outputs required for advanced use cases.
99

1010
This guide explores the concept of structured outputs using these models, highlights the challenges faced in achieving consistent output formatting, and provides strategies for improving output accuracy, particularly when using models that don't inherently support this feature as robustly as GPT models.
1111

1212
By understanding these nuances, users can make informed decisions when choosing models for tasks requiring structured outputs, ensuring that the tools they select align with their project's formatting requirements and expected accuracy.
1313

14-
The Structured Outputs/Response Format feature in [OpenAI](https://platform.openai.com/docs/guides/structured-outputs) is fundamentally a prompt engineering challenge. While its goal is to use system prompts to generate JSON output matching a specific schema, popular open-source models like Llama 3.1 and Mistral Nemo struggle to consistently generate exact JSON output that matches the requirements. An easy way to directly guild the model to reponse in json format in system message:
14+
The Structured Outputs/Response Format feature in [OpenAI](https://platform.openai.com/docs/guides/structured-outputs) is fundamentally a prompt engineering challenge. While its goal is to use system prompts to generate JSON output matching a specific schema, popular open-source models like Llama 3.1 and Mistral Nemo struggle to consistently generate exact JSON output that matches the requirements. An easy way to directly guild the model to reponse in json format in system message, you just need to pass the pydantic model to `response_format`:
1515

1616
```
17+
from pydantic import BaseModel
18+
from openai import OpenAI
19+
import json
20+
ENDPOINT = "http://localhost:39281/v1"
21+
MODEL = "llama3.1:8b-gguf-q4-km"
22+
23+
client = OpenAI(
24+
base_url=ENDPOINT,
25+
api_key="not-needed"
26+
)
27+
28+
29+
class CalendarEvent(BaseModel):
30+
name: str
31+
date: str
32+
participants: list[str]
33+
34+
35+
completion = client.beta.chat.completions.parse(
36+
model=MODEL,
37+
messages=[
38+
{"role": "system", "content": "Extract the event information."},
39+
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
40+
],
41+
response_format=CalendarEvent,
42+
stop=["<|eot_id|>"]
43+
)
44+
45+
event = completion.choices[0].message.parsed
46+
47+
print(json.dumps(event.dict(), indent=4))
48+
```
49+
50+
The output of the model like this
51+
52+
```
53+
{
54+
"name": "science fair",
55+
"date": "Friday",
56+
"participants": [
57+
"Alice",
58+
"Bob"
59+
]
60+
}
61+
```
62+
63+
With more complex json format, llama3.1 still struggle to response correct answer:
64+
65+
```
66+
1767
from openai import OpenAI
1868
from pydantic import BaseModel
69+
import json
1970
ENDPOINT = "http://localhost:39281/v1"
2071
MODEL = "llama3.1:8b-gguf-q4-km"
2172
client = OpenAI(
@@ -39,203 +90,92 @@ completion_payload = {
3990
]
4091
}
4192
42-
response = client.chat.completions.create(
93+
94+
class Step(BaseModel):
95+
explanation: str
96+
output: str
97+
98+
99+
class MathReasoning(BaseModel):
100+
steps: list[Step]
101+
final_answer: str
102+
103+
104+
response = client.beta.chat.completions.parse(
43105
top_p=0.9,
44106
temperature=0.6,
45107
model=MODEL,
46-
messages=completion_payload["messages"]
108+
messages=completion_payload["messages"],
109+
stop=["<|eot_id|>"],
110+
response_format=MathReasoning
47111
)
48112
49-
print(response)
113+
math_reasoning = response.choices[0].message.parsed
114+
print(json.dumps(math_reasoning.dict(), indent=4))
50115
```
51116

52-
The output of the model like this
117+
The output of model looks like this
53118

54119
```
55-
56-
ChatCompletion(
57-
id='OZI0q8hghjYQY7NXlLId',
58-
choices=[
59-
Choice(
60-
finish_reason=None,
61-
index=0,
62-
logprobs=None,
63-
message=ChatCompletionMessage(
64-
content='''Here's how you can solve it:
65-
66120
{
67121
"steps": [
68122
{
69-
"explanation": "First, we need to isolate the variable x. To do this, subtract 7 from both sides of the equation.",
123+
"explanation": "To isolate the variable x, we need to get rid of the constant term on the left-hand side. We can do this by subtracting 7 from both sides of the equation.",
70124
"output": "8x + 7 - 7 = -23 - 7"
71125
},
72126
{
73-
"explanation": "This simplifies to 8x = -30",
127+
"explanation": "Simplifying the left-hand side, we get:",
74128
"output": "8x = -30"
75129
},
76130
{
77-
"explanation": "Next, divide both sides of the equation by 8 to solve for x.",
78-
"output": "(8x) / 8 = -30 / 8"
131+
"explanation": "Now, to solve for x, we need to isolate it by dividing both sides of the equation by 8.",
132+
"output": "8x / 8 = -30 / 8"
79133
},
80134
{
81-
"explanation": "This simplifies to x = -3.75",
135+
"explanation": "Simplifying the right-hand side, we get:",
82136
"output": "x = -3.75"
83137
}
84138
],
85-
"final_output": "-3.75"
86-
}''',
87-
refusal=None,
88-
role='assistant',
89-
audio=None,
90-
function_call=None,
91-
tool_calls=None
92-
)
93-
)
94-
],
95-
created=1730645716,
96-
model='_',
97-
object='chat.completion',
98-
service_tier=None,
99-
system_fingerprint='_',
100-
usage=CompletionUsage(
101-
completion_tokens=190,
102-
prompt_tokens=78,
103-
total_tokens=268,
104-
completion_tokens_details=None,
105-
prompt_tokens_details=None
106-
)
107-
)
139+
"final_answer": "There is no final answer yet, let's break it down step by step."
140+
}
108141
```
109142

110-
From the output, you can easily parse the response to get correct json format as you guild the model in the system prompt.
143+
Even if the model can generate correct format but the information doesn't 100% accurate, the `final_answer` should be `-3.75` instead of `There is no final answer yet, let's break it down step by step.`.
111144

112-
Howerver, open source model like llama3.1 or mistral nemo still truggling on mimic newest OpenAI API on response format. For example, consider this request created using the OpenAI library with very simple request like [OpenAI](https://platform.openai.com/docs/guides/structured-outputs#chain-of-thought):
145+
Another usecase for structured output with json response, you can provide the `response_format={"type" : "json_object"}`, the model will be force to generate json output.
113146

114147
```
115-
from openai import OpenAI
116-
ENDPOINT = "http://localhost:39281/v1"
117-
MODEL = "llama3.1:8b-gguf-q4-km"
118-
client = OpenAI(
119-
base_url=ENDPOINT,
120-
api_key="not-needed"
121-
)
122-
123-
class Step(BaseModel):
124-
explanation: str
125-
output: str
126-
127-
128-
class MathReasoning(BaseModel):
129-
steps: List[Step]
130-
final_answer: str
131-
132-
133-
completion_payload = {
134-
"messages": [
135-
{"role": "system", "content": f"You are a helpful math tutor. Guide the user through the solution step by step.\n"},
136-
{"role": "user", "content": "how can I solve 8x + 7 = -23"}
137-
]
138-
}
139-
140-
response = client.beta.chat.completions.parse(
141-
top_p=0.9,
142-
temperature=0.6,
148+
json_format = {"song_name":"release date"}
149+
completion = client.chat.completions.create(
143150
model=MODEL,
144-
messages= completion_payload["messages"],
145-
response_format=MathReasoning
151+
messages=[
152+
{"role": "system", "content": f"You are a helpful assistant, you must reponse with this format: '{json_format}'"},
153+
{"role": "user", "content": "List 10 songs for me"}
154+
],
155+
response_format={"type": "json_object"},
156+
stop=["<|eot_id|>"]
146157
)
147-
```
148-
149-
The response format parsed by OpenAI before sending to the server is quite complex for the `MathReasoning` schema. Unlike GPT models, Llama 3.1 and Mistral Nemo cannot reliably generate responses that can be parsed as shown in the [OpenAI tutorial](https://platform.openai.com/docs/guides/structured-outputs/example-response). This may be due to these models not being trained on similar structured output tasks.
150158
151-
```
152-
"response_format" :
153-
{
154-
"json_schema" :
155-
{
156-
"name" : "MathReasoning",
157-
"schema" :
158-
{
159-
"$defs" :
160-
{
161-
"Step" :
162-
{
163-
"additionalProperties" : false,
164-
"properties" :
165-
{
166-
"explanation" :
167-
{
168-
"title" : "Explanation",
169-
"type" : "string"
170-
},
171-
"output" :
172-
{
173-
"title" : "Output",
174-
"type" : "string"
175-
}
176-
},
177-
"required" :
178-
[
179-
"explanation",
180-
"output"
181-
],
182-
"title" : "Step",
183-
"type" : "object"
184-
}
185-
},
186-
"additionalProperties" : false,
187-
"properties" :
188-
{
189-
"final_answer" :
190-
{
191-
"title" : "Final Answer",
192-
"type" : "string"
193-
},
194-
"steps" :
195-
{
196-
"items" :
197-
{
198-
"$ref" : "#/$defs/Step"
199-
},
200-
"title" : "Steps",
201-
"type" : "array"
202-
}
203-
},
204-
"required" :
205-
[
206-
"steps",
207-
"final_answer"
208-
],
209-
"title" : "MathReasoning",
210-
"type" : "object"
211-
},
212-
"strict" : true
213-
},
214-
"type" : "json_schema"
215-
}
159+
print(json.dumps(json.loads(completion.choices[0].message.content), indent=4))
216160
```
217161

218-
The response for this request by `mistral-nemo` and `llama3.1` can not be used to parse result like in the [original tutorial by openAI](https://platform.openai.com/docs/guides/structured-outputs/example-response). Maybe `llama3.1` and `mistral-nemo` didn't train with this kind of data, so it fails to handle this case.
162+
The output will looks like this:
219163

220164
```
221-
Response: {
222-
"choices" :
223-
[
224-
{
225-
"finish_reason" : null,
226-
"index" : 0,
227-
"message" :
228-
{
229-
"content" : "Here's a step-by-step guide to solving the equation 8x + 7 = -23:\n\n```json\n{\n \"name\": \"MathReasoning\",\n \"schema\": {\n \"$defs\": {\n \"Step\": {\n \"additionalProperties\": false,\n \"properties\": {\n \"explanation\": {\"title\": \"Explanation\", \"type\": \"string\"},\n \"output\": {\"title\": \"Output\", \"type\": \"string\"}\n },\n \"required\": [\"explanation\", \"output\"],\n \"title\": \"Step\",\n \"type\": \"object\"\n }\n },\n \"additionalProperties\": false,\n \"properties\": {\n \"final_answer\": {\"title\": \"Final Answer\", \"type\": \"string\"},\n \"steps\": {\n \"items\": {\"$ref\": \"#/$defs/Step\"},\n \"title\": \"Steps\",\n \"type\": \"array\"\n }\n },\n \"required\": [\"steps\", \"final_answer\"],\n \"title\": \"MathReasoning\",\n \"type\": \"object\"\n },\n \"strict\": true\n}\n```\n\n1. **Subtract 7 from both sides** to isolate the term with x:\n\n - Explanation: To get rid of the +7 on the left side, we add -7 to both sides of the equation.\n - Output: `8x + 7 - 7 = -23 - 7`\n\n This simplifies to:\n ```\n 8x = -30\n ```\n\n2. **Divide both sides by 8** to solve for x:\n\n - Explanation: To get rid of the 8 on the left side, we multiply both sides of the equation by the reciprocal of 8, which is 1/8.\n - Output: `8x / 8 = -30 / 8`\n\n This simplifies to:\n ```\n x = -3.75\n ```\n\nSo, the final answer is:\n\n- Final Answer: `x = -3.75`",
230-
"role" : "assistant"
231-
}
232-
}
233-
],
165+
{
166+
"Happy": "2013",
167+
"Uptown Funk": "2014",
168+
"Shut Up and Dance": "2014",
169+
"Can't Stop the Feeling!": "2016",
170+
"We Found Love": "2011",
171+
"All About That Bass": "2014",
172+
"Radioactive": "2012",
173+
"SexyBack": "2006",
174+
"Crazy": "2007",
175+
"Viva la Vida": "2008"
176+
}
234177
```
235178

236-
237-
238-
239179
## Limitations of Open-Source Models for Structured Outputs
240180

241181
While the concept of structured outputs is compelling, particularly for applications requiring machine-readable data, it's important to understand that not all models support this capability equally. Open-source models such as Llama 3.1 and Mistral Nemo face notable challenges in generating outputs that adhere strictly to defined JSON schemas. Here are the key limitations:

0 commit comments

Comments
 (0)