|
1 | 1 | --- |
2 | 2 | title: Structured Outputs |
3 | | ---- |
| 3 | +--- |
| 4 | +# Structured Outputs |
| 5 | + |
| 6 | +The Structured Outputs/Response Format feature in [OpenAI](https://platform.openai.com/docs/guides/structured-outputs) is fundamentally a prompt engineering challenge. While its goal is to use system prompts to generate JSON output matching a specific schema, popular open-source models like Llama 3.1 and Mistral Nemo struggle to consistently generate exact JSON output that matches the requirements. An easy way to directly guild the model to reponse in json format in system message: |
| 7 | + |
| 8 | +``` |
| 9 | +from openai import OpenAI |
| 10 | +from pydantic import BaseModel |
| 11 | +ENDPOINT = "http://localhost:39281/v1" |
| 12 | +MODEL = "llama3.1:8b-gguf-q4-km" |
| 13 | +client = OpenAI( |
| 14 | + base_url=ENDPOINT, |
| 15 | + api_key="not-needed" |
| 16 | +) |
| 17 | +
|
| 18 | +format = { |
| 19 | + "steps": [{ |
| 20 | + "explanation": "string", |
| 21 | + "output": "string" |
| 22 | + } |
| 23 | + ], |
| 24 | + "final_output": "string" |
| 25 | +} |
| 26 | +
|
| 27 | +completion_payload = { |
| 28 | + "messages": [ |
| 29 | + {"role": "system", "content": f"You are a helpful math tutor. Guide the user through the solution step by step. You have to response in this json format {format}\n"}, |
| 30 | + {"role": "user", "content": "how can I solve 8x + 7 = -23"} |
| 31 | + ] |
| 32 | +} |
| 33 | +
|
| 34 | +response = client.chat.completions.create( |
| 35 | + top_p=0.9, |
| 36 | + temperature=0.6, |
| 37 | + model=MODEL, |
| 38 | + messages=completion_payload["messages"] |
| 39 | +) |
| 40 | +
|
| 41 | +print(response) |
| 42 | +``` |
| 43 | + |
| 44 | +The output of the model like this |
| 45 | + |
| 46 | +``` |
| 47 | +
|
| 48 | +ChatCompletion( |
| 49 | + id='OZI0q8hghjYQY7NXlLId', |
| 50 | + choices=[ |
| 51 | + Choice( |
| 52 | + finish_reason=None, |
| 53 | + index=0, |
| 54 | + logprobs=None, |
| 55 | + message=ChatCompletionMessage( |
| 56 | + content='''Here's how you can solve it: |
| 57 | +
|
| 58 | +{ |
| 59 | + "steps": [ |
| 60 | + { |
| 61 | + "explanation": "First, we need to isolate the variable x. To do this, subtract 7 from both sides of the equation.", |
| 62 | + "output": "8x + 7 - 7 = -23 - 7" |
| 63 | + }, |
| 64 | + { |
| 65 | + "explanation": "This simplifies to 8x = -30", |
| 66 | + "output": "8x = -30" |
| 67 | + }, |
| 68 | + { |
| 69 | + "explanation": "Next, divide both sides of the equation by 8 to solve for x.", |
| 70 | + "output": "(8x) / 8 = -30 / 8" |
| 71 | + }, |
| 72 | + { |
| 73 | + "explanation": "This simplifies to x = -3.75", |
| 74 | + "output": "x = -3.75" |
| 75 | + } |
| 76 | + ], |
| 77 | + "final_output": "-3.75" |
| 78 | +}''', |
| 79 | + refusal=None, |
| 80 | + role='assistant', |
| 81 | + audio=None, |
| 82 | + function_call=None, |
| 83 | + tool_calls=None |
| 84 | + ) |
| 85 | + ) |
| 86 | + ], |
| 87 | + created=1730645716, |
| 88 | + model='_', |
| 89 | + object='chat.completion', |
| 90 | + service_tier=None, |
| 91 | + system_fingerprint='_', |
| 92 | + usage=CompletionUsage( |
| 93 | + completion_tokens=190, |
| 94 | + prompt_tokens=78, |
| 95 | + total_tokens=268, |
| 96 | + completion_tokens_details=None, |
| 97 | + prompt_tokens_details=None |
| 98 | + ) |
| 99 | +) |
| 100 | +``` |
| 101 | + |
| 102 | +From the output, you can easily parse the response to get correct json format as you guild the model in the system prompt. |
| 103 | + |
| 104 | +Howerver, open source model like llama3.1 or mistral nemo still truggling on mimic newest OpenAI API on response format. For example, consider this request created using the OpenAI library with very simple request like [OpenAI](https://platform.openai.com/docs/guides/structured-outputs#chain-of-thought): |
| 105 | + |
| 106 | +``` |
| 107 | +from openai import OpenAI |
| 108 | +ENDPOINT = "http://localhost:39281/v1" |
| 109 | +MODEL = "llama3.1:8b-gguf-q4-km" |
| 110 | +client = OpenAI( |
| 111 | + base_url=ENDPOINT, |
| 112 | + api_key="not-needed" |
| 113 | +) |
| 114 | +
|
| 115 | +class Step(BaseModel): |
| 116 | + explanation: str |
| 117 | + output: str |
| 118 | +
|
| 119 | +
|
| 120 | +class MathReasoning(BaseModel): |
| 121 | + steps: List[Step] |
| 122 | + final_answer: str |
| 123 | +
|
| 124 | + |
| 125 | +completion_payload = { |
| 126 | + "messages": [ |
| 127 | + {"role": "system", "content": f"You are a helpful math tutor. Guide the user through the solution step by step.\n"}, |
| 128 | + {"role": "user", "content": "how can I solve 8x + 7 = -23"} |
| 129 | + ] |
| 130 | +} |
| 131 | +
|
| 132 | +response = client.beta.chat.completions.parse( |
| 133 | + top_p=0.9, |
| 134 | + temperature=0.6, |
| 135 | + model=MODEL, |
| 136 | + messages= completion_payload["messages"], |
| 137 | + response_format=MathReasoning |
| 138 | +) |
| 139 | +``` |
| 140 | + |
| 141 | +The response format parsed by OpenAI before sending to the server is quite complex for the `MathReasoning` schema. Unlike GPT models, Llama 3.1 and Mistral Nemo cannot reliably generate responses that can be parsed as shown in the [OpenAI tutorial](https://platform.openai.com/docs/guides/structured-outputs/example-response). This may be due to these models not being trained on similar structured output tasks. |
| 142 | + |
| 143 | +``` |
| 144 | +"response_format" : |
| 145 | + { |
| 146 | + "json_schema" : |
| 147 | + { |
| 148 | + "name" : "MathReasoning", |
| 149 | + "schema" : |
| 150 | + { |
| 151 | + "$defs" : |
| 152 | + { |
| 153 | + "Step" : |
| 154 | + { |
| 155 | + "additionalProperties" : false, |
| 156 | + "properties" : |
| 157 | + { |
| 158 | + "explanation" : |
| 159 | + { |
| 160 | + "title" : "Explanation", |
| 161 | + "type" : "string" |
| 162 | + }, |
| 163 | + "output" : |
| 164 | + { |
| 165 | + "title" : "Output", |
| 166 | + "type" : "string" |
| 167 | + } |
| 168 | + }, |
| 169 | + "required" : |
| 170 | + [ |
| 171 | + "explanation", |
| 172 | + "output" |
| 173 | + ], |
| 174 | + "title" : "Step", |
| 175 | + "type" : "object" |
| 176 | + } |
| 177 | + }, |
| 178 | + "additionalProperties" : false, |
| 179 | + "properties" : |
| 180 | + { |
| 181 | + "final_answer" : |
| 182 | + { |
| 183 | + "title" : "Final Answer", |
| 184 | + "type" : "string" |
| 185 | + }, |
| 186 | + "steps" : |
| 187 | + { |
| 188 | + "items" : |
| 189 | + { |
| 190 | + "$ref" : "#/$defs/Step" |
| 191 | + }, |
| 192 | + "title" : "Steps", |
| 193 | + "type" : "array" |
| 194 | + } |
| 195 | + }, |
| 196 | + "required" : |
| 197 | + [ |
| 198 | + "steps", |
| 199 | + "final_answer" |
| 200 | + ], |
| 201 | + "title" : "MathReasoning", |
| 202 | + "type" : "object" |
| 203 | + }, |
| 204 | + "strict" : true |
| 205 | + }, |
| 206 | + "type" : "json_schema" |
| 207 | + } |
| 208 | +``` |
| 209 | + |
| 210 | +The response for this request by `mistral-nemo` and `llama3.1` can not be used to parse result like in the [original tutorial by openAI](https://platform.openai.com/docs/guides/structured-outputs/example-response). Maybe `llama3.1` and `mistral-nemo` didn't train with this kind of data, so it fails to handle this case. |
| 211 | + |
| 212 | +``` |
| 213 | +Response: { |
| 214 | + "choices" : |
| 215 | + [ |
| 216 | + { |
| 217 | + "finish_reason" : null, |
| 218 | + "index" : 0, |
| 219 | + "message" : |
| 220 | + { |
| 221 | + "content" : "Here's a step-by-step guide to solving the equation 8x + 7 = -23:\n\n```json\n{\n \"name\": \"MathReasoning\",\n \"schema\": {\n \"$defs\": {\n \"Step\": {\n \"additionalProperties\": false,\n \"properties\": {\n \"explanation\": {\"title\": \"Explanation\", \"type\": \"string\"},\n \"output\": {\"title\": \"Output\", \"type\": \"string\"}\n },\n \"required\": [\"explanation\", \"output\"],\n \"title\": \"Step\",\n \"type\": \"object\"\n }\n },\n \"additionalProperties\": false,\n \"properties\": {\n \"final_answer\": {\"title\": \"Final Answer\", \"type\": \"string\"},\n \"steps\": {\n \"items\": {\"$ref\": \"#/$defs/Step\"},\n \"title\": \"Steps\",\n \"type\": \"array\"\n }\n },\n \"required\": [\"steps\", \"final_answer\"],\n \"title\": \"MathReasoning\",\n \"type\": \"object\"\n },\n \"strict\": true\n}\n```\n\n1. **Subtract 7 from both sides** to isolate the term with x:\n\n - Explanation: To get rid of the +7 on the left side, we add -7 to both sides of the equation.\n - Output: `8x + 7 - 7 = -23 - 7`\n\n This simplifies to:\n ```\n 8x = -30\n ```\n\n2. **Divide both sides by 8** to solve for x:\n\n - Explanation: To get rid of the 8 on the left side, we multiply both sides of the equation by the reciprocal of 8, which is 1/8.\n - Output: `8x / 8 = -30 / 8`\n\n This simplifies to:\n ```\n x = -3.75\n ```\n\nSo, the final answer is:\n\n- Final Answer: `x = -3.75`", |
| 222 | + "role" : "assistant" |
| 223 | + } |
| 224 | + } |
| 225 | + ], |
| 226 | +``` |
| 227 | + |
| 228 | +This feature currently works reliably only with GPT models, not with open-source models. Given these limitations, we suggest that you should only use Response Format feature as the first example (guild the json format for the reponse for model). Besides, the response format maybe just in beta because we have to use `client.beta.chat.completions.parse` to create chat completion instead of `client.chat.completion.create` |
0 commit comments