Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 159 additions & 101 deletions docs/docs/providers/openai_responses_limitations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,52 +5,55 @@ sidebar_label: Limitations of Responses API
sidebar_position: 1
---

## Unresolved Issues
## Issues

This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`).
See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.

### Instructions
**Status:** Partial Implementation + Work in Progress

**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)

In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object.

---

### Streaming
### Web-search tool compatibility

**Status:** Partial Implementation

**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)
**Issue:** https://github.com/llamastack/llama-stack/issues/4442

Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing.
Llama Stack offers an OpenAI compatible Web Search tool. To have a feature complete implementation of Web Search that is compatible with what OpenAI's tool offers, the following features need to be implemented:

---
- [ ] Domain filtering: Restrict searching to a whitelisted subset of domains
- [ ] User location: Refine search results based on geography by specifying an approximate user location using country, city, region, and/or timezone
- [ ] Live internet access: Control whether the web search tool fetches live content or uses only cached/indexed results in the Responses API

### Prompt Templates
---

**Status:** Partial Implementation
## Implement Remaining Include flags

**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
**Status:** Partially Implemented

OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack.
**Issue:** https://github.com/llamastack/llama-stack/issues/4440

---
OpenAI allows you to return an optional subset of additional/power user data in their API response.
Llama Stack's API now supports these fields, but most of them are not implemented. Not all of them will
make sense to implement in llama stack, but this meta issue tracks which of them are implemented and which are not.

### Web-search tool compatibility
- [ ] `web_search_call.action.sources`
- [ ] `code_interpreter_call.outputs`
- [ ] `computer_call_output.output.image_url`
- [ ] `file_search_call.results`
- [ ] `message.input_image.image_url`
- [ ] `message.output_text.logprobs`
- [ ] `reasoning.encrypted_content`
- [x] `message.output_text.logprobs`

**Status:** Partial Implementation
---

Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says:
### Reasoning Content

> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.
**Status:** Not Implemented

Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.
**Issue:** [#4404](https://github.com/llamastack/llama-stack/issues/4404)

The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too.
Responses API allows you to preserve reasoning context between turns with the reasoning.encrypted_content include value.
The field exists as a no-op right now, and needs to be wired up to providers.

---

Expand All @@ -77,53 +80,13 @@ Response branching, as discussed in the [Agents vs OpenAI Responses API document

---

### Include

**Status:** Not Implemented

The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter.

- `web_search_call.action.sources`
- `code_interpreter_call.outputs`
- `computer_call_output.output.image_url`
- `file_search_call.results`
- `message.input_image.image_url`
- `message.output_text.logprobs`
- `reasoning.encrypted_content`

Some of these are not relevant to Llama Stack in its current form. For example, code interpreter is not implemented (see "Built-in tools" below), so `code_interpreter_call.outputs` would not be a useful directive to Llama Stack.

However, others might be useful. For example, `message.output_text.logprobs` can be useful for assessing how confident a model is in each token of its output.

---

### Tool Choice

**Status:** Not Implemented

**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)

In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. This feature is not implemented in Llama Stack.

---

### Safety Identification and Tracking

**Status:** Not Implemented

OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.

---

### Connectors
**Issue:** [#4381](https://github.com/llamastack/llama-stack/issues/4381)

**Status:** Not Implemented

Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).

**Open Questions:**
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.

---

Expand All @@ -145,27 +108,6 @@ Responses has a field `service_tier` that can be used to prioritize access to in

---

### Top Logprobs

**Status:** Not Implemented

**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)

The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
It enables users to also get logprobs for alternative tokens.

---

### Max Tool Calls

**Status:** Not Implemented

**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)

The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation.

---

### Max Output Tokens

**Status:** Not Implemented
Expand All @@ -186,16 +128,6 @@ The return object from a call to Responses includes a field for indicating why a

---

### Metadata

**Status:** Not Implemented

**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)

Metadata allows you to attach additional information to a response for your own reference and tracking. It is not implemented in Llama Stack.

---

### Background

**Status:** Not Implemented
Expand Down Expand Up @@ -249,6 +181,8 @@ Sampling allows MCP tools to query the generative AI model. See the [MCP specifi
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
- Does this work in Llama Stack?

---

### Prompt Caching

**Status:** Unknown
Expand All @@ -262,15 +196,108 @@ OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/promp

---

## Coming Soon

### Parallel Tool Calls

**Status:** Rumored Issue
**Status:** In Progress

Align Llama Stack Responses Paralell tool calls behavior with OpenAI and harden the implementation with tests.

---

### Connectors

**Status:** In Progress

**Issue:** [#4061](https://github.com/llamastack/llama-stack/issues/4061)

There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed.
Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).

**Open Questions:**
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?

---

## Resolved Issues
### Top Logprobs

**Status:** In Progress

**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)

The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
It enables users to also get logprobs for alternative tokens.

---

### Web Search API Arguments

**Status:** Merged [Planed 0.4.z]

**Issue:** https://github.com/llamastack/llama-stack/issues/4102

Llama Stack supports `web_search`, `web_search_preview`, `web_search_preview_2025_03_1`, and now `web_search_2025_08_26` for
the built in web search tool.

---

### Server Side Telemetry

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3806](https://github.com/llamastack/llama-stack/issues/3806)

Support OpenTelemetry as the preferred way to instrument Llama Stack.

**Remaining Issues:**
- Some data needs to be converted to follow semantic conventions for OTEL genai data

---

### Include

**Status:** Merged [Planned 0.4.z]

The `include` parameter allows you to provide a list of flags to include specific categories of additional information in the response payload.
The API now supportes the same fields as the [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create). Llama stack will no
longer raise an HTTP error when these fields are passed, however most of them are not yet implemented. Along with this change,
`message.output_text.logprobs` was implemented fully, allowing you to get logprobs data from your inference server provider.


---

### Max Tool Calls

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)

The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response.

---

### Metadata

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)

Metadata allows you to attach additional information to a response for your own reference and tracking.

---

### Tool Choice

**Status:** Merged [Planned 0.4.z]

**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)

In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response.

---

## Fixed

The following limitations have been addressed in recent releases:

Expand All @@ -297,3 +324,34 @@ The `require_approval` parameter for MCP tools in the Responses API now works co
**Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API)

MCP tools now correctly handle array-type arguments in both the Agent API and Responses API.

---

### Streaming

**Status:** ✅ Resolved

**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)

Streaming functionality for the Responses API is feature complete and released.

---

### Prompt Templates

**Status:** ✅ Resolved

**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)

OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing.

---

### Instructions
**Status:** ✅ Resolved

**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)

The Responses API request and response object now supports the *instructions* field.

---