Skip to content

Preserve Original Prompt Structure When Constructing CAPI Responses API Input Messages #318813

@zelinms

Description

@zelinms

In CAPI Responses API, when constructing input messages of the next request, we should preserve the original structure and ordering of the model output as much as possible. This would improve cache hit rates by keeping repeated prompt prefixes stable and efficient, and it would also make the prompt format closer to what the model saw during training, which should lead to more accurate behavior.

Currently, we observe two post-processing in vscode-copilot-chat extension:

  • Within the same turn, previous rounds can be transformed in a way that changes the original ordering of commentary and analysis. That is, the original order could be commentary-then-analysis, but the post-processing will convert them to analysis-then-commentary
    • This is the primary issue that affects model quality
  • Historical turns drop the model’s analysis / reasoning content
    • This is fine. We can discuss whether it should be kept or not

Ideally, input construction should avoid such reshaping of prior assistant output. Historical and previous-round content should remain as close as possible to the original model response, including preserving reasoning/analysis metadata when available and maintaining the original relative order between commentary, analysis, final text, and tool calls.

Notes

  • Each turn represents a user request. Each turn could contain multiple rounds (model requests).
  • We do not hit such commentary-analysis reordering issue when using AOAI BYOK, since AOAI BYOK Response API uses stateful response API calls while CAPI uses stateless calls.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions