generated from runpod-workers/worker-template
-
Notifications
You must be signed in to change notification settings - Fork 255
Open
Description
Both vLLM and OpenAI documentation talk about how to use the vLLM support of the responses API however I already faced an error trying to connect a client to my runpod serverless because the worker doesn't support responses, check the documentation found below
Source: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Fragment:
Usage
Once the vllm serve runs and INFO: Application startup complete has been displayed, you can send requests using HTTP request or OpenAI SDK to the following endpoints:
/v1/responses endpoint can perform tool use (browsing, python, mcp) in between chain-of-thought and deliver a final response. This endpoint leverages the openai-harmony library for input rendering and output parsing. Stateful operation and full streaming API are work in progress. Responses API is recommended by OpenAI as the way to interact with this model.
Source: https://cookbook.openai.com/articles/gpt-oss/run-vllm
Fragment:
Create a model response
post https://api.openai.com/v1/responses
Creates a model response. Provide text or image inputs to generate text or JSON outputs. Have the model call your own custom code or use built-in tools like web search or file search to use your own data as input for the model's response.
bigshishiga and generalsvr
Metadata
Metadata
Assignees
Labels
No labels