-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Refactor Agents API to use FastAPI Router #4376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Refactor Agents API to use FastAPI Router #4376
Conversation
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-node studio · code · diff
✅ llama-stack-client-kotlin studio · code · diff
✅ llama-stack-client-go studio · code · diff
⏳ These are partial results; builds are still running. This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
5efea81 to
3550616
Compare
36799de to
73a899e
Compare
|
@leseb please take a look. thanks. |
|
@skamenan7 please resolve the conflicts |
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
73a899e to
0bd5296
Compare
7babe07 to
cf2958d
Compare
|
@skamenan7 many failures in tests. |
cf2958d to
6fdaa15
Compare
814cb77 to
810fdad
Compare
4348bb0 to
0ceff95
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
1f48db1 to
6d599dc
Compare
f615d86 to
16620db
Compare
|
I cleaned this PR up into 4 commits so it’s easy to review end-to-end.
|
16620db to
773d567
Compare
98821bc to
1d39cfd
Compare
Move the Agents API surface to a router-based implementation and replace the legacy single-file agents module with a package layout. This includes wiring the Agents router into the router registry and updating imports/tests accordingly.
Handle FastAPI StreamingResponse results for streaming endpoints so the library client can iterate and forward SSE chunks correctly.
Telemetry collection is disabled in integration tests (llamastack#4089), but docker-mode runs can still spin up OTEL exporters that attempt network connections and make CI flaky. Disable the OTEL SDK in the container to avoid spurious failures.
When /v1/responses moved to the Agents router, the request body schema appears as CreateResponseRequest.input and includes a duplicated OpenAIResponseMessage-Input variant inside nested unions. Stainless codegen treats this as separate generated types, causing Go name clashes and Python duplicate declarations. Apply a combined-spec-only OpenAPI transform to drop redundant sibling $ref entries and regenerate the Stainless artifacts.
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…raint - Add # allow-direct-logging comment since llama_stack_api cannot import from llama_stack - Remove ge=1 from max_tool_calls to preserve existing error message format expected by integration tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1d39cfd to
b448c27
Compare
Summary
This PR migrates the Agents API from the legacy
@webmethoddecorator pattern to the new FastAPI Router pattern, following the established Benchmarks API migration as a reference.Changes
New Files
src/llama_stack_api/agents/__init__.py- Module exports with docstringsrc/llama_stack_api/agents/api.py- Protocol definitionsrc/llama_stack_api/agents/models.py- Pydantic request modelssrc/llama_stack_api/agents/fastapi_routes.py- FastAPI router implementationModified Files
src/llama_stack_api/__init__.py- Export request models from main packagesrc/llama_stack/providers/inline/agents/meta_reference/agents.py- Updated to accept request model inputssrc/llama_stack/core/server/fastapi_router_registry.py- Register agents routerKey Implementation Details
Request Models: All 5 API operations now use Pydantic request models:
CreateResponseRequestRetrieveResponseRequestListResponsesRequestListResponseInputItemsRequestDeleteResponseRequestOpenAPI Spec: Properly documents POST
/v1/responseswith:application/jsonandtext/event-streamcontent typesTesting
Test case: Create Response (Non-Streaming)
Request:
{ "model": "ollama/llama3.2:3b-instruct-fp16", "input": "Tell me a one-liner joke.", "stream": false }Response (200 OK):
{ "id": "resp_ef4879f3-f12d-421a-84ac-d461c68e2767", "object": "response", "status": "completed", "output": [{ "type": "message", "role": "assistant", "content": [{ "type": "output_text", "text": "A man walked into a library and asked the librarian..." }] }], "usage": {"input_tokens": 33, "output_tokens": 54, "total_tokens": 87} }Test case: Create Response (Streaming)
Request:
{ "model": "ollama/llama3.2:3b-instruct-fp16", "input": "Count 1 to 3.", "stream": true }Response (SSE Stream):
Test case: Tool Calling
Request:
{ "model": "ollama/llama3.2:3b-instruct-fp16", "input": "What is the weather in San Francisco?", "tools": [{ "type": "function", "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} }] }Response (200 OK):
{ "output": [{ "type": "function_call", "name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}" }] }Test case: Get Response
Request:
Response (200 OK):
{ "id": "resp_ef4879f3-f12d-421a-84ac-d461c68e2767", "object": "response", "status": "completed", "output": [{ "type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "A man walked into a library..."}] }] }Test case: List Responses
Request:
Response (200 OK):
{ "data": [ {"id": "resp_1cd72689-49f8-465e-9e3c-ac06ee3b3386", "object": "response", ...}, {"id": "resp_dfe88aaa-9550-48e3-a4f4-dc189c78dc7c", "object": "response", ...}, {"id": "resp_ef4879f3-f12d-421a-84ac-d461c68e2767", "object": "response", ...} ], "has_more": false, "object": "list" }Test case: List Input Items
Request:
Response (200 OK):
{ "data": [{ "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Tell me a one-liner joke."}] }], "object": "list" }Test case: Delete Response
Request:
Response (200 OK):
{ "id": "resp_ef4879f3-f12d-421a-84ac-d461c68e2767", "object": "response", "deleted": true }Verify deletion (GET returns 400):
{ detail: Response with id resp_ef4879f3-f12d-421a-84ac-d461c68e2767 not found }Related Issue
Closes #4336