parallel agentic calls not supported by server

Thanks for landing this in 0.7.0. I tested it on Jetson AGX Thor with nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 and single-stream throughput was excellent (~1.8× than same model in vllm), plus load times are much faster after first run.

Two observations from the quick tests I ran:

The server handles one request at a time. Sending two concurrently triggers an error, and the process needs a restart to recover.
Responses seem to return text only. The OpenAI-standard tool_calls field doesn't seem to be populated yet, I rely on that for agentic workflows.
Let me know if there's something I might be missing that could cause these problems if they are known and to be solved.
I can provide my setup if useful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel agentic calls not supported by server #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parallel agentic calls not supported by server #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions