Voice Agents for Unity

Voice Agents for Unity is a package for building open-source AI voice agents that run fully locally with realtime latency.

It is built on top of:

Pipecat;
Local voice agents on MacOS with Pipecat;
WebRTC for Unity;
Open-source inference libraries such as vLLM; and,
Open-source models from the Hugging Face Hub.

You can use it to build intelligent non-player characters (NPCs), game interfaces, among many other applications.

It is currently a proof-of-concept and requires several improvements before it's ready for use in game development.

Instructions

Install package

First, install the package in your project along with the provided sample. Open the Package Manager and under the + dropdown select "Install package from git URL". Enter:

https://github.com/stefanwebb/unity-voice-agents.git

Confirm that the package is present in your Project window under Packages/com.stefanwebb.voiceagents and the sample under Samples/Example

Inference Server

Next, you need to launch a server for local LLM inference for use by the Agent. The inference server is called from Pipecat (architectural diagram coming soon!)

You can use any OpenAI Completions API compatible server, for example:

Choose the option that is most convenient for your platform and follow the instructions there to install and launch the server. It must be hosted at http://127.0.0.1:1234 since the configuration is currently hardcoded.

As I am on Mac, I'm using vLLM-MLX and my launch command is:

vllm-mlx serve mlx-community/Llama-3.2-3B-Instruct-4bit --port 1234

Pipecat Server

After that, you need to launch a Pipecat server, which is where the Agent "lives". Pipecat is a Python framework for building real-time voice and multimodal conversational agents.

Follow the instructions on the Pipecat website to install it (which requires that Python is installed first, of course).

This package provides an example Pipecat server that runs on Mac and the launch command, from the project folder, is:

uv run Packages/com.stefanwebb.voiceagents/Agent/agent.py

Sample Scene

With the inference and Pipecat servers running, open the test scene in the sample and run Play Mode. If everything is working correctly, your speech will be transcribed and displayed in the Game window, passed to the LLM, and its response displayed as well. The conversation history accumulates as you talk to the agent, so this is effectively a voice chatbot.

Limitations

To quickly develop a prototype, I have left the following limitations for future work:

In the provided Pipecat server, speech-to-text (STT) is "segmented" rather than streaming, which means a user's utterance isn't transcribed until the speaker has finished speaking. This gives the impression of a lower real-time latency.
In the provided server, there is no text-to-speech, interruption detection, or other voice agent components.
There is no signal to the user whether the connection to Pipecat is active or not so you have to read the debug console to know when it's ready for input.
A connection to Pipecat has to be re-established every time Play Mode is entered, which slows down development.
If disconnected from Pipecat in Play Mode, there is no way to re-connect without restarting Play Mode.
There is no way to pause the agent.
In the provided Pipecat server, the "agent" is just an LLM chatbot without tool calling, memory, planning, and so on.
The library momentarily hangs the main thread while connecting to Pipecat.
Parameters like the microphone index and server address are hardcoded.
Requires small modifications to Pipecat server to work on Windows and Linux.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Agent		Agent
Documentation		Documentation
Editor		Editor
Runtime		Runtime
Samples		Samples
Tests		Tests
.gitignore		.gitignore
Agent.meta		Agent.meta
CHANGELOG.md		CHANGELOG.md
CHANGELOG.md.meta		CHANGELOG.md.meta
Documentation.meta		Documentation.meta
Editor.meta		Editor.meta
LICENSE.md		LICENSE.md
LICENSE.md.meta		LICENSE.md.meta
README.md		README.md
README.md.meta		README.md.meta
Runtime.meta		Runtime.meta
Samples.meta		Samples.meta
Tests.meta		Tests.meta
package.code-workspace		package.code-workspace
package.code-workspace.meta		package.code-workspace.meta
package.json		package.json
package.json.meta		package.json.meta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agents for Unity

Instructions

Install package

Inference Server

Pipecat Server

Sample Scene

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agents for Unity

Instructions

Install package

Inference Server

Pipecat Server

Sample Scene

Limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages