Uni-Xervo

Unified Rust runtime for embedding, reranking, generation, and raw ONNX execution across local and remote model providers.

uni-xervo gives you one runtime and one API surface for mixed model stacks, so application code stays stable while you swap providers, models, and execution modes.

Overview

Uni-Xervo is built around three core ideas:

Model aliases: your app requests models by stable names like embed/default or generate/llm.
Provider abstraction: local and remote providers implement the same task traits.
Runtime deduplication: equivalent model specs share one loaded instance.

Core tasks:

embed for vector embeddings
rerank for relevance scoring
generate for text generation, vision, image generation, and speech synthesis
raw for task-agnostic ONNX tensor execution

Why Uni-Xervo?

Keep product code provider-agnostic.
Mix local and remote models in one runtime.
Multimodal generation: text, vision, diffusion (image gen), and speech pipelines.
Enforce config correctness with schema-backed option validation.
Control startup behavior with lazy, eager, or background warmup.
Add retries/timeouts per model alias instead of hard-coding behavior.

Provider Support

Provider ID	Tasks	Cargo Feature
`local/candle`	`embed`	`provider-candle`
`local/fastembed`	`embed`	`provider-fastembed`
`local/onnx`	`raw`	`provider-onnx`
`local/mistralrs`	`embed`, `generate` (text, vision, diffusion, speech)	`provider-mistralrs`
`remote/openai`	`embed`, `generate`	`provider-openai`
`remote/gemini`	`embed`, `generate`	`provider-gemini`
`remote/vertexai`	`embed`, `generate`	`provider-vertexai`
`remote/mistral`	`embed`, `generate`	`provider-mistral`
`remote/anthropic`	`generate`	`provider-anthropic`
`remote/voyageai`	`embed`, `rerank`	`provider-voyageai`
`remote/cohere`	`embed`, `rerank`, `generate`	`provider-cohere`
`remote/azure-openai`	`embed`, `generate`	`provider-azure-openai`

Installation

Use only the features you need.

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-candle"] }
tokio = { version = "1", features = ["full"] }

Default feature set:

provider-candle

If you want local embeddings + OpenAI generation:

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-candle", "provider-openai"] }
tokio = { version = "1", features = ["full"] }

If you want raw ONNX execution:

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-onnx"] }
tokio = { version = "1", features = ["full"] }
ndarray = "0.17"

GPU acceleration flag:

gpu-cuda for CUDA-enabled builds.

Quick Start (Rust)

use uni_xervo::api::{ModelAliasSpec, ModelTask};
use uni_xervo::provider::candle::LocalCandleProvider;
use uni_xervo::runtime::ModelRuntime;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let spec = ModelAliasSpec {
        alias: "embed/local".to_string(),
        task: ModelTask::Embed,
        provider_id: "local/candle".to_string(),
        model_id: "sentence-transformers/all-MiniLM-L6-v2".to_string(),
        revision: None,
        warmup: Default::default(),
        required: true,
        timeout: None,
        load_timeout: None,
        retry: None,
        options: serde_json::Value::Null,
    };

    let runtime = ModelRuntime::builder()
        .register_provider(LocalCandleProvider::new())
        .catalog(vec![spec])
        .build()
        .await?;

    let embedder = runtime.embedding("embed/local").await?;
    let vectors = embedder.embed(vec!["hello world"]).await?;
    println!("vector dims = {}", vectors[0].len());

    Ok(())
}

JSON Config Example (`generate/llm`)

Model catalogs are JSON arrays of ModelAliasSpec.

model-catalog.json:

[
  {
    "alias": "embed/default",
    "task": "embed",
    "provider_id": "local/candle",
    "model_id": "sentence-transformers/all-MiniLM-L6-v2",
    "warmup": "lazy",
    "required": true,
    "options": null
  },
  {
    "alias": "generate/llm",
    "task": "generate",
    "provider_id": "remote/openai",
    "model_id": "gpt-4o-mini",
    "warmup": "lazy",
    "timeout": 30,
    "retry": {
      "max_attempts": 3,
      "initial_backoff_ms": 200
    },
    "options": {
      "api_key_env": "OPENAI_API_KEY"
    }
  }
]

Load JSON Config and Run Generation

use uni_xervo::provider::{LocalCandleProvider, RemoteOpenAIProvider};
use uni_xervo::runtime::ModelRuntime;
use uni_xervo::traits::{GenerationOptions, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let runtime = ModelRuntime::builder()
        .register_provider(LocalCandleProvider::new())
        .register_provider(RemoteOpenAIProvider::new())
        .catalog_from_file("model-catalog.json")?
        .build()
        .await?;

    let llm = runtime.generator("generate/llm").await?;
    let result = llm
        .generate(
            &[
                Message::user("You are a concise assistant."),
                Message::assistant("Understood."),
                Message::user("Explain what embeddings are in one paragraph."),
            ],
            GenerationOptions {
                max_tokens: Some(200),
                temperature: Some(0.3),
                top_p: Some(0.9),
                ..Default::default()
            },
        )
        .await?;

    println!("{}", result.text);
    Ok(())
}

Configuration and Validation

Catalog schema: schemas/model-catalog.schema.json
Provider option schemas: schemas/provider-options/*.schema.json
Unknown keys or wrong value types fail fast during runtime build/register.

Default remote credential env vars:

Provider ID	Default credential env var	Extra required options
`remote/openai`	`OPENAI_API_KEY`	None
`remote/gemini`	`GEMINI_API_KEY`	None
`remote/vertexai`	`VERTEX_AI_TOKEN`	`project_id` option or `VERTEX_AI_PROJECT`
`remote/mistral`	`MISTRAL_API_KEY`	None
`remote/anthropic`	`ANTHROPIC_API_KEY`	None
`remote/voyageai`	`VOYAGE_API_KEY`	None
`remote/cohere`	`CO_API_KEY`	None
`remote/azure-openai`	`AZURE_OPENAI_API_KEY`	`resource_name` option

CLI Prefetch Utility

The repository includes a prefetch CLI target (src/bin/prefetch.rs) to pre-download local model artifacts:

cargo run --bin prefetch -- model-catalog.json --dry-run
cargo run --bin prefetch -- model-catalog.json

Remote providers are skipped by design because they do not cache local weights.

HF-backed local/onnx aliases are treated as local artifacts for prefetch purposes because Uni-Xervo snapshots the full repository into cache before loading the ONNX model.

ONNX Raw Runtime

local/onnx is a raw tensor runtime, not a high-level task adapter. Uni-Xervo handles model resolution, snapshot download, ONNX Runtime loading, signature introspection, batching, and timeout/retry wrappers. Your application still owns preprocessing and output interpretation.

Full developer documentation:

website ONNX section: website/docs/onnx/

Use:

provider_id: "local/onnx"
task: "raw"

model_id can be either:

a local path to a .onnx file, or
a Hugging Face repo ID

If a repo contains multiple .onnx files, set options.artifact.

Example catalog entry:

{
  "alias": "raw/classifier",
  "task": "raw",
  "provider_id": "local/onnx",
  "model_id": "smokxy/sequence_classification_onnx",
  "options": {
    "artifact": "model.onnx",
    "execution_providers": ["cpu"]
  }
}

Example runtime usage:

use ndarray::arr2;
use uni_xervo::traits::{TensorBatch, TensorValue};

let runner = runtime.onnx_runner("raw/classifier").await?;

let mut batch = TensorBatch::new();
batch.insert(
    "input_ids".to_string(),
    TensorValue::I64(arr2(&[[101_i64, 1045, 2293, 3185, 102]]).into_dyn()),
);
batch.insert(
    "attention_mask".to_string(),
    TensorValue::I64(arr2(&[[1_i64, 1, 1, 1, 1]]).into_dyn()),
);

let outputs = runner.run(&batch).await?;
println!("{:?}", outputs.keys().collect::<Vec<_>>());

Key ONNX options:

artifact
max_batch_size
execution_providers
graph_optimization_level
inter_op_num_threads
intra_op_num_threads

Measuring Consumer Binary Size

To measure how much uni-xervo adds to a real app, compare a tiny baseline binary against tiny consumer binaries that reference specific provider types:

./scripts/measure-size.sh

The script builds stripped release binaries for:

a baseline Tokio app with no uni-xervo
uni-xervo core with no providers
provider-candle
provider-onnx
provider-fastembed
provider-openai
provider-mistralrs

This gives a more useful incremental footprint than looking at debug binaries or the repository's prefetch CLI alone.

Development

# Build
./scripts/build.sh

# Format + check + test
./scripts/test.sh

# Ignored integration tests (real providers)
./scripts/test-integration.sh

Integration tests for real providers are gated by EXPENSIVE_TESTS=1 and relevant API credentials.

Docs

Contributing guide: CONTRIBUTING.md
Development guide: DEVELOPMENT.md
Community guidelines: COMMUNITY.md
Code of conduct: CODE_OF_CONDUCT.md
Support guide: SUPPORT.md
Security policy: SECURITY.md
User guide: docs/USER_GUIDE.md
Testing guide: TESTING.md
Website docs: website/

License

Apache-2.0 (LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uni-Xervo

Overview

Why Uni-Xervo?

Provider Support

Installation

Quick Start (Rust)

JSON Config Example (`generate/llm`)

Load JSON Config and Run Generation

Configuration and Validation

CLI Prefetch Utility

ONNX Raw Runtime

Measuring Consumer Binary Size

Development

Docs

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.cargo		.cargo
.config		.config
.github/workflows		.github/workflows
benches		benches
build		build
docs		docs
examples		examples
schemas		schemas
scripts		scripts
src		src
tests		tests
website		website
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMUNITY.md		COMMUNITY.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
TESTING.md		TESTING.md
build.rs		build.rs

Folders and files

Latest commit

History

Repository files navigation

Uni-Xervo

Overview

Why Uni-Xervo?

Provider Support

Installation

Quick Start (Rust)

JSON Config Example (generate/llm)

Load JSON Config and Run Generation

Configuration and Validation

CLI Prefetch Utility

ONNX Raw Runtime

Measuring Consumer Binary Size

Development

Docs

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

JSON Config Example (`generate/llm`)

Packages