Skip to content

rustic-ai/uni-xervo

Uni-Xervo

Unified Rust runtime for embedding, reranking, generation, and raw ONNX execution across local and remote model providers.

uni-xervo gives you one runtime and one API surface for mixed model stacks, so application code stays stable while you swap providers, models, and execution modes.

Overview

Uni-Xervo is built around three core ideas:

  • Model aliases: your app requests models by stable names like embed/default or generate/llm.
  • Provider abstraction: local and remote providers implement the same task traits.
  • Runtime deduplication: equivalent model specs share one loaded instance.

Core tasks:

  • embed for vector embeddings
  • rerank for relevance scoring
  • generate for text generation, vision, image generation, and speech synthesis
  • raw for task-agnostic ONNX tensor execution

Why Uni-Xervo?

  • Keep product code provider-agnostic.
  • Mix local and remote models in one runtime.
  • Multimodal generation: text, vision, diffusion (image gen), and speech pipelines.
  • Enforce config correctness with schema-backed option validation.
  • Control startup behavior with lazy, eager, or background warmup.
  • Add retries/timeouts per model alias instead of hard-coding behavior.

Provider Support

Provider ID Tasks Cargo Feature
local/candle embed provider-candle
local/fastembed embed provider-fastembed
local/onnx raw provider-onnx
local/mistralrs embed, generate (text, vision, diffusion, speech) provider-mistralrs
remote/openai embed, generate provider-openai
remote/gemini embed, generate provider-gemini
remote/vertexai embed, generate provider-vertexai
remote/mistral embed, generate provider-mistral
remote/anthropic generate provider-anthropic
remote/voyageai embed, rerank provider-voyageai
remote/cohere embed, rerank, generate provider-cohere
remote/azure-openai embed, generate provider-azure-openai

Installation

Use only the features you need.

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-candle"] }
tokio = { version = "1", features = ["full"] }

Default feature set:

  • provider-candle

If you want local embeddings + OpenAI generation:

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-candle", "provider-openai"] }
tokio = { version = "1", features = ["full"] }

If you want raw ONNX execution:

[dependencies]
uni-xervo = { version = "0.5.0", default-features = false, features = ["provider-onnx"] }
tokio = { version = "1", features = ["full"] }
ndarray = "0.17"

GPU acceleration flag:

  • gpu-cuda for CUDA-enabled builds.

Quick Start (Rust)

use uni_xervo::api::{ModelAliasSpec, ModelTask};
use uni_xervo::provider::candle::LocalCandleProvider;
use uni_xervo::runtime::ModelRuntime;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let spec = ModelAliasSpec {
        alias: "embed/local".to_string(),
        task: ModelTask::Embed,
        provider_id: "local/candle".to_string(),
        model_id: "sentence-transformers/all-MiniLM-L6-v2".to_string(),
        revision: None,
        warmup: Default::default(),
        required: true,
        timeout: None,
        load_timeout: None,
        retry: None,
        options: serde_json::Value::Null,
    };

    let runtime = ModelRuntime::builder()
        .register_provider(LocalCandleProvider::new())
        .catalog(vec![spec])
        .build()
        .await?;

    let embedder = runtime.embedding("embed/local").await?;
    let vectors = embedder.embed(vec!["hello world"]).await?;
    println!("vector dims = {}", vectors[0].len());

    Ok(())
}

JSON Config Example (generate/llm)

Model catalogs are JSON arrays of ModelAliasSpec.

model-catalog.json:

[
  {
    "alias": "embed/default",
    "task": "embed",
    "provider_id": "local/candle",
    "model_id": "sentence-transformers/all-MiniLM-L6-v2",
    "warmup": "lazy",
    "required": true,
    "options": null
  },
  {
    "alias": "generate/llm",
    "task": "generate",
    "provider_id": "remote/openai",
    "model_id": "gpt-4o-mini",
    "warmup": "lazy",
    "timeout": 30,
    "retry": {
      "max_attempts": 3,
      "initial_backoff_ms": 200
    },
    "options": {
      "api_key_env": "OPENAI_API_KEY"
    }
  }
]

Load JSON Config and Run Generation

use uni_xervo::provider::{LocalCandleProvider, RemoteOpenAIProvider};
use uni_xervo::runtime::ModelRuntime;
use uni_xervo::traits::{GenerationOptions, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let runtime = ModelRuntime::builder()
        .register_provider(LocalCandleProvider::new())
        .register_provider(RemoteOpenAIProvider::new())
        .catalog_from_file("model-catalog.json")?
        .build()
        .await?;

    let llm = runtime.generator("generate/llm").await?;
    let result = llm
        .generate(
            &[
                Message::user("You are a concise assistant."),
                Message::assistant("Understood."),
                Message::user("Explain what embeddings are in one paragraph."),
            ],
            GenerationOptions {
                max_tokens: Some(200),
                temperature: Some(0.3),
                top_p: Some(0.9),
                ..Default::default()
            },
        )
        .await?;

    println!("{}", result.text);
    Ok(())
}

Configuration and Validation

  • Catalog schema: schemas/model-catalog.schema.json
  • Provider option schemas: schemas/provider-options/*.schema.json
  • Unknown keys or wrong value types fail fast during runtime build/register.

Default remote credential env vars:

Provider ID Default credential env var Extra required options
remote/openai OPENAI_API_KEY None
remote/gemini GEMINI_API_KEY None
remote/vertexai VERTEX_AI_TOKEN project_id option or VERTEX_AI_PROJECT
remote/mistral MISTRAL_API_KEY None
remote/anthropic ANTHROPIC_API_KEY None
remote/voyageai VOYAGE_API_KEY None
remote/cohere CO_API_KEY None
remote/azure-openai AZURE_OPENAI_API_KEY resource_name option

CLI Prefetch Utility

The repository includes a prefetch CLI target (src/bin/prefetch.rs) to pre-download local model artifacts:

cargo run --bin prefetch -- model-catalog.json --dry-run
cargo run --bin prefetch -- model-catalog.json

Remote providers are skipped by design because they do not cache local weights.

HF-backed local/onnx aliases are treated as local artifacts for prefetch purposes because Uni-Xervo snapshots the full repository into cache before loading the ONNX model.

ONNX Raw Runtime

local/onnx is a raw tensor runtime, not a high-level task adapter. Uni-Xervo handles model resolution, snapshot download, ONNX Runtime loading, signature introspection, batching, and timeout/retry wrappers. Your application still owns preprocessing and output interpretation.

Full developer documentation:

  • website ONNX section: website/docs/onnx/

Use:

  • provider_id: "local/onnx"
  • task: "raw"

model_id can be either:

  • a local path to a .onnx file, or
  • a Hugging Face repo ID

If a repo contains multiple .onnx files, set options.artifact.

Example catalog entry:

{
  "alias": "raw/classifier",
  "task": "raw",
  "provider_id": "local/onnx",
  "model_id": "smokxy/sequence_classification_onnx",
  "options": {
    "artifact": "model.onnx",
    "execution_providers": ["cpu"]
  }
}

Example runtime usage:

use ndarray::arr2;
use uni_xervo::traits::{TensorBatch, TensorValue};

let runner = runtime.onnx_runner("raw/classifier").await?;

let mut batch = TensorBatch::new();
batch.insert(
    "input_ids".to_string(),
    TensorValue::I64(arr2(&[[101_i64, 1045, 2293, 3185, 102]]).into_dyn()),
);
batch.insert(
    "attention_mask".to_string(),
    TensorValue::I64(arr2(&[[1_i64, 1, 1, 1, 1]]).into_dyn()),
);

let outputs = runner.run(&batch).await?;
println!("{:?}", outputs.keys().collect::<Vec<_>>());

Key ONNX options:

  • artifact
  • max_batch_size
  • execution_providers
  • graph_optimization_level
  • inter_op_num_threads
  • intra_op_num_threads

Measuring Consumer Binary Size

To measure how much uni-xervo adds to a real app, compare a tiny baseline binary against tiny consumer binaries that reference specific provider types:

./scripts/measure-size.sh

The script builds stripped release binaries for:

  • a baseline Tokio app with no uni-xervo
  • uni-xervo core with no providers
  • provider-candle
  • provider-onnx
  • provider-fastembed
  • provider-openai
  • provider-mistralrs

This gives a more useful incremental footprint than looking at debug binaries or the repository's prefetch CLI alone.

Development

# Build
./scripts/build.sh

# Format + check + test
./scripts/test.sh

# Ignored integration tests (real providers)
./scripts/test-integration.sh

Integration tests for real providers are gated by EXPENSIVE_TESTS=1 and relevant API credentials.

Docs

  • Contributing guide: CONTRIBUTING.md
  • Development guide: DEVELOPMENT.md
  • Community guidelines: COMMUNITY.md
  • Code of conduct: CODE_OF_CONDUCT.md
  • Support guide: SUPPORT.md
  • Security policy: SECURITY.md
  • User guide: docs/USER_GUIDE.md
  • Testing guide: TESTING.md
  • Website docs: website/

License

Apache-2.0 (LICENSE).

Packages

 
 
 

Contributors

Languages