A local, inspectable multi-model fusion router for coding agents.
OpenFusion exposes one OpenAI-compatible endpoint and turns "pick a suitable model" into a transparent routing decision. openfusion/auto scores a candidate pool by task fit, benchmark percentile, price, throughput, latency, availability, fallback order, and session stickiness. Simple requests use one model, important requests add a verifier, and only complex or high-risk requests run the full fusion panel.
Use it when you want Codex, OpenCode, Aider, editor agents, or custom scripts to talk to a single local gateway instead of hard-coding one upstream model for every task.
- Local-first gateway: run it in front of OpenRouter or any OpenAI-compatible relay.
- Real SSE streaming: passthrough requests stream directly, and fusion streams incrementally once synthesis begins.
- Transparent routing: inspect Auto candidate scores, fallbacks, sticky session pins, selected roles, judge, and synthesizer.
- Coding-agent focused: debugging, review, architecture tradeoffs, tests, and docs.
- Dry-run friendly: try routing and orchestration without an API key.
- Small by design: zero runtime dependencies, plain Node.js, easy to fork.
LiteLLM routes LLM API calls. OpenFusion focuses on routing coding-agent work.
Run directly from GitHub with npx:
npx github:Lling0000/OpenFusion doctor
npx github:Lling0000/OpenFusion --dry-run "Review this patch for security risks and missing tests" --json
npx github:Lling0000/OpenFusion compare --dry-run
npx github:Lling0000/OpenFusion adapter codex
npx github:Lling0000/OpenFusion adapter aiderAfter the first npm release, the same flow becomes:
npx openfusion@latest doctor
npx openfusion@latest adapter codex
npx openfusion@latest adapter aiderDry-run mode does not send prompts upstream. It uses a mock client so you can inspect routing and orchestration locally.
What you should see from compare --dry-run:
# OpenFusion Single-vs-Fusion Comparison Receipt
Mode: `dry-run`
Baseline: `fast:openai/gpt-4.1-mini`
Overall: **PASS** (3/3)
| Case | Status | Baseline | Fusion Panel | Judge | Synthesizer |
| --- | --- | --- | --- | --- | --- |
| `coding-review` | PASS | `fast:openai/gpt-4.1-mini` | `verifier` + `coder` + `fast` | `verifier:google/gemini-2.5-pro` | `writer:openai/gpt-4.1` |
| `architecture-tradeoff` | PASS | `fast:openai/gpt-4.1-mini` | `coder` + `reasoner` + `fast` + `verifier` | `verifier:google/gemini-2.5-pro` | `writer:openai/gpt-4.1` |
| `docs-polish` | PASS | `fast:openai/gpt-4.1-mini` | `writer` + `fast` + `verifier` | `verifier:google/gemini-2.5-pro` | `writer:openai/gpt-4.1` |
That receipt proves the same prompts were exercised through one baseline role and the multi-stage fusion route. In dry-run mode it proves orchestration, not answer quality; run the same command without --dry-run after configuring a real relay to collect real provider evidence.
When you want explicit task-quality evidence instead of orchestration-only evidence, add --grade to compare. That runs a grader role over the baseline answer and fused answer for the same prompts and emits a separate Quality Comparison Receipt.
From a git checkout:
git clone https://github.com/Lling0000/OpenFusion.git
cd OpenFusion
node src/cli.js doctor
node src/cli.js --dry-run "Review this patch for security risks and missing tests" --json
node src/cli.js compare --dry-runStart a local OpenAI-compatible server:
npx github:Lling0000/OpenFusion serve --dry-run --port 8787Call it like a normal chat completions endpoint:
curl http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openfusion/auto",
"messages": [
{
"role": "user",
"content": "Debug this flaky API test and propose a minimal patch"
}
]
}'Inspect routing without running the full pipeline:
curl http://localhost:8787/debug/route \
-H "Content-Type: application/json" \
-d '{
"model": "openfusion/auto",
"messages": [
{
"role": "user",
"content": "Review this patch for security and test risks"
}
]
}'One model is rarely best at everything. Coding, planning, verification, and writing reward different strengths. OpenRouter's Fusion router shows a productized version of this idea: multi-model deliberation behind a single model slug. OpenFusion makes a small open-source version you can run locally in front of your own API relay.
This is especially useful when your Codex or editor setup already talks to an API relay. Instead of hard-coding one model for every task, OpenFusion can expose one local OpenAI-compatible endpoint and decide which upstream model roles should collaborate for each question.
For Codex, the recommended entry point is openfusion/auto in the model selector. Codex does not currently expose a public plugin UI slot for pinning a third-party toggle to the bottom-right of the composer, so the closest native-feeling setup is to make openfusion/auto the model you pick when Auto is on. Auto scores a candidate pool and may choose one model, a primary model plus verifier, or the full fusion panel depending on task risk and complexity.
- OpenAI-compatible local endpoint:
POST /v1/chat/completions. - Model listing endpoint:
GET /v1/models. - Debug route endpoint:
POST /debug/route. - CLI commands:
init,models,route,doctor,compat,adapter,codex,eval,compare,receipt,serve, andchat. - Adapter guides:
adapter codexandadapter aiderprint local connection settings and verification commands. - Transparent Auto policy with benchmark/price/performance/availability scoring, fallback, and session stickiness.
- Explicit
route -> panel -> judge -> synthesizefusion pipeline when you requestopenfusion/fusionor Auto selects full fusion. - Works with OpenRouter or any OpenAI-compatible API relay.
- Dry-run mode for local testing without API keys.
- JSON trace showing selected model roles and orchestration metadata.
- Eval receipts for routing and orchestration checks:
openfusion eval --dry-run. - Configurable role-to-model mapping.
Installed or npx usage:
openfusion init
openfusion models
openfusion route "Review this API design for security and tests"
openfusion doctor
openfusion eval --dry-run
openfusion compare --dry-run
openfusion chat --dry-run "Compare two architectures for a Codex API relay" --json
openfusion serve --dry-run --port 8787
openfusion codex status
openfusion codex enable-autoFrom a git checkout, replace openfusion with node src/cli.js:
node src/cli.js init
node src/cli.js models
node src/cli.js route "Review this API design for security and tests"
node src/cli.js doctor
node src/cli.js eval --dry-run
node src/cli.js compare --dry-run
node src/cli.js chat --dry-run "Compare two architectures for a Codex API relay" --json
node src/cli.js serve --dry-run --port 8787
node src/cli.js codex status
node src/cli.js codex enable-autoroute previews the selected panel, judge, synthesizer, and upstream call budget without calling any model. doctor checks configuration, role mappings, judge/synthesizer settings, and the dry-run fusion pipeline. It also includes a compact Fusion Receipt Summary with the selected panel roles, judge, synthesizer, trace id, phase count, and latency. Pass --real after setting your upstream key to test a real relay.
export OPENROUTER_API_KEY="..."
openfusion doctor --realUse --probe-url to verify an OpenAI-compatible endpoint, including chat, basic streaming, and tool-call round-trip behavior:
openfusion doctor --probe-url http://127.0.0.1:8787/v1
openfusion doctor --probe-url http://127.0.0.1:8787/v1 --probe-timeout-ms 30000Generate a Markdown compatibility report:
openfusion doctor --probe-url http://127.0.0.1:8787/v1 --format markdownThe Markdown report is designed to be pasted into issues or provider reports. Use eval or receipt when you need fuller routing/orchestration evidence with content hashes and excerpts.
Compare multiple providers or relays:
openfusion compat \
--target "local|http://127.0.0.1:8787/v1|openfusion/fusion" \
--target "openrouter|https://openrouter.ai/api/v1|openrouter/fusion|OPENROUTER_API_KEY" \
--timeout-ms 30000If a relay is slow to start streaming or complete tool-call turns, raise --probe-timeout-ms or --timeout-ms instead of assuming the compatibility surface is broken.
Print a local adapter guide:
openfusion adapter codex
openfusion adapter aiderInspect or switch the Codex-facing Auto state:
openfusion codex status
openfusion codex enable-auto
openfusion codex enable-fusionGenerate a dry-run eval receipt for routing/orchestration evidence. The receipt includes a Routing Diversity section that shows whether different prompts selected different role/model panels:
openfusion eval --dry-run
openfusion eval --dry-run --json
openfusion receipt --dry-run "Review this Codex relay patch"Generate a single-model baseline vs fusion receipt for the built-in eval prompts:
openfusion compare --dry-run
openfusion compare --dry-run --json
openfusion compare --dry-run --baseline-role coder
openfusion compare --dry-run --grade
openfusion compare --dry-run --grade --grader-role verifierWithout --dry-run, compare calls your configured upstream relay once through the baseline role and once through the fusion pipeline for each case. Treat the result as reproducible orchestration evidence; use task-specific grading before claiming one answer is better.
Add --grade when you want a second pass where one grader role compares the baseline answer with the fused answer and returns a structured winner/score/rationale/risks receipt. This is still model judgment rather than ground truth, but it gives you explicit quality evidence on top of the orchestration receipt.
Create a config file:
openfusion initEdit openfusion.config.json if your relay uses a different base URL or environment variable:
{
"upstream": {
"baseURL": "https://your-relay.example.com/v1",
"apiKeyEnv": "YOUR_RELAY_API_KEY"
},
"roles": {
"coder": {
"model": "your-code-model",
"pricing": {
"inputUsdPer1M": 0.5,
"outputUsdPer1M": 1.5
}
}
},
"routing": {
"rules": [
{
"role": "verifier",
"keywords": ["incident", "postmortem"],
"score": 3,
"reason": "incident-review signal"
}
]
},
"auto": {
"scoring": {
"costQualityTradeoff": 7,
"preferences": {
"minAvailability": 0.4,
"preferredMinThroughput": { "p90": 40 },
"preferredMaxLatency": { "p90": 5 }
}
},
"stickiness": {
"enabled": true,
"implicit": true,
"ttlMs": 300000
},
"upstreamFallbacks": {
"enabled": true,
"maxModels": 2
},
"candidates": [
{
"id": "code-router",
"role": "coder",
"model": "openrouter/pareto-code",
"skills": { "coding": 0.9, "reasoning": 0.75 },
"benchmarks": { "coding": 86 },
"pricing": {
"inputUsdPer1M": 0.5,
"outputUsdPer1M": 1.5
},
"performance": {
"throughput": { "p50": 90, "p90": 50 },
"latency": { "p50": 1.2, "p90": 3.5 }
},
"availability": 0.85,
"upstream": {
"provider": {
"sort": { "by": "price", "partition": "none" },
"preferred_min_throughput": { "p90": 40 },
"allow_fallbacks": true
},
"plugins": [
{ "id": "pareto-router", "min_coding_score": 0.66 }
]
}
}
]
},
"fusion": {
"maxUpstreamCalls": 6,
"costEstimate": {
"inputTokensPerCall": 2000,
"outputTokensPerCall": 1000,
"maxUsd": 0.1
}
}
}Custom routing rules add to the built-in coding, reasoning, verification, and writing signals. Use keywords for simple matches or patterns for JavaScript regular expressions.
maxUpstreamCalls is a pre-flight safety guard. OpenFusion estimates selected panel roles + judge + synthesis and rejects the request before any upstream call if the route would exceed the limit.
Auto cost estimates come from auto.candidates[].pricing; explicit fusion cost estimates come from roles.*.pricing. Keep those numbers and performance metrics in sync with your relay; OpenFusion does not fetch live provider prices or live benchmark data.
If you override auto.candidates, provide the full candidate pool you want Auto to choose from. The generated default config includes candidates for fast, reasoner, coder, verifier, and writer.
Start OpenFusion:
YOUR_RELAY_API_KEY="..." openfusion serve --config openfusion.config.json --port 8787Point Codex or another OpenAI-compatible client at:
# ~/.codex/config.toml
model = "openfusion/auto"
model_provider = "openfusion"
[model_providers.openfusion]
name = "OpenFusion local"
base_url = "http://127.0.0.1:8787/v1"
env_key = "OPENFUSION_API_KEY"Set OPENFUSION_API_KEY to any local placeholder value. Keep your real relay key in the environment variable used by OpenFusion, such as YOUR_RELAY_API_KEY or OPENROUTER_API_KEY.
When Codex shows openfusion/auto in its model selector, that is the clearest sign that Auto is enabled. If Codex still shows a direct upstream model such as gpt-5.5, you are not routing through OpenFusion.
Generic OpenAI-compatible clients can use:
base_url = http://127.0.0.1:8787/v1
api_key = any-local-placeholder
model = openfusion/auto
OpenFusion will receive the local request, score the Auto candidate pool, choose a single-model, verified, or full-fusion strategy, call your upstream relay, and return a normal chat completion.
openfusion/fusion remains available as the explicit manual model label when you want to force the full panel -> judge -> synthesis route instead of letting Auto choose.
See docs/codex-relay.md for a more complete Codex/API relay setup guide, including doctor --probe-url.
See docs/providers for compatibility report templates and community provider matrix guidance. Provider reports are validated by npm run check.
See examples for copy-paste quickstart, tool passthrough, provider compatibility scenarios, and the real OpenRouter verify.sh Codex gateway smoke.
Run openfusion adapter codex to print the exact local base_url, placeholder API key, startup commands, and verification steps for a Codex-style client. From a git checkout, use node src/cli.js adapter codex.
OpenFusion returns a normal chat completion response plus an openfusion trace:
{
"model": "openfusion/auto",
"choices": [
{
"message": {
"role": "assistant",
"content": "Final synthesized answer..."
}
}
],
"openfusion": {
"mode": "auto",
"auto": {
"mode": "single-verify",
"profile": "balanced",
"selected": [
{
"role": "coder",
"model": "deepseek/deepseek-chat-v3-0324",
"score": 0.847,
"metrics": {
"benchmark_percentile": 86,
"price_usd_per_1m": 0.55,
"availability": 0.82
},
"upstream": {
"provider": {
"sort": { "by": "price", "partition": "none" }
}
}
},
{ "role": "verifier", "model": "google/gemini-2.5-pro", "score": 0.781 }
],
"session": {
"source": "explicit",
"id": "codex-workspace-123",
"hit": true,
"pins": { "roles": { "coder": "deepseek/deepseek-chat-v3-0324" } }
}
},
"route": {
"selectedRoles": ["coder", "verifier", "fast"],
"rationale": "Selected coder, verifier, fast because the prompt contains coding/debugging and verification signals."
},
"panel": [
{ "role": "coder", "model": "deepseek/deepseek-chat-v3-0324" },
{ "role": "verifier", "model": "google/gemini-2.5-pro" }
],
"judge": { "role": "verifier", "model": "google/gemini-2.5-pro" },
"synthesizer": { "role": "writer", "model": "openai/gpt-4.1" },
"trace": {
"phase_count": 3,
"phases": [
{
"phase": "primary",
"role": "coder",
"model": "deepseek/deepseek-chat-v3-0324",
"upstream": {
"models": ["openai/gpt-4.1-mini", "openai/gpt-4.1"],
"session_id": "codex-workspace-123"
},
"attempts": [
{ "model": "deepseek/deepseek-chat-v3-0324", "status": "success" }
]
}
]
}
}
}Strict OpenAI SDK clients can ignore the extra openfusion field. Debugging tools can use it to explain why Auto chose a specific strategy or why explicit fusion routed to a specific panel.
For openfusion/auto, inspect openfusion.auto and openfusion.trace.auto to see the selected strategy, candidate scores, and fallbacks. For openfusion/fusion, inspect the panel, judge, synthesizer, and phase trace.
OpenFusion is deliberately small:
src/router.jsscores the prompt for coding, reasoning, verification, and writing signals.src/auto.jsanalyzes task risk and complexity, then scores candidate models by skill fit, benchmark percentile, configured price, throughput, latency, availability, role fit, and sticky session pins.src/auto.jschoosesfast-single,smart-single,single-verify, orfusion-panel, builds ordered fallbacks, and records every selection in trace metadata.src/auto.jsandsrc/fusion.jsestimate upstream calls and enforcefusion.maxUpstreamCallsbefore any model call.src/fusion.jsruns the explicit full-fusion panel concurrently whenopenfusion/fusionis requested or Auto selectsfusion-panel.src/prompts.jscreates independent panel prompts, judge prompts, and final synthesis prompts when the selected strategy needs them.src/openaiClient.jscalls an OpenAI-compatible/chat/completionsupstream and preserves OpenRouter-stylemodels,provider,plugins, andsession_idfields.src/server.jsexposes a local/v1/chat/completionsendpoint.
OpenFusion implements a small OpenAI-compatible surface for local routing.
| Capability | Status | Notes |
|---|---|---|
POST /v1/chat/completions |
Supported | Non-streaming chat completions. |
GET /v1/models |
Supported | Lists virtual OpenFusion models and role models. |
POST /debug/route |
Supported | Shows selected roles, Auto strategy/candidate scores when openfusion/auto is requested, and estimated upstream call budget without running models. |
| Streaming responses | Basic support | Role/tool passthrough streams directly. Explicit fusion streams once synthesis starts. Auto single/verified responses currently return an SSE-compatible completion chunk after the selected strategy finishes. |
| Tool calls / function calling | Basic passthrough | Requests with tools, tool_choice, parallel_tool_calls, role: "tool", or assistant tool_calls bypass fusion and go to one upstream model so the tool-call protocol is preserved. |
| Embeddings, images, audio | Not supported | OpenFusion currently focuses on coding-agent chat workflows. |
OpenRouter documents Fusion as openrouter/fusion, a model slug that performs multi-model deliberation in a single API call. Their docs describe a panel, a judge, and a final answer. See:
- https://openrouter.ai/docs/guides/routing/routers/fusion-router
- https://openrouter.ai/docs/guides/features/server-tools/fusion
- https://openrouter.ai/docs/guides/features/plugins/fusion
- https://openrouter.ai/openrouter/fusion
OpenFusion does not claim to reproduce OpenRouter's private routing logic. It is a transparent local pattern you can inspect, test, and adapt.
The official OpenRouter docs currently describe Fusion as a router that exposes a model slug, can be invoked through server-tool style Fusion integrations, runs a panel plus judge flow, and returns a final answer. See docs/openrouter-fusion-notes.md for the local mapping and source links.
OpenFusion is an early, working prototype for local multi-model orchestration.
It is useful today for experimenting with role-based routing, Auto candidate scoring, fallback traces, sticky sessions, dry-run traces, eval receipts, graded comparison receipts, incremental synthesis streaming, basic upstream call budgets, and OpenAI-compatible relay integration. It is not yet a production gateway: full fan-out token multiplexing, live provider telemetry ingestion, provider-specific quirks, and fusion-aware tool orchestration are still on the roadmap. Basic tool-call passthrough already exists so coding-agent tool turns stay on one upstream model.
If you adopt it, keep tests, code review, and domain-specific validation in the loop. Fusion improves coverage of perspectives; it does not guarantee correctness.
- Token-by-token streaming from the local server.
- Better prompt classification with examples and custom rules.
- Optional live telemetry import for provider prices, latency, throughput, availability, and context windows.
- Real-provider eval receipts comparing single-model vs fusion answers with task-specific grading.
- Single-model vs fusion comparison receipts with
openfusion compare. - Adapter presets for Codex, OpenCode, Continue, Cline, Aider, and LiteLLM.
- Deeper compatibility doctor checks for provider quirks: tool calls, usage chunks, and headers.
- Fusion-aware tool orchestration after the single-model passthrough path is stable.
- Worktree fanout mode for trying multiple coding agents and letting tests judge the winner.
- Web trace viewer for panel answers and judge decisions.
OpenFusion is intentionally small, so focused contributions are welcome.
- Add more adapter presets for OpenCode, Continue, Cline, and LiteLLM.
- Add provider compatibility tests for OpenRouter and other OpenAI-compatible relays.
- Add sanitized provider compatibility reports under
docs/providers/. - Add token-by-token streaming support for
/v1/chat/completions. - Improve tool-call passthrough and add tool-call round-trip fixtures.
- Add real-provider eval receipts comparing single-model and fused answers.
- Add a trace viewer for panel answers, judge notes, and final synthesis.
See CONTRIBUTING.md for development commands and pull request guidelines. See CHANGELOG.md for release notes, compatibility changes, and migration notes.
- Dry-run mode sends no prompt data upstream.
- Real mode sends your prompt to each selected upstream model role.
- Explicit role models such as
openfusion/codersend normal chat requests to one configured upstream model. - Tool-call passthrough sends a tool turn to one upstream model only, using
fusion.toolRoleby default for virtual models. - Do not put secrets in prompts or config files.
- OpenFusion is an orchestration layer, not a guarantee that a fused answer is correct. Keep tests, review, and domain-specific validation in the loop.
npm test
npm run smoke
npm run doctor
npm run compare
npm run check:providers
npm run check:examples
npm run check:package
npm run checkSee docs/release-checklist.md for the local release checklist and manual smoke commands. See SECURITY.md for secret-handling and vulnerability reporting guidance.
MIT