Skip to content
@OpenSymbolicAI

OpenSymbolicAI

OpenSymbolicAI

Make AI a software engineering discipline.

On the TravelPlanner benchmark (ICML 2024), LangChain passes 77.8% of tasks and CrewAI passes 73.3%. They burn 3–6× more tokens, cost 4–8× more per passing result, and lose track of instructions as context grows. GPT-4 alone scores 0.6%.

OpenSymbolicAI scores 97.9% on 1,000 tasks by splitting the LLM's job in two:

┌─────────────────────────────────────┐
│  Traditional Agent (ReAct)          │
│                                     │
│  User ─→ LLM ─→ Tool ─→ LLM ─→      │
│          Tool ─→ LLM ─→ Tool ─→     │
│          LLM ─→ ... (loop forever)  │
│                                     │
│  ⚠ Data in prompt = injection risk  │
│  ⚠ Context bloats every iteration   │
│  ⚠ LLM makes unplanned tool calls   │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│  OpenSymbolicAI (Plan + Execute)    │
│                                     │
│  User ─→ LLM ─→ Plan                │
│                    ↓                │
│          Runtime executes plan      │
│          deterministically          │
│                                     │
│  ✓ Data never enters LLM context    │
│  ✓ Fewer tokens, fewer LLM calls    │
│  ✓ Every side effect is explicit    │
└─────────────────────────────────────┘

The LLM plans. The runtime executes. Data stays in application memory and never gets tokenized.

Three blueprints for different problem shapes:

Blueprint Pattern Use when
PlanExecute Plan once, execute deterministically Fixed sequence of steps (calculators, converters, simple QA)
DesignExecute Plan with loops and conditionals Dynamic-length data (shopping carts, batch processing)
GoalSeeking Plan → execute → evaluate → repeat Iterative problems (optimization, multi-hop research, deep research)
pip install opensymbolicai-core

How It Works

Define primitives (what your agent can do) and decompositions (examples of how to use them). The LLM learns from your examples to plan new queries:

from opensymbolicai import PlanExecute, primitive, decomposition

class Calculator(PlanExecute):

    @primitive(read_only=True)
    def add(self, a: float, b: float) -> float:
        return a + b

    @decomposition(
        intent="What is 2 + 3?",
        expanded_intent="Add the two numbers",
    )
    def _example(self) -> float:
        return self.add(a=2, b=3)

Every decomposition you add makes the agent better. This is the flywheel that prompt engineering doesn't have.

Why This Matters

Problem How OpenSymbolicAI solves it
Prompt injection Symbolic Firewall keeps data out of LLM context. Nothing to inject into
Unpredictable behavior Execution is deterministic and fully traced. Even iterative agents (GoalSeeking) produce inspectable plans each step — no runaway tool-calling
High costs Fewer LLM calls to plan, then pure code execution. No re-tokenizing on every step
Can't test or debug Full execution traces, typed outputs (Pydantic), version-controlled behavior
Model lock-in Model-agnostic. Swap providers without rewriting your agent

Repositories

Runtimes

Language Repo Description
Python core-py Primitives, blueprints (PlanExecute, DesignExecute, GoalSeeking), multi-provider LLM abstraction
TypeScript core-ts TypeScript core SDK
Go core-go Go runtime with AST-based plan execution
C# / .NET core-dotnet .NET runtime

Examples & Tools

Repo Description
examples-py Python examples: RAG, multi-hop QA, deep research, unit converter, date calculator
examples-ts TypeScript examples: RAG Agent, Date Agent, Unit Converter
cli-py Interactive TUI for discovering and running agents
claude-skills Claude Code skills for scaffolding agents, adding primitives/decompositions/evaluators, and debugging traces

Benchmarks

Benchmark Result What it shows
TravelPlanner 97.9% on 1,000 tasks — GPT-4 gets 0.6% GoalSeeking two-stage. 100% hard constraint pass rate, 3.1× fewer tokens than LangChain. Blog post
MultiHopRAG 82.9% — +7.9pp over previous best GoalSeeking, 609 documents, 2,556 queries. Same result in Python, C# (83.8%), and Go (81.6%). Blog post
LegalBench 93.1% across 162 legal reasoning tasks GoalSeeking agent. 835 items, 0 errors, $1.88 total cost
FOLIO 89.2% — outperforms GPT-4 CoT (78.1%) PlanExecute + Z3 theorem prover. First-order logic reasoning

Framework Comparison (TravelPlanner)

Same model (gpt-oss-120b), same tools, same evaluation — only the framework differs:

                Pass Rate        Tokens/Task       Cost/Passing Task    LLM Calls/Task
                ─────────        ───────────       ─────────────────    ──────────────
OpenSymbolicAI  ████████████ 100%  ██░░░░░░░  13,936   █░░░░░░░  $0.013    ██░░░░░░░  2.3
LangChain       █████████░░░ 77.8% █████░░░░  43,801   ████░░░░  $0.051    ████████░  13.5
CrewAI          ████████░░░░ 73.3% █████████  81,331   ████████  $0.100    █████████  39.6

7 models hit 100% pass rate — including Llama 3.3 70B at $0.006/task and 4.3s latency on Groq. The framework matters more than the model. See the full model landscape.

Deep Dives

License

MIT

Pinned Loading

  1. core-py core-py Public

    Open Symbolic AI Core Repository

    Python 17 6

  2. examples-py examples-py Public

    Examples for OpenSymbolic AI Python

    1

  3. claude-skills claude-skills Public

    Claude Code skills for building agents with the OpenSymbolicAI framework

    Go Template 2

Repositories

Showing 10 of 17 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…