A systems-design-focused AI orchestration framework that simulates an operating system for AI agents. Users submit tasks; a kernel orchestrates execution; specialized agents perform subtasks; memory is managed centrally; outputs are evaluated and refined.
This is not a chatbot — it is an explainable, traceable multi-agent execution platform with MCP-style tool routing.
- Demonstrate AI OS concepts: kernel, scheduling, processes, memory, IPC
- Provide modular, extensible agent architecture
- Enable full execution traces for observability
- Support self-reflective reasoning via critic + refinement loops
- Ship an MVP that runs locally with no external APIs
┌─────────────────────────────────────────────────────────────────┐
│ User Task │
└────────────────────────────┬────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ MCP Kernel │
│ Context Manager │ Process Registry │ Tool Router │ Traces │
└────────────────────────────┬────────────────────────────────────┘
▼
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Scheduler │ │ Process Manager │ │ Memory Manager │
│ Agent │ │ Agent │ │ Agent │
└───────┬───────┘ └────────┬────────┘ └────────┬─────────┘
│ │ │
└────────────────────┼─────────────────────┘
▼
┌──────────────────────────────┐
│ Worker Agents │
│ Research │ Analysis │ Summary│
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ Critic Agent │
│ (quality scoring) │
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ Refinement Loop (optional) │
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ Validator Agent │
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ Final Validated Output │
└──────────────────────────────┘
| Stage | Component | Responsibility |
|---|---|---|
| 1 | Task Parser | Extract intent, entities, phases |
| 2 | MCP Kernel | Register task, route tools, log traces |
| 3 | Scheduler | Prioritize and order execution phases |
| 4 | Process Manager | Split into sub-processes with dependencies |
| 5 | Memory Manager | Load context, persist outputs |
| 6 | Workers | Execute research, analysis, summarization |
| 7 | Critic | Score completeness, reasoning, structure, evidence |
| 8 | Refinement | Re-run weak phases (max 2 rounds) |
| 9 | Validator | Final structure and consistency checks |
The kernel acts as the AI operating system core:
- Context registration — per-task execution state
- Process registry — track sub-process lifecycle
- Execution router — MCP-style tool dispatch (
parse_task,search,summarize) - Shared memory — short-term (session) and long-term (knowledge)
- Event logging —
ExecutionTracefor every kernel event
| Type | Module | Purpose |
|---|---|---|
| Short-term | memory/short_term_memory.py |
Active context, worker outputs |
| Long-term | memory/long_term_memory.py |
Completed tasks, reusable summaries |
API (via MemoryManagerAgent):
save_memory(key, value, scope=...)retrieve_memory(key, scope=...)clear_memory(scope=...)
agentic_ai_operating_system/
├── agents/ # Scheduler, workers, critic, validator, orchestrator
├── kernel/ # MCP kernel, context, router
├── memory/ # Short/long-term stores
├── tools/ # Task parser, search, summarizer
├── tasks/ # Sample user tasks
├── logs/ # Runtime logs
├── models.py # Pydantic domain models
├── utils.py # Logging and formatting
└── main.py # CLI entry point
Requires uv and Python 3.11+.
cd agentic_ai_operating_system
uv sync
uv run python main.pyRun a custom task:
uv run python main.py "Research NVIDIA earnings and summarize key AI market risks."==================================================
AI OPERATING SYSTEM TRACE
==================================================
[TASK]
User Request: Research NVIDIA earnings and summarize key AI market risks.
[SCHEDULER]
Execution Order: ['research', 'analyze', 'summarize', 'critique', 'validate']
[PROCESS MANAGER]
Created Processes: [ Research NVIDIA, Analyze findings, Summarize results, ... ]
[MEMORY]
Loaded Context: { prior_knowledge_count, short_term_keys, ... }
[WORKER OUTPUT]
[ research_worker, analysis_worker, summary_worker outputs ... ]
[CRITIC]
Score: 0.82
Feedback: [ ... ]
[VALIDATOR]
Validation Passed
==================================================
FINAL OUTPUT
==================================================
{
"summary": "...",
"confidence": 0.92,
"execution_steps": [...],
"memory_used": [...],
"critic_score": 0.90
}
- Modularity — agents and tools are independently replaceable
- Orchestration — single orchestrator coordinates pipeline stages
- Explainability — structured traces at every step
- Traceability — kernel
ExecutionTrace+ file logs - Extensibility — add workers, tools, or persistence without rewriting kernel
- Real MCP server transport (stdio/HTTP) for tool routing
- LLM-backed workers with structured output schemas
- Persistent long-term memory (vector DB / SQLite)
- Parallel worker execution via async process pool
- Priority queues and preemption in scheduler
- Distributed agent runtime and health monitoring
- Web dashboard for live trace visualization
- Pluggable critic models and human-in-the-loop validation
MIT — use freely for learning and extension.