OpenOS

An operating environment for AI agents — API-first, cloud-native, and built for isolation, scheduling, and observability.

This repository contains the Agent OS design documents and implementation workstream. For deeper technical detail, see agent-os/architecture/ and agent-os/technical-design/.

Design philosophy

Agent OS is guided by principles that distinguish an agent-native platform from a traditional user-centric stack:

API-first — Capabilities are exposed through APIs (REST, gRPC, WebSocket); there is no dependency on a GUI for core operations.
Agent-centric — Scheduling, lifecycle, and resources are modeled around agents, not around interactive sessions.
Security by design — Multi-tenant boundaries, least privilege, and strong isolation (containers/sandboxes) are first-class concerns.
Cloud-native — Microservice-friendly boundaries, containerized workloads, and elastic scaling.
Observability — Metrics, traces, and structured events are required for operating agents at scale.

Product stance (from the product definition): agents run continuously, need strict isolation, and collaborate through APIs and messaging rather than GUIs. The system favors declarative configuration, event-driven integration, and safe defaults.

Governance stance (from architecture docs): capabilities are tracked under a single source of truth using three lenses — Target (long-term), IterationScope (committed for the current iteration), and Implemented (verified in the repo). Items only move to Implemented when acceptance criteria and evidence links are satisfied — not when documentation alone exists.

Data philosophy: a thin data layer (internal/database + internal/data/*) keeps repositories, transactions (Unit of Work), outbox, schema compatibility, and migrations explicit without introducing a heavyweight standalone “data microservice” in early phases.

Technical architecture (summary)

The high-level stack is organized in layers (see agent-os/architecture/system-architecture.md and agent-os/architecture/tech-architecture.md):

Layer	Role (illustrative)
Interface	API gateway, CLI, dashboard — REST/OpenAPI, gRPC, WebSocket
Orchestration	Workflow engine, service discovery, event bus
Agent	Agent runtime, lifecycle manager, registry
Platform	Resource scheduler, security enforcement, networking
Core services	Storage, messaging, monitoring

Interface contracts (v1 baseline) include a unified error model, idempotency (Idempotency-Key on create-style APIs), path versioning (e.g. /api/v1/...), explicit RBAC + tenant requirements per API, and event payload minimum fields (event_id, event_type, schema_version, occurred_at, trace_id, agent_id, tenant_id). Cross-protocol rules are summarized in agent-os/architecture/api-compatibility-matrix.md.

Orchestration is specified with a minimal control-plane state machine (Created → Scheduled → Starting → Ready / Failed, with recovery and compensation) and a consistency / outbox state machine (Persisted → OutboxWritten → Published → Acknowledged, with replay and dead-letter paths). The two must align: e.g. control flow should not reach Ready unless the consistency stream has at least reached a Published-class state where required.

Thin data layer (see agent-os/architecture/data-layer-blueprint.md): repositories hide SQL from upper layers; Unit of Work bundles business writes, outbox writes, and audit writes; OutboxPublisher handles delivery semantics only; SchemaRegistry owns schema_version and compatibility checks.

MVP / current implementation note: the long-term interface story includes Envoy and a rich gateway; the near-term path documented in-repo uses an in-process HTTP server in Go (internal/server) for health, metrics, and baseline routing, with room to evolve toward a sidecar or standalone gateway.

Messaging ADR: NATS-first messaging is recorded under agent-os/architecture/adr/.

Stack highlights (from architecture docs): Go for the core, Kubernetes-oriented scheduling and platform concepts, PostgreSQL / Redis / S3-compatible storage patterns, NATS/Kafka-compatible messaging, Prometheus-compatible metrics.

Validation, SLO targets & release posture

This project does not define a financial “backtesting” workflow. Instead, reliability and quality are framed as measurable SLIs/SLOs, CI data gates, and release gates so that behavior can be validated before and after rollout.

Key SLI / SLO targets (documented baselines)

Area	SLI	Target	Window
Agent start	Success rate	≥ 99.0%	7-day rolling
Agent start	P95 latency	≤ 5s	24h
Control-plane API	Error rate	≤ 1.0%	24h
Outbox delivery	Success rate	≥ 99.5%	24h
Event consumption	ACK latency P95	≤ 2s	24h

Error budget rules tie SLO breaches to release policy (freeze features, gate releases, or allow only fixes/rollbacks). Details: agent-os/architecture/slo-release-gate.md.

CI data gates (automation targets)

Gate-1–4 (migrations, schema compat, outbox/idempotency, idempotency key policy) are intended to block merges when violated; Gate-5–6 (observability fields, evidence for high-risk items) block release until satisfied. See agent-os/architecture/ci-data-gates.md.

Product-level success metrics (aspirational)

The product summary cites targets such as API P95 < 50ms, availability ≥ 99.9%, resource utilization ≥ 75%, and 1000+ concurrent agents — as north-star engineering goals, not a guarantee of current measurements. See agent-os/summary-overview.md.

Project status & what is delivered

Versioning

The implementation build is aligned with v0.1.0 (see VERSION in agent-os/implementation/Makefile).

What v0.1.0 represents

Kernel-oriented foundation: runtime interfaces, substantial containerd integration work, partial gVisor integration, and shared lifecycle / types / security modeling in the runtime area.
Framework & operations baseline: configuration system with tests, health checks with tests, HTTP server skeleton (internal/server) with health/metrics-style endpoints, scheduler interfaces and scaffolding (algorithms and full policy still evolving), basic monitoring hooks.
Architecture & governance artifacts: layered architecture, dual state machines, thin data layer blueprint, API compatibility matrix, ADRs (e.g. API gateway MVP path, NATS-first messaging), SLO/release and CI data gate specifications, and capability tracking (Target / IterationScope / Implemented).

MVP scope (planning)

The MVP timeline was adjusted to a 6-month horizon with phased delivery (technical validation → core components → integration and testing). See agent-os/mvp-adjustment-summary.md.

Honest gap summary (from implementation reviews)

End-to-end workflow engine, service discovery, full gateway business APIs, scheduler algorithms, policy enforcement, persistent data layer, and NATS/Kafka integration are not complete as of the documented reviews; the project is in early implementation with a strong specification and partial runtime/config/server code. Refer to agent-os/current-project-state.md and agent-os/implementation-status-analysis.md.

Documentation map

Topic	Location
System architecture	`agent-os/architecture/system-architecture.md`
Detailed technical architecture	`agent-os/architecture/tech-architecture.md`
Thin data layer	`agent-os/architecture/data-layer-blueprint.md`
Product vision	`agent-os/product-vision.md`
Summary overview	`agent-os/summary-overview.md`

License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
cmd/aos		cmd/aos
configs		configs
internal		internal
pkg/runtime		pkg/runtime
scripts		scripts
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenOS

Design philosophy

Technical architecture (summary)

Validation, SLO targets & release posture

Key SLI / SLO targets (documented baselines)

CI data gates (automation targets)

Product-level success metrics (aspirational)

Project status & what is delivered

Versioning

What v0.1.0 represents

MVP scope (planning)

Honest gap summary (from implementation reviews)

Documentation map

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenOS

Design philosophy

Technical architecture (summary)

Validation, SLO targets & release posture

Key SLI / SLO targets (documented baselines)

CI data gates (automation targets)

Product-level success metrics (aspirational)

Project status & what is delivered

Versioning

What v0.1.0 represents

MVP scope (planning)

Honest gap summary (from implementation reviews)

Documentation map

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages