Underpass AI

We don't build models. We engineer the infrastructure that makes them useful.

We focus on surgical context, governed execution, and statistical tool selection. Our benchmark data shows that structured explanatory context narrows the accuracy gap between local and frontier models — an 8B model with well-structured context scores comparably to frontier models on task identification, restart-point selection, and rationale preservation in synthetic evaluation graphs.

We use Claude, OpenAI, Qwen, or whatever model fits the job. We're not competing with model providers — we're building the engineering layer that helps extract more value from any model.

What we build

Context rehydration — A knowledge graph holds the full picture. When an event fires, the Rehydration Kernel traverses only what matters for that agent's role, renders token-counted sections, and delivers a bounded bundle. Typed explanatory relationships preserve why each node exists — rationale, motivation, method, and decision linkage — so agents can diagnose failures, resume interrupted work, and justify decisions from rehydrated context alone. Fewer tokens, better signal.

Governed tool execution — Agents don't run loose. The Underpass Runtime provides isolated workspaces with 99 governed tools across 23 families — filesystem, git, build, test, security, containers, Kubernetes, messaging, data — all under policy enforcement with full telemetry. Every invocation is tracked, audited, and produces telemetry that feeds the learning loop. OpenAPI 3.1 and AsyncAPI 3.0 contracts define the API surface.

Statistical tool selection — No LLM call needed to pick the right tool. Thompson Sampling ranks tools by empirical success rate per context, with hard SLO constraints on latency, error rate, and cost. The telemetry loop closes itself — tools that fail get ranked down, tools that work get ranked up.

Event-driven agents

When something happens, the right agent fires. No polling. No central orchestrator.

NATS event → specific agent activates → kernel delivers surgical context →
  Thompson Sampling selects best tools → runtime executes governed →
    telemetry feeds back → policies improve → next event, better decisions

Each agent is a specialist: one for diagnostics, another for repairs, another for strategic decisions. Local models handle routine work on GPU. When a task requires stronger reasoning, the system can escalate to frontier APIs — with bounded context, not sprawling prompts.

Model-agnostic

The platform doesn't care what reasons. Claude, OpenAI, open-weight models via vLLM — swap the model, keep the infrastructure. The value is in context quality and execution governance, not in any specific model.

Repositories

Layer	Repository	Language	What it does
Product	`swe-ai-fleet`	Python	Multi-agent SWE platform — planning, deliberation, and execution
Execution	`underpass-runtime`	Go	99 governed tools, isolated workspaces, TLS across 5 transports, Thompson Sampling pipeline, Helm chart with mTLS
Context	`rehydration-kernel`	Rust	Graph-native context rehydration with explanatory relationships — 270 unit tests, 9 integration tests, LLM-as-judge benchmark (432 evaluations)

Production-grade infrastructure

Both core services have been through comprehensive quality audits:

Security: Threat models, trust boundary diagrams, TLS 1.3 on all transports, policy engine with RBAC, audit logging with secret redaction, CodeQL + govulncheck + SonarCloud in CI
Operations: Helm charts with dev/production/mTLS value overrides, HPA, PDB, NetworkPolicy, PrometheusRule with 6 alerts, operational runbooks
Testing: 80% coverage gates, 14 E2E tests as K8s Jobs (smoke/core/full tiers), table-driven unit tests with hand-written fakes
API contracts: OpenAPI 3.1 (HTTP), AsyncAPI 3.0 (NATS events), contract validation in CI
Observability: Domain-layer quality metrics as value objects, Prometheus exposition, OTel tracing, structured logging
CI/CD: Automated dependency updates (Dependabot), release automation (tag-triggered builds + GHCR push), quality gate scripts

Demos

Repository	What it shows
`underpass-demo`	Runtime + tool-learning in action (Thompson Sampling, cost benchmarks)
`rehydration-starship-demo`	Context rehydration + event-driven agents + model routing

Architecture decisions

Both services maintain Architecture Decision Records (ADRs) documenting key design choices with explicit trade-offs and alternatives considered. Security models with threat analysis are published in each repository.

Open source

Apache 2.0. We build in the open because this kind of infrastructure should be shared, challenged, and improved by the community. Fork it, break it, make it better.

Status

The core services are deployed and validated through end-to-end tests on live Kubernetes clusters with full TLS. Benchmark results on synthetic graphs are documented in the kernel repo. We're actively hardening for production adoption. Early adopters and contributors welcome.

People

Created by Tirso Garcia · LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Underpass AI

Underpass AI

What we build

Event-driven agents

Model-agnostic

Repositories

Production-grade infrastructure

Demos

Architecture decisions

Open source

Status

People

Contact

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!