Skip to content

docs(adr): oabctl Kubernetes backend#1181

Open
chaodu-agent wants to merge 2 commits into
mainfrom
docs/adr-oabctl-k8s-backend
Open

docs(adr): oabctl Kubernetes backend#1181
chaodu-agent wants to merge 2 commits into
mainfrom
docs/adr-oabctl-k8s-backend

Conversation

@chaodu-agent

@chaodu-agent chaodu-agent commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What problem does this solve?

OpenAB has two parallel deployment paths that do not share tooling or schema: Helm charts for Kubernetes, and oabctl (+ S3 control plane) for ECS Fargate. We want one tool and one spec to deploy to both runtimes, and for oabctl to eventually replace the Helm chart as the recommended K8s path.

This ADR turns the ECS Control Plane ADR's multi-runtime intent (§4 + Phase 3) into a concrete design for an oabctl Kubernetes backend.

Discord Discussion URL: discussed in the maintainer Discord thread with @pahud.hsieh (discussion check is skipped for the maintainer bot account; no public URL minted).

At a Glance

                  ┌──────────────────────────┐
                  │   oab.dev/v1 OABService   │  one spec, platform-agnostic core
                  │   + platform.{ecs,k8s}    │
                  └─────────────┬─────────────┘
                                │  load + validate(target) + render config.toml
                        ┌───────┴────────┐
              --target ecs            --target k8s
                        ▼                ▼
              ┌──────────────┐   ┌──────────────────┐
              │EcsProvisioner│   │  K8sProvisioner  │
              │ (aws-sdk)    │   │  (kube-rs)       │
              └──────┬───────┘   └─────────┬────────┘
                     ▼                     ▼
            S3 artifact + ECS      Deployment + ConfigMap +
            TaskDef + Service      Secret/ExternalSecret + PVC + SA

Prior Art & Industry Research

Not applicable — this is a docs/ADR-only change (no runtime code). Cross-runtime prior art is captured inline in the ADR's "Alternatives Considered" section (Helm client-render-and-apply model, kubectl apply server-side-apply semantics, and the CRD+operator pattern). Deeper OpenClaw/Hermes research will accompany the implementation PR (Phase K0/K1), not this design doc.

Proposed Solution

Add a Kubernetes backend to oabctl consuming the same oab.dev/v1 OABService manifest as ECS:

  • Platform-agnostic core spec + optional platform.ecs / platform.k8s overlays, with target-aware validation (today's spec is ECS-coupled — validate() hard-requires subnets/securityGroups — so the schema refactor is Step 1, with a backward-compat shim).
  • A Provisioner trait (EcsProvisioner / K8sProvisioner) sharing manifest loading, validation, generation tracking, and render_config_toml().
  • Client-side render & apply first (like Helm / kubectl apply); in-cluster CRD + operator deferred to a later phase.
  • Helm lifecycle parity: apply (install+upgrade), delete (uninstall), plus new template / rollback / history / --set.

Phased plan K0–K4 with a risk register and Helm migration path is in the doc.

Why this approach?

Client-render-first is the honest 1:1 Helm replacement: Helm is also client-render-and-apply, so this requires zero in-cluster install, reuses existing rendering code, and unblocks Helm deprecation fastest. CRD+operator (GitOps/self-healing) is valuable but a much larger lift and is not required for parity, so it is deferred.

Alternatives Considered

Captured in the ADR (§11): keep Helm+oabctl split (two specs), shell out to helm/kubectl (loses typed validation), CRD-first (too large, blocks deprecation), static YAML for kubectl apply (no lifecycle — a downgrade from Helm).

Validation

Docs-only changes:

  • Links are valid (cross-refs to existing ADRs: ecs-control-plane, multi-platform-adapters, unified-binary)
  • Renders correctly in GitHub preview
  • CI PR Discussion URL Check passes

Specifies how oabctl gains a K8s backend using the same oab.dev/v1
OABService spec as ECS, and the path to replacing the Helm chart.

- Platform-agnostic core spec + platform.{ecs,k8s} overlays
- Provisioner trait abstraction (EcsProvisioner / K8sProvisioner)
- Client-side render-and-apply first; CRD+operator deferred
- Lifecycle parity with Helm (install/upgrade/uninstall/rollback/history)
- Phased plan K0-K4, risks, and migration from Helm
@chaodu-agent chaodu-agent requested a review from thepagent as a code owner June 23, 2026 21:04
The first draft was written against a stale feat/unified-binary-workspace
copy of manifest.rs (flat oab.dev/v1 spec). main is already oab.dev/v2 with
a Runtime enum (Ecs|Kubernetes), an existing KubernetesRuntime stub, and the
K8s path explicitly stubbed in apply.rs/create.rs as 'not yet implemented'.

Rewrite to reflect that the schema refactor is already done; the real scope
is implementing the stubbed Runtime::Kubernetes branch. Fold in review
findings: keep runtime enum (target inference, no --target), keep configFrom
+ HashMap secrets (ARN-prefix inference), generation handled internally per
provisioner, hardcode replicas=1, add PVC storage_size, grow Provisioner
trait (status/logs), K8s Secret Injection Contract + --set denylist +
secret-not-in-ConfigMap guard, ESO preflight, operator CI gate. Correct the
false 'Helm uses ExternalSecrets' premise (it uses native Secrets).
@chaodu-agent

Copy link
Copy Markdown
Collaborator Author

LGTM ✅ — A correct, well-scoped Proposed ADR for the oabctl Kubernetes backend; every factual claim about main checks out and the security section is thorough.

What This PR Does

Adds a single docs-only ADR (docs/adr/oabctl-k8s-backend.md) that designs how oabctl will implement the currently-stubbed Runtime::Kubernetes path so that one tool + one oab.dev/v2 spec can deploy to both ECS Fargate and Kubernetes, with the goal of eventually replacing the Helm chart as the recommended K8s path.

How It Works

  • Client-side render & apply via kube-rs server-side apply, no in-cluster component in the first milestone (CRD + operator deferred to an optional later phase).
  • Keeps the existing Runtime enum (Ecs | Kubernetes) rather than flattening to platform.* overlays — the enum already enforces mutual exclusivity and lets the target be inferred from the manifest (no --target flag), which removes most ECS-regression risk.
  • A Provisioner trait unifies apply/delete/get (+ status/logs) across both backends, sharing manifest load, validation, and configFrom resolution.
  • Extends KubernetesRuntime additively (storage class/size, imagePullSecrets, secretBackend, optional Service) and maps the core spec (resources, configFrom, secrets) to Deployment + ConfigMap/initContainer + PVC + Secret/ExternalSecret + SA.
  • Phased plan K1–K4, Helm verb-parity table, risk register, alternatives, and open questions are all included.

Findings

# Severity Finding Location
1 🟢 All claims about current main verified accurate §0, §2
2 🟢 Strong, specific security section (secrets never in ConfigMap, --set denylist, ESO preflight, RBAC, rotation) §7
3 🟢 Honest "Correction note" documenting and fixing a prior stale-schema draft §0
4 🟢 Sound design call to keep the Runtime enum + configFrom/secrets as-is, with rationale §3, §13
Finding Details

🟢 F1: Claims verified against main

I cross-checked the ADR's factual assertions against operator/src/manifest.rs, operator/src/apply.rs, and operator/src/create.rs on main:

  • apiVersion: oab.dev/v2 — confirmed (validate() bails on anything else).
  • Runtime enum with Ecs(EcsRuntime) / Kubernetes(KubernetesRuntime) — confirmed.
  • KubernetesRuntime exists with node_selector / service_account / tolerations — confirmed.
  • secrets: HashMap<String, String> and config_from reference — confirmed.
  • The cited stubs (apply.rs:55-57, create.rs:56-57) bailing "Kubernetes runtime not yet implemented" — confirmed at those exact lines.

This level of accuracy is rare for a design doc and makes the proposed scope trustworthy.

🟢 F2: Security section is concrete, not hand-wavy

§7 lays out hard requirements: config/ConfigMap must never carry secret values (parity with the chart's guard), a --set denylist for secret-value paths + redacted last-applied manifest, an apply-time preflight for ESO CRDs (avoids silent ExternalSecret failure), explicit IRSA/Pod-Identity IAM needs, minimum kube RBAC verbs, and a rotation→rollout story. external is the default and native is correctly flagged as a discouraged plaintext relay.

🟢 F3: Self-correcting "Correction note"

§0 openly documents that an earlier draft was written against a stale feat/unified-binary-workspace schema and corrects every divergence (including retracting a wrong "Helm already uses ExternalSecrets" claim in §2). This prevents the doc from misleading implementers.

🟢 F4: Enum-over-overlay and reference-over-inline are the right calls

Keeping the Runtime enum (vs flattening) and configFrom/secrets: HashMap (vs inline typed config) are both justified in §13 with the tradeoffs spelled out, and they avoid touching the already-implemented ECS path.

Non-blocking observations (not findings)
  • The Open Questions (§14: ESO hard-dependency, config-delivery default, gateway porting, CRD timing) are appropriately left open for a Proposed ADR; they are decisions to make at K1, not gaps.
  • Resource mapping examples are correct (ECS cpu: "512" = 0.5 vCPU → 500m; memory: "1024" MiB → 1Gi).
  • Citing exact source line numbers in an ADR is fragile across future refactors, but they are accurate today.
Baseline Check
  • PR opened: 2026-06-23, docs-only, 1 file, +404/-0.
  • Main already has: the oab.dev/v2 schema, Runtime enum, minimal KubernetesRuntime, and ECS-only apply/get/delete; the K8s path is stubbed.
  • Net-new value: a concrete, verified design for implementing that stub — turning the ECS Control Plane ADR's multi-runtime intent into an actionable K1–K4 plan. No existing ADR covers the K8s backend.
  • CI: check passing; this is a docs-only change so no operator CI runs (the ADR itself notes this in §11).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant