Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions .github/workflows/tags.yaml

This file was deleted.

6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@ error.log
.task
.state
capt/output/
capt/bin/
.vscode/
sushy.cert
sushy.key
htpasswd
htpasswd
.validation-success
capt/e2e/artifacts/*
out
1 change: 1 addition & 0 deletions capt/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TASK_X_ENV_PRECEDENCE=1
111 changes: 106 additions & 5 deletions capt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,22 @@ This playground has only been tested on Ubuntu 22.04 LTS. If you are using a vir

### Binaries

The following must be installed system-wide:

- [Libvirtd](https://wiki.debian.org/KVM) >= libvirtd (libvirt) 8.0.0
- [Docker](https://docs.docker.com/engine/install/) >= 24.0.7
- [Helm](https://helm.sh/docs/intro/install/) >= v3.13.1
- [KinD](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) >= v0.20.0
- [clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start#install-clusterctl) >= v1.6.0
- [kubectl](https://www.downloadkubernetes.com/) >= v1.28.2
- [virt-install](https://virt-manager.org/) >= 4.0.0
- [yq](https://github.com/mikefarah/yq/#install) >= v4.44.2
- [task](https://taskfile.dev/installation/) >= 3.37.2
- `curl`, `tar`, `ssh-keygen` (from `openssh-client`)

The following are downloaded automatically into `./bin/` by `task install-binaries` (invoked as part of `task create-playground`); pinned versions live near the top of [Taskfile.yaml](./Taskfile.yaml):

- `cue` — workload-manifest renderer
- `helm`
- `kind`
- `kubectl`
- `clusterctl`
- `yq`
Comment thread
jacobweinstock marked this conversation as resolved.

### Packages

Expand Down Expand Up @@ -57,6 +64,29 @@ Delete the CAPT playground:
task delete-playground
```

### External Tinkerbell mode

When `externalTinkerbell: true` is set in [`config.yaml`](./config.yaml), the
playground spins up a **second** KinD cluster (named `<clusterName>-tinkerbell`)
on the same `kind` docker network and deploys the Tinkerbell stack there
instead of into the management cluster. Hardware, Machine (BMC), and Workflow
CRs all live in this second cluster. CAPT, running in the management cluster,
talks to it via the `external-tinkerbell-kubeconfig` Secret in the
`capt-system` namespace (created by
[scripts/create_external_kubeconfig_secret.sh](./scripts/create_external_kubeconfig_secret.sh)
and labeled for `clusterctl move`).

Two kubeconfigs are produced under `output/`:

- `kind.kubeconfig` — the management cluster (CAPI/CAPT components live here)
- `tinkerbell-kind.kubeconfig` — the Tinkerbell cluster (Hardware/BMC/Workflows live here)

Caveat: after `task pivot`, the workload cluster receives the secret via
`clusterctl move`, but it must still be able to reach the Tinkerbell KinD
container's API server at the IP embedded in the kubeconfig — that IP is only
reachable from containers on the host's `kind` docker network, so cross-host
pivots are not supported in this mode.

## Next Steps

With the playground up and running and a workload cluster created, you can run through a few CAPI lifecycle operations.
Expand All @@ -81,6 +111,77 @@ To be written.

To be written.

## Running E2E Tests

The `e2e/run.sh` script orchestrates a matrix of provisioning combos
(topology × bootmode × mirror) defined in [`e2e/cue/matrix.cue`](e2e/cue/matrix.cue).
Each combo renders its own `config.yaml` from CUE, runs
`task create-playground`, executes Ginkgo specs against the resulting
clusters, then tears the playground down.

List available combos:

```bash
./e2e/run.sh --list
```

Run a single combo:

```bash
./e2e/run.sh single-nomirror-netboot
```

Run the whole matrix (combos that use the registry mirror require
`--mirror-host`):

```bash
./e2e/run.sh --mirror-host reg.example.com all
```

Useful flags:

- `--no-teardown` — keep the playground running after tests so you can
poke at it. **Resources persist; clean up with `./e2e/run.sh --cleanup`.**
- `--dry-run` — render configs and print what would run, but skip
`task` and `ginkgo` invocations.
- `--labels FILTER` — Ginkgo `--label-filter` (default `provisioning`).
- `--artifacts DIR` — where per-combo logs and JUnit reports are written
(default `e2e/artifacts/`).

The Ginkgo suite under [`e2e/test/`](e2e/test/) can also be run directly
against an already-provisioned playground by exporting the kubeconfig
paths:

```bash
E2E_MGMT_KUBECONFIG="$(yq .kind.kubeconfig .state)" \
E2E_WORKLOAD_KUBECONFIG="$(yq .outputDir .state)/$(yq .clusterName .state).kubeconfig" \
E2E_NAMESPACE="$(yq .namespace .state)" \
ginkgo -v --label-filter=provisioning ./e2e/test/...
```

## How CUE renders the playground

`config.yaml` is the only file most users touch. Everything else
(`.state`, generated CAPI manifests, kind config, hardware/BMC YAML,
`hosts.toml` mirror drop-ins) is derived by CUE packages under
[`cue/`](cue/):

- [`cue/state`](cue/state/state.cue) — reads `config.yaml`, computes
derived names/IPs/MACs, writes `.state` (the source of truth for
every downstream renderer).
- [`cue/values`](cue/values/values.cue) — the `#Config` schema for
`.state`. Inner structs are closed so typos fail `cue vet`.
- [`cue/capi`](cue/capi/render.cue), [`cue/infra`](cue/infra/render.cue),
[`cue/clusterctl`](cue/clusterctl/clusterctl.cue), [`cue/kind`](cue/kind/kind.cue)
— render Kubernetes resources from `.state`.
- [`cue/mirror`](cue/mirror/schema.cue) — optional pull-through OCI
registry mirror. Disabled by default; the wiring sentinel in
[`cue/wiring`](cue/wiring/wiring.cue) ensures the feature can't be
half-removed by accident.

`task generate-state` runs `cue vet` before `cue export`, so schema
errors surface with line numbers before any other task runs.

## Known Issues

### DNS issue
Expand Down
123 changes: 88 additions & 35 deletions capt/Taskfile.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,52 @@
version: "3"

# Apply bash-style strict mode to every `cmds:` block in this Taskfile.
# Replaces the per-task `set -euo pipefail` heredoc preamble.
# https://taskfile.dev/docs/reference/schema#set-options
set: [errexit, nounset, pipefail]

includes:
create: ./tasks/Taskfile-create.yaml
delete: ./tasks/Taskfile-delete.yaml
vbmc: ./tasks/Taskfile-vbmc.yaml
capi: ./tasks/Taskfile-capi.yaml
capi-pivot: ./tasks/Taskfile-capi-pivot.yaml
helm: ./tasks/Taskfile-helm.yaml
tools: ./tasks/Taskfile-tools.yaml

env:
# Prepend ./bin to PATH so all pinned tools (yq, helm, kind, kubectl,
# clusterctl, cue) resolve as bare command names — both in `cmds:` and
# in `vars: sh:` blocks. Requires the TASK_X_ENV_PRECEDENCE=1 experiment
# (set in .env) so the Task `env:` value wins over the OS PATH.
PATH: '{{.ROOT_DIR}}/bin:{{env "PATH"}}'

vars:
OUTPUT_DIR:
sh: echo $(yq eval '.outputDir' config.yaml)
CURR_DIR:
sh: pwd
sh: |
awk -F: '/^outputDir:/ {gsub(/^[ "]+|[ "]+$/, "", $2); print $2}' {{.ROOT_DIR}}/config.yaml
STATE_FILE: ".state"
STATE_FILE_FQ_PATH:
sh: echo {{joinPath .CURR_DIR .STATE_FILE}}
sh: echo {{joinPath .ROOT_DIR .STATE_FILE}}
CONFIG_FILE_FQ_PATH:
sh: echo {{joinPath .CURR_DIR "config.yaml"}}
sh: echo {{joinPath .ROOT_DIR "config.yaml"}}
# Pinned tool versions for binaries downloaded into ./bin from each upstream
# project's release artifacts. Bump these to upgrade tooling; the
# install-binaries task is idempotent on the version string.
CUE_VERSION: v0.16.1
HELM_VERSION: v4.1.4
KIND_VERSION: v0.31.0
# clusterctl version pins to the CAPI core release; must be compatible with
# the providers `clusterctl init` will install.
CLUSTERCTL_VERSION: v1.12.5
KUBECTL_VERSION: v1.35.4
YQ_VERSION: v4.53.2
# Host OS/arch used to construct release download URLs. Override on the
# command line if cross-installing (e.g. `task install-binaries OS=darwin`).
OS:
sh: uname -s | tr '[:upper:]' '[:lower:]'
ARCH:
sh: case "$(uname -m)" in x86_64) echo amd64;; aarch64|arm64) echo arm64;; *) uname -m;; esac

tasks:
create-playground:
Expand All @@ -25,7 +55,8 @@ tasks:
Create the CAPT playground. Use the config.yaml file to define things like cluster size and Kubernetes version.
cmds:
- task: system-deps-warnings
- task: validate-binaries
- task: tools:validate-binaries
- task: tools:install-binaries
- task: ensure-output-dir
- task: generate-state
- task: create:playground-ordered
Expand All @@ -36,29 +67,10 @@ tasks:
summary: |
Delete the CAPT playground.
cmds:
- task: validate-binaries
- task: tools:validate-binaries
- task: tools:install-binaries
- task: delete:playground

validate-binaries:
silent: true
summary: |
Validate all required dependencies for the CAPT playground.
cmds:
- for:
[
"virsh",
"docker",
"helm",
"kind",
"kubectl",
"clusterctl",
"virt-install",
"yq",
]
cmd: command -v {{ .ITEM }} >/dev/null || echo "'{{ .ITEM }}' was not found in the \$PATH, please ensure it is installed."
# sudo apt install virtinst # for virt-install
# sudo apt install bridge-utils # for brctl

system-deps-warnings:
summary: |
Run CAPT playground system warnings.
Expand All @@ -79,17 +91,46 @@ tasks:
- echo ;[ -d {{.OUTPUT_DIR}} ]
- echo ;[ -d {{.OUTPUT_DIR}}/xdg ]

generate-ssh-key:
summary: |
Generate an ed25519 SSH keypair in {{.OUTPUT_DIR}} for cloud-init
authorized_keys injection.
deps: [ensure-output-dir]
generates:
- "{{.OUTPUT_DIR}}/capt-ssh-key"
- "{{.OUTPUT_DIR}}/capt-ssh-key.pub"
cmds:
- ssh-keygen -t ed25519 -f {{.OUTPUT_DIR}}/capt-ssh-key -N "" -C capt-playground >/dev/null
status:
- test -f {{.OUTPUT_DIR}}/capt-ssh-key
- test -f {{.OUTPUT_DIR}}/capt-ssh-key.pub

generate-state:
summary: |
Populate the state file.
Populate the state file from config.yaml using the cue/state package.
Reads the SSH public key generated by generate-ssh-key.
deps: [generate-ssh-key]
sources:
- config.yaml
- cue/state/*.cue
- cue/values/*.cue
Comment thread
jacobweinstock marked this conversation as resolved.
- cue/mirror/*.cue
- "{{.OUTPUT_DIR}}/capt-ssh-key.pub"
generates:
- .state
cmds:
- ./scripts/generate_state.sh config.yaml .state
status:
- if [[ -n "$(yq '.os.sshKey' {{.STATE_FILE_FQ_PATH}})" ]]; then true; else false; fi
# Validate config.yaml against the cue/state schema (and transitively
# cue/values + cue/mirror) before exporting. Surfaces schema errors
# before they appear as confusing yaml-export failures.
# Folded scalar (>-) is required because the bare `yaml:` token would
# otherwise be parsed as a YAML mapping key.
- >-
cue vet ./cue/state yaml: config.yaml -l 'config:'
- >-
cue export ./cue/state yaml: config.yaml -l 'config:'
-t cwd="$PWD"
-t sshPubKey="$(tr -d '\n' < {{.OUTPUT_DIR}}/capt-ssh-key.pub)"
Comment thread
jacobweinstock marked this conversation as resolved.
-e out --out yaml > .state
Comment thread
jacobweinstock marked this conversation as resolved.

next-steps:
silent: true
Expand All @@ -104,29 +145,41 @@ tasks:
sh: yq eval '.clusterName' {{.STATE_FILE_FQ_PATH}}
KIND_KUBECONFIG:
sh: yq eval '.kind.kubeconfig' {{.STATE_FILE_FQ_PATH}}
EXTERNAL_TINKERBELL:
sh: yq eval '.externalTinkerbell // false' {{.STATE_FILE_FQ_PATH}}
TINK_KUBECONFIG:
sh: yq eval '.kind.tinkerbell.kubeconfig // ""' {{.STATE_FILE_FQ_PATH}}
cmds:
- |
echo
echo The workload cluster is now being created.
echo Once the cluster nodes are up and running, you will need to deploy a CNI for the cluster to be fully functional.
echo The management cluster kubeconfig is located at: {{.KIND_KUBECONFIG}}
{{- if eq .EXTERNAL_TINKERBELL "true" }}
echo The Tinkerbell cluster kubeconfig is located at: {{.TINK_KUBECONFIG}}
echo "NOTE: External Tinkerbell mode is enabled. Tinkerbell resources (Hardware, Workflows, BMC) are in the Tinkerbell cluster."
{{- end }}
echo The workload cluster kubeconfig is located at: {{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig
echo
echo 1. Watch and wait for the first control plane node to be provisioned successfully: STATE_SUCCESS
{{- if eq .EXTERNAL_TINKERBELL "true" }}
echo "KUBECONFIG={{.TINK_KUBECONFIG}} kubectl get workflows -n {{.NAMESPACE}} -o wide -w"
{{- else }}
echo "KUBECONFIG={{.KIND_KUBECONFIG}} kubectl get workflows -n {{.NAMESPACE}} -o wide -w"
{{- end }}
echo
echo
echo 2. Watch and wait for the Kubernetes API server to be ready and responding:
echo "until KUBECONFIG={{.CURR_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl get node; do echo 'Waiting for Kube API server to respond...'; sleep 5; done"
echo "until KUBECONFIG={{.ROOT_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl get node; do echo 'Waiting for Kube API server to respond...'; sleep 5; done"
echo
echo 3. Deploy a CNI
echo Cilium
echo "KUBECONFIG={{.CURR_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig cilium install"
echo "KUBECONFIG={{.ROOT_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig cilium install"
echo or KUBEROUTER
echo "KUBECONFIG={{.CURR_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kubeadm-kuberouter.yaml"
echo "KUBECONFIG={{.ROOT_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kubeadm-kuberouter.yaml"
echo
echo 4. Watch and wait for all nodes to join the cluster and be ready:
echo "KUBECONFIG={{.CURR_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl get nodes -w"
echo "KUBECONFIG={{.ROOT_DIR}}/{{.OUTPUT_DIR}}/{{.CLUSTER_NAME}}.kubeconfig kubectl get nodes -w"
- touch {{.OUTPUT_DIR}}/.next-steps-displayed
status:
- echo ;[ -f {{.OUTPUT_DIR}}/.next-steps-displayed ]
Expand Down
Loading
Loading