Skip to content

[codex] Store etcd snapshots in Longhorn PVC#9522

Draft
boxp wants to merge 3 commits into
mainfrom
feature/etcd-snapshot-longhorn-pvc
Draft

[codex] Store etcd snapshots in Longhorn PVC#9522
boxp wants to merge 3 commits into
mainfrom
feature/etcd-snapshot-longhorn-pvc

Conversation

@boxp
Copy link
Copy Markdown
Owner

@boxp boxp commented May 26, 2026

Summary

  • Store Kubernetes upgrade etcd snapshots in a Longhorn PVC exposed by the GitOps-managed etcd-snapshot-store Deployment.
  • Remove the GitHub Actions runner fetch/S3 upload path that was failing on large snapshot transfer.
  • Keep the host-side snapshot for the existing rollback task and prune old PVC snapshots by retention count.
  • Improve Plan Ansible CI failure diagnostics by printing captured stderr and JSON tail, and uploading plan outputs on failure.
  • Add the implementation plan under docs/project_docs/BOXP-2-etcd-snapshot-longhorn-pvc/plan.md.

Validation

  • git diff --check
  • ansible-playbook --syntax-check -i inventories/production/hosts.yml playbooks/upgrade-k8s.yml in Docker via ghcr.io/astral-sh/uv:python3.14-bookworm
  • ansible-lint playbooks/upgrade-k8s.yml roles/kubernetes_upgrade/tasks/etcd_snapshot.yml in Docker via ghcr.io/astral-sh/uv:python3.14-bookworm
  • molecule test for roles/kubernetes_upgrade in Docker with Docker CLI installed in the validation container
  • codex review --uncommitted

Notes

The old S3 Terraform resources are intentionally left managed for now so Terraform does not try to delete a potentially non-empty bucket during this migration.

This PR expects the GitOps-managed storage app from boxp/lolice#587 to be applied before running the Kubernetes upgrade pre-check.

@github-actions

This comment has been minimized.

@boxp boxp force-pushed the feature/etcd-snapshot-longhorn-pvc branch from 72d4404 to 7c75360 Compare May 26, 2026 12:24
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Ansible Plan Results

Mode: --check --diff (dry run)

⚠️ Changes detected


shanghai-1: control-plane

Host OK Changed Skipped Failed Unreachable
shanghai-1 73 1 12 0 0

1 changed

Changed Tasks (1)
# Task Module
1 user_management : Update package cache unknown

shanghai-1: node-shanghai-1

Host OK Changed Skipped Failed Unreachable
shanghai-1 9 0 0 0 0

No changes

shanghai-2: control-plane

Host OK Changed Skipped Failed Unreachable
shanghai-2 73 1 12 0 0

1 changed

Changed Tasks (1)
# Task Module
1 user_management : Update package cache unknown

shanghai-2: node-shanghai-2

Host OK Changed Skipped Failed Unreachable
shanghai-2 9 0 0 0 0

No changes

shanghai-3: control-plane

Host OK Changed Skipped Failed Unreachable
shanghai-3 73 1 12 0 0

1 changed

Changed Tasks (1)
# Task Module
1 user_management : Update package cache unknown

shanghai-3: node-shanghai-3

Host OK Changed Skipped Failed Unreachable
shanghai-3 9 0 0 0 0

No changes


Plan executed on all nodes in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant