This document describes two complementary features for managing the lifecycle of operator-managed resources:
- Tombstoning - Clean deletion of obsolete resources during upgrades
- Root Exclusion - Prevention of resource creation from Day 0
During operator upgrades, resources can become obsolete (removed features, renamed resources, consolidated configurations). Simply removing YAML from the assets directory leaves orphaned objects in the cluster. These features provide explicit, safe mechanisms for lifecycle management.
Tombstoning enables explicit deletion of legacy resources with built-in safety checks. When a feature is removed or a resource is renamed, the old resource is moved to the tombstones directory to be deleted during the next reconciliation.
/assets
├── active/ # Current managed resources (Apply - Desired State)
│ ├── descheduler/
│ ├── hco/
│ ├── operators/
│ └── metadata.yaml
└── tombstones/ # Obsolete resources (Delete - Legacy)
└── v1.1-cleanup/ # Optional organizational subfolders
/test
└── crds/ # CRDs for testing (envtest/Kind) - not deployed by autopilot
Tombstone files are minimal YAML manifests with a required safety label:
apiVersion: v1
kind: ConfigMap
metadata:
name: obsolete-tuning-config
namespace: openshift-cnv
labels:
platform.kubevirt.io/managed-by: virt-platform-autopilot # REQUIREDRequired fields:
apiVersionkindmetadata.namemetadata.labels["platform.kubevirt.io/managed-by"]="virt-platform-autopilot"
Optional field:
metadata.namespace(omit for cluster-scoped resources)
The tombstoning system includes multiple safety checks:
- Label verification: Resources are only deleted if they have the exact label
platform.kubevirt.io/managed-by=virt-platform-autopilot - Load-time validation: Tombstone files are validated when loaded - missing labels cause startup failure
- Best-effort execution: If one tombstone fails to delete, others are still processed
- Idempotency: Already-deleted resources are silently skipped (no error)
Creating a tombstone:
- Identify obsolete resource (e.g.,
assets/active/descheduler/old-config.yaml) - Move file to tombstones directory:
git mv assets/active/descheduler/old-config.yaml assets/tombstones/v1.1-cleanup/
- Verify the file contains the required
platform.kubevirt.io/managed-bylabel - Commit and release
On operator upgrade:
- Operator loads tombstones from
assets/tombstones/ - For each tombstone:
- Check if resource exists in cluster
- Verify it has the management label
- Delete if label matches (skip if label missing/incorrect)
- Emit events and update metrics
Removing a tombstone (after 2-3 releases):
- Confirm resource is deleted from all supported clusters
- Remove file from tombstones directory:
git rm assets/tombstones/v1.1-cleanup/old-config.yaml
- Commit
The RBAC generator automatically scans the tombstones/ directory and adds the delete verb to the ClusterRole for any resource types found:
make generate-rbacGenerated ClusterRole example:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["delete", "create", "get", "list", "patch", "update", "watch"] # delete addedMetrics:
kubevirt_autopilot_tombstone_status{kind, name, namespace}
# Values:
# 1 = Resource still exists (not yet deleted)
# 0 = Resource deleted successfully
# -1 = Deletion error
# -2 = Skipped (label mismatch - safety check triggered)
Events:
TombstoneDeleted(Normal): Resource successfully deletedTombstoneFailed(Warning): Deletion failed (check logs, RBAC, finalizers)TombstoneSkipped(Warning): Label mismatch - resource not managed by autopilot
Alert:
- alert: VirtPlatformTombstoneStuck
expr: kubevirt_autopilot_tombstone_status < 0
for: 30mSee runbook: docs/runbooks/VirtPlatformTombstoneStuck.md
Root exclusion prevents specific resources from being created in the first place. This is useful for:
- Disabling features not relevant to the deployment
- Preventing resource creation in environments where they would fail
- Temporary workarounds for known issues
- Excluding groups of related resources using wildcards
Set the platform.kubevirt.io/disabled-resources annotation on the HyperConverged CR using YAML syntax:
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: openshift-cnv
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: KubeDescheduler # Cluster-scoped resource
name: cluster
- kind: ConfigMap
namespace: openshift-cnv
name: virt-tuning-* # Wildcard for multiple configs
- kind: Service
namespace: prod-* # Namespace wildcard
name: metrics
- kind: Secret # Omit namespace = all namespaces
name: credentials-*YAML Structure:
- Array of exclusion rules
- Each rule requires:
kind: Resource kind (case-sensitive, e.g., "ConfigMap")name: Resource name (supports wildcards with*)
- Optional field:
namespace: Target namespace (supports wildcards, omit to match all namespaces)
Wildcard Support:
*matches any sequence of characters- Examples:
virt-*,*-config,prod-* - Glob pattern semantics (uses
filepath.Match)
Namespace Matching:
- Specify namespace for exact or wildcard namespace matching
- Omit namespace field to match resources in any namespace (including cluster-scoped)
- Empty namespace in rule = matches all namespaces
- Operator parses the annotation as YAML on each reconciliation
- Invalid YAML logs an error and continues without exclusions (fail-open)
- After rendering assets, filters out excluded resources in-memory using pattern matching
- Excluded resources are never applied (ServerSideApply is never called)
- Logs each skipped resource for transparency
Example log:
Skipping resource due to Root Exclusion kind=ConfigMap namespace=openshift-cnv name=virt-handler annotation=platform.kubevirt.io/disabled-resources
Disable KubeDescheduler (cluster-scoped):
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: KubeDescheduler
name: clusterDisable swap on specific clusters:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: MachineConfig
name: 50-swap-enableDisable all virt tuning configs in openshift-cnv namespace:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: ConfigMap
namespace: openshift-cnv
name: virt-*Disable metrics service in all prod namespaces:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: Service
namespace: prod-*
name: metricsDisable specific secret across all namespaces:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: Secret
name: credentials-dbMultiple exclusions:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: KubeDescheduler
name: cluster
- kind: ConfigMap
namespace: openshift-cnv
name: virt-*
- kind: PersesDataSource
namespace: openshift-cnv
name: virt-metrics| Feature | Root Exclusion | mode: unmanaged |
|---|---|---|
| Scope | Specific resources (Kind/Namespace/Name + wildcards) | Individual resource (annotation per object) |
| When | Day 0 (prevents creation) | Day 1+ (stops reconciliation) |
| RBAC | No impact (resource never created) | Full RBAC still required |
| Wildcards | Supported (name and namespace) | Not applicable |
| Namespace filtering | Supported | Not applicable |
| Use case | Disable features/patterns cluster-wide | Opt out of management per resource |
When to use Root Exclusion:
- Cluster-wide feature disablement
- Resources that should never be created
- Temporary workarounds before feature flag available
- Pattern-based exclusions (e.g., all virt-* configs)
- Namespace-specific exclusions
When to use mode: unmanaged:
- Per-resource customization
- Gradual migration to external management
- Temporary user overrides
- Case-sensitive kind:
ConfigMap≠configmap - Wildcard support: Use
*in name or namespace fields - Namespace filtering: Exclude resources in specific namespaces or namespace patterns
- Any-namespace matching: Omit namespace field to match resources in all namespaces
- Error handling: Invalid YAML logs error but continues (fail-open)
- Pattern validation: Invalid glob patterns are skipped gracefully
This is a breaking change from the previous comma-separated format ("Kind/Name, Kind/Name"). The old format is no longer supported. Update your HyperConverged annotations to use the new YAML syntax.
Before (old format - no longer supported):
platform.kubevirt.io/disabled-resources: "KubeDescheduler/cluster, MachineConfig/50-swap-enable"After (new format):
platform.kubevirt.io/disabled-resources: |
- kind: KubeDescheduler
name: cluster
- kind: MachineConfig
name: 50-swap-enable-
Check if resource exists:
kubectl get <kind> <name> -n <namespace>
-
Verify label:
kubectl get <kind> <name> -n <namespace> -o jsonpath='{.metadata.labels}'
Should contain:
"platform.kubevirt.io/managed-by": "virt-platform-autopilot" -
Check for finalizers:
kubectl get <kind> <name> -n <namespace> -o jsonpath='{.metadata.finalizers}'
If finalizers present, they may block deletion. Check operator logs for the finalizer owner.
-
Check RBAC:
kubectl auth can-i delete <resource> --as system:serviceaccount:openshift-cnv:virt-platform-autopilot
-
Check events:
kubectl get events -n openshift-cnv --field-selector involvedObject.kind=HyperConverged
-
Verify annotation syntax:
kubectl get hco kubevirt-hyperconverged -n openshift-cnv \ -o jsonpath='{.metadata.annotations.platform\.kubevirt\.io/disabled-resources}' -
Check operator logs for "Skipping resource due to Root Exclusion" message
-
Verify Kind/Name matches exactly (case-sensitive)
-
Check if resource was created before annotation was added
- Root exclusion only prevents creation, doesn't delete existing resources
- Use tombstoning to remove existing resources
- Lifecycle: Keep tombstones for 2-3 releases, then remove
- Organization: Use subdirectories like
v1.1-cleanup/for clarity - Testing: Test tombstone deletion in staging before production release
- Monitoring: Set up alerts for stuck tombstones
- Documentation: Document why resources were tombstoned (commit message)
- Temporary: Use root exclusion as a temporary measure, not permanent solution
- Documentation: Document why resources are excluded
- Alternatives: Consider if feature gates or component-level disable is better
- Migration path: Plan to remove exclusions when proper fix is available
- Prefer feature flags: Use metadata.yaml conditions when possible
- Gradual rollout: Test lifecycle changes in dev/staging first
- Monitor metrics: Watch tombstone_status and compliance_status metrics
- Clean up: Remove old tombstones and unused exclusions regularly
Before (v1.0):
assets/active/observability/metrics-config.yaml
After (v1.1):
-
Create new resource:
assets/active/observability/prometheus-config.yaml -
Move old resource to tombstones:
git mv assets/active/observability/metrics-config.yaml \ assets/tombstones/v1.1-cleanup/metrics-config.yaml -
Verify label in tombstone file:
labels: platform.kubevirt.io/managed-by: virt-platform-autopilot
-
Release v1.1 - operator will:
- Create
prometheus-config - Delete
metrics-config(tombstone)
- Create
User wants to disable KubeDescheduler due to known issue:
kubectl annotate hco kubevirt-hyperconverged -n openshift-cnv \
platform.kubevirt.io/disabled-resources='- kind: KubeDescheduler
name: cluster'Or using kubectl patch:
kubectl patch hco kubevirt-hyperconverged -n openshift-cnv --type=merge -p '
metadata:
annotations:
platform.kubevirt.io/disabled-resources: |
- kind: KubeDescheduler
name: cluster
'Operator will skip creating/updating KubeDescheduler.
To re-enable:
kubectl annotate hco kubevirt-hyperconverged -n openshift-cnv \
platform.kubevirt.io/disabled-resources-Removing MTV (Migration Toolkit) integration:
-
Create tombstone:
# assets/tombstones/v1.2-cleanup/mtv-operator.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: mtv-operator namespace: openshift-mtv labels: platform.kubevirt.io/managed-by: virt-platform-autopilot
-
Remove from active:
git rm assets/active/operators/mtv.yaml.tpl
-
Update metadata.yaml to remove MTV asset entry
-
Regenerate RBAC:
make generate-rbac # Adds delete verb for Subscription -
Release - operator deletes MTV subscription
-
After 2 releases, clean up tombstone:
git rm assets/tombstones/v1.2-cleanup/mtv-operator.yaml
If you have an existing deployment and want to adopt lifecycle management:
-
Audit current resources: Identify which resources are managed by autopilot
kubectl get all,cm,secrets -A -l platform.kubevirt.io/managed-by=virt-platform-autopilot
-
Ensure labels: All managed resources should have the management label
- Newer versions auto-apply labels
- Older resources may need manual labeling
-
Plan tombstones: For any resources to be removed, create tombstones with proper labels
-
Test in staging: Validate tombstone deletion in non-production environment
-
Monitor: Watch metrics and events during rollout
- Specification:
/claude_assets/reclaiming_leftovers.md - Runbook:
docs/runbooks/VirtPlatformTombstoneStuck.md - RBAC generation:
cmd/rbac-gen/main.go - Implementation:
pkg/engine/tombstone.go,pkg/engine/exclusion.go