Skip to content

Implemented an automated cleanup of old and unused images#5

Merged
valeryia-hurynovich merged 15 commits intomasterfrom
cloud-profile-garbage-collection
Mar 20, 2026
Merged

Implemented an automated cleanup of old and unused images#5
valeryia-hurynovich merged 15 commits intomasterfrom
cloud-profile-garbage-collection

Conversation

@valeryia-hurynovich
Copy link
Collaborator

@valeryia-hurynovich valeryia-hurynovich commented Mar 5, 2026

Related to this ticket: https://github.wdf.sap.corp/cc/unified-kubernetes/issues/1063
Here is a logic of following points:

  1. Extend the ManagedCloudProfile CRD with a .spec.machineImagesUpdates.garbageCollection field;
  2. Implement time-based garbage collection;
  3. Add tests.

Summary by CodeRabbit

  • New Features

    • Added garbage-collection controls for machine images (enabled + maxAge) with automatic cleanup of old unreferenced versions.
    • Stores creation timestamps for machine images.
    • Introduced Shoot resource support via new CRD.
  • Tests

    • Expanded coverage for garbage-collection scenarios, credential/OCI handling, provider-config updates, and reconciliation flows.
  • Chores

    • Bumped Go toolchain to 1.25.8 across CI and build images; added lint timeouts.
    • Updated CRD metadata annotations.

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch 9 times, most recently from 6edd645 to 3aad482 Compare March 6, 2026 13:19
@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 3aad482 to 4c020fa Compare March 6, 2026 13:25
@fwiesel
Copy link
Contributor

fwiesel commented Mar 9, 2026

I see the "old" part, but where do you filter out that currently referenced images are not being deleted?

@valeryia-hurynovich
Copy link
Collaborator Author

I see the "old" part, but where do you filter out that currently referenced images are not being deleted?

Indeed, I missed this part, will add this part of functionality

@defo89
Copy link
Contributor

defo89 commented Mar 10, 2026

Thanks for the PR! I think we need to revisit the referenced image protection logic.

This controller is responsible for creating CloudProfile CR and updating its .spec.machineImages and .spec.providerConfig.machineImages. So if we check this same resource before cleanup, we won't ever perform the GC, right?

Since we have access to Garden API where CloudProfile CR resides, we should be able to check Shoot resources for the respective machine image versions in the worker pools. ref https://github.com/gardener/gardener/blob/master/pkg/apis/core/v1beta1/types_shoot.go#L1774

Wdyt?

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch 2 times, most recently from 0c8585e to 0ce9d65 Compare March 10, 2026 11:26
@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch 2 times, most recently from e6c127a to 9e9a1ee Compare March 11, 2026 15:43
@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 9e9a1ee to 4f21398 Compare March 11, 2026 15:54
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps Go versions in CI and Docker; adds per-image GarbageCollection config and GC implementation with a pluggable OCIFactory; records image creation timestamps; adds Shoot CRD; updates deepcopy, CRD schemas, tests, and workflows.

Changes

Cohort / File(s) Summary
CI & Build
.github/workflows/checks.yaml, .github/workflows/ci.yaml, .github/workflows/codeql.yaml, Dockerfile
Bump Go from 1.25.5 → 1.25.8; add --timeout=15m arg to golangci-lint in checks workflow; update Docker builder image tag.
API & DeepCopy / CRD schema
api/v1alpha1/managedcloudprofile.go, api/v1alpha1/zz_generated.deepcopy.go, crd/cloudprofilesync.cobaltcore.dev_managedcloudprofiles.yaml
Add GarbageCollectionConfig type and optional garbageCollection field on MachineImageUpdate; generate deepcopy support; update CRD schema and controller-gen annotation to include garbageCollection (enabled, maxAge).
Source metadata
cloudprofilesync/source.go
Add CreatedAt time.Time to SourceImage; parse OCI manifest annotations (org.opencontainers.image.created / created) to populate creation timestamp.
Controller: GC logic & hooks
controllers/managedcloudprofile_controller.go
Add exported OCIFactory hook; implement garbage-collection workflow (initialize OCI source, list versions, compute referenced vs old by MaxAge, delete unreferenced old versions, patch status/conditions); add helpers deleteVersion and getReferencedVersions.
Controller tests
controllers/managedcloudprofile_controller_test.go, controllers/suite_test.go
Add fakeSource test double and many GC-focused tests (deletion, reference preservation, credential/registry errors, listing failures, providerConfig updates); add created annotation to OCI index in test setup.
New CRD
crd/core.gardener.cloud_shoots.yaml
Add new Shoot CRD (group core.gardener.cloud, version v1beta1, kinds/plural and schema including worker machine.image name/version).

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as ManagedCloudProfile<br/>Reconciler
    participant OCI as OCI Source
    participant K8s as Kubernetes API
    participant CP as CloudProfile Storage

    Reconciler->>OCI: OCIFactory(params, insecure) → initialize (if GC enabled)
    Reconciler->>OCI: List available image versions (include CreatedAt)
    OCI-->>Reconciler: Return versions list

    Reconciler->>K8s: Query referenced versions from CloudProfile & Shoots
    K8s-->>Reconciler: Return referenced versions

    Reconciler->>Reconciler: Compute deletable = versions NOT referenced AND older than MaxAge

    loop For each deletable version
        Reconciler->>CP: Remove version from CloudProfile and providerConfig
        CP-->>Reconciler: Confirm / report errors
    end

    alt All deletions successful
        Reconciler->>CP: Patch status/conditions (success)
    else Any error
        Reconciler->>CP: Patch status/conditions (failure reason)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hoppity-hop, old tags retire,

Timestamps tick where images conspire,
OCIs pruned with a careful paw,
Shoots stay safe, no sudden flaw,
A rabbit nods: the registry's in choir.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implemented an automated cleanup of old and unused images' accurately describes the main change: adding garbage collection functionality for machine images in the ManagedCloudProfile CRD, including implementation and tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cloud-profile-garbage-collection
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
controllers/managedcloudprofile_controller_test.go (1)

311-394: Misleading test name: test verifies ProviderConfig, not Shoot worker pools.

The test name states "preserves old machine image versions referenced by Shoot worker pools," but the test actually verifies preservation based on ProviderConfig.MachineImages in the CloudProfile—no Shoot resource is created. This is different from the subsequent test at lines 396-487 which correctly creates a Shoot.

Consider renaming to clarify the actual behavior being tested:

📝 Suggested rename
-	It("preserves old machine image versions referenced by Shoot worker pools", func(ctx SpecContext) {
+	It("preserves old machine image versions referenced in ProviderConfig", func(ctx SpecContext) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller_test.go` around lines 311 - 394,
The test's It description ("preserves old machine image versions referenced by
Shoot worker pools") is misleading because the test verifies preservation via
CloudProfile.Spec.ProviderConfig.MachineImages (and not any Shoot or worker
pools); update the It string to accurately describe the behavior (e.g.,
"preserves old machine image versions referenced by CloudProfile
ProviderConfig/MachineImages") so readers can find the test intent next to the
code paths exercised (look for the Ginkgo It block with the current description
and references to CloudProfile.Spec.ProviderConfig,
ProviderConfig.MachineImages, and ManagedCloudProfile/MachineImageUpdates).
controllers/managedcloudprofile_controller.go (1)

269-285: Consider filtering Shoots server-side for efficiency.

This lists all Shoots across all namespaces and filters client-side. For clusters with many Shoots, this could be inefficient during each reconciliation.

If a field index exists for spec.cloudProfile.name, consider using client.MatchingFields. Alternatively, since GC runs periodically (5-minute requeue), this may be acceptable overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller.go` around lines 269 - 285, The
current code lists all Shoots into shootList via r.List(ctx, shootList,
client.InNamespace(metav1.NamespaceAll)) and then filters client-side by
shoot.Spec.CloudProfile.Name (cloudProfileName), which is inefficient; change
the List call to perform server-side filtering using client.MatchingFields (or
client.MatchingField) for the field key used by the field index for
spec.cloudProfile.name so only matching Shoots are returned, and ensure a field
index exists for that key (or add one where indexes are registered) before using
it; keep the subsequent loop that checks worker.Machine.Image.Name (imageName)
and populates referenced unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/managedcloudprofile_controller.go`:
- Around line 171-180: The status patch message is a copy-paste from a different
error: in the block that handles failures from deleteVersion you must change the
condition Message (and align the returned error text) to reference the deletion
failure instead of "failed to list source versions for garbage collection";
update the fmt.Sprintf message passed to patchStatusAndCondition (for
CloudProfileAppliedConditionType on the ManagedCloudProfile/mcp) to something
like "failed to delete image versions: %s" and ensure the final returned
fmt.Errorf also uses the same deletion-oriented wording so messages from
patchStatusAndCondition and the returned error consistently reflect
deleteVersion failures.
- Around line 214-241: The code currently ignores errors from json.Unmarshal and
json.Marshal which can leave cp.Spec.MachineImages and
cp.Spec.ProviderConfig.Raw inconsistent; update the block around
json.Unmarshal(json.Unmarshal(...)) and json.Marshal(json.Marshal(cfg)) to
handle errors: if json.Unmarshal fails, log the error (including context:
imageName and cp.Name) and return the error from the reconciler (or otherwise
abort the update) so no partial changes occur; after building cfg and
marshaling, if json.Marshal fails, log and return the error instead of silently
skipping the update, and only assign cp.Spec.ProviderConfig.Raw = raw when
marshal succeeded; reference the symbols cp.Spec.ProviderConfig, json.Unmarshal,
json.Marshal, cfg, cfg.MachineImages, and cp.Spec.ProviderConfig.Raw when making
these changes.
- Around line 246-288: The function getReferencedVersions currently swallows
errors from r.Get and r.List which can lead to accidental deletion of in-use
images; change its signature to return (map[string]bool, error), propagate and
return any errors from r.Get and r.List (including json.Unmarshal failures)
instead of ignoring them, and update the caller that invokes
getReferencedVersions (the reconciler code that previously did
referencedVersions := r.getReferencedVersions(...)) to handle the error (e.g.,
log and skip GC on error). Ensure to update all call sites to check the returned
error and retain existing behavior only on nil error.

---

Nitpick comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 311-394: The test's It description ("preserves old machine image
versions referenced by Shoot worker pools") is misleading because the test
verifies preservation via CloudProfile.Spec.ProviderConfig.MachineImages (and
not any Shoot or worker pools); update the It string to accurately describe the
behavior (e.g., "preserves old machine image versions referenced by CloudProfile
ProviderConfig/MachineImages") so readers can find the test intent next to the
code paths exercised (look for the Ginkgo It block with the current description
and references to CloudProfile.Spec.ProviderConfig,
ProviderConfig.MachineImages, and ManagedCloudProfile/MachineImageUpdates).

In `@controllers/managedcloudprofile_controller.go`:
- Around line 269-285: The current code lists all Shoots into shootList via
r.List(ctx, shootList, client.InNamespace(metav1.NamespaceAll)) and then filters
client-side by shoot.Spec.CloudProfile.Name (cloudProfileName), which is
inefficient; change the List call to perform server-side filtering using
client.MatchingFields (or client.MatchingField) for the field key used by the
field index for spec.cloudProfile.name so only matching Shoots are returned, and
ensure a field index exists for that key (or add one where indexes are
registered) before using it; keep the subsequent loop that checks
worker.Machine.Image.Name (imageName) and populates referenced unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79d0a92a-6377-4ba1-866d-0c75104a070a

📥 Commits

Reviewing files that changed from the base of the PR and between 72e3dc9 and 6f5275a.

📒 Files selected for processing (12)
  • .github/workflows/checks.yaml
  • .github/workflows/ci.yaml
  • .github/workflows/codeql.yaml
  • Dockerfile
  • api/v1alpha1/managedcloudprofile.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • cloudprofilesync/source.go
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go
  • controllers/suite_test.go
  • crd/cloudprofilesync.cobaltcore.dev_managedcloudprofiles.yaml
  • crd/core.gardener.cloud_shoots.yaml

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 6f5275a to e5daea6 Compare March 13, 2026 13:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crd/core.gardener.cloud_shoots.yaml (2)

68-70: Consider enabling the status subresource.

The status is defined but no subresource is enabled. If the controller needs to update Shoot status independently of spec changes (standard Kubernetes pattern), add a subresources section.

🔧 Suggested addition after schema block
           status:
             description: Status contains the current status of the Shoot.
             type: object
+    subresources:
+      status: {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crd/core.gardener.cloud_shoots.yaml` around lines 68 - 70, The CRD defines
the status object for the Shoot but does not enable the status subresource; add
a top-level subresources block to the CRD enabling status so controllers can
update Shoot.status independently of spec (i.e., add a subresources: { status:
{} } entry adjacent to the schema definition). Ensure the block is added at the
same CRD root level where the schema and metadata/kind for Shoot are defined so
the Kubernetes API server exposes the status subresource for updates.

33-67: Consider adding required field validations for critical properties.

The spec schema lacks required constraints. Fields like cloudProfile.name and provider.workers[].machine.image.name/version are likely mandatory for GC logic to function correctly. Without validation, malformed Shoot resources could cause nil pointer issues or silent failures during garbage collection.

🔧 Suggested schema enhancement
           spec:
             description: Spec defines the desired state of the Shoot.
             type: object
+            required:
+              - cloudProfile
             properties:
               cloudProfile:
                 description: Reference to the CloudProfile used by the Shoot.
                 type: object
+                required:
+                  - name
                 properties:
                   name:
                     description: Name of the CloudProfile.
                     type: string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crd/core.gardener.cloud_shoots.yaml` around lines 33 - 67, Add JSON Schema
"required" constraints to the Shoot CRD spec to enforce mandatory fields;
specifically mark spec, spec.cloudProfile, spec.cloudProfile.name,
spec.provider, spec.provider.workers (the array), and within each worker item
require machine and machine.image and then require image.name and image.version
(use items.required on the workers array and required arrays inside the
machine/image object schemas). Ensure the required arrays are added alongside
the existing properties for spec, cloudProfile, provider, workers, machine, and
image so the API server rejects resources missing these critical fields (refer
to the symbols spec, cloudProfile.name, provider.workers[].machine.image.name
and provider.workers[].machine.image.version).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 311-394: The test "preserves old machine image versions referenced
by Shoot worker pools" (in managedcloudprofile_controller_test.go) does not
create a Shoot and thus only preserves 1.0.0 because it exists in
ProviderConfig; either rename the It description to reflect ProviderConfig-based
preservation (change the test name string) or add a Shoot resource that
references image version "1.0.0" (create a gardenerv1beta1.Shoot with a worker
pool using machineImage name "preserve-image" and version "1.0.0") before
creating the ManagedCloudProfile (mcp) so the reconciliation logic that inspects
Shoot worker pools (similar to the working test "preserves machine image
versions referenced by Shoot workers") will actually be exercised; update
cleanup to delete the Shoot as well.
- Around line 36-80: The AfterEach cleanup is missing deletion of created
Shoots, which can leak between tests; update the AfterEach (where k8sClient is
used) to list gardenv1beta1.ShootList, iterate over items and call
k8sClient.Delete(ctx, &shoot) for each Shoot, and add an Eventually block that
lists gardenv1beta1.ShootList and asserts its length is 0 to ensure asynchronous
deletion has completed.

---

Nitpick comments:
In `@crd/core.gardener.cloud_shoots.yaml`:
- Around line 68-70: The CRD defines the status object for the Shoot but does
not enable the status subresource; add a top-level subresources block to the CRD
enabling status so controllers can update Shoot.status independently of spec
(i.e., add a subresources: { status: {} } entry adjacent to the schema
definition). Ensure the block is added at the same CRD root level where the
schema and metadata/kind for Shoot are defined so the Kubernetes API server
exposes the status subresource for updates.
- Around line 33-67: Add JSON Schema "required" constraints to the Shoot CRD
spec to enforce mandatory fields; specifically mark spec, spec.cloudProfile,
spec.cloudProfile.name, spec.provider, spec.provider.workers (the array), and
within each worker item require machine and machine.image and then require
image.name and image.version (use items.required on the workers array and
required arrays inside the machine/image object schemas). Ensure the required
arrays are added alongside the existing properties for spec, cloudProfile,
provider, workers, machine, and image so the API server rejects resources
missing these critical fields (refer to the symbols spec, cloudProfile.name,
provider.workers[].machine.image.name and
provider.workers[].machine.image.version).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2d20438-0c82-4c54-a759-0315b1fada47

📥 Commits

Reviewing files that changed from the base of the PR and between 6f5275a and e5daea6.

📒 Files selected for processing (3)
  • controllers/managedcloudprofile_controller_test.go
  • crd/cloudprofilesync.cobaltcore.dev_managedcloudprofiles.yaml
  • crd/core.gardener.cloud_shoots.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • crd/cloudprofilesync.cobaltcore.dev_managedcloudprofiles.yaml

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/managedcloudprofile_controller.go`:
- Around line 267-283: getReferencedVersions currently treats every image listed
in cp.Spec.ProviderConfig (unmarshalled into providercfg.CloudProfileConfig) as
a referenced version, creating a circular reference with updateMachineImages and
preventing GC; remove or disable the ProviderConfig-based loop in
getReferencedVersions (the block starting with "if cp.Spec.ProviderConfig != nil
{ ... }") so only Shoot worker pool checks (the code that iterates Shoots and
worker pools) mark versions as referenced, ensuring newly-synced ProviderConfig
entries don't permanently block garbage collection.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b80b9ad-b29e-4648-a4ec-c0eb20d091a4

📥 Commits

Reviewing files that changed from the base of the PR and between e5daea6 and 8cb8455.

📒 Files selected for processing (1)
  • controllers/managedcloudprofile_controller.go

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 8cb8455 to 1e38ff4 Compare March 13, 2026 14:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
controllers/managedcloudprofile_controller_test.go (1)

36-80: ⚠️ Potential issue | 🟡 Minor

Missing Shoot cleanup in AfterEach causes potential test pollution.

Tests at lines 357, 446 create Shoot resources but the AfterEach block doesn't clean them up. If a test fails before reaching its manual cleanup, Shoots will persist and could affect subsequent tests.

,

Add Shoot cleanup to AfterEach
 		Eventually(func(g Gomega) int {
 			var updated corev1.SecretList
 			g.Expect(k8sClient.List(ctx, &updated)).To(Succeed())
 			count := 0
 			for _, sec := range updated.Items {
 				if sec.Namespace == metav1.NamespaceDefault && sec.Name == "oci" {
 					count++
 				}
 			}
 			return count
 		}).Should(Equal(0))
+
+		var shootList gardenerv1beta1.ShootList
+		Expect(k8sClient.List(ctx, &shootList)).To(Succeed())
+		for _, shoot := range shootList.Items {
+			Expect(k8sClient.Delete(ctx, &shoot)).To(Succeed())
+		}
+
+		Eventually(func(g Gomega) int {
+			var updated gardenerv1beta1.ShootList
+			g.Expect(k8sClient.List(ctx, &updated)).To(Succeed())
+			return len(updated.Items)
+		}).Should(Equal(0))
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller_test.go` around lines 36 - 80, The
AfterEach cleanup is missing deletion of Shoot resources; update the AfterEach
block to list and delete all Shoot objects created by tests using the k8sClient
(e.g., declare a gardenv1beta1.ShootList named shootList, call
Expect(k8sClient.List(ctx, &shootList)).To(Succeed()), iterate and
Expect(k8sClient.Delete(ctx, &shoot)) for each item), and add an Eventually
assertion (similar to the ones for ManagedCloudProfile, CloudProfile and Secret)
that waits until len(updatedShootList.Items) == 0 to avoid test pollution;
reference the AfterEach function, k8sClient.List/Delete calls, and the new
shootList/updatedShootList variables when implementing.
🧹 Nitpick comments (1)
controllers/managedcloudprofile_controller.go (1)

165-168: Consider logging when CreatedAt is missing.

Versions with a zero CreatedAt are silently skipped and will never be garbage collected. While this is a safe default, it could lead to accumulated orphan versions without operators knowing why. A debug-level log would aid troubleshooting.

Suggested improvement
 		for _, v := range versions {
 			if v.CreatedAt.IsZero() {
+				log.V(1).Info("skipping version without creation timestamp during GC", "image", updates.ImageName, "version", v.Version)
 				continue
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller.go` around lines 165 - 168, The
loop that skips versions when v.CreatedAt.IsZero() (the for _, v := range
versions loop) silently discards those entries; add a debug-level log inside
that branch to record which version was skipped (use identifiable fields on v
such as v.Name or v.ID) and why (e.g., "skipping version with zero CreatedAt"),
using the controller's logger (e.g., r.Log.Debug or the existing logger
variable) so operators can discover orphaned versions during troubleshooting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 590-634: Rename or adjust the test so its name and setup match
what it actually asserts: the current test overrides controllers.OCIFactory
causing updateMachineImages (controller.updateMachineImages) to fail, yielding
Condition Type controllers.CloudProfileAppliedConditionType with Reason
"ApplyFailed" and message "failed to retrieve images version"; either rename the
It(...) description to reflect OCI source/listing failures (e.g., "reports
failure when OCI source listing errors occur") or change the setup so
updateMachineImages does not use the fakeSource (remove/configure
MachineImageUpdates or OCIFactory) and instead inject a failure that triggers
the garbage collection path to assert Reason "GarbageCollectionFailed" for
GC-specific behavior.

In `@controllers/managedcloudprofile_controller.go`:
- Around line 107-110: The getCredential failure path in the GC flow
(r.getCredential called with updates.Source.OCI.Password) returns the error
without updating the ManagedCloudProfile status; mirror the other GC error paths
by setting a descriptive condition on the MCP status (e.g.,
"CredentialRetrievalFailed" with the error message) and patching the status
before returning. Locate the GC handler where r.getCredential is invoked, create
the same style of status condition used elsewhere in the file, call the
controller's status patch method (the same helper used for OCI factory/version
errors), include the error text in the condition, then return the error as
before.

---

Duplicate comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 36-80: The AfterEach cleanup is missing deletion of Shoot
resources; update the AfterEach block to list and delete all Shoot objects
created by tests using the k8sClient (e.g., declare a gardenv1beta1.ShootList
named shootList, call Expect(k8sClient.List(ctx, &shootList)).To(Succeed()),
iterate and Expect(k8sClient.Delete(ctx, &shoot)) for each item), and add an
Eventually assertion (similar to the ones for ManagedCloudProfile, CloudProfile
and Secret) that waits until len(updatedShootList.Items) == 0 to avoid test
pollution; reference the AfterEach function, k8sClient.List/Delete calls, and
the new shootList/updatedShootList variables when implementing.

---

Nitpick comments:
In `@controllers/managedcloudprofile_controller.go`:
- Around line 165-168: The loop that skips versions when v.CreatedAt.IsZero()
(the for _, v := range versions loop) silently discards those entries; add a
debug-level log inside that branch to record which version was skipped (use
identifiable fields on v such as v.Name or v.ID) and why (e.g., "skipping
version with zero CreatedAt"), using the controller's logger (e.g., r.Log.Debug
or the existing logger variable) so operators can discover orphaned versions
during troubleshooting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47fde647-7fe4-423a-bb9c-2f37ff400dcf

📥 Commits

Reviewing files that changed from the base of the PR and between 8cb8455 and 1e38ff4.

📒 Files selected for processing (2)
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 1e38ff4 to 1c41e88 Compare March 13, 2026 14:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
controllers/managedcloudprofile_controller.go (1)

158-159: Avoid listing all Shoots once per image update.

getReferencedVersions does a full Shoot list each invocation; called inside the machine-image loop this scales as O(imageUpdates × shoots). Consider listing once per reconcile and reusing an index keyed by (cloudProfileName, imageName).

Also applies to: 272-275

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller.go` around lines 158 - 159,
getReferencedVersions is called repeatedly inside the machine-image update loop
and triggers a full Shoot list each time, causing O(images × shoots) work;
instead, list Shoots once per reconcile and build an in-memory index keyed by
(cloudProfileName,imageName) that maps to referenced versions. Modify the
reconcile so you perform a single Shoot list at the start (or at the top of the
reconcile function), construct a map (e.g.,
referencedVersionsIndex[cloudProfileName+"|"+imageName] = []versions) by
iterating Shoots and extracting image references, and then replace direct calls
to getReferencedVersions(ctx, mcp.Name, updates.ImageName) (and the similar call
at lines ~272-275) with lookups into that index; keep getReferencedVersions
logic for on-demand use or refactor it to accept a precomputed Shoot list to
preserve behavior if needed.
controllers/managedcloudprofile_controller_test.go (1)

511-553: Clarify apply-vs-GC intent in these GC-labeled failure tests.

Both tests are named as GC failures, but setup can fail in the apply phase first. Consider either renaming to apply-path failure behavior or asserting explicit Reason to lock intent.

Also applies to: 555-593

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/managedcloudprofile_controller_test.go` around lines 511 - 553,
Test name and assertions are ambiguous about whether the failure is in apply vs
garbage-collection; either rename the It block (e.g., change "handles missing
credential for GC OCI source" to "handles missing credential during apply for GC
OCI source") or add an explicit assertion on the condition Reason to lock the
intent (locate the ManagedCloudProfile test case using the It block and assert
on mcp.Status.Conditions entries that the CloudProfileAppliedConditionType has a
specific Reason string emitted by the controller, e.g., add
Expect(...).To(HaveField("Reason",
Equal("<controller-reason-for-apply-failure>"))) so the test documents whether
it is an apply-path failure or a GC-path failure).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 49-53: Replace the current immediate deletions of Shoot objects
(using k8sClient.Delete and iterating over shootList.Items) with a deletion loop
followed by an assertion that cleanup completed: after calling k8sClient.Delete
for each shoot, use Eventually with a short polling interval to re-list into
gardenerv1beta1.ShootList via k8sClient.List(ctx, &shootList) and
Expect(len(shootList.Items)).To(BeZero()) (or To(Equal(0))); reference the
existing k8sClient, ctx, gardenerv1beta1.ShootList, Eventually and Expect
symbols when adding this check so the test waits until all Shoots are Gone and
prevents cross-test pollution.

In `@controllers/managedcloudprofile_controller.go`:
- Around line 172-183: Before computing cutoff, validate that
updates.GarbageCollection.MaxAge.Duration is non-negative; if it's negative,
avoid treating it as a valid retention period (either clamp it to zero or skip
garbage collection) so you don't compute a future cutoff and delete unreferenced
versions incorrectly. Update the block around cutoff :=
time.Now().Add(-updates.GarbageCollection.MaxAge.Duration) (referencing cutoff,
updates.GarbageCollection.MaxAge.Duration, versions, versionsToDelete,
referencedVersions, and v.CreatedAt) to check Duration >= 0, handle/log the
invalid value, and only build versionsToDelete when the MaxAge is valid.

---

Nitpick comments:
In `@controllers/managedcloudprofile_controller_test.go`:
- Around line 511-553: Test name and assertions are ambiguous about whether the
failure is in apply vs garbage-collection; either rename the It block (e.g.,
change "handles missing credential for GC OCI source" to "handles missing
credential during apply for GC OCI source") or add an explicit assertion on the
condition Reason to lock the intent (locate the ManagedCloudProfile test case
using the It block and assert on mcp.Status.Conditions entries that the
CloudProfileAppliedConditionType has a specific Reason string emitted by the
controller, e.g., add Expect(...).To(HaveField("Reason",
Equal("<controller-reason-for-apply-failure>"))) so the test documents whether
it is an apply-path failure or a GC-path failure).

In `@controllers/managedcloudprofile_controller.go`:
- Around line 158-159: getReferencedVersions is called repeatedly inside the
machine-image update loop and triggers a full Shoot list each time, causing
O(images × shoots) work; instead, list Shoots once per reconcile and build an
in-memory index keyed by (cloudProfileName,imageName) that maps to referenced
versions. Modify the reconcile so you perform a single Shoot list at the start
(or at the top of the reconcile function), construct a map (e.g.,
referencedVersionsIndex[cloudProfileName+"|"+imageName] = []versions) by
iterating Shoots and extracting image references, and then replace direct calls
to getReferencedVersions(ctx, mcp.Name, updates.ImageName) (and the similar call
at lines ~272-275) with lookups into that index; keep getReferencedVersions
logic for on-demand use or refactor it to accept a precomputed Shoot list to
preserve behavior if needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf245499-da6b-4980-8252-408e60a300de

📥 Commits

Reviewing files that changed from the base of the PR and between 1e38ff4 and 1c41e88.

📒 Files selected for processing (2)
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go

@valeryia-hurynovich valeryia-hurynovich force-pushed the cloud-profile-garbage-collection branch from 1c41e88 to 132d34e Compare March 16, 2026 15:03
errs := make([]error, 0)
for _, updates := range mcp.Spec.MachineImageUpdates {
// only call updater when an explicit source is provided
if updates.Source.OCI == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously on not having an updates.Source.OCI we were adding errors, now this will differ the behaviour, do we want it this way ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case we don't need a error handling since it means that particular MachineImageUpdate doesn’t have an OCI source configured, so there’s nothing for the garbage-collector to do for it and we move on to the next update

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring not only to the garbage collection functionality, but I see thats not part of the revised code, so all good now 👍🏻

@videlov
Copy link

videlov commented Mar 17, 2026

overall looks very good, good job 👌🏻

Copy link

@videlov videlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍🏻

errs := make([]error, 0)
for _, updates := range mcp.Spec.MachineImageUpdates {
// only call updater when an explicit source is provided
if updates.Source.OCI == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring not only to the garbage collection functionality, but I see thats not part of the revised code, so all good now 👍🏻

Parallel: 1,
}, updates.Source.OCI.Insecure)
if err != nil {
return ctrl.Result{}, r.failWithStatusUpdate(ctx, mcp, fmt.Sprintf("failed to initialize OCI source for garbage collection: %s", err), fmt.Errorf("failed to initialize OCI source for garbage collection: %w", err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very good, only thing that is bothering me is this message/error string duplication. if you figure out a way to not have that would be great. but is also fine if you want to keep it, your decision 👍🏻

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also small nitpick returning Reconcile's ctrl.Result{} should be a job for the main Reconcile func.


if len(versionsToDelete) > 0 {
if err := r.deleteVersion(ctx, mcp.Name, updates.ImageName, versionsToDelete); err != nil {
if apierrors.IsInvalid(err) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, you are right. Was confused by the name of the function, though we are deleting some API, but we are really filtering out stuff and update/patch 👍🏻 perhaps name could be something more precise but don't have an exact example.

@github-actions
Copy link

Merging this branch changes the coverage (2 decrease, 1 increase)

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/cloud-profile-sync/api/v1alpha1 47.24% (-0.79%) 👎
github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync 81.98% (+2.17%) 👍
github.com/cobaltcore-dev/cloud-profile-sync/controllers 72.57% (-11.71%) 💀

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/cloud-profile-sync/api/v1alpha1/managedcloudprofile.go 100.00% (ø) 1 1 0
github.com/cobaltcore-dev/cloud-profile-sync/api/v1alpha1/zz_generated.deepcopy.go 46.91% (-0.77%) 162 (+11) 76 (+4) 86 (+7) 👎
github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync/imageupdater.go 86.11% (+5.56%) 36 31 (+2) 5 (-2) 👍
github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync/source.go 74.51% (+1.78%) 51 (+7) 38 (+6) 13 (+1) 👍
github.com/cobaltcore-dev/cloud-profile-sync/controllers/managedcloudprofile_controller.go 72.57% (-11.71%) 175 (+105) 127 (+68) 48 (+37) 💀

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync/provider_test.go
  • github.com/cobaltcore-dev/cloud-profile-sync/controllers/managedcloudprofile_controller_test.go
  • github.com/cobaltcore-dev/cloud-profile-sync/controllers/suite_test.go

@valeryia-hurynovich valeryia-hurynovich merged commit e2d28e4 into master Mar 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants