Skip to content

fix(bundler): pin OCI sync for argocd-helm mixed-component -pre/-post children#1039

Merged
mchmarny merged 3 commits into
mainfrom
fix/argocd-helm-pre-child-oci-1018
May 26, 2026
Merged

fix(bundler): pin OCI sync for argocd-helm mixed-component -pre/-post children#1039
mchmarny merged 3 commits into
mainfrom
fix/argocd-helm-pre-child-oci-1018

Conversation

@mchmarny
Copy link
Copy Markdown
Member

Summary

Closes #1018 by (a) regression-testing the post-#1035 contract on the mixed-component -pre/-post synthetic-split code path that was the actual failure mode in the bug report, and (b) correcting two stale documentation examples that still pointed operators at the pre-#1035 contract.

Motivation / Context

#1018 reported that --deployer argocd-helm bundles published to OCI fail to sync on the <component>-pre / <component>-post Applications (the bug example was gpu-operator-pre). The structural fix landed in #1035 — the path-based child template now appends /{{ .Chart.Name }} to the rendered source.repoURL so Argo CD's generic OCI source resolves to the actual artifact when the operator passes --set repoURL=<parent namespace>.

But two doc references were missed in #1035's sweep and still showed --set repoURL=oci://<your-registry>/<path>/aicr-bundle (with chart segment). An operator following those examples gets a double-suffixed reference (.../aicr-bundle/aicr-bundle:<tag>), which Argo cannot resolve — reproducing #1018's exact failure. Without a regression test specifically exercising the synthetic-split path, the symptom can come back.

Fixes: #1018
Related: #1034, #1035

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • Documentation update

Component(s) Affected

  • Bundlers (pkg/bundler, pkg/component/*)
  • Docs/examples (docs/, examples/)

Implementation Notes

Code doc fixpkg/bundler/deployer/argocdhelm/argocdhelm.go. The writeParentApplicationTemplate docstring example now uses --set repoURL=oci://ghcr.io/myorg (parent namespace). It also references #1018 / #1034 so anyone reading the source while debugging an OCI-sync issue lands on the right context.

User doc fixdocs/user/cli-reference.md. The argocd-helm "Recommended deploy flow" example now matches the post-#1032/#1035 contract, with a short inline rationale ("the parent App appends .Chart.Name; including the chart name in --set repoURL double-suffixes both the parent and the path-based children"). Added a new troubleshooting entry under "Bundle Generation Fails" enumerating the three common OCI-sync failure modes (double-suffixed repoURL, Argo CD < v2.13, missing tag) with the diagnostic commands for each.

Regression testTestHelmTemplate_MixedComponentPreChildResolvesFromOCI in pkg/bundler/deployer/argocdhelm/argocdhelm_test.go. Reproduces the recipe shape from #1018 (gpu-operator ComponentRef + sibling ComponentPreManifests), runs live helm template --set repoURL=oci://example.test/myorg --set targetRevision=v9.9.9-issue-1018, and asserts:

  • Parent aicr-stack Application: source.repoURL == oci://example.test/myorg, source.chart == aicr-bundle.
  • Path-based gpu-operator-pre child: source.repoURL == oci://example.test/myorg/aicr-bundle, source.path == 001-gpu-operator-pre.
  • Sibling upstream-helm gpu-operator child: source.repoURL == https://helm.ngc.nvidia.com/nvidia (must stay as the upstream registry, not get templated from .Values.repoURL).

TestHelmTemplate_RendersWithSetRepoURL already covered the manifest-only path-based shape (nodewright-customizations); this new test covers the synthetic-split shape that ComponentPreManifests / ComponentPostManifests generate — the exact code path #1018 reported a failure on.

Testing

make qualify   # tests + lint + e2e (22 chainsaw cases) + vuln scan + license headers — all passed

New test runs in ~0.4s.

Risk Assessment

Rollout notes: No migration. Operators already on the post-#1035 contract continue to work. Operators following the pre-#1035 example in the docs will, after this PR, see the corrected form and won't double-suffix their --set repoURL.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

… children

Issue #1018 reported that `--deployer argocd-helm` bundles published to
OCI registries fail to sync on the synthetic `<component>-pre` /
`<component>-post` child Applications. The structural fix landed in
#1035 (path-based child template now appends `/{{ .Chart.Name }}` to the
rendered `source.repoURL` so Argo CD's generic OCI source resolves to
the actual artifact). However, two stale doc references still pointed
operators at the pre-#1035 contract — installing per those examples
re-produces the original failure with a double-suffixed reference.

Changes:

1. `pkg/bundler/deployer/argocdhelm/argocdhelm.go` —
   `writeParentApplicationTemplate` docstring example updated to pass
   `--set repoURL=<parent namespace>` (no chart name). The docstring is
   the canonical reference for the deployer contract; the bundle's
   emitted README, the parent App's `required` directive, and the
   path-based child's `required` directive all already use this form.

2. `docs/user/cli-reference.md` — the argocd-helm "Recommended deploy
   flow" example also still showed `--set repoURL=oci://<reg>/<path>/
   aicr-bundle`. Replaced with the parent-namespace form, added the
   contract rationale inline, and added a new troubleshooting entry
   covering the three common failure modes for OCI sync
   (double-suffixed repoURL, Argo CD < v2.13, missing tag).

3. `pkg/bundler/deployer/argocdhelm/argocdhelm_test.go` — new
   `TestHelmTemplate_MixedComponentPreChildResolvesFromOCI` reproduces
   the exact recipe shape from #1018 (Helm component + sibling
   ComponentPreManifests producing an NNN-<name>-pre folder), runs
   `helm template --set repoURL=oci://<parent-namespace>`, and asserts
   the rendered `gpu-operator-pre` Application's `source.repoURL`
   equals `<parent-namespace>/<chart-name>`. Pins the post-#1035
   contract for the mixed-component synthetic-split code path, which
   `TestHelmTemplate_RendersWithSetRepoURL` only covered for
   manifest-only components.

Fixes #1018
Related #1034, #1035
@github-actions
Copy link
Copy Markdown
Contributor

@mchmarny mchmarny self-assigned this May 26, 2026
@mchmarny mchmarny enabled auto-merge (squash) May 26, 2026 20:29
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Coverage Report ✅

Metric Value
Coverage 77.0%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.0%25-green)

Coverage unchanged by this PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1099cbb3-7d4f-4726-a294-20e801cace4a

📥 Commits

Reviewing files that changed from the base of the PR and between e860e29 and e168edc.

📒 Files selected for processing (1)
  • docs/user/cli-reference.md

📝 Walkthrough

Walkthrough

This PR clarifies and validates the correct --set repoURL contract for the argocd-helm deployer. User documentation and inline examples now specify that repoURL must be set to the parent OCI namespace without the chart name; the parent Helm chart appends .Chart.Name to derive child Application source URLs. Documentation now lists Argo CD ≥ v2.13 as an OCI prerequisite and adds troubleshooting for Failed to load target state OCI not-found failures. A new regression test validates repoURL wiring between parent and path-based child Applications.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/aicr#1032: Related changes to OCI chart-name derivation and parent spec templating that this PR’s test and expectations build upon.
  • NVIDIA/aicr#1035: Adds/updates argocd-helm Helm-render regression tests asserting the same repoURL + chart-name contract for path-based child Applications.
  • NVIDIA/aicr#1032: (duplicate in vector search results; included due to strong overlap with OCI chart-name behavior)

Suggested labels

bug

Suggested reviewers

  • lockwobr
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing OCI sync for argocd-helm mixed-component children by pinning the contract with updated documentation and regression tests.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the bug fix, documentation updates, regression test, and testing results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/argocd-helm-pre-child-oci-1018

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/user/cli-reference.md`:
- Around line 1609-1610: The sentence "If the cluster runs Argo CD < v2.13 *or*
the recipe is pure-Helm..." is misleading; change the wording so the version
clause is not presented as a standalone exception—make it clear that the
path-based children exception applies only to pure-Helm recipes
(manifest-only/mixed components require Argo CD >= v2.13). Update the line
containing that exact phrase and preserve the troubleshooting pointer about
`Failed to load target state` for `aicr-stack` and `<component>-pre`/`-post`
Applications, e.g., rephrase to say path-based children are not exercised for
pure-Helm recipes (otherwise Argo CD v2.13+ is required).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ea8eef09-fb1a-42cb-ad18-d62d095553f8

📥 Commits

Reviewing files that changed from the base of the PR and between 91a4552 and 9598cb8.

📒 Files selected for processing (3)
  • docs/user/cli-reference.md
  • pkg/bundler/deployer/argocdhelm/argocdhelm.go
  • pkg/bundler/deployer/argocdhelm/argocdhelm_test.go

Comment thread docs/user/cli-reference.md Outdated
The previous "if Argo < v2.13 *or* pure-Helm" phrasing implied that an
older Argo install alone exempted the bundle from the v2.13 requirement,
which is backwards — path-based children are always emitted when the
recipe has manifest-only / mixed components; older Argo just fails to
resolve them.

Split into two sentences per CodeRabbit's suggestion: the pure-Helm
recipe is the actual exception that lets older Argo work; otherwise
v2.13+ is required.
@mchmarny mchmarny requested a review from yuanchen8911 May 26, 2026 21:21
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean fix for #1018. Regression test is well-targeted (synthetic-split path + parent + sibling upstream-helm child assertions), and the docs add useful Argo CD ≥ v2.13 prereq and troubleshooting context.

Non-blocking issue (minor):

tests/chainsaw/cli/bundle-dynamic/chainsaw-test.yaml#L206 still passes --set repoURL=oci://example/aicr-bundle, which is the old chart-qualified form. With the default chart name aicr-bundle, path-based children render as oci://example/aicr-bundle/aicr-bundle:v1. The test still passes because the assertion at L220 is a substring grep. Please change it to --set repoURL=oci://example so the test exercises the contract this PR documents.

Cross-review by Claude Code + Codex + CodeRabbit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(bundler): argocd-helm generates gpu-operator-pre Application with path instead of chart — blocks OCI sync

2 participants