fix(kwok): make argocd OCI repoURL per-lane (follow-up to #1047)#1048
Conversation
PR NVIDIA#1047 stripped the per-recipe segment from OCI_IN_CLUSTER_REF for both deployer lanes, but the assignment lives inside the shared `argocd-oci|argocd-helm-oci)` case branch, so plain argocd-oci was incorrectly switched to the parent-only form too. Result: argocd-oci's root `nvidia-stack` app dials oci://…/aicr:<tag> while the artifact is pushed to oci://…/aicr/<recipe>:<tag>, so the OCI artifact lookup 404s on every PR. (Caught by Codex review after NVIDIA#1047 merged.) Make the assignment per-lane to restore the working argocd-oci contract while keeping the parent-only form for argocd-helm-oci: - argocd-oci -> oci://…/aicr/<recipe> (full path, unchanged from pre-NVIDIA#1047 behavior; root app references the artifact by full path) - argocd-helm-oci -> oci://…/aicr (parent-only; the parent App template appends .Chart.Name) The log_info line is split per-lane too so each prints the URL it will actually dereference.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR updates Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
mchmarny
left a comment
There was a problem hiding this comment.
Clean, targeted follow-up to #1047. The per-lane conditional restores the correct in-cluster repoURL form for argocd-oci (full aicr/<recipe> path needed by the nvidia-stack root app) while preserving #1047's parent-namespace-only form for argocd-helm-oci (parent App appends .Chart.Name itself).
Diff matches the existing per-lane conditional pattern already present in the same case branch (tag-format at L742, deployer-arg at L774). Verified that OCI_IN_CLUSTER_REF for the argocd-oci lane is only consumed via --repo to aicr bundle — there's no separate deploy-time consumer to keep in sync. The split log_info lines are accurate for both lanes.
CI: Lint, Security Scan, and CLI E2E green at review time; Tier 1 matrix (the real validation) still in progress — both argocd-oci and argocd-helm-oci jobs across oke*/lke should clear with this fix. LGTM.
PR NVIDIA#1047 / NVIDIA#1048 tried to fix the argocd-helm-oci KWOK failures by adjusting how the wrapper script passes `--set repoURL=…`, but the root cause is in the bundler's parent App template, not the wrapper. # Why the previous fix did not hold `parentAppTemplate` (pkg/bundler/deployer/argocdhelm/argocdhelm.go) rendered a Helm-OCI shape: source: repoURL: <Values.repoURL> # e.g., oci://reg/path chart: <.Chart.Name> # e.g., my-bundle targetRevision: <tag> The design assumed Argo CD would resolve this as `<repoURL>/<chart>:<tag>`. But Argo CD's `oci://` scheme triggers the **native OCI** code path, where `repoURL` is the FULL artifact reference and `source.chart` is silently ignored. CI confirms it: HEAD /v2/aicr/manifests/0.0.0-<sha>-argocd-helm-oci (artifact actually lives at /v2/aicr/<recipe>/manifests/...) The Argo CD user guide splits the two shapes explicitly: - Helm OCI charts (no `oci://`, `chart:` set): https://argo-cd.readthedocs.io/en/stable/user-guide/helm/ - Native OCI artifacts (`oci://` repoURL, `path:` set): https://argo-cd.readthedocs.io/en/stable/user-guide/oci/ # Fix: dispatch parent App shape on repoURL scheme Conditional template: - `oci://...` repoURL → native OCI shape: append `.Chart.Name` to repoURL, set `path: "."`, drop `chart:`. Mirrors the path-based child template at injectValuesIntoSingleSource. - HTTPS / HTTP repoURL → Helm chart repository shape: keep repoURL as-is, emit `chart: .Chart.Name`, drop `path:`. Preserves the pure-Helm install path the README still advertises (ChartMuseum, GitHub Pages, etc.). # Tests - TestHelmTemplate_RewrittenSources / TestHelmTemplate_PathBasedChildRepoURL updated for the native-OCI shape (parent.repoURL == namespace + "/" + chart, chart == "", path == "."). - TestHelmTemplate_RendersWithHelmRepoRepoURL added: verifies the HTTPS-mode rendering produces repoURL + chart with no path field. - TestGenerate_CustomChartName now asserts chart name in repoURL (was: source.chart). - TestHelmTemplate_AppNameOverride heuristic switched from `source.chart \!= ""` to `source.path == "."`. # Goldens testdata/{helm_and_manifest_only,mixed_component}/templates/aicr-stack.yaml regenerated. # Doc comments Stale `source.chart` / `repoURL/chart:tag` references in parentAppTemplate, Generator.ChartName, DefaultChartName, and the "Publication backends and the path field" block updated to describe the dual-mode contract and cite the Argo CD docs split. # KWOK wrapper script The PR NVIDIA#1047 comment block in kwok/scripts/validate-scheduling.sh referenced the now-wrong source.chart contract; updated to cite the new parentAppTemplate design. The conditional itself (per-lane assignment from NVIDIA#1048) is unchanged. Closes the argocd-helm-oci CI breakage on main introduced by PR NVIDIA#1047.
Summary
Follow-up to #1047 — restore the per-recipe segment for the
argocd-ocilane, which was incorrectly stripped together withargocd-helm-oci.Motivation / Context
PR #1047 fixed the
argocd-helm-ocirepoURL contract (PR #1032 / #1035) by removing the per-recipe segment fromOCI_IN_CLUSTER_REF. But the assignment lives inside the sharedargocd-oci|argocd-helm-oci)case branch (line 722), so the change applied to both lanes. Codex flagged this on the post-merge review:argocd-helm-oci(intended target): parent App template appends.Chart.Nameitself → parent-only repoURL is correct → currently green.argocd-oci(collateral damage): rootnvidia-stackapp references the artifact by full path → it now triesoci://…/aicr:<tag>, but the artifact lives atoci://…/aicr/<recipe>:<tag>→ 404.This PR makes the assignment per-lane:
The
log_infoline at 777 is split per-lane too so each prints the URL it will actually dereference.Fixes: N/A (no dedicated tracking issue)
Related: #1047, #1032, #1035
Type of Change
Component(s) Affected
kwok/scripts/validate-scheduling.sh)Implementation Notes
Matches the existing per-lane conditional pattern in the same branch:
[[ "$DEPLOYER" == "argocd-helm-oci" ]]→ adjust tag format[[ "$DEPLOYER" == "argocd-helm-oci" ]] && deployer_arg="argocd-helm"→ adjust deployer arg[[ "$DEPLOYER" == "argocd-helm-oci" ]]→ adjust in-cluster repoURL formThe explanatory comment block is replaced with a per-lane explanation citing both contracts.
The
flux-ocibranch at line ~844 is untouched (stillaicr/${recipe}) — Flux deploys via OCIRepository CR with the full path baked in, doesn't go through anyrepoURLtemplate indirection.Testing
Both pass. Full
make qualifywas skipped because this is an infra-only change to a CI shell script — Go tests, e2e, vulnerability scan, and yamllint cannot regress from a bash conditional. PerCLAUDE.local.md's doc-only/infra-only carve-out, scoped checks are the appropriate gate here.CI itself is the real validation: after this lands and PRs rebase onto main, both
argocd-ociandargocd-helm-ociTier-1 matrix jobs should be green.Risk Assessment
argocd-oci. Easy revert.Rollout notes: No runtime impact. After merge, every open PR's
argocd-ociTier-1 jobs should clear;argocd-helm-ocijobs continue passing (the PR #1047 fix is preserved).Checklist
bash -n+shellcheck— infra-only change, fullmake qualifynot applicable)git commit -S)