feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096
feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096rchincha wants to merge 70 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to unify the custom-cloud CA certificate bootstrap path (removing the separate “operation-requests” init scripts) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Windows: add a scheduled task to refresh custom-cloud CA certificates; update
Get-CACertificatesto support legacy vs “rcv1p” modes keyed off location. - Linux: consolidate custom-cloud init to a single init script and update CSE command generation to set a cert-endpoint mode variable.
- Regenerate multiple custom data / generated command snapshots to reflect the new templates.
Reviewed changes
Copilot reviewed 74 out of 176 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task + updates CA retrieval logic and error behavior |
| parts/windows/kuberneteswindowssetup.ps1 | Wires Get-CACertificates -Location and registers refresh task for custom clouds |
| pkg/agent/variables.go | Always injects initAKSCustomCloud payload into cloud-init data |
| pkg/agent/const.go | Removes separate custom-cloud init script constants; keeps single init script |
| pkg/agent/baker.go | Simplifies GetTargetEnvironment; notes IsAKSCustomCloud as deprecated |
| parts/linux/cloud-init/artifacts/cse_cmd.sh | Updates CSE command to set cert endpoint mode + run custom-cloud init script |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Deleted (custom-cloud init consolidation) |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Mirrors CSE command template updates for aks-node-controller parser |
| aks-node-controller/parser/testdata/Compatibility+EmptyConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AzureLinuxv2+Kata+DisableUnattendedUpgrades=false/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+SSHStatusOn/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+EnablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DisablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DefaultPubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomOSConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomCloud/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+Containerd+MIG/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CloudProviderOverrides/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+China/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
You can also share your feedback on Copilot code review. Take the survey.
44ff9ee to
a0a1307
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to unify AKS custom-cloud CA certificate bootstrap behavior (legacy vs “rcv1p/operation-requests” style flows) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Adds Windows CA refresh scheduled task registration and introduces location-based endpoint-mode selection (legacy vs rcv1p).
- Refactors Windows CA certificate retrieval to support both endpoint modes and opt-in gating for rcv1p.
- Simplifies Linux custom-cloud init script selection by consolidating onto
init-aks-custom-cloud.shand removing older variants; updates generated testdata accordingly.
Reviewed changes
Copilot reviewed 93 out of 99 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task and endpoint-mode-aware Get-CACertificates implementation. |
| pkg/agent/variables.go | Simplifies how initAKSCustomCloud is added to Linux cloud-init variables. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates expected Flatcar CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/ACL/CustomData.inner | Updates expected ACL CustomData snapshot (generated content changed). |
| pkg/agent/const.go | Consolidates custom-cloud init script constants to a single script. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup flow to call Get-CACertificates with location and registers CA refresh scheduled task. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes operation-requests-specific Linux init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes Mariner/AzureLinux operation-requests init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes Mariner/AzureLinux legacy init script variant (consolidation). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds a LOCATION shell variable in the generated CSE command template. |
| aks-node-controller/parser/helper.go | Factors out a shared getCloudLocation helper and reuses it in getCloudTargetEnv. |
You can also share your feedback on Copilot code review. Take the survey.
2b3c1d6 to
e19a19b
Compare
e19a19b to
d41856f
Compare
There was a problem hiding this comment.
Pull request overview
This PR unifies the AKS custom cloud CA certificate bootstrap logic to a single flow and adds a Windows scheduled task to periodically refresh custom cloud CA certificates. It also updates Linux/customdata generation and test snapshots to reflect the new wiring.
Changes:
- Add Windows scheduled task registration for daily CA certificate refresh and introduce a location-based cert endpoint mode selector.
- Simplify Linux custom cloud init script selection by standardizing on
init-aks-custom-cloud.sh, plus add wiring/tests for refresh-mode arguments. - Update aks-node-controller template to export
LOCATION, and regenerate CustomData snapshot test artifacts.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for cert endpoint mode selection, scheduled task registration, and CA retrieval behavior. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements unified Windows CA retrieval logic with legacy/rcv1p modes and registers a daily refresh scheduled task. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec assertions to validate refresh-mode argument parsing/wiring in the Linux init script. |
| pkg/agent/variables.go | Changes how initAKSCustomCloud is injected into Linux cloud-init data. |
| pkg/agent/const.go | Removes per-cloud custom init script constants and standardizes on init-aks-custom-cloud.sh. |
| parts/windows/kuberneteswindowssetup.ps1 | Wires CA retrieval call and registers the Windows CA refresh scheduled task during BasePrep. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removed (operation-requests variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removed (operation-requests Mariner/AzureLinux variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removed (Mariner/AzureLinux legacy variant no longer used). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Exports LOCATION into the CSE environment for downstream scripts. |
| aks-node-controller/parser/helper.go | Adds a helper to normalize location and reuses it in cloud target env detection. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
You can also share your feedback on Copilot code review. Take the survey.
d41856f to
18ba549
Compare
18ba549 to
e94c465
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates AKS custom cloud certificate bootstrapping to use a single unified flow and adds a Windows scheduled task for periodic custom cloud CA refresh.
Changes:
- Added Windows CA refresh task registration plus new logic to select cert retrieval mode and opt-in gating.
- Simplified Linux custom cloud init script wiring by removing legacy “operation-requests” variants and normalizing location for refresh mode.
- Added/updated tests and refreshed golden testdata outputs to reflect new custom data content.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for endpoint-mode selection, task registration behavior, and CA retrieval failure handling. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements endpoint-mode derivation, opt-in gating, CA retrieval paths, and a Windows scheduled task for refresh. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec checks to ensure init script wiring for ca-refresh mode and LOCATION usage. |
| pkg/agent/variables.go | Simplifies init script selection and updates how custom cloud init script is injected into cloud-init data. |
| pkg/agent/const.go | Removes now-unused custom-cloud init script constants; keeps unified init script constant. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup to call Get-CACertificates with Location and conditionally register refresh task. |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds LOCATION variable for downstream scripts during custom cloud provisioning. |
| aks-node-controller/parser/helper.go | Adds getCloudLocation helper and reuses it for cloud target env detection. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes legacy operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes legacy Mariner operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes legacy Mariner init script variant (superseded by unified script). |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/ACL/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
Comments suppressed due to low confidence (7)
staging/cse/windows/kubernetesfunc.ps1:1
Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
staging/cse/windows/kubernetesfunc.ps1:1Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
pkg/agent/variables.go:1- This change removes the previous
cs.IsAKSCustomCloud()guard and injects the custom cloud init script intocloudInitDataunconditionally. That can increase customData size for all clusters (risking platform limits) and may introduce unintended side effects if any downstream template writes/executes this script outside custom cloud. Recommend reinstating the custom cloud guard (and only settinginitAKSCustomCloudwhenIsAKSCustomCloud()is true), while still using the unifiedinitAKSCustomCloudScriptfor all custom clouds.
staging/cse/windows/kubernetesfunc.ps1:1 $resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1$resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1- The new rcv1p operation-requests flow is non-trivial (multiple requests, JSON shape assumptions, per-item content downloads, and
$downloadedAnyaggregation), but the added Pester tests only cover legacy mode and the “throws returns false” path. Add tests that (1) exercise the rcv1p path end-to-end with mockedRetry-Commandreturning operation requests and cert bodies, and (2) verify behavior when operation requests are empty/invalid (ensuring the function returns$falseand logs expected warnings).
pkg/agent/variables.go:1 - The PR description still contains placeholder text (
Fixes #with no linked issue and no explanation of “what/why”). Please update the PR description to summarize the behavior change (unified bootstrap + Windows refresh task) and link the relevant issue or remove the placeholder.
e94c465 to
f20d5b8
Compare
f20d5b8 to
b53f240
Compare
…ator Probes all wireserver cert endpoints (isOptedInForRootCerts, operationrequestsroot, operationrequestsintermediate, legacy cacertificates) during validation and dumps CSE log lines related to certificate operations. Uses execScriptOnVMForScenario with explicit t.Logf to ensure output is always visible in test logs, not swallowed by execScriptOnVMForScenarioValidateExitCode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The wireserver operationrequestsroot and operationrequestsintermediate endpoints return certificates under the 'OperationsInfo' field, but the Windows PowerShell code was looking for 'OperationRequests' which doesn't exist in the response. This caused the null check to skip the entire cert download loop, leaving C:\ca empty despite wireserver returning valid certificate data. The Linux implementation avoids this by using grep to extract ResouceFileName values directly from the raw JSON, bypassing the parent field name entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Wraps each azcopy copy call with error checking and logging to diagnose why Windows CSE log uploads consistently return BlobNotFound. Also captures RunCommand stdout/stderr (InstanceView) which was previously not logged, so we can see azcopy output and any MSI auth failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Adds a ValidateFileHasContent check for a nonsense string that will never exist in the CSE log. If this test PASSES, it proves the ExitMissingError handler in exec.go:130 is silently swallowing SSH exit codes and all Windows validators are no-ops. If this test FAILS (expected), validators are working correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The canary test proved validators are functional and our branch CSE zip is correctly delivered to VMs. Wireserver returns IsOptedInForRootCerts=true and the CSE log contains the expected RCV1P log lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Cert installation must succeed for the selected mode (legacy or rcv1p). Previously, failures after exhausting retries were silently swallowed with a warning, leaving the node without certificates. Now failures exit 1, matching the Windows -FailOnError behavior. Retries with backoff in make_request_with_retry still handle transient wireserver issues (rate limiting, temporary unavailability). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Reverts the following temporary diagnostic commits that served their purpose during RCV1P cert mode debugging and are no longer needed: - 807b5a4 (wireserver endpoint diagnostics in validator) Why: Added to debug cert download failures. The root cause was a JSON field name mismatch (OperationRequests vs OperationsInfo), now fixed. Diagnostic probing adds noise to validator output. - 9f6a902 (azcopy error logging in Windows log collection) Why: Added to debug empty CSE log uploads (BlobNotFound). Root cause was ADO job timeout (90m) racing with go test timeout (90m), fixed on main by 54aa84a (reduced go test timeout to 80m). - d083fbe (verbose test output with -v flag) Why: Added so t.Logf output would appear in pipeline logs for diagnostics. No longer needed; increases log noise for all tests. - 45041cb (always collect Windows CSE logs) Why: Removed s.T.Failed() guard to collect logs on success too. Root cause of missing logs was the ADO/go-test timeout race, not the collection logic. Restored failure-only collection. - fdc6962 + 1196773 (canary check, already net-zero) Why: Canary proved validators work correctly. Already removed by the follow-up commit; these two commits cancel each other. - 0bc8f2e (poll wireserver IsOptedInForRootCerts retry loop) Why: Experimental polling for FC goal-state propagation. Tags are now set at VMSS creation time, making polling unnecessary. Already reverted by later commits during development. Kept (not reverted): - 76edb18: Azure CNI cluster for Windows RCV1P tests (real fix for NBC/cluster type mismatch causing IP exhaustion) - a891055: Branch-built CSE zip override (required until RCV1P code ships in a published CSE package) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Wireserver unreachable after retries is now fatal (return 2 + exit 1) instead of silently skipping cert installation. If the subscription is opted in for hardened root certs but we silently fall back to the distro's default trust store, we leave a security hole — the node would trust CAs the customer explicitly intended to replace. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…cription set When RCV1P_SUBSCRIPTION_ID is set, Windows RCV1P positive tests set Scenario.AzureClient/SubscriptionID to the RCV1P subscription, but rcv1pWindowsCluster() always returned ClusterAzureNetwork (default subscription). This subscription mismatch would cause VMSS creation to 404 in the RCV1P subscription's node resource group. Fix: - Add ClusterRCV1PAzureNetwork in cache.go (Azure CNI cluster using RCV1PClusterInfra) - Branch rcv1pWindowsCluster() on hasExplicitRCV1PSubscription(), matching the pattern used by rcv1pCluster() for Linux - Fix Test_RCV1P_Windows_NotOptedIn to use ClusterRCV1PAzureNetwork instead of ClusterRCV1PKubenet (Windows needs Azure CNI) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
On custom clouds (AGC, Delos) where an older version of this script already installed a ca-refresh cron entry without the location argument, the idempotency grep would match the old entry and skip adding the new one. The old cron entry runs ca-refresh with an empty location, causing get_cert_endpoint_mode to default to rcv1p instead of legacy for ussec/usnat environments. Fix: always remove any existing ca-refresh entry for this script and re-add it with the explicit location argument, ensuring upgraded nodes get the correct endpoint mode on periodic refresh. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
All wireserver Retry-Command calls in kubernetesfunc.ps1 increased from 5 to 10 retries, matching Linux make_request_with_retry which uses 10 retries with exponential backoff. Under rate-limiting or transient wireserver unavailability, 5 retries (50s) could exhaust before the endpoint recovers. Added comments explaining: - Retry count parity with Linux - Security rationale: wireserver unreachable with -FailOnError is fatal because silently falling back to the OS default trust store would be a security hole if the customer intended hardened certs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Windows baseTemplateWindows() configures NBC with NetworkPlugin=azure and NetworkPluginMode=overlay. Using a kubenet cluster causes azure-vnet plugin IPAM failures on the node. Switch all Windows RCV1P tests to use ClusterRCV1PAzureNetwork which creates an Azure CNI overlay cluster in the RCV1P subscription. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The following commits are superseded by the permanent fix in c71b1eb which correctly assigns ClusterRCV1PAzureNetwork to Windows RCV1P tests and keeps ClusterRCV1PKubenet for Linux RCV1P tests: - 286c711 REVERT ME: use dedicated kubenet cluster for RCV1P tests - 4de7fe5 REVERT ME: use Azure CNI cluster for Windows RCV1P tests Both are no-ops against the current state and can be safely squashed out during final interactive rebase before merge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Reverts: - 5c2ed65 (canary check that guarantees test failure) - 07d1c44 (5-minute wireserver polling loop - provisioning regression) The canary ValidateFileHasContent for a nonexistent string causes guaranteed test failures. The wireserver polling adds up to 5 minutes of sleep to every Linux RCV1P node provisioning. Remaining diagnostic commits (wireserver endpoint probing, azcopy logging, verbose output) are kept for initial rollout observability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
When RCV1P_SUBSCRIPTION_ID is not explicitly set, the skip logic now checks whether the E2E subscription (E2E_SUBSCRIPTION_ID) has the PlatformSettingsOverride feature flag registered. If it does, the RCV1P tests run automatically using the E2E subscription. This enables MSFT tenant pipelines (where the E2E subscription is already enrolled) to run RCV1P tests without a separate variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
On subscriptions with PlatformSettingsOverride registered, the platform auto-injects the opt-in tag on ALL VMSSes, making the 'not opted in' negative test scenario impossible. Skip these tests when the RCV1P subscription was auto-detected from the E2E subscription. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Replace context.Background() with the caller's context so the VM instance view fetch respects test/scenario timeouts instead of potentially hanging indefinitely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The windows-2025 image does not support TrustedLaunch, only the Gen2 variant does. This matches the pattern on main where Test_Windows2025 uses EmptyVMConfigMutator without TrustedLaunch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…root_certs The function documented return code 2 for 'wireserver unreachable' and the caller correctly checked for it, but the implementation returned 1 (not opted in) on request failure. This silently skipped cert installation on wireserver outages — a security hole if the subscription is enrolled for hardened certs. Now returns 2 on failure so the caller treats it as fatal, matching the documented contract. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
When IsOptedInForRootCerts is true but no certificates are downloaded, Get-CACertificates only logged a warning and returned \False. Because the caller (BasePrep) doesn't check the return value, provisioning continued without the required CA set. Now throws when -FailOnError is set and no certs were downloaded, matching the fail-closed contract. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The published CSE package (aks-windows-cse-scripts-current.zip) does not contain the RCV1P code (Get-CACertificates -Location, -FailOnError, IsOptedInForRootCerts, Register-CACertificatesRefreshTask). Without this override, Windows RCV1P E2E tests pass vacuously using the old code path. This builds a CSE zip from staging/cse/windows/ at test time, uploads it to blob storage with a SAS URL, and overrides CseScriptsPackageURL so the VMs download the branch's CSE scripts. TODO(rcv1p): remove the branch CSE zip override and rcv1pWindowsCSEMutator once the RCV1P code ships in a published CSE package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
The wireserver returns JSON like {"IsOptedInForRootCerts":true} but
the script was using grep for IsOptedInForRootCerts=true (equals
sign), which never matches the JSON colon format. Use jq for proper
JSON parsing instead.
This fix was previously applied but accidentally dropped during a
rebase squash/reorder.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Adapt to upstream signature change: BootstrapConfigMutator now takes (*Cluster, *NodeBootstrappingConfiguration) instead of just (*NodeBootstrappingConfiguration). Also thread infra parameter through setupPrivateDNSForAPIServer to match getClusterVNet signature. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Track the number of successfully saved certificates and return non-zero if all individual cert content fetches failed despite the operation endpoint returning filenames. This closes a gap where retrieve_rcv1p_certs could report success with zero certs actually downloaded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…keys These functions relied on bash dynamic scoping to access the caller's local repodepot_endpoint variable. Pass it as an explicit parameter to follow the repo's shell script guidelines and avoid fragile implicit variable dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
… validator Remove the always-on diagnostic block that probed wireserver endpoints and dumped CSE logs on every Windows RCV1P test run. This bloated test logs, added latency, and could leak wireserver response content into CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…ption createPrivateZone, waitForPrivateZone, createPrivateDNSLink, and the RecordSet call in setupPrivateDNSForAPIServer were hardcoded to config.Azure (the default E2E subscription). When running RCV1P tests in a separate subscription, the MC_ resource group only exists in the RCV1P subscription, causing ResourceGroupNotFound errors. Add an azure *config.AzureClient parameter to these functions so the caller can pass the correct subscription client. setupPrivateDNSForAPIServer now uses infra.Azure; addPrivateEndpointForACR continues using config.Azure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
…V1PSubscriptionID Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
What this PR does / why we need it
This PR implements RCV1P (Robust Certificate Validation for 1P) — the next-generation mechanism for distributing Azure root CA certificates to AKS nodes. Instead of hardcoding certificate bundles, RCV1P queries the Azure wireserver at provisioning time to download and install the latest root certificates into the OS trust store.
Reference: https://eng.ms/docs/products/onecert-certificates-key-vault-and-dsms/onecert-customer-guide/autorotationandecr/rcv1ptsg
Summary of Changes
1. Linux: Unified cert bootstrap flow (
init-aks-custom-cloud.sh)init-aks-custom-cloud-mariner.sh,init-aks-custom-cloud-operation-requests-mariner.sh, andinit-aks-custom-cloud-operation-requests.sh. All cert logic now flows through a singleinit-aks-custom-cloud.shthat detects the distro (Ubuntu, Mariner, AzureLinux, Flatcar, ACL) at runtime.init-aks-custom-cloud-repos.shto keep basecustomDatasize small for non-custom-cloud scenarios (critical for Flatcar/ACL which have tight size limits).legacy(ussec/usnat regions) andrcv1p(all other regions), selected by cloud location at runtime.2. Windows: CA cert refresh task and rcv1p support (
kubernetesfunc.ps1)Get-CACertificateswith-Locationparameter: Determines cert endpoint mode from location, uses legacy endpoint for ussec/usnat, rcv1p for all others.Register-CACertificatesRefreshTask: Registers a daily scheduled task to refresh CA certificates, with backward compatibility for older VHDs that don't accept-Location.Should-InstallCACertificatesRefreshTask: Gates refresh task registration on wireserver opt-in status.3. E2E tests (
e2e/scenario_rcv1p_test.go,e2e/scenario_rcv1p_win_test.go)C:\ca, Windows certificate store import, and scheduled task registration.Test_RCV1P_NotOptedInverifies that omitting the VM opt-in tag correctly prevents cert installation..pipelines/e2e-rcv1p.yamlruns daily at 3am PST with tag filterrcv1pcertmode=true(not yet enabled).4. E2E infrastructure: multi-subscription and VM instance tagging
RCV1P_SUBSCRIPTION_ID) with theMicrosoft.Compute/PlatformSettingsOverridefeature flag. AddedSubscriptionIDfield to scenarios andGetAzure()/GetSubscriptionID()helpers.platformsettings.host_environment.service.platform_optedin_for_rootcerts=true) on the VMSS at creation time via aVMConfigMutator. VMSS-level tags inherit to VM instances automatically.1. Cert endpoint mode is determined by cloud location, not a flag
Decision:
ussec*/usnat*→legacymode, everything else →rcv1pmode. This is determined at runtime from the node's Azure location.Why: Avoids requiring a new API contract field. The location-based approach lets us roll out rcv1p incrementally — ussec/usnat stay on the legacy endpoint that works today, while all other regions use the new rcv1p endpoint with opt-in gating.
2. Two-layer access control for rcv1p
Decision: Both conditions must be met for cert installation:
Microsoft.Compute/PlatformSettingsOverride) enables the wireserver endpointplatformsettings.host_environment.service.platform_optedin_for_rootcerts=true) grants per-VM accessWhy: Defense in depth — the subscription flag is a coarse gate, the VM tag provides per-node opt-in control. Without the tag, wireserver returns
IsOptedInForRootCerts=false.3. VM opt-in tag is set at VMSS creation time
Decision: The opt-in tag (
platformsettings.host_environment.service.platform_optedin_for_rootcerts=true) is set on the VMSS at creation time and inherits to all VM instances automatically.Why: VMSS-level tags propagate to VM instances, and wireserver reads the tag from the VM instance to determine opt-in status. In E2E tests, the positive tests set the tag via a
VMConfigMutatorat VMSS creation, while the negative test (Test_RCV1P_NotOptedIn) simply omits the tag to verify wireserver returnsIsOptedInForRootCerts=false.4.
Get-CACertificatesmoved outsideIsAKSCustomCloudguard (Windows)Decision:
Get-CACertificates -Location $Location -FailOnErrornow runs for all clouds, not just custom clouds.Why: RCV1P applies to all clouds. The function itself handles the location-based mode selection internally and gracefully skips cert installation when wireserver returns
IsOptedInForRootCerts=false(which is the case on public cloud without the feature flag).5. Wireserver failures are fatal after retries
Decision: If wireserver cert endpoints fail after exhausting retries, provisioning fails (
exit 1on Linux,throwon Windows with-FailOnError).Why: Cert installation is required for the selected mode. Silently continuing without certificates would leave the node in an inconsistent state. Retries with backoff handle transient wireserver issues (rate limiting, temporary unavailability).
6. Backward compatibility for Windows VHD/CSE version skew
Decision:
kuberneteswindowssetup.ps1guardsRegister-CACertificatesRefreshTaskwithGet-Commandchecks before calling it.Why: Windows VHD and CSE release independently. Newer CSE must not crash on older VHDs that don't have these functions. The guard falls back gracefully.
Testing Evidence
MSFT tenant (default E2E subscription)
Linux (Build 158446017):
IsOptedInForRootCertscheck works (skips on public cloud as expected)Windows (Build 158446024):
Get-CACertificates -Locationcorrectly selects rcv1p modeShould-InstallCACertificatesRefreshTaskreturns$falseon public cloud (correct)TME tenant (RCV1P_SUBSCRIPTION_ID set in pipeline, with PlatformSettingsOverride feature flag)
Linux — Validated end-to-end: wireserver returns
IsOptedInForRootCerts=true, certificates downloaded and installed into OS trust store, refresh schedule registered. Passed across Ubuntu 2204, Ubuntu 2404, AzureLinux V3, Flatcar, ACL.Windows (Build 161633049):
IsOptedInForRootCerts=true, certificates downloaded toC:\ca, scheduled taskaks-ca-certs-refresh-taskregisteredwindows-2022-containerdjob) was a pre-existingTest_Windows2022_VHDCachingissue unrelated to RCV1POperationRequestsinstead ofOperationsInfo) when parsing wireserver responses — this was the root cause of empty cert downloads. Fixed in commitb6cd4e4f68.Files Changed (31 files, +1979 / -1218)
init-aks-custom-cloud.sh,init-aks-custom-cloud-repos.sh(new), 3 removedkubernetesfunc.ps1,kuberneteswindowssetup.ps1kubernetesfunc.tests.ps1(new)scenario_rcv1p_test.go(new),scenario_rcv1p_win_test.go(new)vmss.go,types.go,validators.go,cluster.go,config/e2e-rcv1p.yaml(new)baker.go,const.go,variables.goPR File Breakdown: Functionality vs Tests
Functionality (1,859 lines — 51%)
parts/linux/cloud-init/artifacts/init-aks-custom-cloud.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-repos.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.shstaging/cse/windows/kubernetesfunc.ps1parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.shpkg/agent/variables.goparts/windows/kuberneteswindowssetup.ps1pkg/agent/const.goparts/linux/cloud-init/nodecustomdata.ymlaks-node-controller/parser/helper.goparts/linux/cloud-init/artifacts/cse_cmd.shaks-node-controller/parser/templates/cse_cmd.sh.gtplpkg/agent/baker.goTests / E2E Infra (1,795 lines — 49%)
e2e/scenario_rcv1p_test.gostaging/cse/windows/kubernetesfunc.tests.ps1e2e/scenario_rcv1p_win_test.goe2e/validators.goe2e/cluster.goe2e/vmss.goe2e/config/azure.goe2e/test_helpers.goe2e/types.gospec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.she2e/cache.goe2e/aks_model.goe2e/config/config.go.pipelines/e2e-rcv1p.yamle2e/kube.go.pipelines/scripts/e2e_run.sh.pipelines/templates/e2e-template.yamlSummary