Skip to content

fix(windows): register k8s-restart-job in NodePrep to avoid PIS bootstrap race#8535

Open
r2k1 wants to merge 2 commits into
mainfrom
akhantimirov/fix-windows-pis-tls-bootstrap-race
Open

fix(windows): register k8s-restart-job in NodePrep to avoid PIS bootstrap race#8535
r2k1 wants to merge 2 commits into
mainfrom
akhantimirov/fix-windows-pis-tls-bootstrap-race

Conversation

@r2k1
Copy link
Copy Markdown
Contributor

@r2k1 r2k1 commented May 19, 2026

What this PR does / why we need it:

Move Register-NodeResetScriptTask from BasePrep to NodePrep (after the temporary kubeconfig removal) on Windows. This fixes a silent kubelet TLS bootstrap failure that hits Windows agentpools attached to a Prepared Image Specification (PIS) / pre-baked VHD path.

Root cause

Windows CSE runs in two phases for PIS / baked VHDs:

  1. BasePrep (bake time, PreProvisionOnly=true):

    • Write-KubeConfig writes a temporary c:\k\config with client-certificate-data: <clusterClientCertificate> embedded — the cluster-shared "nodeclient" cert produced by the AKS RP's certificates_goal_resolver.go (Subject: CN=nodeclient, O=system:nodes).
    • Register-NodeResetScriptTask registers k8s-restart-job with an -AtStartup trigger.
    • VHD is captured / staged for the agentpool.
  2. NodePrep (provision time):

    • Eventually removes c:\k\config (line ~579) if TLS bootstrap is enabled.
    • Calls Start-ScheduledTask k8s-restart-job.

Because k8s-restart-job is -AtStartup, on the FIRST boot from the baked VHD the task fires before NodePrep removes the temporary kubeconfig. windowsnodereset.ps1 calls Start-Service kubelet. Kubelet reads c:\k\config, extracts the embedded nodeclient certificate into c:\k\pki\kubelet-client-<ts>.pem, never performs the proper bootstrap-token CSR exchange. Subsequent API server calls fail because CN=nodeclient doesn't satisfy the NodeAuthorizer:

"Failed while requesting a signed certificate from the control plane"
err="cannot create certificate signing request:
     certificatesigningrequests.certificates.k8s.io is forbidden:
     User \"nodeclient\" cannot create resource \"certificatesigningrequests\"..."

The pool reports Succeeded (CSE / extension level), so the failure is silent — the node never registers with the cluster.

Reproduction (verified)

On the AKS New Zealand Sandbox subscription, cluster aks-pis-test in australiaeast, K8s 1.34.7, Windows Server 2022:

Pool PIS attached Cert subject K8s join
pislinux yes (Linux PIS, NodeProvisionTime) CN=system:node:aks-pislinux-... ✅ Ready
piswin yes (Windows PIS, NodeProvisionTime) CN=nodeclient ❌ Never registered
winctl no (same cluster, same SKU) CN=system:node:akswinctl000000 ✅ Ready in 30s

Timeline on the broken pool (from C:\AzureData\CustomDataSetupScript.log + C:\k\windowsnodereset.log + C:\k\kubelet.err.log):

01:43-01:44   BasePrep (PreProvisionOnly=true), Register-NodeResetScriptTask registered
01:44:04      base_prep.complete written, NodePrep skipped
~01:57:00     VM reboots from baked VHD
01:57:07      windowsnodereset.ps1 starts (k8s-restart-job AtStartup fires)
01:57:11      CSE invocation 2 begins
01:57:20      windowsnodereset.ps1 calls Start-Service kubelet
01:57:30      kubelet writes kubelet-client-<ts>.pem with the nodeclient cert
01:57:33      NodePrep removes c:\k\config (too late)
01:57:44      NodePrep completes

Fix

Move the task registration out of BasePrep and into NodePrep, after the temporary kubeconfig has been removed. The scheduled task does not exist on a baked VHD, so the first boot does not race with NodePrep and kubelet only starts via the explicit Start-ScheduledTask call at the end of NodePrep — at which point the temporary kubeconfig with the embedded cluster client cert is gone and kubelet correctly falls back to the bootstrap-token path in c:\k\bootstrap-config.

Risk / blast radius

  • Linux and non-PIS Windows pools are unaffected (single-phase CSE; the task was never registered before the cleanup ran in practice).
  • Subsequent boots after the node has been properly provisioned behave identically to today: the task fires -AtStartup, windowsnodereset.ps1 restarts kubelet/kubeproxy as before.

Which issue(s) this PR fixes:

Fixes #

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves the registration of the k8s-restart-job scheduled task from the BasePrep phase to the NodePrep phase in the Windows CSE script. The task uses an -AtStartup trigger, and registering it during BasePrep (which runs at VHD bake time in the Prepared Image Specification / two-phase flow) caused it to fire on the first boot before NodePrep could remove the temporary kubeconfig embedding the shared nodeclient certificate. As a result, kubelet would silently skip the proper TLS bootstrap-token CSR exchange and never register with the API server. By registering the task only after the temporary kubeconfig is removed, the race is eliminated for PIS-baked Windows VHDs while preserving existing behavior for non-PIS pools (which run both phases in a single CSE invocation).

Changes:

  • Remove Register-NodeResetScriptTask from BasePrep and replace it with an explanatory comment.
  • Register Register-NodeResetScriptTask in NodePrep immediately after the temporary c:\k\config cleanup block.

…trap race

When Windows VHDs are baked or staged via Prepared Image Specification
(PIS), CSE runs in two phases: BasePrep at bake time and NodePrep at
node provisioning time. Register-NodeResetScriptTask was registered in
BasePrep with an -AtStartup trigger, so on the first boot from a baked
VHD it fired before NodePrep got a chance to remove the temporary
kubeconfig at `c:\k\config`. That kubeconfig embeds the cluster-shared
"nodeclient" client certificate. As a result, kubelet started up, read
the embedded certificate, persisted it under `c:\k\pki`, and skipped
the standard token-based TLS bootstrap. With `CN=nodeclient` (not
`CN=system:node:<hostname>`) the API server's NodeAuthorizer rejected
every kubelet request — "User \"nodeclient\" cannot create
certificatesigningrequests" — and the node never registered. The
agentpool ARM operation succeeded, so the failure was silent.

Repro / verification (NZ Sandbox sub, Windows2022, K8s 1.34.7):
  - Pool WITH PIS:    cert subject CN=nodeclient,O=system:nodes  → never joined
  - Pool WITHOUT PIS: cert subject CN=system:node:<host>         → Ready in ~30s

Move Register-NodeResetScriptTask out of BasePrep and into NodePrep,
after the temporary kubeconfig removal, so the scheduled task does not
exist on a baked VHD and only fires on subsequent boots after the node
has been properly provisioned.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@r2k1 r2k1 force-pushed the akhantimirov/fix-windows-pis-tls-bootstrap-race branch from 69eac1e to 0a5b677 Compare May 19, 2026 02:15
# Register AFTER temp kubeconfig removal: the -AtStartup trigger would
# otherwise race PIS-baked VHD first boot and bring kubelet up with the
# embedded "nodeclient" cert instead of doing TLS bootstrap.
Register-NodeResetScriptTask
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test for this - that the registration happens in nodeprop not baseprep?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a validator. But I don't really like it.

I don't like testing "X doesn't supposed to happen". They are infinite amount of things that doesn't suppose to happen ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced it with a proper e2e. BootstrapTLS enabled vs disabled

Copilot AI review requested due to automatic review settings May 19, 2026 03:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Existing Test_Windows2022_VHDCaching runs with secure TLS bootstrap
enabled, which masks bugs in the legacy bootstrap-token path on Windows
PIS / two-stage CSE provisioning. This scenario forces kubelet onto the
legacy path so stage 2's WaitUntilNodeReady catches regressions there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@r2k1 r2k1 force-pushed the akhantimirov/fix-windows-pis-tls-bootstrap-race branch from 439330a to 0dd4090 Compare May 19, 2026 03:11
Comment thread e2e/scenario_win_test.go
// to use the legacy bootstrap-token path. Catches regressions in the two-stage
// CSE flow that only surface when no secure-tls-bootstrap client is around to
// overwrite the temporary kubeconfig.
func Test_Windows2022_VHDCaching_LegacyTLSBootstrap(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have the bootstrap token fallback scenarios which are designed to test a similar path - could we just add another one for the VHDCaching case instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VHDCaching: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants