Skip to content

Support Karpenter autoscaling in AKS private clusters#1041

Open
xinWeiWei24 wants to merge 12 commits intomainfrom
xinwei/jumpbox_karpenter
Open

Support Karpenter autoscaling in AKS private clusters#1041
xinWeiWei24 wants to merge 12 commits intomainfrom
xinwei/jumpbox_karpenter

Conversation

@xinWeiWei24
Copy link
Collaborator

When AKS private clusters are enabled, our agents cannot access the cluster due to network isolation. This PR introduces a new topology for running Karpenter in private clusters and updates the related engine to support autoscaling benchmarking in private environments.

@xinWeiWei24 xinWeiWei24 force-pushed the xinwei/jumpbox_karpenter branch from 82a3d1c to fea6a3d Compare February 3, 2026 01:29
@xinWeiWei24 xinWeiWei24 marked this pull request as ready for review February 3, 2026 01:32
Copilot AI review requested due to automatic review settings February 3, 2026 01:32
@xinWeiWei24 xinWeiWei24 marked this pull request as draft February 3, 2026 01:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for running Karpenter-based autoscale benchmarks against AKS private clusters by introducing a jumpbox-based topology and wiring clusterloader2 to execute via that jumpbox.

Changes:

  • Add a new karpenter-private-cluster topology that discovers a jumpbox VM, configures kubeconfig on it, and applies Karpenter nodepool configuration remotely.
  • Introduce a reusable SSH command template and a jumpbox-specific clusterloader2 autoscale execution flow, including remote result collection and cleanup behavior split between public and private clusters.
  • Extend the nap-complex Azure Terraform inputs with a jumpbox public IP, subnet, NIC association, and VM definition, and adjust NSG rule validation to allow priority 100 to support the new SSH NSG rule.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
steps/topology/karpenter-private-cluster/validate-resources.yml New topology validation that locates the jumpbox VM by tags, sets JUMPBOX_HOST, configures AKS kubeconfig on the jumpbox, and uploads/applies the Karpenter nodepool manifest remotely.
steps/topology/karpenter-private-cluster/execute-clusterloader2.yml Routes clusterloader2 execution for this topology through the new jumpbox execution template instead of the local engine runner.
steps/topology/karpenter-private-cluster/collect-clusterloader2.yml Wires result collection to the existing autoscale collect template and then invokes the autoscale cleanup template (which now handles both public and private clusters).
steps/ssh/run-command.yml Introduces a generic SSH helper step that executes a provided command on a VM IP (currently the jumpbox), gated on JUMPBOX_HOST being set.
steps/engine/clusterloader2/autoscale/execute_jumpbox.yml Implements the jumpbox-based autoscale pipeline: prepare remote workspace, tar/scp the Python modules and environment file, run override/execute on the jumpbox, and then pull results back to the agent.
steps/engine/clusterloader2/autoscale/cleanup.yml Splits cleanup between a local kubectl path for public clusters and a jumpbox-based path for private clusters via the SSH template; note that the remote cleanup now only runs when the job has succeeded so far.
scenarios/perf-eval/nap-complex/terraform-inputs/azure.tfvars Adds a public IP, subnet, NIC association, and VM config for a dedicated jumpbox with SSH NSG rule, and slightly reorders AKS CLI optional parameters.
modules/terraform/azure/network/network-security-rule/variables.tf Loosens NSG rule priority validation from [120, 4096] to [100, 4096] so rules like the jumpbox SSH rule at priority 100 are accepted.

@xinWeiWei24 xinWeiWei24 marked this pull request as ready for review February 6, 2026 06:53
@xinWeiWei24 xinWeiWei24 force-pushed the xinwei/jumpbox_karpenter branch from b4dfb83 to 334deab Compare February 6, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant