Support Karpenter autoscaling in AKS private clusters#1041
Open
xinWeiWei24 wants to merge 12 commits intomainfrom
Open
Support Karpenter autoscaling in AKS private clusters#1041xinWeiWei24 wants to merge 12 commits intomainfrom
xinWeiWei24 wants to merge 12 commits intomainfrom
Conversation
82a3d1c to
fea6a3d
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds support for running Karpenter-based autoscale benchmarks against AKS private clusters by introducing a jumpbox-based topology and wiring clusterloader2 to execute via that jumpbox.
Changes:
- Add a new
karpenter-private-clustertopology that discovers a jumpbox VM, configures kubeconfig on it, and applies Karpenter nodepool configuration remotely. - Introduce a reusable SSH command template and a jumpbox-specific
clusterloader2autoscale execution flow, including remote result collection and cleanup behavior split between public and private clusters. - Extend the
nap-complexAzure Terraform inputs with a jumpbox public IP, subnet, NIC association, and VM definition, and adjust NSG rule validation to allow priority 100 to support the new SSH NSG rule.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
steps/topology/karpenter-private-cluster/validate-resources.yml |
New topology validation that locates the jumpbox VM by tags, sets JUMPBOX_HOST, configures AKS kubeconfig on the jumpbox, and uploads/applies the Karpenter nodepool manifest remotely. |
steps/topology/karpenter-private-cluster/execute-clusterloader2.yml |
Routes clusterloader2 execution for this topology through the new jumpbox execution template instead of the local engine runner. |
steps/topology/karpenter-private-cluster/collect-clusterloader2.yml |
Wires result collection to the existing autoscale collect template and then invokes the autoscale cleanup template (which now handles both public and private clusters). |
steps/ssh/run-command.yml |
Introduces a generic SSH helper step that executes a provided command on a VM IP (currently the jumpbox), gated on JUMPBOX_HOST being set. |
steps/engine/clusterloader2/autoscale/execute_jumpbox.yml |
Implements the jumpbox-based autoscale pipeline: prepare remote workspace, tar/scp the Python modules and environment file, run override/execute on the jumpbox, and then pull results back to the agent. |
steps/engine/clusterloader2/autoscale/cleanup.yml |
Splits cleanup between a local kubectl path for public clusters and a jumpbox-based path for private clusters via the SSH template; note that the remote cleanup now only runs when the job has succeeded so far. |
scenarios/perf-eval/nap-complex/terraform-inputs/azure.tfvars |
Adds a public IP, subnet, NIC association, and VM config for a dedicated jumpbox with SSH NSG rule, and slightly reorders AKS CLI optional parameters. |
modules/terraform/azure/network/network-security-rule/variables.tf |
Loosens NSG rule priority validation from [120, 4096] to [100, 4096] so rules like the jumpbox SSH rule at priority 100 are accepted. |
b4dfb83 to
334deab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When AKS private clusters are enabled, our agents cannot access the cluster due to network isolation. This PR introduces a new topology for running Karpenter in private clusters and updates the related engine to support autoscaling benchmarking in private environments.