Skip to content

Add new tooling to cri.py to support modifying kubelet using a daemonset with predefined custom kubelet flags#1034

Open
pdamianov-dev wants to merge 33 commits intomainfrom
pdamianov-dev/k8s-pressure
Open

Add new tooling to cri.py to support modifying kubelet using a daemonset with predefined custom kubelet flags#1034
pdamianov-dev wants to merge 33 commits intomainfrom
pdamianov-dev/k8s-pressure

Conversation

@pdamianov-dev
Copy link
Contributor

This is related to the creation of a suitable framework to run pressure tests against node images as part of Node Hardening efforts.

Supporting documents:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “CRI pressure” execution path that can modify kubelet flags via a privileged DaemonSet before running ClusterLoader2, enabling node pressure testing for node hardening work.

Changes:

  • Switches the k8s-resource-pressure topology to use a new cri-pressure engine template.
  • Adds a new pipeline step template that applies a kubelet-modifying DaemonSet before running the benchmark.
  • Extends cri.py with a modify-kubelet subcommand and adds DaemonSet-manifest generation in clusterloader2/utils.py.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

File Description
steps/topology/k8s-resource-pressure/execute-clusterloader2.yml Routes the topology to the new cri-pressure execution template.
steps/engine/clusterloader2/cri-pressure/execute.yml Adds a pre-benchmark step to apply a kubelet config updater DaemonSet, then runs CL2.
modules/python/clusterloader2/utils.py Adds DaemonSet YAML generation for updating kubelet flags (but also modifies imports/initialization).
modules/python/clusterloader2/cri/cri.py Adds modify-kubelet CLI command and wires it to the DaemonSet generator.

Comment on lines 390 to 392
parser_modify_kubelet.add_argument(
"--custom_kubelet_config", type=str, help="Custom kubelet flags in string format"
)
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--custom_kubelet_config is optional, but modify-kubelet depends on it to generate the DaemonSet. Make this argument required (or validate non-empty and exit with a clear error) to avoid applying a manifest with an empty/None flags string.

Copilot uses AI. Check for mistakes.
Comment on lines 386 to 390
# Sub-command for modify-kubelet
parser_modify_kubelet = subparsers.add_parser(
"modify-kubelet", help="Add custom flags to kubelet and apply via daemonset"
)
parser_modify_kubelet.add_argument(
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests covering main() subcommands in modules/python/tests/test_cri.py, but the new modify-kubelet subcommand isn’t covered. Add a test that exercises arg parsing and verifies it invokes the right handler (and ideally validates the manifest output path).

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 5
from xml.dom import minidom
import json
import os
import docker
from clients.docker_client import DockerClient
from utils.logger_config import get_logger, setup_logging

setup_logging()
logger = get_logger(__name__)

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports / module-level initialization for Docker + logging were removed, but this module still references DockerClient, docker.errors.ContainerError, and logger (e.g., in run_cl2_command and process_cl2_reports). This will raise NameError at runtime when running clusterloader2. Re-introduce the needed imports and define logger (or switch these functions to use logging.getLogger(__name__) and ensure docker/DockerClient are imported).

Copilot uses AI. Check for mistakes.
Comment on lines 209 to 213
labels:
app: kubelet-config-updater
spec:
hostPID: true
containers:
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DaemonSet is privileged with host mounts but has no node scoping. As written it can run on every node and modify kubelet configuration cluster-wide. Add a nodeSelector/affinity (e.g., cri-resource-consume=true) and at minimum constrain to Linux nodes to reduce blast radius.

Copilot uses AI. Check for mistakes.
"--registry_info", type=str, help="Container registry information scraped",
)

# Sub-command for modify-kubelet
Copy link
Contributor

@vittoriasalim vittoriasalim Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need to apply daemonset before benchmarking start, you can apply it at validation stage rather than in the engine e.g : https://github.com/Azure/telescope/pull/1045/changes
https://github.com/Azure/telescope/blob/main/steps/topology/karpenter/validate-resources.yml

I am not exactly sure what is this for
if you need to modify the node image, can do something like this https://github.com/Azure/telescope/pull/1044/changes

kind: DaemonSet
metadata:
name: kubelet-config-updater
namespace: kube-system
Copy link
Contributor

@vittoriasalim vittoriasalim Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to validation stage , e.g: https://github.com/Azure/telescope/pull/1045/changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I can take a look at this, my only real concern is how to pass the parameters in as the yaml itself, and will let you know if I can have the same effect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this example where it patch the VM_SIZE to yaml template

if [ -n "${VM_SIZE:-}" ]; then

@@ -1,8 +1,34 @@
trigger: none

parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need parameters , simply enable them in your test

            n1-p60-memory-managed:
              node_count: 1
              max_pods: 60
              repeats: 1
              operation_timeout: 3m
              load_type: memory
              pod_startup_latency_threshold: 23s
              kubernetes_version: "1.34"
              k8s_os_disk_type: Managed
              scrape_kubelets: True
              enable_custom_kubelet: false
              kubelet_config_type: "eviction-hard" # eviction-soft, eviction-soft-grace-period

in validate-resources

can call
$KUBELET_CONFIG_TYPE

if KUBELET_CONFIG_TYPE == ëviction-hard :

  • override the memory, nodefs and pid
    else if
    else if
    so on and so on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to pass in different flags with each run without changing the pipeline definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intention may not be to schedule this pipeline, we will most likely do it manually or have a separate pipeline trigger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if i set a queue time variable instead?

Copy link
Contributor

@vittoriasalim vittoriasalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove parameters and only enable those variable inside your test

rename use_custom_kubelet to enable_custom_kubelet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants