Skip to content

Add image pull 10 nodes scenario#963

Merged
jasminetMSFT merged 32 commits intomainfrom
jasminet/image_pull_prototype
Jan 12, 2026
Merged

Add image pull 10 nodes scenario#963
jasminetMSFT merged 32 commits intomainfrom
jasminet/image_pull_prototype

Conversation

@jasminetMSFT
Copy link
Contributor

@jasminetMSFT jasminetMSFT commented Dec 8, 2025

Summary

Adds containerd image pull metrics collection to the existing CRI module using a scrape_containerd toggle.

Approach

Instead of creating a new module, this PR extends the existing CRI infrastructure:

  • Adds scrape_containerd parameter to enable containerd metrics collection
  • Reuses cri-resource-consume topology
  • Adds containerd-measurements.yaml config to CRI module

Metrics Collected

ContainerdCriImagePullingThroughput

Image pull throughput (MB/s) with the following aggregations:

Metric Description
Avg Weighted average throughput per image pull
AvgPerNode Unweighted average - each node contributes equally
Count Total number of successful image pulls
Perc50 50th percentile (median) throughput across nodes
Perc90 90th percentile throughput across nodes
Perc99 99th percentile throughput across nodes

Results

  1. scrape_containerd=True
    Pipeline: #20260106.12 • Add 20s pod_startup_latency_threshold
    Json Result: perf-eval/image-pull-n10/image_pull_prototype/47996-233c2bdc-3a73-570f-5588-ad30a97c6402.json
  2. scrape_containerd=False
    Pipeline: #20260106.27 • test with scrape_containerd=False
    Json Result: perf-eval/image-pull-n10/image_pull_prototype/48089-233c2bdc-3a73-570f-5588-ad30a97c6402.json

@jasminetMSFT jasminetMSFT marked this pull request as ready for review December 14, 2025 23:42
Copilot AI review requested due to automatic review settings December 14, 2025 23:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new ClusterLoader2-based performance test scenario to measure container image pull performance on AKS clusters. The scenario deploys 10 small deployments with 1 replica each (totaling 10 pods) using imagePullPolicy: Always to force image pulls, then collects detailed metrics from kubelet and containerd via Prometheus.

Key Changes

  • New image-pull-n10 scenario with terraform configuration for 3-node default pool, 1-node Prometheus pool, and 10-node user pool
  • ClusterLoader2 test configuration that measures image pulling throughput, kubelet runtime operations, and pod startup latency
  • Python module with execute/collect functions following established patterns from cri and other modules
  • Comprehensive unit tests for the new Python module

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
steps/topology/image-pull/*.yml Topology step templates for validation, execution, and collection
steps/engine/clusterloader2/image_pull/*.yml Engine-level execute and collect steps with environment configuration
modules/python/clusterloader2/image_pull/image_pull.py Main Python module implementing CL2 config override, execution, and results collection
modules/python/tests/test_image_pull.py Unit tests covering execute, collect, and CLI functions
modules/python/clusterloader2/image_pull/config/image-pull.yaml Main CL2 test configuration with pod deployment and measurement steps
modules/python/clusterloader2/image_pull/config/deployment_template.yaml Kubernetes deployment template with imagePullPolicy: Always
modules/python/clusterloader2/image_pull/config/kubelet-measurement.yaml Kubelet runtime operation duration measurements
modules/python/clusterloader2/image_pull/config/containerd-measurements.yaml Containerd CRI metrics for image pulling throughput and network operations
scenarios/perf-eval/image-pull-n10/terraform-inputs/azure.tfvars Infrastructure configuration for Azure AKS cluster with 3 node pools
scenarios/perf-eval/image-pull-n10/terraform-test-inputs/azure.json Test input parameters for scenario
scenarios/perf-eval/image-pull-n10/README.md Documentation of scenario purpose, infrastructure, and usage

@vittoriasalim
Copy link
Contributor

We have image pull benchmark at cri module, is it possible to use the same module(add the extra metrics there) instead of creating a new module

@wonderyl
Copy link
Collaborator

Do you have a test run result?

@jasminetMSFT jasminetMSFT marked this pull request as draft December 15, 2025 05:15
Copy link

@johnsonshi johnsonshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Please address comments.

jikuma
jikuma previously requested changes Dec 15, 2025
@jasminetMSFT jasminetMSFT marked this pull request as ready for review December 21, 2025 22:38
@jasminetMSFT jasminetMSFT changed the title Add image pull scenario Add image pull 10 nodes scenario Jan 6, 2026
@jasminetMSFT jasminetMSFT requested a review from jikuma January 6, 2026 03:05
xgugeng
xgugeng previously approved these changes Jan 6, 2026
Copy link
Contributor

@xgugeng xgugeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please request approval from the maintainers.

Copy link
Contributor

@vittoriasalim vittoriasalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change vm size

vittoriasalim
vittoriasalim previously approved these changes Jan 8, 2026
@jasminetMSFT jasminetMSFT dismissed jikuma’s stale review January 12, 2026 03:18

Addressed all requested changes in the latest commits. Dismissing to unblock merge due to dependent work.

@jasminetMSFT jasminetMSFT merged commit 2310ba7 into main Jan 12, 2026
6 checks passed
@jasminetMSFT jasminetMSFT deleted the jasminet/image_pull_prototype branch January 12, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants