Add StatefulSet workload support to CRUD benchmarking framework#1132
Draft
diamondpowell wants to merge 10 commits into
Draft
Add StatefulSet workload support to CRUD benchmarking framework#1132diamondpowell wants to merge 10 commits into
diamondpowell wants to merge 10 commits into
Conversation
8a14575 to
1207d1d
Compare
695cd4e to
61a2300
Compare
7ea865e to
dea3f3f
Compare
Add create_statefulset() to NodePoolCRUD that deploys K8s StatefulSets onto node pools after provisioning. Follows the same pattern as create_deployment — multi-doc YAML manifest parsing, configurable replica count, and per-statefulset readiness validation via wait_for_condition. - Add 'statefulset' subcommand to handle_workload_operations() in main.py with --number-of-statefulsets and --replicas args - Add statefulset.yml workload template with configurable replicas and node affinity via label_selector - Add _is_statefulset_ready and _check_statefulset_condition to kubernetes_client.py for readiness polling
Add statefulset execution step to the k8s CRUD engine pipeline between deployment and scale-down. Parameters (number_of_statefulsets, replicas) flow from pipeline matrix → topology → engine step → main.py. - Add statefulset script block to steps/engine/crud/k8s/execute.yml - Pass number_of_statefulsets through topology execute-crud.yml
Add test coverage for create_statefulset and statefulset wait_for_condition: - test_create_statefulset_success: single statefulset with readiness check - test_create_statefulset_failure: statefulset fails to become ready - test_create_statefulset_partial_success: continues on individual failures - test_create_statefulset_no_client: returns early when k8s client unavailable - test_statefulset_wait_for_condition: validates _is_statefulset_ready and _check_statefulset_condition polling logic
- Extract _apply_statefulset helper (matches _apply_deployment pattern) - Use os.path for default template path instead of hardcoded string - Use per-statefulset labels to avoid selector collision - Remove redundant outer try/except - Use workload_common_parser for shared args (--count, --replicas, etc.) - Add hasattr guard for cloud provider compatibility - Use args.count instead of args.number_of_statefulsets - Update pipeline YAML to use count parameter
…-dir
- Wrap statefulset pipeline step inside Azure cloud gate (matches deployment)
- Use ${MANIFEST_DIR:+--manifest-dir} conditional (matches deployment pattern)
77ed08a to
2db2b31
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds StatefulSet workload support to the CRUD benchmarking framework — the second of three planned workload methods (
deployment,statefulset,jobs). Measures K8s StatefulSet create/verify latency on AKS node pools.Branch cleanup note: Rebased and squashed for reviewability. All commits are logically grouped. Also includes a fix for
gpu_profiledriver setting that caused node pool creation failure on non-GPU pools.Changes
modules/python/crud/workload_templates/statefulset.ymlclusterIP: None) for stable pod DNSSTATEFULSET_REPLICASplaceholder, configurable node affinity via label_selectormodules/python/crud/azure/node_pool_crud.pycreate_statefulset()— same loop pattern ascreate_deploymentreadycondition (notavailable) since StatefulSets don't support theavailablecondition typemodules/python/crud/main.pystatefulsetsubparser with--node-pool-name,--number-of-statefulsets,--replicas,--manifest-direlif command == "statefulset"routing inhandle_workload_operationsmodules/python/clients/kubernetes_client.py_is_statefulset_readyand_check_statefulset_conditionfor readiness pollingwait_for_conditionto support StatefulSet resource typesteps/engine/crud/k8s/execute.ymlstatefulsetscript block callingpython3 main.py statefulsetnumber_of_statefulsetsparametersteps/topology/k8s-crud-gpu/execute-crud.ymlnumber_of_statefulsetsthrough to engine stepmodules/python/clients/aks_client.pygpu_profiledriver to"None"for non-GPU node pools (was incorrectly set to"Install", causing creation failures)Tests
test_azure_node_pool_crud.py:test_create_statefulset_success— happy pathtest_create_statefulset_failure— all fail to become readytest_create_statefulset_no_client— returns early when k8s client unavailabletest_create_statefulset_partial_success— continues on failures, returns Falsetest_kubernetes_client.py:test_wait_for_condition_statefulset_successtest_wait_for_condition_statefulset_timeouttest_wait_for_condition_statefulset_not_foundDependencies
Based on
test-refactor(PR #879) — must merge first.