Skip to content

CA DRA: stop depending on NodeGroup.TemplateNodeInfo() in DraCustomResourcesProcessor #8881

@towca

Description

@towca

Which component are you using?:

/area cluster-autoscaler
/area core-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Nodes with DRA Devices run into the same problem as GPU Nodes using device plugin - they appear as Ready in K8s API before the GPU/DRA Device is actually exposed to Pods. Without special handling, Cluster Autoscaler would do another scale-up once a DRA Node from a previous scale-up becomes Ready - because the pending Pods can't schedule on it without the Devices exposed yet.

DraCustomResourcesProcessor is responsible for hacking such Nodes to be not-Ready until all their expected DRA Devices are published by the relevant DRA Drivers.

The processor uses NodeGroup.TemplateNodeInfo() to get the set of Devices it needs to wait on. This only works for DRA Drivers that are explicitly integrated into NodeGroup.TemplateNodeInfo() for a given CloudProvider integartion.

When a "custom" (i.e. not integrated with CA NodeGroup.TemplateNodeInfo()) DRA Driver is used, NodeGroup.TemplateNodeInfo() will return 0 DRA Devices to wait on. This in turn causes DraCustomResourcesProcessor to not hack the Node readiness, and CA to do unnecessary scale-ups explained above.

Describe the solution you'd like.:

If DraCustomResourcesProcessor had access to the template NodeInfos computed by TemplateNodeInfoProvider, the problem would be solved. The default MixedTemplateNodeInfoProvider implementation sanitizes real Nodes into templates and only falls back to NodeGroup.TemplateNodeInfo() if there are no good candidate Nodes in a NodeGroup. So as long as there's a single Node in a NodeGroup, DraCustomResourcesProcessor would be able to know which DRA Devices to wait on - the same ones as on that single real Node.

#8882 tracks making template NodeInfos computed by TemplateNodeInfoProvider accessible from all Cluster Autoscaler logic via a new component. Once that happens, DraCustomResourcesProcessor can be trivially migrated to use that new component to get the template instead of the current NodeGroup.TemplateNodeInfo().

Metadata

Metadata

Assignees

Labels

area/cluster-autoscalerarea/core-autoscalerDenotes an issue that is related to the core autoscaler and is not specific to any provider.kind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions