Summary
We need to support node partitioning without coupling scheduling rules directly to course definitions or accelerator definitions.
Today, accelerator configuration already carries hardware-oriented nodeSelector constraints. That works for hardware placement, but it is not a good abstraction for broader scheduling partitions such as teaching pools, internal pools, or other placement domains. Encoding those concerns directly in courses or accelerators would make the model non-orthogonal and hard to extend.
This issue introduces a separate scheduling policy abstraction that resources/courses can reference.
Proposed scope
- Add a new top-level
schedulingPolicies configuration section
- Allow each resource metadata entry to reference one
schedulingPolicy
- In v1, a scheduling policy only encapsulates
nodeSelector
- Compute the final pod
nodeSelector by merging:
- global
singleuser.nodeSelector
- accelerator
nodeSelector
- scheduling policy
nodeSelector
- Detect conflicting selector keys with different values and fail fast during configuration/startup
- Keep this as an admin-only configuration feature for v1
- Do not expose scheduling policy selection in the frontend yet
Acceptance criteria
- A resource can reference a named scheduling policy
- A scheduling policy can contribute
nodeSelector entries to spawned user pods
- Final
nodeSelector is merged from global, accelerator, and scheduling policy layers
- Conflicting selector values are rejected with a clear configuration error
- Existing deployments without
schedulingPolicies continue to work unchanged
- No frontend changes are required in this phase
Summary
We need to support node partitioning without coupling scheduling rules directly to course definitions or accelerator definitions.
Today, accelerator configuration already carries hardware-oriented
nodeSelectorconstraints. That works for hardware placement, but it is not a good abstraction for broader scheduling partitions such as teaching pools, internal pools, or other placement domains. Encoding those concerns directly in courses or accelerators would make the model non-orthogonal and hard to extend.This issue introduces a separate scheduling policy abstraction that resources/courses can reference.
Proposed scope
schedulingPoliciesconfiguration sectionschedulingPolicynodeSelectornodeSelectorby merging:singleuser.nodeSelectornodeSelectornodeSelectorAcceptance criteria
nodeSelectorentries to spawned user podsnodeSelectoris merged from global, accelerator, and scheduling policy layersschedulingPoliciescontinue to work unchanged