Skip to content

Commit 9db2d40

Browse files
authored
Merge pull request #97613 from kquinn1204/TELCODOCS-2270
Telcodocs 2270: topology-aware scheduler: HA by default
2 parents bfe5dbb + 43dcd47 commit 9db2d40

File tree

5 files changed

+174
-0
lines changed

5 files changed

+174
-0
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="customizing-scheduler-replicas_{context}"]
7+
= Customizing scheduler replicas
8+
9+
Set a specific number of scheduler replicas by updating the `spec.replicas` field in the `NUMAResourcesScheduler` custom resource. This overrides the default HA behavior.
10+
11+
.Procedure
12+
13+
. Create the `NUMAResourcesScheduler` CR with the following YAML named for example `custom-ha.yaml` that sets the number of replicas to 2:
14+
+
15+
[source,yaml,subs="attributes+"]
16+
----
17+
apiVersion: nodetopology.openshift.io/v1
18+
kind: NUMAResourcesScheduler
19+
metadata:
20+
name: example-custom
21+
spec:
22+
imageSpec: 'registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:v{product-version}'
23+
replicas: 2
24+
----
25+
26+
. Deploy the NUMA-aware pod scheduler by running the following command:
27+
+
28+
[source,terminal]
29+
----
30+
$ oc apply -f custom-ha.yaml
31+
----
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="disabling-numa-aware-scheduling_{context}"]
7+
= Disabling NUMA-aware scheduling
8+
9+
Disable the NUMA-aware scheduler, stopping all running scheduler pods and preventing new ones from starting.
10+
11+
.Procedure
12+
13+
. Save the following minimal required YAML in the `nro-disable-scheduler.yaml` file. Disable the scheduler by setting the `spec.replicas` field to `0`.
14+
+
15+
[source,yaml,subs="attributes+"]
16+
----
17+
apiVersion: nodetopology.openshift.io/v1
18+
kind: NUMAResourcesScheduler
19+
metadata:
20+
name: example-disable
21+
spec:
22+
imageSpec: 'registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:v{product-version}'
23+
replicas: 0
24+
----
25+
26+
. Disable the NUMA-aware pod scheduler by running the following command:
27+
+
28+
[source,terminal]
29+
----
30+
$ oc apply -f nro-disable-scheduler.yaml
31+
----
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
:_mod-docs-content-type: CONCEPT
5+
[id="cnf-managing-ha-nrop_{context}"]
6+
= Managing high availability (HA) for the NUMA-aware scheduler
7+
8+
--
9+
:FeatureName: Managing high availability
10+
include::snippets/technology-preview.adoc[]
11+
--
12+
13+
The NUMA Resources Operator manages the high availability of the NUMA-aware secondary scheduler based on the `spec.replicas` field in the `NUMAResourcesScheduler` custom resource (CR). By default, the NUMA Resources Operator automatically enables HA mode by creating one scheduler replica for each control plane node, with a maximum of three replicas.
14+
15+
The following manifest demonstrates this default behavior. To automatically enable replica detection, omit the `replicas` field.
16+
17+
[source,yaml,subs="attributes+"]
18+
----
19+
apiVersion: nodetopology.openshift.io/v1
20+
kind: NUMAResourcesScheduler
21+
metadata:
22+
name: example-auto-ha
23+
spec:
24+
imageSpec: 'registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:v{product-version}'
25+
# The 'replicas' field is not included, enabling auto-detection.
26+
----
27+
28+
You can control scheduler behavior by using one of the following options:
29+
30+
* Customizing the number of replicas.
31+
* Disabling NUMA-aware scheduling.
32+
33+
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="verifying-scheduler-ha-status_{context}"]
7+
= Verifying scheduler high availability (HA) status
8+
9+
Verify the status of the NUMA-aware scheduler to ensure it is running with the expected number of replicas based on your configuration.
10+
11+
.Procedure
12+
13+
. List only the scheduler pods by running the following command:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc get pods -n openshift-numaresources -l app=secondary-scheduler
18+
----
19+
+
20+
.Expected output
21+
22+
Using the default HA mode, the number of pods equals the number of control-plane nodes. A standard HA {product-title} cluster typically has three control-plane nodes, and therefore displays three pods:
23+
+
24+
[source,terminal]
25+
----
26+
NAME READY STATUS RESTARTS AGE
27+
secondary-scheduler-5b8c9d479d-2r4p5 1/1 Running 0 5m
28+
secondary-scheduler-5b8c9d479d-k2f3p 1/1 Running 0 5m
29+
secondary-scheduler-5b8c9d479d-q8c7b 1/1 Running 0 5m
30+
----
31+
+
32+
* If you **customized the replicas**, the number of pods matches the value you set.
33+
* If you **disabled the scheduler**, there are no running pods with this label.
34+
+
35+
[NOTE]
36+
====
37+
A limit of 3 replicas is enforced for the NUMA-aware scheduler. On a hosted control planes cluster, the scheduler pods run on the worker nodes of the hosted-cluster.
38+
====
39+
40+
. Verify the number of replicas and their status by running the following command:
41+
+
42+
[source,terminal]
43+
----
44+
$ oc get deployment secondary-scheduler -n openshift-numaresources
45+
----
46+
+
47+
.Example output
48+
+
49+
[source,terminal]
50+
----
51+
NAME READY UP-TO-DATE AVAILABLE AGE
52+
secondary-scheduler 3/3 3 3 5m
53+
----
54+
+
55+
In this output, 3/3 means 3 replicas are ready out of an expected 3 replicas.
56+
57+
. For more detailed information run the following command:
58+
+
59+
[source,terminal]
60+
----
61+
$ oc describe deployment secondary-scheduler -n openshift-numaresources
62+
----
63+
+
64+
.Example output
65+
66+
The `Replicas` line shows a deployment configured for 3 replicas, with all 3 updated and available.
67+
+
68+
[source,yaml]
69+
----
70+
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
71+
----

scalability_and_performance/cnf-numa-aware-scheduling.adoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,14 @@ include::modules/cnf-deploying-the-numa-aware-scheduler.adoc[leveloffset=+2]
4040

4141
include::modules/cnf-configuring-single-numa-policy.adoc[leveloffset=+2]
4242

43+
include::modules/cnf-managing-ha-nrop-scheduler.adoc[leveloffset=+2]
44+
45+
include::modules/cnf-customizing-schedulder-ha-nro.adoc[leveloffset=+3]
46+
47+
include::modules/cnf-disabling-schedulder-ha-nro.adoc[leveloffset=+3]
48+
49+
include::modules/cnf-verifying-schedulder-ha-status.adoc[leveloffset=+3]
50+
4351
[role="_additional-resources"]
4452
.Additional resources
4553

0 commit comments

Comments
 (0)