Skip to content

Commit 7910f08

Browse files
authored
Merge pull request #97281 from dfitzmau/OSDOCS-15459
OSDOCS-15459: Completed CQA actions for Using the vSphere Problem Det…
2 parents 20fe050 + e594e0e commit 7910f08

8 files changed

+57
-66
lines changed

installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ include::_attributes/common-attributes.adoc[]
66

77
toc::[]
88

9+
:operator-name: vSphere Problem Detector Operator
10+
11+
You can use the {operator-name} to check a cluster that you deployed on {vmw-full} for common installation and misconfiguration issues that relate to storage.
12+
913
// About the operator
1014
include::modules/vsphere-problem-detector-about.adoc[leveloffset=+1]
1115

@@ -31,3 +35,6 @@ include::modules/vsphere-problem-detector-metrics.adoc[leveloffset=+1]
3135
== Additional resources
3236

3337
* xref:../../observability/monitoring/about-ocp-monitoring/about-ocp-monitoring.adoc#about-ocp-monitoring[About {product-title} monitoring]
38+
39+
// Clear temporary attributes
40+
:!operator-name:

modules/vsphere-problem-detector-about.adoc

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,13 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
75
:_mod-docs-content-type: CONCEPT
86
[id="vsphere-problem-detector-about_{context}"]
97
= About the {operator-name}
108

11-
The {operator-name} checks clusters that are deployed on vSphere for common installation and misconfiguration issues that are related to storage.
9+
The {operator-name} checks a cluster that you deployed on {vmw-full} for common installation and configuration issues that relate to storage.
1210

13-
The Operator runs in the `openshift-cluster-storage-operator` namespace and is started by the Cluster Storage Operator when the Cluster Storage Operator detects that the cluster is deployed on vSphere. The {operator-name} communicates with the vSphere vCenter Server to determine the virtual machines in the cluster, the default datastore, and other information about the vSphere vCenter Server configuration. The Operator uses the credentials from the Cloud Credential Operator to connect to vSphere.
11+
After the Cluster Storage Operator starts and determines that a cluster runs on {vmw-full}, the Cluster Storage Operator launches the {operator-name}. When the {operator-name} starts, the Operator immediately runs the checks. The {operator-name} communicates with the {vmw-short} vCenter Server to find the virtual machines in the cluster, the default datastore, and other information about the {vmw-short} vCenter Server configuration. The Operator uses the credentials from the Cloud Credential Operator to connect to {vmw-short}.
1412

1513
The Operator runs the checks according to the following schedule:
1614

@@ -20,7 +18,5 @@ The Operator runs the checks according to the following schedule:
2018
2119
* When all checks pass, the schedule returns to an hour interval.
2220
23-
The Operator increases the frequency of the checks after a failure so that the Operator can report success quickly after the failure condition is remedied. You can run the Operator manually for immediate troubleshooting information.
21+
After a failure, the Operator increases its check frequency to quickly report success when the failure condition gets resolved. You can run the Operator manually for immediate troubleshooting information.
2422

25-
// Clear temporary attributes
26-
:!operator-name:

modules/vsphere-problem-detector-config-checks.adoc

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
5+
:_mod-docs-content-type: REFERENCE
76
[id="vsphere-problem-detector-config-checks_{context}"]
87
= Configuration checks run by the {operator-name}
98

@@ -16,33 +15,33 @@ The following tables identify the configuration checks that the {operator-name}
1615
|Description
1716

1817
|`CheckDefaultDatastore`
19-
|Verifies that the default datastore name in the vSphere configuration is short enough for use with dynamic provisioning.
18+
|Verifies that the default datastore name in the {vmw-full} configuration is short enough for use with dynamic provisioning.
2019

2120
If this check fails, you can expect the following:
2221

2322
* `systemd` logs errors to the journal such as `Failed to set up mount unit: Invalid argument`.
2423
25-
* `systemd` does not unmount volumes if the virtual machine is shut down or rebooted without draining all the pods from the node.
24+
* `systemd` does not unmount volumes if the virtual machine shuts down or reboots without draining all the pods from the node.
2625
27-
If this check fails, reconfigure vSphere with a shorter name for the default datastore.
26+
If this check fails, reconfigure {vmw-short} with a shorter name for the default datastore.
2827

2928
|`CheckFolderPermissions`
30-
|Verifies the permission to list volumes in the default datastore. This permission is required to create volumes. The Operator verifies the permission by listing the `/` and `/kubevols` directories. The root directory must exist. It is acceptable if the `/kubevols` directory does not exist when the check runs. The `/kubevols` directory is created when the datastore is used with dynamic provisioning if the directory does not already exist.
29+
|Verifies the permission to list volumes in the default datastore. You must enable the permission to create volumes. The Operator verifies the permission by listing the `/` and `/kubevols` directories. When the Operator performs the check, the root directory must exist. The `/kubevols` directory might not exist at the time of the check. The creation of the `/kubevols` directory occurs when the datastore supports dynamic provisioning.
3130

32-
If this check fails, review the required permissions for the vCenter account that was specified during the {product-title} installation.
31+
If this check fails, review the required permissions for the vCenter account that you specified during the {product-title} installation.
3332

3433
|`CheckStorageClasses`
3534
|Verifies the following:
3635

37-
* The fully qualified path to each persistent volume that is provisioned by this storage class is less than 255 characters.
36+
* The fully qualified path to each persistent volume that the storage class provisions does not go lower than 255 characters.
3837
39-
* If a storage class uses a storage policy, the storage class must use one policy only and that policy must be defined.
38+
* The storage class can use only one storage policy and the policy must be defined.
4039
4140
|`CheckTaskPermissions`
4241
|Verifies the permission to list recent tasks and datastores.
4342

4443
|`ClusterInfo`
45-
|Collects the cluster version and UUID from vSphere vCenter.
44+
|Collects the cluster version and UUID from {vmw-short} vCenter.
4645
|===
4746

4847
.Node configuration checks
@@ -52,19 +51,19 @@ If this check fails, review the required permissions for the vCenter account tha
5251
|Description
5352

5453
|`CheckNodeDiskUUID`
55-
|Verifies that all the vSphere virtual machines are configured with `disk.enableUUID=TRUE`.
54+
|Verifies that all the {vmw-short} virtual machines include the `disk.enableUUID=TRUE` configuration.
5655

57-
If this check fails, see the link:https://access.redhat.com/solutions/4606201[How to check 'disk.EnableUUID' parameter from VM in vSphere] Red Hat Knowledgebase solution.
56+
If this check fails, see the link:https://access.redhat.com/solutions/4606201[How to check `disk.EnableUUID` parameter from VM in vSphere] Red Hat Knowledgebase solution.
5857

5958
|`CheckNodeProviderID`
60-
|Verifies that all nodes are configured with the `ProviderID` from vSphere vCenter. This check fails when the output from the following command does not include a provider ID for each node.
59+
|Verifies that all nodes have the `ProviderID` configuration from {vmw-short} vCenter. This check fails when the output from the following command does not include a provider ID for each node.
6160

6261
[source,terminal]
6362
----
6463
$ oc get nodes -o custom-columns=NAME:.metadata.name,PROVIDER_ID:.spec.providerID,UUID:.status.nodeInfo.systemUUID
6564
----
6665

67-
If this check fails, refer to the vSphere product documentation for information about setting the provider ID for each node in the cluster.
66+
If this check fails, reference the {vmw-short} product documentation on how to set the provider ID for each node in the cluster.
6867

6968
|`CollectNodeESXiVersion`
7069
|Reports the version of the ESXi hosts that run nodes.
@@ -73,5 +72,3 @@ If this check fails, refer to the vSphere product documentation for information
7372
|Reports the virtual machine hardware version for a node.
7473
|===
7574

76-
// Clear temporary attributes
77-
:!operator-name:

modules/vsphere-problem-detector-metrics.adoc

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
5+
:_mod-docs-content-type: REFERENCE
76
[id="vsphere-problem-detector-operator-metrics_{context}"]
87
= Metrics for the {operator-name}
98

@@ -21,20 +20,18 @@ The {operator-name} exposes the following metrics for use by the {product-title}
2120
|Number of failed cluster-level checks that the {operator-name} performed. For example, a value of `1` indicates that one cluster-level check failed.
2221

2322
|`vsphere_esxi_version_total`
24-
|Number of ESXi hosts with a specific version. Be aware that if a host runs more than one node, the host is counted only once.
23+
|Counts the number of ESXi hosts with a specific version. Note that if a host runs more than one node, the {operator-name} counts the host only once.
2524

2625
|`vsphere_node_check_total`
2726
|Cumulative number of node-level checks that the {operator-name} performed. This count includes both successes and failures.
2827

2928
|`vsphere_node_check_errors`
30-
|Number of failed node-level checks that the {operator-name} performed. For example, a value of `1` indicates that one node-level check failed.
29+
|Counts the number of failed node-level checks that the {operator-name} performed. For example, a value of `1` indicates that one node-level check failed.
3130

3231
|`vsphere_node_hw_version_total`
33-
|Number of vSphere nodes with a specific hardware version.
32+
|Number of {vmw-short} nodes with a specific hardware version.
3433

3534
|`vsphere_vcenter_info`
36-
|Information about the vSphere vCenter Server.
35+
|Information about the {vmw-short} vCenter Server.
3736
|===
3837

39-
// Clear temporary attributes
40-
:!operator-name:

modules/vsphere-problem-detector-running.adoc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,13 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
75
:_mod-docs-content-type: PROCEDURE
86
[id="vsphere-problem-detector-running_{context}"]
97
= Running the {operator-name} checks
108

119
You can override the schedule for running the {operator-name} checks and run the checks immediately.
1210

13-
The {operator-name} automatically runs the checks every hour. However, when the Operator starts, it runs the checks immediately. The Operator is started by the Cluster Storage Operator when the Cluster Storage Operator starts and determines that the cluster is running on vSphere. To run the checks immediately, you can scale the {operator-name} to `0` and back to `1` so that it restarts the {operator-name}.
11+
The {operator-name} automatically runs the checks every hour. After the Operator starts, the Operator runs the checks immediately. After the Cluster Storage Operator starts and determines that a cluster runs on {vmw-full}, the Cluster Storage Operator starts the {operator-name}. To run the checks immediately, you can scale the {operator-name} to `0` and back to `1` so that the Cluster Storage Operator restarts the {operator-name}.
1412

1513
.Prerequisites
1614

@@ -42,7 +40,9 @@ NAME READY STATUS RESTARTS
4240
vsphere-problem-detector-operator-77486bd645-9ntpb 1/1 Running 0 11s
4341
----
4442
+
45-
The `AGE` field must indicate that the pod is restarted.
43+
The `AGE` field must indicate that the pod restarted.
44+
45+
.Next steps
4646

47-
// Clear temporary attributes
48-
:!operator-name:
47+
* Viewing the events from the {operator-name}
48+
* Viewing the logs from the {operator-name}

modules/vsphere-problem-detector-storage-class-config-check.adoc

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,13 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
75
:_mod-docs-content-type: CONCEPT
86
[id="vsphere-problem-detector-storage-class-config-check_{context}"]
97
= About the storage class configuration check
108

11-
The names for persistent volumes that use vSphere storage are related to the datastore name and cluster ID.
12-
13-
When a persistent volume is created, `systemd` creates a mount unit for the persistent volume. The `systemd` process has a 255 character limit for the length of the fully qualified path to the VDMK file that is used for the persistent volume.
9+
The datastore name and cluster ID relate to the names for persistent volumes that use {vmw-full} storage. After the creation of a persistent volume, `systemd` creates a mount unit for the persistent volume.
1410

15-
The fully qualified path is based on the naming conventions for `systemd` and vSphere. The naming conventions use the following pattern:
11+
The `systemd` process has a 255 character limit for the length of the fully qualified path to the virtual machine disk (VMDK) file. This path follows the naming conventions for `systemd` and {vmw-short}. The naming conventions use the following example pattern:
1612

1713
[source,text]
1814
----
@@ -21,11 +17,8 @@ The fully qualified path is based on the naming conventions for `systemd` and vS
2117

2218
* The naming conventions require 205 characters of the 255 character limit.
2319
24-
* The datastore name and the cluster ID are determined from the deployment.
25-
26-
* The datastore name and cluster ID are substituted into the preceding pattern. Then the path is processed with the `systemd-escape` command to escape special characters. For example, a hyphen character uses four characters after it is escaped. The escaped value is `\x2d`.
20+
* The depolyment determines the datastore name and the cluster ID.
2721
28-
* After processing with `systemd-escape` to ensure that `systemd` can access the fully qualified path to the VDMK file, the length of the path must be less than 255 characters.
22+
* The datastore name and cluster ID substitute into the example pattern. The fully qualified path gets processed with the `systemd-escape` command to escape special characters. For example, after the escape operation, a hyphen character uses four characters, such as `\x2d`.
2923
30-
// Clear temporary attributes
31-
:!operator-name:
24+
* After the `systemd-escape` CLI processes the VMDK file path, the length of the path must not be lower than 255 characters. This criteria ensures that the `systemd` process can access the fully qualified VMDK file path.

modules/vsphere-problem-detector-viewing-events.adoc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,19 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
75
:_mod-docs-content-type: PROCEDURE
86
[id="vsphere-problem-detector-viewing-events_{context}"]
97
= Viewing the events from the {operator-name}
108

11-
After the {operator-name} runs and performs the configuration checks, it creates events that can be viewed from the command line or from the {product-title} web console.
9+
After the {operator-name} runs and performs the configuration checks, the Operator creates events that you can view from the command-line interface (CLI) or from the {product-title} web console.
10+
11+
.Prerequisites
12+
13+
* The {operator-name} ran checks on your cluster.
1214
1315
.Procedure
1416

15-
* To view the events by using the command line, run the following command:
17+
* To view the events by using the CLI, run the following command:
1618
+
1719
[source,terminal]
1820
----
@@ -29,6 +31,3 @@ $ oc get event -n openshift-cluster-storage-operator \
2931
----
3032
3133
* To view the events by using the {product-title} web console, navigate to *Home* -> *Events* and select `openshift-cluster-storage-operator` from the *Project* menu.
32-
33-
// Clear temporary attributes
34-
:!operator-name:

modules/vsphere-problem-detector-viewing-logs.adoc

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,21 @@
22
//
33
// * installing/installing_vsphere/using-vsphere-problem-detector-operator.adoc
44

5-
:operator-name: vSphere Problem Detector Operator
6-
75
:_mod-docs-content-type: PROCEDURE
86
[id="vsphere-problem-detector-viewing-logs_{context}"]
97
= Viewing the logs from the {operator-name}
108

11-
After the {operator-name} runs and performs the configuration checks, it creates log records that can be viewed from the command line or from the {product-title} web console.
9+
After the {operator-name} runs and performs the configuration checks, the Operator creates log records that you can view from the command-line interface (CLI) or from the {product-title} web console. Log lines that indicate `passed` means that you do not need to perform any actions.
10+
11+
The ideal output for a log line indicates `passed` or `0 problems`. If a log line indicates `failure` or 1 or more problems, see the information in the "Configuration checks run by the {operator-name}" document.
12+
13+
.Prerequisites
14+
15+
* The {operator-name} ran checks on your cluster.
1216
1317
.Procedure
1418

15-
* To view the logs by using the command line, run the following command:
19+
* To view the logs by using the CLI, run the following command. A log line that shows `passed` in the output means that you must analyze the log output and resolve the issue.
1620
+
1721
[source,terminal]
1822
----
@@ -32,14 +36,12 @@ I0108 08:32:28.480685 1 operator.go:271] CheckNodeProviderID:<host_name> p
3236
----
3337
3438
* To view the Operator logs with the {product-title} web console, perform the following steps:
35-
39+
+
3640
.. Navigate to *Workloads* -> *Pods*.
37-
41+
+
3842
.. Select `openshift-cluster-storage-operator` from the *Projects* menu.
39-
43+
+
4044
.. Click the link for the `vsphere-problem-detector-operator` pod.
41-
45+
+
4246
.. Click the *Logs* tab on the *Pod details* page to view the logs.
4347
44-
// Clear temporary attributes
45-
:!operator-name:

0 commit comments

Comments
 (0)