You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
158219: server: add vCPU count to diagnostics and status r=dhartunian a=dhartunian
Previously, vCPU was just the OS CPU count as reported by the runtime. This change adds a `GetVCPUs` function that's cgroup aware and uses it to compute a new num_vcpus column on the node status protobuf. This new column is used to display stats on the overview page.
Resolves: CRDB-54703
Epic: None
Release note (ui change): Previously, we would incorrectly report operating system CPU counts on the DB Console overview page even though the column was labeled `vCPUs`. This change fixes the reporting to measure and report vCPUs correctly using cgroups. This should now reflect reserved compute in Kubernetes and other virtualized environments.
158364: ui/jobs: show pause reasons in jobs UI as warnings r=dt a=dt
Previously we only showed 'error' for paused jobs, but pause reasons don't show up in error (at least not since 24.x).
This extends the UI to look for them in `running_status`, and also treat pause statuses with added advisory reasons as attention-worth (yellow) warnings instead of grey normal pauses.
158388: cli/demo: start job scheduler right away r=dt a=dt
Previously we started it quickly but only in the system tenant. Now we start it quickly in all. Also run it more often.
Release note: none.
Epic: none.
158518: cloud/azure: default enable caching SDK clients r=dt a=dt
This mirrors the default in s3 client handling.
Release note: none.
Epic: none.
Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
Co-authored-by: David Taylor <davidt@davidt.io>
Copy file name to clipboardExpand all lines: docs/generated/http/full.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -349,7 +349,8 @@ NodeStatus records the most recent values of metrics for a node.
349
349
| latencies |[NodeStatus.LatenciesEntry](#cockroach.server.serverpb.NodesResponse-cockroach.server.status.statuspb.NodeStatus.LatenciesEntry)| repeated | latencies is a map of nodeIDs to nanoseconds which is the latency between this node and the other node.<br><br>NOTE: this is deprecated and is only set if the min supported cluster version is >= VersionRPCNetworkStats. |[reserved](#support-status)|
350
350
| activity |[NodeStatus.ActivityEntry](#cockroach.server.serverpb.NodesResponse-cockroach.server.status.statuspb.NodeStatus.ActivityEntry)| repeated | activity is a map of nodeIDs to network statistics from this node to other nodes. |[reserved](#support-status)|
351
351
| total_system_memory |[int64](#cockroach.server.serverpb.NodesResponse-int64)|| total_system_memory is the total RAM available to the system (or, if detected, the memory available to the cgroup this process is in) in bytes. |[alpha](#support-status)|
352
-
| num_cpus |[int32](#cockroach.server.serverpb.NodesResponse-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. Note that this does not report the number of CPUs actually used by `cockroach`; this parameter is controlled separately. |[alpha](#support-status)|
352
+
| num_cpus |[int32](#cockroach.server.serverpb.NodesResponse-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. This reflects the physical CPU count and does not account for container/cgroup limits. See num_vcpus for container-aware CPU allocation. |[alpha](#support-status)|
353
+
| num_vcpus |[double](#cockroach.server.serverpb.NodesResponse-double)|| num_vcpus is the number of vCPUs allocated to the process by the container orchestrator (e.g., Kubernetes, Docker) based on cgroup CPU quota/period. This represents the platform CPU allocation and is independent of GOMAXPROCS runtime tuning. Falls back to num_cpus if no container limits are configured. Supports fractional values (e.g., 1.5 for Kubernetes CPU limits like "1500m"). |[alpha](#support-status)|
353
354
354
355
355
356
@@ -501,7 +502,8 @@ NodeStatus records the most recent values of metrics for a node.
501
502
| latencies |[NodeStatus.LatenciesEntry](#cockroach.server.status.statuspb.NodeStatus-cockroach.server.status.statuspb.NodeStatus.LatenciesEntry)| repeated | latencies is a map of nodeIDs to nanoseconds which is the latency between this node and the other node.<br><br>NOTE: this is deprecated and is only set if the min supported cluster version is >= VersionRPCNetworkStats. |[reserved](#support-status)|
502
503
| activity |[NodeStatus.ActivityEntry](#cockroach.server.status.statuspb.NodeStatus-cockroach.server.status.statuspb.NodeStatus.ActivityEntry)| repeated | activity is a map of nodeIDs to network statistics from this node to other nodes. |[reserved](#support-status)|
503
504
| total_system_memory |[int64](#cockroach.server.status.statuspb.NodeStatus-int64)|| total_system_memory is the total RAM available to the system (or, if detected, the memory available to the cgroup this process is in) in bytes. |[alpha](#support-status)|
504
-
| num_cpus |[int32](#cockroach.server.status.statuspb.NodeStatus-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. Note that this does not report the number of CPUs actually used by `cockroach`; this parameter is controlled separately. |[alpha](#support-status)|
505
+
| num_cpus |[int32](#cockroach.server.status.statuspb.NodeStatus-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. This reflects the physical CPU count and does not account for container/cgroup limits. See num_vcpus for container-aware CPU allocation. |[alpha](#support-status)|
506
+
| num_vcpus |[double](#cockroach.server.status.statuspb.NodeStatus-double)|| num_vcpus is the number of vCPUs allocated to the process by the container orchestrator (e.g., Kubernetes, Docker) based on cgroup CPU quota/period. This represents the platform CPU allocation and is independent of GOMAXPROCS runtime tuning. Falls back to num_cpus if no container limits are configured. Supports fractional values (e.g., 1.5 for Kubernetes CPU limits like "1500m"). |[alpha](#support-status)|
505
507
506
508
507
509
@@ -656,6 +658,7 @@ NodeStatus records the most recent values of metrics for a node.
656
658
| activity |[NodeResponse.ActivityEntry](#cockroach.server.serverpb.NodesResponseExternal-cockroach.server.serverpb.NodeResponse.ActivityEntry)| repeated | activity is a map of nodeIDs to network statistics from this node to other nodes. |[reserved](#support-status)|
657
659
| total_system_memory |[int64](#cockroach.server.serverpb.NodesResponseExternal-int64)|| total_system_memory is the total RAM available to the system (or, if detected, the memory available to the cgroup this process is in) in bytes. |[alpha](#support-status)|
658
660
| num_cpus |[int32](#cockroach.server.serverpb.NodesResponseExternal-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. Note that this does not report the number of CPUs actually used by `cockroach`; this parameter is controlled separately. |[alpha](#support-status)|
661
+
| num_vcpus |[double](#cockroach.server.serverpb.NodesResponseExternal-double)|| num_vcpus is the number of provisioned vCPUs as reported by cgroups or the operating system. |[reserved](#support-status)|
659
662
660
663
661
664
@@ -914,6 +917,7 @@ NodeStatus records the most recent values of metrics for a node.
914
917
| activity |[NodeResponse.ActivityEntry](#cockroach.server.serverpb.NodeResponse-cockroach.server.serverpb.NodeResponse.ActivityEntry)| repeated | activity is a map of nodeIDs to network statistics from this node to other nodes. |[reserved](#support-status)|
915
918
| total_system_memory |[int64](#cockroach.server.serverpb.NodeResponse-int64)|| total_system_memory is the total RAM available to the system (or, if detected, the memory available to the cgroup this process is in) in bytes. |[alpha](#support-status)|
916
919
| num_cpus |[int32](#cockroach.server.serverpb.NodeResponse-int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. Note that this does not report the number of CPUs actually used by `cockroach`; this parameter is controlled separately. |[alpha](#support-status)|
920
+
| num_vcpus |[double](#cockroach.server.serverpb.NodeResponse-double)|| num_vcpus is the number of provisioned vCPUs as reported by cgroups or the operating system. |[reserved](#support-status)|
Copy file name to clipboardExpand all lines: docs/generated/http/nodes-other.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,8 @@ Support status: [alpha](#support-status)
21
21
| latencies |[NodeStatus.LatenciesEntry](#cockroach.server.status.statuspb.NodeStatus.LatenciesEntry)| repeated | latencies is a map of nodeIDs to nanoseconds which is the latency between this node and the other node.<br><br>NOTE: this is deprecated and is only set if the min supported cluster version is >= VersionRPCNetworkStats. |[reserved](#support-status)|
22
22
| activity |[NodeStatus.ActivityEntry](#cockroach.server.status.statuspb.NodeStatus.ActivityEntry)| repeated | activity is a map of nodeIDs to network statistics from this node to other nodes. |[reserved](#support-status)|
23
23
| total_system_memory |[int64](#int64)|| total_system_memory is the total RAM available to the system (or, if detected, the memory available to the cgroup this process is in) in bytes. |[alpha](#support-status)|
24
-
| num_cpus |[int32](#int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. Note that this does not report the number of CPUs actually used by `cockroach`; this parameter is controlled separately. |[alpha](#support-status)|
24
+
| num_cpus |[int32](#int32)|| num_cpus is the number of logical CPUs as reported by the operating system on the host where the `cockroach` process is running. This reflects the physical CPU count and does not account for container/cgroup limits. See num_vcpus for container-aware CPU allocation. |[alpha](#support-status)|
25
+
| num_vcpus |[double](#double)|| num_vcpus is the number of vCPUs allocated to the process by the container orchestrator (e.g., Kubernetes, Docker) based on cgroup CPU quota/period. This represents the platform CPU allocation and is independent of GOMAXPROCS runtime tuning. Falls back to num_cpus if no container limits are configured. Supports fractional values (e.g., 1.5 for Kubernetes CPU limits like "1500m"). |[alpha](#support-status)|
0 commit comments