feat(orchestrator): enable per-build CPU/memory tracking for build sandboxes#1968
feat(orchestrator): enable per-build CPU/memory tracking for build sandboxes#1968
Conversation
| @@ -179,7 +179,7 @@ func (so *SandboxObserver) startObserving() (metric.Registration, error) { | |||
| return err | |||
There was a problem hiding this comment.
sandbox_type is empty string for regular sandboxes: sbx.Runtime.SandboxType is only populated for build sandboxes (set in layer/create_sandbox.go and layer/resume_sandbox.go). For regular API-created sandboxes it remains "", so this emits sandbox_type="" into OTel metrics and therefore into sandbox_metrics_gauge — bypassing the DEFAULT 'sandbox' column default (the materialized view explicitly selects Attributes['sandbox_type'], so the default is never used).
This is inconsistent with hoststats.go, which falls back to SandboxTypeSandbox when runtime.SandboxType is empty. As a result, WHERE sandbox_type = 'sandbox' queries on sandbox_metrics_gauge will miss all regular sandbox rows.
Fix: apply the same fallback here, or set SandboxType: SandboxTypeSandbox in CreateSandbox/ResumeSandbox for the non-build path.
There was a problem hiding this comment.
Fixed: added a fallback in the OTel observer that defaults empty SandboxType to sandbox.SandboxTypeSandbox before emitting the attribute, consistent with hoststats.go.
bf294e8 to
6cfb7ed
Compare
…ndboxes Add sandbox_type column to sandbox_host_stats and sandbox_metrics_gauge ClickHouse tables to explicitly distinguish build vs runtime metrics. Thread BuildID and SandboxType through RuntimeMetadata so build sandboxes are properly attributed to the overall build. Key changes: - Add cgroup setup to Factory.CreateSandbox (unified with ResumeSandbox) - Initialize HostStatsCollector in CreateSandbox (feature-flag-gated) - Enable envd metrics (MetricsWriteFlag) for CreateSandbox - Add build_id + sandbox_type OTel attributes for sandbox_metrics_gauge - Update materialized view to extract new attributes - Pass TeamID, BuildID, SandboxType from build layer executors
6cfb7ed to
ac31320
Compare
Summary
sandbox_typecolumn (LowCardinality(String), default'sandbox') to bothsandbox_host_statsandsandbox_metrics_gaugeClickHouse tables to explicitly distinguish build vs runtime metricsBuildIDandSandboxTypethroughRuntimeMetadataso build sandboxes are properly attributed to the overall build (not per-layer ephemeral IDs)Factory.CreateSandboxandFactory.ResumeSandbox— build sandboxes now get full cgroup v2 CPU/memory accountingHostStatsCollectorinCreateSandbox(feature-flag-gated viaHostStatsEnabled)CreateSandboxviaMetricsWriteFlagfeature flagbuild_idandsandbox_typeOTel attributes to the sandbox metrics observer, updates the materialized view to extract themFiles changed
New:
packages/clickhouse/migrations/20260223120000_add_sandbox_type_to_host_stats.sqlpackages/clickhouse/migrations/20260223120001_add_build_columns_to_metrics_gauge.sqlModified:
packages/clickhouse/pkg/hoststats/hoststats.go—SandboxTypefield onSandboxHostStatpackages/clickhouse/pkg/hoststats/delivery.go—sandbox_typein INSERT + Appendpackages/orchestrator/internal/sandbox/hoststats_collector.go—SandboxTypeon metadata + CollectSamplepackages/orchestrator/internal/sandbox/hoststats.go— readsSandboxTypefromRuntimeMetadatapackages/orchestrator/internal/sandbox/fc/process.go—cgroupFDparam onProcess.Create()packages/orchestrator/internal/sandbox/sandbox.go—BuildID/SandboxTypeonRuntimeMetadata, cgroup + collector init inCreateSandbox,MetricsWriteFlagforCreateSandbox, updatedResumeSandboxpackages/orchestrator/internal/metrics/sandboxes.go—build_id+sandbox_typeOTel attributespackages/orchestrator/internal/template/build/layer/create_sandbox.go— passesTeamID/BuildID/SandboxTypepackages/orchestrator/internal/template/build/layer/resume_sandbox.go— passesTeamID/BuildID/SandboxTypeTest plan
sandbox_host_statsrows appear withsandbox_type = 'build'and the correctsandbox_build_idsandbox_type = 'sandbox'(or default)sandbox_metrics_gaugerows for build sandboxes havebuild_idandsandbox_typepopulatedHostStatsEnabledandMetricsWriteFlagfeature flags gate the new behavior correctly