Skip to content

feat(orchestrator): enable per-build CPU/memory tracking for build sandboxes#1968

Draft
dobrac wants to merge 1 commit intomainfrom
feat/per-build-cpu-tracking
Draft

feat(orchestrator): enable per-build CPU/memory tracking for build sandboxes#1968
dobrac wants to merge 1 commit intomainfrom
feat/per-build-cpu-tracking

Conversation

@dobrac
Copy link
Contributor

@dobrac dobrac commented Feb 23, 2026

Summary

  • Adds sandbox_type column (LowCardinality(String), default 'sandbox') to both sandbox_host_stats and sandbox_metrics_gauge ClickHouse tables to explicitly distinguish build vs runtime metrics
  • Threads BuildID and SandboxType through RuntimeMetadata so build sandboxes are properly attributed to the overall build (not per-layer ephemeral IDs)
  • Unifies cgroup setup across Factory.CreateSandbox and Factory.ResumeSandbox — build sandboxes now get full cgroup v2 CPU/memory accounting
  • Initializes HostStatsCollector in CreateSandbox (feature-flag-gated via HostStatsEnabled)
  • Enables envd metrics collection for CreateSandbox via MetricsWriteFlag feature flag
  • Adds build_id and sandbox_type OTel attributes to the sandbox metrics observer, updates the materialized view to extract them

Files changed

New:

  • packages/clickhouse/migrations/20260223120000_add_sandbox_type_to_host_stats.sql
  • packages/clickhouse/migrations/20260223120001_add_build_columns_to_metrics_gauge.sql

Modified:

  • packages/clickhouse/pkg/hoststats/hoststats.goSandboxType field on SandboxHostStat
  • packages/clickhouse/pkg/hoststats/delivery.gosandbox_type in INSERT + Append
  • packages/orchestrator/internal/sandbox/hoststats_collector.goSandboxType on metadata + CollectSample
  • packages/orchestrator/internal/sandbox/hoststats.go — reads SandboxType from RuntimeMetadata
  • packages/orchestrator/internal/sandbox/fc/process.gocgroupFD param on Process.Create()
  • packages/orchestrator/internal/sandbox/sandbox.goBuildID/SandboxType on RuntimeMetadata, cgroup + collector init in CreateSandbox, MetricsWriteFlag for CreateSandbox, updated ResumeSandbox
  • packages/orchestrator/internal/metrics/sandboxes.gobuild_id + sandbox_type OTel attributes
  • packages/orchestrator/internal/template/build/layer/create_sandbox.go — passes TeamID/BuildID/SandboxType
  • packages/orchestrator/internal/template/build/layer/resume_sandbox.go — passes TeamID/BuildID/SandboxType

Test plan

  • Verify ClickHouse migrations apply cleanly (staging)
  • Trigger a template build and confirm sandbox_host_stats rows appear with sandbox_type = 'build' and the correct sandbox_build_id
  • Confirm regular sandbox metrics still have sandbox_type = 'sandbox' (or default)
  • Verify sandbox_metrics_gauge rows for build sandboxes have build_id and sandbox_type populated
  • Check that the HostStatsEnabled and MetricsWriteFlag feature flags gate the new behavior correctly

@@ -179,7 +179,7 @@ func (so *SandboxObserver) startObserving() (metric.Registration, error) {
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sandbox_type is empty string for regular sandboxes: sbx.Runtime.SandboxType is only populated for build sandboxes (set in layer/create_sandbox.go and layer/resume_sandbox.go). For regular API-created sandboxes it remains "", so this emits sandbox_type="" into OTel metrics and therefore into sandbox_metrics_gauge — bypassing the DEFAULT 'sandbox' column default (the materialized view explicitly selects Attributes['sandbox_type'], so the default is never used).

This is inconsistent with hoststats.go, which falls back to SandboxTypeSandbox when runtime.SandboxType is empty. As a result, WHERE sandbox_type = 'sandbox' queries on sandbox_metrics_gauge will miss all regular sandbox rows.

Fix: apply the same fallback here, or set SandboxType: SandboxTypeSandbox in CreateSandbox/ResumeSandbox for the non-build path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added a fallback in the OTel observer that defaults empty SandboxType to sandbox.SandboxTypeSandbox before emitting the attribute, consistent with hoststats.go.

@dobrac dobrac force-pushed the feat/per-build-cpu-tracking branch from bf294e8 to 6cfb7ed Compare February 23, 2026 21:11
…ndboxes

Add sandbox_type column to sandbox_host_stats and sandbox_metrics_gauge
ClickHouse tables to explicitly distinguish build vs runtime metrics.
Thread BuildID and SandboxType through RuntimeMetadata so build sandboxes
are properly attributed to the overall build.

Key changes:
- Add cgroup setup to Factory.CreateSandbox (unified with ResumeSandbox)
- Initialize HostStatsCollector in CreateSandbox (feature-flag-gated)
- Enable envd metrics (MetricsWriteFlag) for CreateSandbox
- Add build_id + sandbox_type OTel attributes for sandbox_metrics_gauge
- Update materialized view to extract new attributes
- Pass TeamID, BuildID, SandboxType from build layer executors
@dobrac dobrac force-pushed the feat/per-build-cpu-tracking branch from 6cfb7ed to ac31320 Compare February 23, 2026 21:44
@dobrac dobrac assigned ValentaTomas and unassigned ValentaTomas Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants