Skip to content

fix(engine): respect cgroup memory limits in low-memory detection#1373

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/cgroup-memory-limit
Jun 12, 2026
Merged

fix(engine): respect cgroup memory limits in low-memory detection#1373
miguel-heygen merged 1 commit into
mainfrom
fix/cgroup-memory-limit

Conversation

@miguel-heygen

Copy link
Copy Markdown
Collaborator

Problem

getSystemTotalMb() returned os.totalmem() — the host's physical RAM, not the container limit. A 4 GB Docker container on a 32 GB host never auto-flagged as low-memory, so the low-memory render profile (frame-cache sizing, Chrome heap flags, worker count) failed to activate exactly where it's needed most: Docker, the documented fallback when host rendering fails. The blind spot was already documented in the module's own comment; it produced three identical Docker-timeout reports in a single day (#1193, #1194, #1195) plus #1236, with PRODUCER_LOW_MEMORY_MODE as the only escape hatch.

Fix

  • Read the cgroup v2 limit (/sys/fs/cgroup/memory.max) with a v1 fallback (memory.limit_in_bytes, its no-limit sentinel ignored via a 2^60-byte cutoff), parse via a pure exported function, and return min(host, cgroup).
  • Best-effort probe: any unreadable cgroup file degrades to host-RAM detection — it can never fail the render pipeline. Non-Linux platforms never touch /sys.
  • Public API unchanged; config.ts, browserManager.ts, and parallelCoordinator.ts inherit the fix through getSystemTotalMb(). The now-outdated caveat comment is updated, keeping the PRODUCER_LOW_MEMORY_MODE guidance for environments without readable cgroup files.

Tests

Full matrix: v2 numeric limit honored, v2 max ignored, v1 numeric honored, v1 sentinel ignored, absent files → host value, malformed content → host value, limit larger than host → host wins, v2 precedence over v1, and non-Linux platforms never read /sys (asserted via a throwing mock). The headline repro: 4 GiB cgroup limit on a 32 GiB host now flags low-memory.

Engine suite 715/715, typecheck clean, oxlint/oxfmt clean.

Refs #1193, #1194, #1195, #1236

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix, clean tests, full cgroup v1/v2 sentinel matrix covered. Low-memory render profile will finally activate inside the documented Docker fallback path. No blockers. Two concerns worth a tweak before merge (observability gap on cgroup-read failure, and getSystemTotalMb() now does sync FS reads on every call with no caching across ~7 production call sites), plus a few small nits.

Concerns (non-blocking but worth addressing):

  • packages/engine/src/services/systemMemory.ts:80-88readCgroupFile() swallows every error path silently. The PR body emphasizes "best-effort probe… can never fail the render pipeline" which is correct, but the rubric on this file is "if detection degrades, we should know." Today: a misconfigured container where /sys/fs/cgroup/memory.max exists but is EACCES → silently falls back to os.totalmem(), identical observable behavior to a host with no cgroup at all. The repro from the PR body (4 GiB container on 32 GiB host) would silently not engage low-memory mode and we'd be back to the #1193/#1194/#1195 timeout shape with no log signal. Suggest a one-shot log.warn (debug-level fine) on first failure, gated to fire at most once per process — keeps the never-fail-pipeline guarantee while giving oncall a needle to grep.

  • packages/engine/src/services/systemMemory.ts:14-20 (getSystemTotalMb) — no module-level cache. Production call sites I traced: config.ts:242, 249, 283 (via isLowMemorySystem default arg), browserManager.ts:520, 527, renderOrchestrator.ts:1702 — that's 5-6 sync readFileSync calls per resolveConfig() + per-job log. Sysfs is fast (~µs each) so this isn't a render-perf regression, but the cgroup limit is invariant for the process lifetime — a let cached: number | null = null memo at module scope would eliminate the repeated reads with zero behavior change. Same shape probeNvidiaVramMb already uses via _cachedVramMb at browserManager.ts:507, so there's a precedent in the codebase.

Nits:

  • packages/engine/src/services/systemMemory.ts:16CGROUP_V1_NO_LIMIT_CUTOFF_BYTES = 2n ** 60n (~1.15 EiB) is a clever sentinel-catcher (handles 9223372036854771712, LONG_MAX, and unsigned -1 all at once), but the rationale isn't in a comment. A future reader will wonder why 2^60 specifically vs the literal sentinel. One-liner: // Catches the canonical v1 "no limit" sentinel (~8 EiB) plus LONG_MAX / unsigned -1 variants; well above any plausible real container limit.

  • PR body says parallelCoordinator.ts… inherits the fix through getSystemTotalMb(). I grepped both packages/engine/src/services/parallelCoordinator.ts and packages/producer/src/services/parallelCoordinator.ts — neither imports systemMemory or calls getSystemTotalMb/isLowMemorySystem directly. They presumably get it via cfg.lowMemoryMode flowing through, which is fine but the body's phrasing implies a direct call. Trivial, but worth fixing for the next person reading the PR for context.

  • systemMemory.ts:43-47 (the updated caveat comment) — "should set PRODUCER_LOW_MEMORY_MODE explicitly rather than relying on auto-detection" still tells Lambda users to set the env var, which is now only true for the no-readable-cgroup case (your fix). Reads slightly stale — Lambda actually does expose cgroup-v2 these days. Minor wording cleanup, optional.

Things NOT in the diff worth checking:

  • No log emission anywhere when cgroup IS read successfully. When the fix engages (cgroup limit < host RAM, low-memory mode activates), the existing renderOrchestrator.ts:1697-1703 log fires with totalMemMb — but that just shows the final number, not where it came from. Adding source: "cgroup_v2" | "cgroup_v1" | "host" would make the next "why didn't low-memory engage on this container?" investigation trivial. Especially important since the whole motivation here is "we couldn't tell which detection path was running."

  • Edge case not in the test matrix: container with cgroup file present but 0 bytes content. Some sandboxes / chroots create the file but never populate it. The current parsePositiveByteLimitMb treats empty string as malformed (!/^\d+$/.test("") → returns null → falls back to host). Good behavior, but it's only tested via the malformed-"" parametrized case at systemMemory.test.ts:204-219, not the full getSystemTotalMb flow with both files-empty. Belt-and-suspenders nice-to-have, not a real gap.

  • Memory-cgroup nesting via systemd-managed user slices. On dev laptops running rootless Docker via systemd-user, the path becomes /sys/fs/cgroup/user.slice/user-1000.slice/.../memory.max, not the canonical paths the diff reads. The fix correctly falls back to host RAM (which is what dev laptops want anyway, since the threshold catches them), so this isn't a regression — but if anyone later tries to "improve" this to read /proc/self/cgroup to find the right path, that's a classic foot-gun (the v1 controller line vs v2 unified line format differ, format changes across kernel versions). Worth a code comment near CGROUP_V2_MEMORY_MAX_PATH saying "we deliberately read the well-known paths, not /proc/self/cgroup-derived paths, to avoid the parsing complexity."

  • Failing CI shard (regression, regression-shards (shard-5)) — both look unrelated to this diff (touches no rendering / regression-suite code), but worth a glance at the shard-5 log to confirm before merge so it doesn't get retconned into "memory detection broke regression."

  • Determinism contract — verified, no concern. All 7 production callers are config-resolution / per-job startup. Nothing render-hot calls getSystemTotalMb, so the new sync FS read can't leak into render-time and doesn't violate the no-IO-in-render-path contract.

Review by Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the fix/cgroup-memory-limit branch from ccda96a to c22c8b8 Compare June 12, 2026 15:32
@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Both reviews addressed:

  • parallelCoordinator: confirmed the direct os.totalmem() reads — calculateOptimalWorkers and getSystemResources didn't inherit anything via cfg, so worker sizing was still host-RAM-based in containers. Both now use getSystemTotalMb(). CLI telemetry reports the effective total too (the cli→engine edge already existed; cli/tsconfig.json gains the gcp-cloud-run/sdk source alias matching the existing producer/aws-lambda entries so cli typechecks from source). A repo-wide grep confirms the only remaining totalmem() call sites live inside systemMemory.ts.
  • Probe detectability + cost: the cgroup result is now cached for the process lifetime (mirroring the _cachedVramMb pattern, with a test reset hook), and a present-but-unreadable cgroup file warns once per process instead of silently falling back — absence (ENOENT) stays silent. Added the 2^60 cutoff rationale comment and refreshed the stale serverless caveat.

Regression shard failures were the org-wide buildx cache outage (image build died before any render ran); the force-push retriggers a clean run.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 — ready to stamp. Both R1 concerns resolved (warn-once + cached probe), three of five nits/should-checks landed, the two CI failures at HEAD are pre-existing main-branch breakage from an unrelated studio PR (not from this diff). Two cosmetic items still open (#6 source log on success path, #8 /proc/self/cgroup foot-gun comment) — both genuinely optional, not blocking. Bonus: you pulled in a real worker-sizing fix I didn't catch in R1.

R1 carry-over status:

  1. EACCES log emissionsystemMemory.ts:117-124 warnCgroupReadFailure(): once-per-process gated via _warnedCgroupReadFailure, only fires on non-ENOENT/ENOTDIR codes (absence stays silent — exactly the rubric), test at systemMemory.test.ts:255-275 verifies once-only + EACCES surfaced + ENOENT silent.
  2. Module-level cachesystemMemory.ts:21 _cachedCgroupLimitMb + getCgroupLimitMb() short-circuit at :77. Same shape as _cachedVramMb. Test-reset hook _resetCgroupLimitCacheForTests exposed at :71. Subtle but correct: undefined = not yet probed, null = probed/no limit, number = cached limit.
  3. 2^60 rationale commentsystemMemory.ts:18 now reads // Kernel no-limit sentinel is page-rounded 2^63-1 (~9223372036854771712); >= 2^60 is implausible as a real limit.
  4. PR body fixedparallelCoordinator.ts now genuinely does call getSystemTotalMb() directly (parallelCoordinator.ts:29,157,433). Went one better than my suggestion: instead of correcting the misclaim, made it true. calculateOptimalWorkers + getSystemResources now respect container limits — that was a real worker-sizing bug in containers I missed in R1.
  5. Lambda caveat refreshsystemMemory.ts:157-163 now reads "Lambda tiers with readable cgroup ceilings, inherit the tighter container limit" and reserves the env-var guidance for environments that hide cgroup files. Accurate.
  6. Source-of-truth log (source: "cgroup_v2" | "cgroup_v1" | "host" on the renderOrchestrator success path) — not added. renderOrchestrator.ts:1697-1703 untouched. Non-blocking; can be a follow-up if it bites again.
  7. Empty-bytes edge casesystemMemory.test.ts:280-310 it.each(["", "garbage", "-1", "0"]) now runs the full getSystemTotalMb flow with file content mocked, not just the parser directly. Belt-and-suspenders covered.
  8. /proc/self/cgroup foot-gun comment — not added near CGROUP_V2_MEMORY_MAX_PATH (:16). Pure cosmetic, leave it.
  9. CI shard failures at R1 — diagnosed as org-wide buildx cache outage, force-push retriggered. The remaining failures at HEAD aren't from this PR (see below).
  10. N/A — determinism contract, no concern, already verified in R1.

Bonus catch: cli/src/telemetry/system.ts now also reports getSystemTotalMb() instead of totalmem(), and cli/tsconfig.json gains the gcp-cloud-run/sdk source alias (matching existing producer/aws-lambda entries) so a fresh checkout cli-typecheck resolves from source. Repo-wide grep confirms totalmem() now only lives inside systemMemory.ts itself — single source of truth.

CI status at HEAD c22c8b8:

  • Engine Test ✅, Typecheck ✅, Build ✅, CLI smoke (required) ✅, Smoke: global install ✅, runtime + studio + SDK smokes all ✅, CodeQL all ✅.
  • regression / preview-regression jobs ❌ at HEAD — but the Preflight (lint + format) sub-jobs failed (not the actual regression shards), which means the failures cascade from:
  • CI / Lint ❌ — packages/studio/src/components/editor/DomEditOverlay.tsx: oxlint says onSelectElementById declared but never used. Untouched by this PR.
  • CI / Format ❌ — same file DomEditOverlay.tsx has an oxfmt formatting issue. Untouched by this PR.
  • Verdict: both CI failures are pre-existing main-branch breakage from a separate PR's studio work. Nothing in #1373 touches packages/studio/. Heads-up to merge-main once that's fixed; otherwise the regression shards stay red and the green state is masked.

Anything NEW worth flagging:

  • parallelCoordinator.ts:157 (calculateOptimalWorkers) and :433 (getSystemResources) now correctly use getSystemTotalMb() — this is meaningful beyond what the original PR scope advertised. Workers-per-container will be sized off the cgroup limit, not host RAM. Worth verifying once in a real 4 GiB container that recommendedWorkers drops accordingly — but no smoke gap to block on, MEMORY_PER_WORKER_MB-based math is well-tested.

R2 review by Rames D Jusso

getSystemTotalMb returned os.totalmem() — the host's physical RAM — so a
4GB Docker container on a 32GB host never auto-flagged as low-memory and
the low-memory render profile didn't activate exactly where it's needed
most. Read the cgroup v2 limit (/sys/fs/cgroup/memory.max, with the v1
fallback and its no-limit sentinel handled) and use min(host, cgroup).
The probe is best-effort and non-Linux platforms never touch /sys.

Review follow-ups: worker sizing (calculateOptimalWorkers) and the
getSystemResources diagnostics previously read os.totalmem() directly
and now use getSystemTotalMb(), so container limits actually govern
parallel spawn decisions; CLI telemetry reports the effective total as
well. The cgroup probe result is cached for the process lifetime (the
limit is immutable per process) with a test reset hook; a detected limit
logs once so operators can see which source governs, and a
present-but-unreadable cgroup file warns once instead of failing
silently — absence stays silent. The root-path-vs-/proc/self/cgroup
trade-off is documented at the path constants. cli/tsconfig.json gains
the gcp-cloud-run/sdk source alias (matching the existing producer and
aws-lambda entries) so the cli typecheck resolves from source in a
fresh checkout.

Refs #1193, #1194, #1195, #1236
@miguel-heygen miguel-heygen force-pushed the fix/cgroup-memory-limit branch from 233b467 to 7d1bd99 Compare June 12, 2026 15:56
@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

R2 follow-ups: rebased onto current main (past the DomEditOverlay lint breakage), plus the two cosmetic items — a detected cgroup limit now logs once on the success path so operators can see which source governs memory-adaptive behaviour, and the root-paths-vs-/proc/self/cgroup trade-off is documented at the path constants (container namespace root is the case this probe exists for; nested systemd slices on bare hosts intentionally fall back to total-RAM detection).

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R3 — green to merge. Item #8 fully addressed, item #6 addressed via a different (legitimate) approach, all required CI green at HEAD 7d1bd99.

R2 open-item status:

  • Item #8 (foot-gun comment)packages/engine/src/services/systemMemory.ts:16-22. Multi-line comment above CGROUP_V2_MEMORY_MAX_PATH explicitly warns future readers off /proc/self/cgroup: explains namespace-root mount semantics, calls out the nested-slice case on systemd hosts, and justifies why bare-host nesting is acceptable not-covered. Exactly what was asked.
  • ⚠️ Item #6 (source-of-truth log) — partially-addressed, alternative approach. The [Render] Low-memory render profile active … log at producer/src/services/renderOrchestrator.ts:1697-1703 still emits only {totalMemMb, thresholdMb} — no source: cgroup_v2 | cgroup_v1 | host field. Instead added a one-shot detection log at systemMemory.ts:96-100: [SystemMemory] cgroup memory limit detected: N MiB — it governs memory-adaptive render behaviour instead of host RAM. Operationally this gives oncalls the same signal (cgroup engaged vs silent fallback to host), gated by _cachedCgroupLimitMb !== null so it fires once per process. Minor gap: the message doesn't disambiguate v1 vs v2 ("detected" is the same string for both paths). Non-blocking — the value the source-tag would unlock (knowing which detection path engaged) is mostly there.

CI status post-rebase:

All 7 required checks pass at HEAD 7d1bd99: Build, Lint, Format, Typecheck, Test, Test: runtime contract, Render/Tests on windows-latest. The R2 DomEditOverlay.tsx Lint/Format failure is gone — fix landed independently as #1377 (fix(studio): remove unused onSelectElementById after indicator removal) and the rebase pulled it in. The 8 regression-shards are still in-progress but not in the required-merge set.

Anything NEW since R2:

  • Single rebased commit 7d1bd99. PR scope unchanged: 5 files (cli/telemetry/system.ts, cli/tsconfig.json, engine/parallelCoordinator.ts, engine/systemMemory.ts, engine/systemMemory.test.ts). No drive-by additions, no scope creep.
  • Nice net-new touch: getCgroupLimitMb() now logs once on first success AND warnCgroupReadFailure() warns once on present-but-unreadable files (gated by _warnedCgroupReadFailure), with absence staying silent. The dual-flag pattern means a misconfigured container surfaces a one-line WARN rather than silently falling back. Good defensive choice.

R3 review by Rames D Jusso

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@miguel-heygen miguel-heygen merged commit 2ce5b42 into main Jun 12, 2026
42 checks passed
@miguel-heygen miguel-heygen deleted the fix/cgroup-memory-limit branch June 12, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants