fix(engine): respect cgroup memory limits in low-memory detection#1373
Conversation
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Solid fix, clean tests, full cgroup v1/v2 sentinel matrix covered. Low-memory render profile will finally activate inside the documented Docker fallback path. No blockers. Two concerns worth a tweak before merge (observability gap on cgroup-read failure, and getSystemTotalMb() now does sync FS reads on every call with no caching across ~7 production call sites), plus a few small nits.
Concerns (non-blocking but worth addressing):
-
packages/engine/src/services/systemMemory.ts:80-88—readCgroupFile()swallows every error path silently. The PR body emphasizes "best-effort probe… can never fail the render pipeline" which is correct, but the rubric on this file is "if detection degrades, we should know." Today: a misconfigured container where/sys/fs/cgroup/memory.maxexists but is EACCES → silently falls back toos.totalmem(), identical observable behavior to a host with no cgroup at all. The repro from the PR body (4 GiB container on 32 GiB host) would silently not engage low-memory mode and we'd be back to the #1193/#1194/#1195 timeout shape with no log signal. Suggest a one-shotlog.warn(debug-level fine) on first failure, gated to fire at most once per process — keeps the never-fail-pipeline guarantee while giving oncall a needle to grep. -
packages/engine/src/services/systemMemory.ts:14-20(getSystemTotalMb) — no module-level cache. Production call sites I traced:config.ts:242, 249, 283(viaisLowMemorySystemdefault arg),browserManager.ts:520, 527,renderOrchestrator.ts:1702— that's 5-6 syncreadFileSynccalls perresolveConfig()+ per-job log. Sysfs is fast (~µs each) so this isn't a render-perf regression, but the cgroup limit is invariant for the process lifetime — alet cached: number | null = nullmemo at module scope would eliminate the repeated reads with zero behavior change. Same shapeprobeNvidiaVramMbalready uses via_cachedVramMbatbrowserManager.ts:507, so there's a precedent in the codebase.
Nits:
-
packages/engine/src/services/systemMemory.ts:16—CGROUP_V1_NO_LIMIT_CUTOFF_BYTES = 2n ** 60n(~1.15 EiB) is a clever sentinel-catcher (handles9223372036854771712,LONG_MAX, and unsigned-1all at once), but the rationale isn't in a comment. A future reader will wonder why 2^60 specifically vs the literal sentinel. One-liner:// Catches the canonical v1 "no limit" sentinel (~8 EiB) plus LONG_MAX / unsigned -1 variants; well above any plausible real container limit. -
PR body says
parallelCoordinator.ts… inherits the fix through getSystemTotalMb(). I grepped bothpackages/engine/src/services/parallelCoordinator.tsandpackages/producer/src/services/parallelCoordinator.ts— neither importssystemMemoryor callsgetSystemTotalMb/isLowMemorySystemdirectly. They presumably get it viacfg.lowMemoryModeflowing through, which is fine but the body's phrasing implies a direct call. Trivial, but worth fixing for the next person reading the PR for context. -
systemMemory.ts:43-47(the updated caveat comment) — "should setPRODUCER_LOW_MEMORY_MODEexplicitly rather than relying on auto-detection" still tells Lambda users to set the env var, which is now only true for the no-readable-cgroup case (your fix). Reads slightly stale — Lambda actually does expose cgroup-v2 these days. Minor wording cleanup, optional.
Things NOT in the diff worth checking:
-
No log emission anywhere when cgroup IS read successfully. When the fix engages (cgroup limit < host RAM, low-memory mode activates), the existing
renderOrchestrator.ts:1697-1703log fires withtotalMemMb— but that just shows the final number, not where it came from. Addingsource: "cgroup_v2" | "cgroup_v1" | "host"would make the next "why didn't low-memory engage on this container?" investigation trivial. Especially important since the whole motivation here is "we couldn't tell which detection path was running." -
Edge case not in the test matrix: container with cgroup file present but
0 bytescontent. Some sandboxes / chroots create the file but never populate it. The currentparsePositiveByteLimitMbtreats empty string as malformed (!/^\d+$/.test("")→ returns null → falls back to host). Good behavior, but it's only tested via the malformed-""parametrized case atsystemMemory.test.ts:204-219, not the fullgetSystemTotalMbflow with both files-empty. Belt-and-suspenders nice-to-have, not a real gap. -
Memory-cgroup nesting via systemd-managed user slices. On dev laptops running rootless Docker via systemd-user, the path becomes
/sys/fs/cgroup/user.slice/user-1000.slice/.../memory.max, not the canonical paths the diff reads. The fix correctly falls back to host RAM (which is what dev laptops want anyway, since the threshold catches them), so this isn't a regression — but if anyone later tries to "improve" this to read/proc/self/cgroupto find the right path, that's a classic foot-gun (the v1 controller line vs v2 unified line format differ, format changes across kernel versions). Worth a code comment nearCGROUP_V2_MEMORY_MAX_PATHsaying "we deliberately read the well-known paths, not/proc/self/cgroup-derived paths, to avoid the parsing complexity." -
Failing CI shard (
regression,regression-shards (shard-5)) — both look unrelated to this diff (touches no rendering / regression-suite code), but worth a glance at the shard-5 log to confirm before merge so it doesn't get retconned into "memory detection broke regression." -
Determinism contract — verified, no concern. All 7 production callers are config-resolution / per-job startup. Nothing render-hot calls
getSystemTotalMb, so the new sync FS read can't leak into render-time and doesn't violate the no-IO-in-render-path contract.
Review by Rames D Jusso
ccda96a to
c22c8b8
Compare
|
Both reviews addressed:
Regression shard failures were the org-wide buildx cache outage (image build died before any render ran); the force-push retriggers a clean run. |
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
R2 — ready to stamp. Both R1 concerns resolved (warn-once + cached probe), three of five nits/should-checks landed, the two CI failures at HEAD are pre-existing main-branch breakage from an unrelated studio PR (not from this diff). Two cosmetic items still open (#6 source log on success path, #8 /proc/self/cgroup foot-gun comment) — both genuinely optional, not blocking. Bonus: you pulled in a real worker-sizing fix I didn't catch in R1.
R1 carry-over status:
- ✅ EACCES log emission —
systemMemory.ts:117-124warnCgroupReadFailure(): once-per-process gated via_warnedCgroupReadFailure, only fires on non-ENOENT/ENOTDIRcodes (absence stays silent — exactly the rubric), test atsystemMemory.test.ts:255-275verifies once-only + EACCES surfaced + ENOENT silent. - ✅ Module-level cache —
systemMemory.ts:21_cachedCgroupLimitMb+getCgroupLimitMb()short-circuit at:77. Same shape as_cachedVramMb. Test-reset hook_resetCgroupLimitCacheForTestsexposed at:71. Subtle but correct:undefined= not yet probed,null= probed/no limit,number= cached limit. - ✅ 2^60 rationale comment —
systemMemory.ts:18now reads// Kernel no-limit sentinel is page-rounded 2^63-1 (~9223372036854771712); >= 2^60 is implausible as a real limit. - ✅ PR body fixed —
parallelCoordinator.tsnow genuinely does callgetSystemTotalMb()directly (parallelCoordinator.ts:29,157,433). Went one better than my suggestion: instead of correcting the misclaim, made it true.calculateOptimalWorkers+getSystemResourcesnow respect container limits — that was a real worker-sizing bug in containers I missed in R1. - ✅ Lambda caveat refresh —
systemMemory.ts:157-163now reads "Lambda tiers with readable cgroup ceilings, inherit the tighter container limit" and reserves the env-var guidance for environments that hide cgroup files. Accurate. - ❌ Source-of-truth log (
source: "cgroup_v2" | "cgroup_v1" | "host"on the renderOrchestrator success path) — not added.renderOrchestrator.ts:1697-1703untouched. Non-blocking; can be a follow-up if it bites again. - ✅ Empty-bytes edge case —
systemMemory.test.ts:280-310it.each(["", "garbage", "-1", "0"])now runs the fullgetSystemTotalMbflow with file content mocked, not just the parser directly. Belt-and-suspenders covered. - ❌
/proc/self/cgroupfoot-gun comment — not added nearCGROUP_V2_MEMORY_MAX_PATH(:16). Pure cosmetic, leave it. - ✅ CI shard failures at R1 — diagnosed as org-wide buildx cache outage, force-push retriggered. The remaining failures at HEAD aren't from this PR (see below).
- N/A — determinism contract, no concern, already verified in R1.
Bonus catch: cli/src/telemetry/system.ts now also reports getSystemTotalMb() instead of totalmem(), and cli/tsconfig.json gains the gcp-cloud-run/sdk source alias (matching existing producer/aws-lambda entries) so a fresh checkout cli-typecheck resolves from source. Repo-wide grep confirms totalmem() now only lives inside systemMemory.ts itself — single source of truth.
CI status at HEAD c22c8b8:
- Engine
Test✅,Typecheck✅,Build✅,CLI smoke (required)✅,Smoke: global install✅, runtime + studio + SDK smokes all ✅, CodeQL all ✅. regression/preview-regressionjobs ❌ at HEAD — but thePreflight (lint + format)sub-jobs failed (not the actual regression shards), which means the failures cascade from:CI / Lint❌ —packages/studio/src/components/editor/DomEditOverlay.tsx: oxlint saysonSelectElementByIddeclared but never used. Untouched by this PR.CI / Format❌ — same fileDomEditOverlay.tsxhas an oxfmt formatting issue. Untouched by this PR.- Verdict: both CI failures are pre-existing main-branch breakage from a separate PR's studio work. Nothing in #1373 touches
packages/studio/. Heads-up to merge-main once that's fixed; otherwise the regression shards stay red and the green state is masked.
Anything NEW worth flagging:
parallelCoordinator.ts:157(calculateOptimalWorkers) and:433(getSystemResources) now correctly usegetSystemTotalMb()— this is meaningful beyond what the original PR scope advertised. Workers-per-container will be sized off the cgroup limit, not host RAM. Worth verifying once in a real 4 GiB container thatrecommendedWorkersdrops accordingly — but no smoke gap to block on,MEMORY_PER_WORKER_MB-based math is well-tested.
R2 review by Rames D Jusso
c22c8b8 to
233b467
Compare
getSystemTotalMb returned os.totalmem() — the host's physical RAM — so a 4GB Docker container on a 32GB host never auto-flagged as low-memory and the low-memory render profile didn't activate exactly where it's needed most. Read the cgroup v2 limit (/sys/fs/cgroup/memory.max, with the v1 fallback and its no-limit sentinel handled) and use min(host, cgroup). The probe is best-effort and non-Linux platforms never touch /sys. Review follow-ups: worker sizing (calculateOptimalWorkers) and the getSystemResources diagnostics previously read os.totalmem() directly and now use getSystemTotalMb(), so container limits actually govern parallel spawn decisions; CLI telemetry reports the effective total as well. The cgroup probe result is cached for the process lifetime (the limit is immutable per process) with a test reset hook; a detected limit logs once so operators can see which source governs, and a present-but-unreadable cgroup file warns once instead of failing silently — absence stays silent. The root-path-vs-/proc/self/cgroup trade-off is documented at the path constants. cli/tsconfig.json gains the gcp-cloud-run/sdk source alias (matching the existing producer and aws-lambda entries) so the cli typecheck resolves from source in a fresh checkout. Refs #1193, #1194, #1195, #1236
233b467 to
7d1bd99
Compare
|
R2 follow-ups: rebased onto current main (past the DomEditOverlay lint breakage), plus the two cosmetic items — a detected cgroup limit now logs once on the success path so operators can see which source governs memory-adaptive behaviour, and the root-paths-vs- |
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
R3 — green to merge. Item #8 fully addressed, item #6 addressed via a different (legitimate) approach, all required CI green at HEAD 7d1bd99.
R2 open-item status:
- ✅ Item #8 (foot-gun comment) —
packages/engine/src/services/systemMemory.ts:16-22. Multi-line comment aboveCGROUP_V2_MEMORY_MAX_PATHexplicitly warns future readers off/proc/self/cgroup: explains namespace-root mount semantics, calls out the nested-slice case on systemd hosts, and justifies why bare-host nesting is acceptable not-covered. Exactly what was asked. ⚠️ Item #6 (source-of-truth log) — partially-addressed, alternative approach. The[Render] Low-memory render profile active …log atproducer/src/services/renderOrchestrator.ts:1697-1703still emits only{totalMemMb, thresholdMb}— nosource: cgroup_v2 | cgroup_v1 | hostfield. Instead added a one-shot detection log atsystemMemory.ts:96-100:[SystemMemory] cgroup memory limit detected: N MiB — it governs memory-adaptive render behaviour instead of host RAM.Operationally this gives oncalls the same signal (cgroup engaged vs silent fallback to host), gated by_cachedCgroupLimitMb !== nullso it fires once per process. Minor gap: the message doesn't disambiguate v1 vs v2 ("detected" is the same string for both paths). Non-blocking — the value the source-tag would unlock (knowing which detection path engaged) is mostly there.
CI status post-rebase:
All 7 required checks pass at HEAD 7d1bd99: Build, Lint, Format, Typecheck, Test, Test: runtime contract, Render/Tests on windows-latest. The R2 DomEditOverlay.tsx Lint/Format failure is gone — fix landed independently as #1377 (fix(studio): remove unused onSelectElementById after indicator removal) and the rebase pulled it in. The 8 regression-shards are still in-progress but not in the required-merge set.
Anything NEW since R2:
- Single rebased commit
7d1bd99. PR scope unchanged: 5 files (cli/telemetry/system.ts,cli/tsconfig.json,engine/parallelCoordinator.ts,engine/systemMemory.ts,engine/systemMemory.test.ts). No drive-by additions, no scope creep. - Nice net-new touch:
getCgroupLimitMb()now logs once on first success ANDwarnCgroupReadFailure()warns once on present-but-unreadable files (gated by_warnedCgroupReadFailure), with absence staying silent. The dual-flag pattern means a misconfigured container surfaces a one-line WARN rather than silently falling back. Good defensive choice.
R3 review by Rames D Jusso
Problem
getSystemTotalMb()returnedos.totalmem()— the host's physical RAM, not the container limit. A 4 GB Docker container on a 32 GB host never auto-flagged as low-memory, so the low-memory render profile (frame-cache sizing, Chrome heap flags, worker count) failed to activate exactly where it's needed most: Docker, the documented fallback when host rendering fails. The blind spot was already documented in the module's own comment; it produced three identical Docker-timeout reports in a single day (#1193, #1194, #1195) plus #1236, withPRODUCER_LOW_MEMORY_MODEas the only escape hatch.Fix
/sys/fs/cgroup/memory.max) with a v1 fallback (memory.limit_in_bytes, its no-limit sentinel ignored via a 2^60-byte cutoff), parse via a pure exported function, and returnmin(host, cgroup)./sys.config.ts,browserManager.ts, andparallelCoordinator.tsinherit the fix throughgetSystemTotalMb(). The now-outdated caveat comment is updated, keeping thePRODUCER_LOW_MEMORY_MODEguidance for environments without readable cgroup files.Tests
Full matrix: v2 numeric limit honored, v2
maxignored, v1 numeric honored, v1 sentinel ignored, absent files → host value, malformed content → host value, limit larger than host → host wins, v2 precedence over v1, and non-Linux platforms never read/sys(asserted via a throwing mock). The headline repro: 4 GiB cgroup limit on a 32 GiB host now flags low-memory.Engine suite 715/715, typecheck clean, oxlint/oxfmt clean.
Refs #1193, #1194, #1195, #1236