ai-assisted=yes
Current behavior
Thresholder runs at garden pre-start and computes clean.threshold_bytes from disk available space minus reserved. It never checks the actual XFS store filesystem capacity.
In the default configuration (modernCalculator), threshold_bytes equals store_size_bytes. However, XFS metadata overhead means the filesystem's actual reportable capacity is always slightly less than the backing file size. The GC trigger condition (committedQuota + totalVolumesSize >= threshold_bytes) becomes mathematically unsatisfiable — GC never fires.
Over time, orphaned layers from app push/delete cycles accumulate. The store eventually fills → ENOSPC → all container creation fails on that cell.
Root cause
// main.go — threshold computed from disk, never validated against store
config.Clean.ThresholdBytes = calc.CalculateGCThreshold()
The XFS store at config.StorePath is a fixed-size loop mount. Its actual filesystem capacity (as reported by statfs(2)) is always less than the computed threshold due to:
- XFS metadata overhead (~11 MB): Even in the default config,
threshold_bytes = store_size_bytes > actual XFS capacity. GC condition is unreachable on every cell.
- Store/threshold mismatch (observed in production): In some cases the store is significantly smaller than the threshold (e.g. 35 GB store vs 268 GB threshold on a 284 GB disk). The exact cause of the undersized store is not determined, but the fix handles it regardless.
Impact
Every landscape running with grootfs.routine_gc=false (the default) is affected. Cells degrade over time as orphaned layers accumulate with no GC to reclaim space.
Desired behavior
Cap threshold_bytes at the store filesystem's total capacity (statfs(2) → Blocks * Bsize) so GC can always fire before the store is full.
Affected Version
garden-runc-release v1.86.0+ (pattern existed since thresholder introduction).
Fix
cloudfoundry/garden-runc-release#394
ai-assisted=yes
Current behavior
Thresholder runs at garden pre-start and computes
clean.threshold_bytesfrom disk available space minus reserved. It never checks the actual XFS store filesystem capacity.In the default configuration (
modernCalculator),threshold_bytesequalsstore_size_bytes. However, XFS metadata overhead means the filesystem's actual reportable capacity is always slightly less than the backing file size. The GC trigger condition (committedQuota + totalVolumesSize >= threshold_bytes) becomes mathematically unsatisfiable — GC never fires.Over time, orphaned layers from app push/delete cycles accumulate. The store eventually fills → ENOSPC → all container creation fails on that cell.
Root cause
The XFS store at
config.StorePathis a fixed-size loop mount. Its actual filesystem capacity (as reported bystatfs(2)) is always less than the computed threshold due to:threshold_bytes=store_size_bytes> actual XFS capacity. GC condition is unreachable on every cell.Impact
Every landscape running with
grootfs.routine_gc=false(the default) is affected. Cells degrade over time as orphaned layers accumulate with no GC to reclaim space.Desired behavior
Cap
threshold_bytesat the store filesystem's total capacity (statfs(2)→Blocks * Bsize) so GC can always fire before the store is full.Affected Version
garden-runc-release v1.86.0+ (pattern existed since thresholder introduction).
Fix
cloudfoundry/garden-runc-release#394