Summary
Add platform-owned aggregate safety limits for evaluation logs and challenge run manifests. This is not required for the small-scale MVP launch, but should be tracked before broader public usage.
Background
The runner already enforces a mandatory per-container Docker log cap and truncates collected container logs before persistence. We intentionally should not expose log_limit_bytes as a challenge-owner setting, because log caps are platform safety policy rather than benchmark semantics.
The remaining gap is aggregate size: one evaluation can involve multiple setup/build/run/scorer/prepare containers, and challenge-owned static or prepared run manifests can contain many runs. Even with per-container caps, total persisted runner.log size and total invocation count can grow with the number of containers/runs.
Proposed work
- Add a platform-owned maximum run count for static and prepared run manifests.
- Add an aggregate per-evaluation persisted log cap.
- Keep per-container Docker log caps mandatory and platform-owned.
- Make validation/prepared-manifest errors clear when limits are exceeded.
- Document the limits in operations and solution protocol docs.
Non-goals
- Do not add
log_limit_bytes or similar knobs to challenge-owner configs.
- Do not make these limits benchmark semantics. They should remain admin/platform policy.
Priority
Post-MVP hardening. Small-scale MVP publish can proceed without this, but the issue should be resolved before broader public launch.
Summary
Add platform-owned aggregate safety limits for evaluation logs and challenge run manifests. This is not required for the small-scale MVP launch, but should be tracked before broader public usage.
Background
The runner already enforces a mandatory per-container Docker log cap and truncates collected container logs before persistence. We intentionally should not expose
log_limit_bytesas a challenge-owner setting, because log caps are platform safety policy rather than benchmark semantics.The remaining gap is aggregate size: one evaluation can involve multiple setup/build/run/scorer/prepare containers, and challenge-owned static or prepared run manifests can contain many runs. Even with per-container caps, total persisted
runner.logsize and total invocation count can grow with the number of containers/runs.Proposed work
Non-goals
log_limit_bytesor similar knobs to challenge-owner configs.Priority
Post-MVP hardening. Small-scale MVP publish can proceed without this, but the issue should be resolved before broader public launch.