fix(envd): eliminate OOM score and nice priority race in child processes#2028
fix(envd): eliminate OOM score and nice priority race in child processes#2028
Conversation
The previous approach set the child's oom_score_adj after cmd.Start(), leaving a race window where grandchildren could spawn and inherit the parent's protected OOM score (-1000) before it was corrected to 100. Replace the post-start adjustOomScore call with an inline /bin/sh wrapper that writes oom_score_adj before exec-ing the user command, ensuring all descendants inherit the correct score from the start.
…rocesses Set Nice=-20 in envd.service to improve responsiveness under load. Reset nice to the default level (0) for user processes, PTY processes, and socat port-forwarding processes so only envd itself benefits from the elevated priority.
Error messages and log entries were showing the full wrapped command (sh -c 'echo 100 > /proc/...' ...) instead of the user's actual command. Extract userCommand() helper to format the original cmd+args for logging.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93360127dc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Wrap the command in a shell that sets the OOM score and nice value before exec-ing the actual command. | ||
| // This eliminates the race window where grandchildren could inherit the parent's protected OOM score (-1000) | ||
| // or high CPU priority (nice -20) before the post-start calls had a chance to correct them. | ||
| oomWrapperScript := fmt.Sprintf(`echo %d > /proc/$$/oom_score_adj && exec nice -n %d "${@}"`, defaultOomScore, defaultNice) |
There was a problem hiding this comment.
If writing to /proc/$$/oom_score_adj fails (e.g., capability restrictions in certain environments), the && causes the user's command to silently not run at all. The previous adjustOomScore logged the error and continued executing. Consider using ; instead of &&, or redirecting stderr and ignoring the write failure, to preserve the old behavior where an OOM adjustment failure is non-fatal.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
nice(1) applies a relative adjustment, not absolute. Since envd runs at nice -20, 'nice -n 0' was a no-op. Read the current nice via Getpriority and compute the delta to defaultNice, so child processes are correctly reset to nice 0. Also use absolute path /usr/bin/nice to avoid PATH issues.
Summary
oom_score_adjand resets nice prioritybefore
exec, eliminating the race where grandchildren could inherit envd'sprotected OOM score (-1000) or elevated CPU priority (nice -20)
Nice=-20on the envd systemd service for improved daemon responsiveness under loadand structured logs
Motivation
Previously,
adjustOomScoreandresetNicewere called aftercmd.Start(),leaving a race window where the child process (and any grandchildren it spawns quickly)
could run with envd's inherited OOM score of -1000 and nice of -20. Under specific
scenarios — e.g. shells that immediately fork subprocesses — grandchildren would be
born before the parent's values were corrected, making them effectively immune to the
OOM killer and running at elevated CPU priority.
The fix uses an inline
/bin/shwrapper:This sets the OOM score and nice value atomically before
exec-ing the user's command,so all descendants inherit the correct values from the very first instruction.
Note on the logging commit
The commit that strips the wrapper from logged commands (
cf306862e) can be excludedfrom this PR if desired. It is included because without it, users would see confusing
log output like
/bin/sh -c echo 100 > /proc/$$/oom_score_adj && exec nice -n 0 ...instead of their actual command. The tradeoff is that if the command fails inside the
wrapper (e.g.
/bin/shitself errors), the error message won't show the full wrappedcommand. I believe the cleaner user experience is worth it in practice.
Obsoletes
Obsoletes #2003 — this PR subsumes the nice priority feature from that PR and fixes
the race condition in its
resetNiceapproach.Closes: #2003
Note
Cursor Bugbot is generating a summary for commit 9336012. Configure here.