Skip to content

fix: prevent Linux gateway mis-kill and calibration data resurrection#330

Closed
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-bug-investigation-a75a
Closed

fix: prevent Linux gateway mis-kill and calibration data resurrection#330
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-bug-investigation-a75a

Conversation

@cursor

@cursor cursor Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Daily critical-bug scan found two high-severity correctness issues and fixes them with minimal, targeted changes.

Bug 1: Linux gateway zombie cleanup kills unrelated processes

Impact: On Linux, cleanup_zombie_gateway_processes used fuser + kill -9 on every PID bound to the gateway port, with no command-line or /health checks. During gateway restart timeouts or stop failures, this could terminate unrelated services sharing the port, or kill a healthy Gateway that was still starting up.

Root cause: Linux implementation was added as a blunt port-based kill, while Windows was later hardened (#244, #446) with cmdline verification, /health retries, and adoption of healthy external instances — Linux was never updated.

Fix: Port the Windows safety strategy to Linux: find listeners via ss/lsof, verify openclaw gateway cmdline from /proc, retry /health before killing, and adopt healthy external PIDs.

Bug 2: Calibration inherit resurrects deleted config from .bak

Impact: Running inherit-mode calibration after intentionally removing providers/channels could restore deleted settings, silently undoing user edits.

Root cause: select_calibration_source preferred whichever of openclaw.json or openclaw.json.bak had a higher "richness" score. Every write_openclaw_config copies the previous file to .bak, so after a slimming edit the backup is always richer than current.

Fix: Prefer current whenever it is non-empty; only fall back to backup when current is effectively empty (score 0). Mirrored in scripts/dev-api.js.

Validation

  • Added Rust regression tests for select_calibration_source (richer-backup vs empty-current cases)
  • Full cargo test not run in this environment (missing GTK system deps); logic mirrors proven Windows gateway cleanup path

Files changed

  • src-tauri/src/commands/service.rs
  • src-tauri/src/commands/config.rs
  • scripts/dev-api.js
Open in Web View Automation 

Linux cleanup_zombie_gateway_processes previously used fuser + kill -9 on
every PID bound to the gateway port, with no cmdline or /health checks.
This could terminate unrelated services or healthy Gateways during restart
timeouts. Align Linux with the Windows strategy: verify openclaw gateway
cmdline, retry /health, only kill unresponsive zombies, and adopt healthy
external instances.

Calibration inherit mode preferred the richer openclaw.json.bak over the
current file. Because every write copies the previous config to .bak,
intentional removals (providers/channels) could be resurrected on the next
calibration. Prefer current whenever it is non-empty; only fall back to
backup when current is effectively empty.

Add regression tests for calibration source selection and mirror the fix
in dev-api.js.

Co-authored-by: 晴天 <1186258278@users.noreply.github.com>
@1186258278

Copy link
Copy Markdown
Contributor

已在 main 通过我们自己的提交吸收并发布:0a65ea7 / v0.18.5。\n\n覆盖内容:\n- Linux Gateway 清理先校验进程命令行和 /health,避免 fuser 盲杀\n- 校准继承优先使用当前非空配置,避免 .bak 恢复用户主动删除的配置\n- Web/Rust 两侧都补了回归测试\n\nCI 与 release workflow 均已通过,发布页:https://github.com/qingchencloud/clawpanel/releases/tag/v0.18.5\n\n这个 draft PR 保留会造成重复,先关闭。

@1186258278

Copy link
Copy Markdown
Contributor

已由 main 的 0a65ea7 / v0.18.5 吸收并发布,关闭重复 draft PR。

@1186258278 1186258278 closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants