fix: detect half-open WebSocket connections via read-idle watchdog#22
Merged
Conversation
Add a read-idle watchdog to the streaming WS loop: any inbound frame (data, a server Ping, or the Pong replying to our 30s keepalive ping) resets a 90s deadline; if it elapses the socket is treated as half-open (a zombie connection where the OS still reports ESTABLISHED but the peer is silently gone) and a reconnect is forced. This closes the gap where a silently-dropped connection — common on mobile/Android with WiFi power-save or CGNAT — was never detected, because the read side blocked forever and pongs were swallowed by a catch-all arm. Also jitter the reconnect backoff (Equal Jitter, [backoff/2, backoff]) to de-sync reconnects across accounts while keeping a floor, so a fast-failing connect can't spin into a hot retry loop. Refs hitalin/notedeck#506, hitalin/notedeck#507 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
Android で WebSocket 常時接続が切れる問題(hitalin/notedeck#506 / #507)の土台修正。設計の詳細は hitalin/notedeck#640 §B / §E。
根本原因
ws_loopの read 側に死活検知が無く、半開(ゾンビ)接続を検知できなかった:_ => {}で握り潰し)read.next()が無限ブロック → サーバー送信が静かに止まっても OS の TCP 再送タイムアウト(分オーダー)まで顕在化しないブラウザ/PWA は内蔵 WS が死活管理を肩代わりするため目立たないが、生 WS を自前管理する notecli では Android(WiFi 省電力 / CGNAT)で顕在化していた。
変更(
src/streaming.rsのみ)Some(Ok(_))(data / server Ping / 自分の ping への Pong)= あらゆる inbound フレームで 90s 締切をリセットし、elapse したらWsExitReason::Disconnectedを返して既存の指数バックオフ再接続に合流。catch-all_ => {}は維持。WS_READ_IDLE_TIMEOUT = 90s(WS_PING_INTERVAL = 30sの 3 倍。Pong 2 回欠落まで許容)。wsライブラリ autoPong)が client ping に Pong を返す」前提に依存。Some(Ok(_))全般でリセットするため、通常トラフィックでも延命でき誤切断に強い。[backoff/2, backoff]にランダム化(backoff_secsは書き戻さない)。複数アカウントの同時再接続を脱同期しつつ、fast-fail での hot retry loop を floor で防ぐ。テスト
[backoff/2, backoff]内(1000 サンプル × 6 段、from_secs_f64panic 回避も担保)WS_READ_IDLE_TIMEOUT >= WS_PING_INTERVAL * 3の不変条件ローカルで CI 同条件(
--no-default-features)の build / clippy-D warnings/ test を確認済み。注意
Refs hitalin/notedeck#506, hitalin/notedeck#507