fix(socket): follow HTTP 3xx during WebSocket handshake (OPENHUMAN-TAURI-9X)#1547
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis change introduces HTTP redirect following during WebSocket upgrade and persistent error state tracking. The socket manager now stores user-visible connection warnings, exposing them via state queries and clearing them on connect/disconnect lifecycle events. WebSocket upgrade follows HTTP 301/302/307/308 redirects up to a configured hop limit, records a one-shot warning for followed permanent redirects, and pins the resolved URL for subsequent reconnect attempts. ChangesWebSocket Redirect and Error Handling
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/socket/ws_loop.rs`:
- Around line 191-192: The code clears shared.error immediately after
connect_async returns, which can drop a persisted warning before the EIO/SIO
handshake completes; instead, remove the early assignment that sets shared.error
to None right after connect_async and only clear shared.error in the
established/connected-state block (the existing place later where shared.error
is set to None after the handshake succeeds) so errors persist until the full
handshake completes; locate uses of shared.error and the connect_async result in
ws_loop.rs and delete/move the early clear to the
connected-state/handshake-success section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dfc37142-e77a-4363-a808-e53da0e916bd
📒 Files selected for processing (5)
src/openhuman/socket/event_handlers.rssrc/openhuman/socket/manager.rssrc/openhuman/socket/types.rssrc/openhuman/socket/ws_loop.rssrc/openhuman/socket/ws_loop_tests.rs
…#1547) `tokio_tungstenite::connect_async` returns an error on any non-101 upgrade response, so an HTTP 3xx (typical when `BACKEND_URL` is plain `http://...` and Cloudflare 301s to `https://`) used to be fatal — the reconnect loop hammered the same dead URL at error level forever (OPENHUMAN-TAURI-9X, 20 occurrences in 4 minutes). Resolve the bug instead of muting it: - Track a mutable `ws_url` across the reconnect loop so the resolved target survives backoff iterations. - New `connect_with_redirects` walks up to 3 HTTP 3xx hops per attempt: resolves `Location` against the current URL (handles relative paths), upgrades `http`→`ws` / `https`→`wss`, and retries. Non-redirect failures fall through to the existing backoff path unchanged. - Surface a one-shot warning through `SharedState.error` → `SocketState.error` so the UI can prompt the user to update `BACKEND_URL` to the resolved target. Closes OPENHUMAN-TAURI-9X. Refs tinyhumansai#1472. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds focused unit coverage for the new helpers in `ws_loop.rs` (absolute / relative / unsupported Location handling, scheme upgrade, followable-status filter, one-shot warning) plus a driver-level test that stands up an HTTP 301 server in front of the existing Engine.IO mock and asserts the loop follows the redirect, completes the SIO handshake against the redirect target, and records the user-actionable warning in `SharedState.error`. A separate test exercises the no-Location failure mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a6f5188 to
c1724c1
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/socket/ws_loop_tests.rs`:
- Around line 625-629: The test currently evaluates matches!(err,
WsError::Http(_)) as a no-op; change it to assert the variant explicitly by
asserting that err matches WsError::Http(_) (e.g., use assert!(matches!(err,
WsError::Http(_))) and include err in the assertion message for debugging).
Locate the call site around connect_with_redirects and the err binding in
ws_loop_tests.rs and replace the no-op matches! expression with an assert that
verifies the error is WsError::Http(_) (and prints err on failure).
In `@src/openhuman/socket/ws_loop.rs`:
- Around line 488-495: Only emit the stale-BACKEND_URL warning and call
record_redirect_warning when the redirect is permanent (301 or 308). Modify the
redirect-handling block around ws_url and response.status() so it checks for
permanent redirects (e.g., response.status() == StatusCode::MOVED_PERMANENTLY ||
response.status() == StatusCode::PERMANENT_REDIRECT or compare as_u16() ==
301/308) and only then log the "Server redirected ... instruct users to update
config" warning and call record_redirect_warning(shared, &original, &next_url);
for non-permanent statuses (302/307) still follow the redirect and update
*ws_url = next_url but do not emit the stale-BACKEND_URL warning.
- Around line 133-136: The code currently flattens a WsError to a string when
connect_with_redirects fails; instead return a structured failure so callers can
inspect retryability/severity. Modify the match to return a ConnectionOutcome
variant that carries the original WsError or a new ConnectionFailure struct
(e.g., change ConnectionOutcome::Failed(String) to
ConnectionOutcome::Failed(ConnectionFailure) or
ConnectionOutcome::Failed(WsError)), and add a helper
classify_connection_failure(err: WsError) -> ConnectionFailure that maps
WsError::{Http, Io, Tls, Url, Protocol, ...} into typed metadata (retryable,
severity, code). Update call sites that construct or pattern-match
ConnectionOutcome accordingly and use classify_connection_failure(connect_error)
rather than formatting the error to a string; keep connect_with_redirects and
WsError types intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5a57fb8c-526a-481d-9b80-b06d7c2e816e
📒 Files selected for processing (4)
src/openhuman/socket/event_handlers.rssrc/openhuman/socket/manager.rssrc/openhuman/socket/ws_loop.rssrc/openhuman/socket/ws_loop_tests.rs
✅ Files skipped from review due to trivial changes (1)
- src/openhuman/socket/event_handlers.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/socket/manager.rs
| let ws_stream = match connect_with_redirects(ws_url, shared).await { | ||
| Ok(stream) => stream, | ||
| Err(e) => return ConnectionOutcome::Failed(format!("WebSocket connect: {e}")), | ||
| }; |
There was a problem hiding this comment.
Preserve structured connect failure metadata instead of flattening to string.
At Line 135, WsError is converted to a plain string immediately, which blocks downstream retryable/non-retryable severity handling from using concrete error shape.
💡 Suggested direction
- let ws_stream = match connect_with_redirects(ws_url, shared).await {
+ let ws_stream = match connect_with_redirects(ws_url, shared).await {
Ok(stream) => stream,
- Err(e) => return ConnectionOutcome::Failed(format!("WebSocket connect: {e}")),
+ Err(e) => return ConnectionOutcome::Failed(classify_connection_failure(e)),
};// Example: keep typed metadata for log-level/routing decisions
fn classify_connection_failure(err: WsError) -> ConnectionFailure {
// map WsError::{Http,Io,Tls,Url,Protocol,...} -> typed failure + retryability
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/socket/ws_loop.rs` around lines 133 - 136, The code currently
flattens a WsError to a string when connect_with_redirects fails; instead return
a structured failure so callers can inspect retryability/severity. Modify the
match to return a ConnectionOutcome variant that carries the original WsError or
a new ConnectionFailure struct (e.g., change ConnectionOutcome::Failed(String)
to ConnectionOutcome::Failed(ConnectionFailure) or
ConnectionOutcome::Failed(WsError)), and add a helper
classify_connection_failure(err: WsError) -> ConnectionFailure that maps
WsError::{Http, Io, Tls, Url, Protocol, ...} into typed metadata (retryable,
severity, code). Update call sites that construct or pattern-match
ConnectionOutcome accordingly and use classify_connection_failure(connect_error)
rather than formatting the error to a string; keep connect_with_redirects and
WsError types intact.
…I-9X, -7N) Per CodeRabbit minor review on PR tinyhumansai#1547: the bare `matches!()` expression at ws_loop_tests.rs:628 evaluated to a discarded bool — the test would silently pass if the redirect-without-Location path ever started returning a different error variant. Wrap in `assert!(...)` so the expected `WsError::Http(_)` branch is enforced. `cargo test --lib openhuman::socket::ws_loop` — 27/27 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
graycyrus
left a comment
There was a problem hiding this comment.
PR Review — fix(socket): follow HTTP 3xx during WebSocket handshake
Overall: Well-targeted fix for a real Sentry regression. The redirect-following approach is correct, the test coverage is solid (9 new tests + E2E), and the SharedState.error lifecycle is properly wired. A few items below worth addressing before merge — see inline comments.
Verified / looks good
SharedState.errorbehindparking_lot::RwLock— no TOCTOU, correct concurrent accessresolve_redirect_targetusesurl::Url::joinfor RFC-compliant relative Location handlingis_redirect_statuscorrectly limits to 301/302/307/308extract_location_headergracefully drops non-UTF-8 valueserrorfield cleared on bothconnect()anddisconnect()— no stale errors- All new tests are deterministic (local TCP listeners, no real network)
MAX_REDIRECT_HOPS = 3is sensible for the documented Cloudflare single-hop case
| next_url | ||
| ); | ||
| record_redirect_warning(shared, &original, &next_url); | ||
| *ws_url = next_url; |
There was a problem hiding this comment.
[major] 302/307 redirects should not pin the resolved URL.
*ws_url = next_url runs for ALL redirect types (301/302/307/308), but RFC semantics say 302/307 are temporary — the client should re-resolve each attempt. Only 301/308 should mutate ws_url in place.
This also interacts with the CodeRabbit comment about only emitting the stale-BACKEND_URL warning for permanent redirects — both the URL pinning and the warning should be gated on permanent status codes.
// suggestion: only pin permanent redirects
let is_permanent = matches!(
response.status(),
StatusCode::MOVED_PERMANENTLY | StatusCode::PERMANENT_REDIRECT
);
if is_permanent {
*ws_url = next_url.clone();
}
record_redirect_warning(shared, &original, &next_url);
// For 302/307, use next_url for this hop but do not mutate ws_url| break; | ||
| } | ||
| tokio::time::sleep(tokio::time::Duration::from_millis(50)).await; | ||
| } |
There was a problem hiding this comment.
[major] Sleep-loop polling can produce false failures under load.
If the mock server takes longer than 2.5s (50 × 50ms), the loop exits without hitting Connected and the assert fires a confusing failure. Better to assert inside the loop:
let deadline = tokio::time::Instant::now() + tokio::time::Duration::from_secs(5);
loop {
if *shared.status.read() == ConnectionStatus::Connected {
break;
}
assert!(
tokio::time::Instant::now() < deadline,
"timed out waiting for ConnectionStatus::Connected after redirect follow"
);
tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
}| "[socket] Exceeded {MAX_REDIRECT_HOPS} redirect hops starting from {original}; giving up" | ||
| ); | ||
| return Err(WsError::Http(response)); | ||
| } |
There was a problem hiding this comment.
[minor] unreachable!() correctness is non-obvious to readers.
The reasoning is correct (the hop == MAX_REDIRECT_HOPS guard returns before the loop exits), but a brief inline proof would help future editors:
// The for-loop covers 0..=MAX_REDIRECT_HOPS (inclusive). On the final
// iteration (hop == MAX_REDIRECT_HOPS) the redirect branch returns Err
// above. Non-redirect errors and Ok(_) also return immediately.
unreachable!("connect_with_redirects exited loop without returning")| shared: &Arc<SharedState>, | ||
| ) -> Result<WsStream, WsError> { | ||
| let original = ws_url.clone(); | ||
| for hop in 0..=MAX_REDIRECT_HOPS { |
There was a problem hiding this comment.
[minor] No debug log on successful redirect-follow path.
Every error/warn path logs, but when connect_async succeeds after following hops, there is no confirmation that we connected to the resolved URL. Makes "why did the socket connect to a different host than configured" hard to diagnose.
Ok((stream, _response)) => {
if *ws_url != original {
log::debug!(
"[socket] Connected to redirect target {} (original: {})",
ws_url, original
);
}
return Ok(stream);
}…cts (OPENHUMAN-TAURI-9X) Per CodeRabbit major review on PR tinyhumansai#1547: the previous code called record_redirect_warning unconditionally for every followed redirect (301/302/307/308). The warning instructs users to update their BACKEND_URL config, which is correct for 301 / 308 (Permanent / Permanent Redirect) but misleading for 302 / 307 (Found / Temporary Redirect) — those say "this time, route elsewhere" and don't imply the configured URL is stale. Gate `record_redirect_warning` on the permanent statuses only. The inline `log::warn!` that records the redirect itself still fires for all four statuses so the operator can still see what happened. `cargo test --lib openhuman::socket` — 70/70 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
tokio_tungstenite::connect_asyncdoesn't follow HTTP redirects, so a stalehttp://BACKEND_URLbehind a Cloudflare-style edge that 301s tohttps://used to be permanently fatal — the reconnect loop hammered the same dead URL at error level forever (20 occurrences in 4 minutes against a single user).Location, upgrades the scheme (http→ws,https→wss), and reconnects, pinning the resolved URL across reconnects so subsequent backoff iterations skip the redirect round-trip.SocketState.errorso the UI/observer can flag thatBACKEND_URLis stale — keeps the actionable signal without the recurring error noise.log::error!+ exponential backoff unchanged — nothing is silenced.Problem
OPENHUMAN-TAURI-9X(WebSocket connect: HTTP error: 301 Moved Permanently, 20 events in 4 minutes) was emitted fromsrc/openhuman/socket/ws_loop.rs:62every retry attempt.api.tinyhumans.aidirectly:curl -I http://api.tinyhumans.ai/socket.io/...returnsHTTP/1.1 301 Moved Permanently → Location: https://api.tinyhumans.ai:443/socket.io/?EIO=4&transport=websocket. Any client whoseBACKEND_URLishttp://(or whose desktop config drifted there) is gated on this 301 forever — Cloudflare advertises an HTTPS-only socket endpoint.tokio-tungstenite0.24'sconnect_asyncreturnsWsError::Httpfor any non-101 upgrade response; there is no built-in redirect-follower. The loop's only knob is exponential backoff, which never recovers from a permanent redirect.IO error: Operation timed out, 1 event) is a true transient — the existing backoff path handles it. No demotion needed; it'll fall back toerror!only when retries exhaust, which is correct.Solution
ws_loopnow holds a mutablews_url: Stringthat survives backoff iterations. After a successful redirect-follow on attempt N, attempt N+1 connects directly to the resolved URL with no extra round-trip.connect_with_redirectshelper inws_loop.rs:MAX_REDIRECT_HOPS = 3per attempt.Locationagainst the current URL viaurl::Url::joinso relative paths (RFC 7230) work the same as absolute ones.http→ws,https→wss; existingws/wsspass through; anything else surfaces an error).WsStreamon success or the underlyingWsErroron failure — non-redirect errors fall through unchanged.SharedStategrows anerror: RwLock<Option<String>>field exposed viaSocketState.error(was previously hard-coded toNone). Recorded once per connect attempt the first time we follow a redirect, cleared onconnect()entry and ondisconnect(), persisted across successful handshakes so the user still sees the "your config is stale" hint after the loop recovers.main.Submission Checklist
#[test]s covering scheme upgrade, relative Location, unsupported scheme, status-filter, header extraction, one-shot warning recording, full end-to-end 301-then-handshake via a local mock, and the missing-Location failure mode.diff-cover) meet the gate enforced by.github/workflows/coverage.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge.## Related— N/A: no feature matrix ID applies.docs/RELEASE-MANUAL-SMOKE.md) — N/A: observability-only core socket change, no manual smoke flow.Closes #NNNin the## Relatedsection — Sentry issue ID below; meta-tracker Track and fix active Sentry issues #1472 remains open for the broader sweep.Impact
BACKEND_URLis configured (or has drifted to)http://now connect successfully on the first attempt instead of looping at error level.SocketState.errorwill carry a "Backend redirected … — update BACKEND_URL" warning until nextconnect()/disconnect(), so observers (UI / Sentry breadcrumbs / state subscribers) can surface the stale-config signal.error!and feed the existing backoff — nothing is silenced.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/ws-transient-noisec1724c18712c19a08496cd055efb011299684dfaValidation Run
pnpm --filter openhuman-app format:check— N/A: no frontend changes.pnpm typecheck— N/A: no frontend changes.cargo test --lib openhuman::socket— 70 tests passed (was 60 onmain; +10 covering new redirect behavior + 1 fixture-error test onSocketManager).cargo fmt --checkclean on changed files;cargo check --libclean (only pre-existing unrelated warnings).Validation Blocked
command:cargo clippy --workspace -- -D warningserror:80 pre-existing clippy errors onmainacross unrelated modules (rand deprecations, type_complexity, derivable_impls, manual_is_multiple_of, redundant_guards, etc.). Filtering clippy output to the changed socket files surfaces zero issues.impact:Socket files compile clean under clippy; full-workspace-D warningsgate is pre-existing breakage.Behavior Changes
SocketState.error.BACKEND_URL(e.g.http://instead ofhttps://) now connect successfully on first try instead of looping at error level forever;SocketState.errorreports "Backend redirected … — update BACKEND_URL" until next disconnect.Parity Contract
error!-level logging for actionable failures (URL/protocol/4xx/TLS).MAX_REDIRECT_HOPScaps loop length; non-redirect errors fall through to existing code paths; missing-Location case is rejected and surfaces the underlying HTTP error to the reconnect loop.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Tests