fix: stabilize minitun registration liveness + v5 query protocol#14
Conversation
… TCP passthrough
Wire protocol v5 adds Intent::Query (5) for agent-side status queries,
eliminating the need to open a tunneled TCP connection just to fetch
Minecraft server list JSON.
Server -> Agent: QueryRequestV5 (kind=6)
session[32] + protocol_version[i32] + server_address + target
Agent -> Server: Intent::Query (AgentHello with query payload)
session[32] + json_len[u16] + json[bytes]
Agent flow:
1. Receive QueryRequestV5 from server via listener channel
2. Open TCP to backend target, send MC Handshake (status) +
StatusRequest
3. Decode StatusResponse JSON
4. Open new connection back to ingress, send Intent::Query hello
with session + JSON
5. Server matches session via pending_query map, returns JSON to
proxy fast-path
Server fallback:
Agents < v5 use the existing tunnel TCP status path unchanged
(ForwardRequest + Connect + TargetAddr round-trip). The
try_tunnel_status_query_fast_path() method returns None for
legacy agents, falling through to open_tunnel_status_connection().
Key invariants:
- No TCP passthrough for status queries on v5 agents
- Ping/pong handled locally at proxy (not forwarded)
- Query session tokens are separate from tunnel session tokens
- HMAC validation on Query intent uses session for replay auth
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughWire protocol bumped to v5 with Intent::Query and query payloads; registry gains pending-query state and APIs; proxy adds a v5+ fast path to request status JSON from compatible agents; agents can perform backend status queries and reply with AgentHello(Intent::Query, json). Tests and binaries adjusted accordingly, nya~. ChangesTunnel Status Query Protocol v5
🎯 4 (Complex) | ⏱️ ~45 minutes Sequence Diagram(s) sequenceDiagram
participant Client
participant Proxy
participant Registry
participant Agent
participant Backend
Client->>Proxy: status request (route uses tunnel)
Proxy->>Registry: request_query_from_agent(session,...)
Registry->>Agent: ServerMsg::QueryRequestV5
Agent->>Backend: Handshake(status) / StatusRequest
Backend-->>Agent: StatusResponseS2c(json)
Agent->>Registry: AgentHello(Intent::Query, json)
Registry-->>Proxy: TunnelStatusResponse(json)
Proxy-->>Client: status JSON
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tun/src/lib.rs (1)
173-193:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject
Intent::Queryon pre-v5 frames.Nuh-uh — right now
version = 4plusintent = Querystill decodes, and the encoder never rejects that combo either. That breaks the v5 compatibility boundary and can make fallback behavior lie. Please gateQuerybehindversion >= VERSIONin both decode and encode paths.Possible fix
if version < V3_VERSION && matches!(intent, Intent::Forward) { return Err(TunnelError::UnsupportedVersion(version)); } if version < V4_VERSION && matches!(intent, Intent::Beacon) { return Err(TunnelError::UnsupportedVersion(version)); } + if version < VERSION && matches!(intent, Intent::Query) { + return Err(TunnelError::UnsupportedVersion(version)); + }Intent::Query => { + if hello.version < VERSION { + return Err(TunnelError::UnsupportedVersion(hello.version)); + } if hello.session.is_some() || hello.forward.is_some() || hello.query.is_none() { return Err(TunnelError::InvalidIntent(hello.intent as u8)); } }Also applies to: 297-324
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tun/src/lib.rs` around lines 173 - 193, The decode path currently allows Intent::Query for pre-v5 frames — add a version gate like the existing checks for Intent::Forward and Intent::Beacon: after Intent::from_u8(buf[5]) check if version < VERSION and matches!(intent, Intent::Query) and return Err(TunnelError::UnsupportedVersion(version)); also apply the same guard in the encode path (the code around the opposite branch ~lines 297-324) so that the encoder rejects writing Intent::Query when version < VERSION; reference Intent::Query, VERSION, and TunnelError::UnsupportedVersion when making the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/proxy/mod.rs`:
- Around line 1240-1250: The Intent::Query branch currently trusts hello.query
and calls self.tunnels.accept_query_response(SessionToken(query.session), ...)
without authenticating hello.hmac; validate the HMAC (using hello.key_id,
hello.timestamp, and hello.hmac) against the query payload/session before
calling accept_query_response. Locate the Intent::Query match arm handling
hello.query and perform the same HMAC/key/timestamp verification routine used
elsewhere (or extract that logic into a helper) to verify the signature for
query.session, and only on successful verification proceed to call
accept_query_response; return an error (anyhow::bail or propagate the
verification error) if validation fails.
- Around line 1077-1085: The fast-path uses tunnel_forward_addr() when calling
tunnels.request_query_from_agent, which sends the proxy socket address instead
of the original requested host/port; change the call so QueryRequestV5 (via
request_query_from_agent) receives the original host string and port from
handshake.server_address and handshake.server_port (or construct the host:port
string the agent expects) instead of tunnel_forward_addr(); update the argument
passed to request_query_from_agent and any code constructing QueryRequestV5 so
it preserves handshake.server_address/handshake.server_port for
hostname-routed/status-personalized backends.
In `@tun/src/bin/tun.rs`:
- Around line 47-49: The before_send_log closure passed to SentryLogger filters
only LogLevel::Fatal which never occurs, silencing all logs; change the filter
in before_send_log (the Arc::new(|log| ... ) closure) to allow Error (and
optionally Warn) levels instead of Fatal so Sentry receives real errors—e.g.,
match on LogLevel::Error (or LogLevel::Error | LogLevel::Warn) and return
Some(log) for those cases, None otherwise.
- Around line 24-26: Replace the eprintln call that prints the error from
tun::client::run_cli() with a call to sentry::capture_error(&err) so Sentry
receives the full anyhow::Error chain and backtrace; specifically, inside the
Err(err) branch where you currently call eprintln!("{err}") and then
drop(sentry), call sentry::capture_error(&err) (then drop the sentry guard as
before) to preserve full error context in reporting.
In `@tun/src/client/runtime.rs`:
- Around line 325-413: The handle_query task can hang indefinitely; wrap the
entire backend interaction and subsequent agent send in a timeout (e.g., using
tokio::time::timeout) so the function fails fast on slow backends. Concretely,
put the async sequence that includes net::sock::LureConnection::connect(...),
the handshake/StatusRequest encode/write, the read loop that calls
target_conn.read_chunk(...), and the agent send flow (crate::connect_agent(...)
and send_agent_hello(...)) inside a timeout future and return an error if it
elapses; prefer a configurable duration from TunConfig (add a field like
query_timeout if needed) and ensure you propagate a clear anyhow::Error on
timeout from handle_query.
In `@tun/src/lib.rs`:
- Around line 352-358: In the Intent::Query handling (the block that reads
hello.query and calls u16::try_from on query.json.len()), remove the
expect/panic paths and instead validate the JSON length against u16::MAX (or use
u16::try_from and map the Err to a protocol error), returning an appropriate
protocol error Result rather than panicking; do the same for the other similar
block around the 401-407 range so oversized query.json payloads produce a clean
protocol error response instead of crashing the task.
---
Outside diff comments:
In `@tun/src/lib.rs`:
- Around line 173-193: The decode path currently allows Intent::Query for pre-v5
frames — add a version gate like the existing checks for Intent::Forward and
Intent::Beacon: after Intent::from_u8(buf[5]) check if version < VERSION and
matches!(intent, Intent::Query) and return
Err(TunnelError::UnsupportedVersion(version)); also apply the same guard in the
encode path (the code around the opposite branch ~lines 297-324) so that the
encoder rejects writing Intent::Query when version < VERSION; reference
Intent::Query, VERSION, and TunnelError::UnsupportedVersion when making the
checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 6f42ba0d-c2d1-4cfb-9727-c50ebc7edf14
📒 Files selected for processing (9)
src/bin/instance_tcp_test.rssrc/proxy/mod.rssrc/tunnel/mod.rssrc/tunnel/registry.rstests/tunnel_integration.rstun/src/bin/tun.rstun/src/client/commands.rstun/src/client/runtime.rstun/src/lib.rs
| pub async fn request_query_from_agent( | ||
| &self, | ||
| key_id: TokenKeyId, | ||
| session: SessionToken, | ||
| protocol_version: i32, | ||
| server_address: SocketAddr, | ||
| target: SocketAddr, | ||
| ) -> anyhow::Result<oneshot::Receiver<TunnelStatusResponse>> { | ||
| { | ||
| let tokens = self.tokens.read().await; | ||
| if !tokens.contains_key(&key_id) { | ||
| anyhow::bail!("tunnel token not registered for key_id"); | ||
| } | ||
| } | ||
|
|
||
| let agent_tx = { self.agents.read().await.get(&key_id).map(|a| a.tx.clone()) }; | ||
| let Some(agent_tx) = agent_tx else { | ||
| anyhow::bail!("no active tunnel agent registered for key_id"); | ||
| }; | ||
|
|
||
| let (tx, rx) = oneshot::channel(); | ||
| self.pending_query.write().await.insert(session, tx); | ||
|
|
||
| if agent_tx | ||
| .send(TunnelCommand::QueryRequest { | ||
| session, | ||
| protocol_version, | ||
| server_address, | ||
| target, | ||
| }) | ||
| .await | ||
| .is_ok() | ||
| { | ||
| Ok(rx) | ||
| } else { | ||
| self.pending_query.write().await.remove(&session); | ||
| anyhow::bail!("failed to notify tunnel agent for query") | ||
| } | ||
| } |
There was a problem hiding this comment.
Timed-out queries never leave pending_query.
These entries are only removed on a successful accept_query_response(). If the proxy times out first or the agent-side task dies, the sender just sits there forever — yep, that's an unbounded leak on repeated query failures. Please add a cancel/expiry path for pending_query and use it on timeout/drop.
Also applies to: 469-481
| if let Err(err) = tun::client::run_cli() { | ||
| capture_sentry_error("minitun_fatal", "tunnel_client", &err); | ||
| error!("{err}"); | ||
| eprintln!("{err}"); | ||
| drop(sentry); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For sentry crate 0.48.0, what is the recommended API to report a CLI runtime error object (std::error::Error/anyhow::Error) before process exit?
💡 Result:
For the sentry crate version 0.48.0, the recommended API to report a CLI runtime error object implementing std::error::Error (including anyhow::Error) before process exit is to use sentry::capture_error(&err) for std::error::Error or sentry_anyhow::capture_anyhow(&err) for anyhow::Error [1][2][3][4]. This captures the error as a Sentry event, including the full error chain [3]. To ensure the event is sent before process exit (critical for short-lived CLI processes), initialize Sentry with let _guard = sentry::init((dsn, sentry::ClientOptions { .. })); at the start of main. The guard's Drop implementation flushes pending events with a default 2-second timeout on drop [5][6][7]. Keep the guard in scope until after error reporting and just before std::process::exit (or let it drop naturally at end of main), allowing time for transmission. Example for std::error::Error: fn main() { let _guard = sentry::init("your_dsn"); // CLI logic if let Err(err) = run_cli() { eprintln!("Error: {}", err); sentry::capture_error(&err); // Optional: explicit flush if needed if let Some(client) = sentry::Hub::current().client() { client.close(Some(std::time::Duration::from_secs(2))); } std::process::exit(1); } } For anyhow::Error, enable the "anyhow" feature and use capture_anyhow [8][4]. The sentry-anyhow crate is at v0.48.0/0.48.1, compatible with sentry 0.48.0 [8]. Explicit close provides more control but disables the client afterward [7]. The guard's automatic flush is usually sufficient for CLIs [5].
Citations:
- 1: https://docs.sentry.io/platforms/rust/usage/
- 2: https://docs.sentry.io/platforms/rust/usage
- 3: https://docs.rs/sentry/latest/sentry/fn.capture_error.html
- 4: https://docs.rs/sentry-anyhow/latest/sentry_anyhow/
- 5: https://docs.rs/crate/sentry/0.46.0
- 6: https://crates.io/crates/sentry/0.47.0
- 7: https://docs.sentry.io/platforms/rust/configuration/draining/
- 8: https://crates.io/crates/sentry-anyhow
🏁 Script executed:
# First, find the return type of tun::client::run_cli()
fd -t f "\.rs$" tun/src | head -20Repository: illnet/mach
Length of output: 211
🏁 Script executed:
# Check Cargo.toml for sentry dependencies
find . -name "Cargo.toml" -type f | xargs grep -l "sentry"Repository: illnet/mach
Length of output: 85
🏁 Script executed:
# Check the tun.rs main function and surrounding context
cat -n tun/src/bin/tun.rs | head -50Repository: illnet/mach
Length of output: 2013
🏁 Script executed:
# Find the run_cli function signature
rg -A 5 "fn run_cli" tun/src/Repository: illnet/mach
Length of output: 365
🏁 Script executed:
# Check tun/Cargo.toml for dependencies
cat tun/Cargo.tomlRepository: illnet/mach
Length of output: 867
🏁 Script executed:
# Check if sentry::capture_error is used elsewhere in codebase
rg "capture_error|capture_message" tun/src/ --type rustRepository: illnet/mach
Length of output: 37
🏁 Script executed:
# Check sentry::Error or anyhow in the code
rg "sentry::" tun/src/ --type rust -A 2 | head -40Repository: illnet/mach
Length of output: 1563
Use sentry::capture_error(&err) instead—it preserves the full error chain.
Yeah, you caught it. Line 25 just yeets the error to stderr without telling Sentry, which is a visibility regression in production. The suggested capture_message approach works, but it's throwing away the error chain by converting to a string. Since anyhow::Error implements std::error::Error, you can call sentry::capture_error(&err) directly with the current sentry 0.48.0 setup to capture the whole backtrace and context—no additional dependencies needed.
Better fix
if let Err(err) = tun::client::run_cli() {
+ sentry::capture_error(&err);
eprintln!("{err}");
drop(sentry);
std::process::exit(1);
}The guard is already dropped after, so flushing timing is fine.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if let Err(err) = tun::client::run_cli() { | |
| capture_sentry_error("minitun_fatal", "tunnel_client", &err); | |
| error!("{err}"); | |
| eprintln!("{err}"); | |
| drop(sentry); | |
| if let Err(err) = tun::client::run_cli() { | |
| sentry::capture_error(&err); | |
| eprintln!("{err}"); | |
| drop(sentry); | |
| std::process::exit(1); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tun/src/bin/tun.rs` around lines 24 - 26, Replace the eprintln call that
prints the error from tun::client::run_cli() with a call to
sentry::capture_error(&err) so Sentry receives the full anyhow::Error chain and
backtrace; specifically, inside the Err(err) branch where you currently call
eprintln!("{err}") and then drop(sentry), call sentry::capture_error(&err) (then
drop the sentry guard as before) to preserve full error context in reporting.
| Intent::Query => { | ||
| let query = hello.query.as_ref().expect("validated above"); | ||
| out.extend_from_slice(&query.session); | ||
| let json = query.json.as_bytes(); | ||
| let len = u16::try_from(json.len()).expect("query json too long"); | ||
| out.extend_from_slice(&len.to_be_bytes()); | ||
| out.extend_from_slice(json); |
There was a problem hiding this comment.
Don't panic on oversized query JSON.
This JSON comes from an external MC status response, so it isn't something you get to trust, got it? If it exceeds u16::MAX, these expects panic the task instead of failing the request cleanly. Return a protocol error here so one huge status payload doesn't turn into a runtime crash.
Also applies to: 401-407
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tun/src/lib.rs` around lines 352 - 358, In the Intent::Query handling (the
block that reads hello.query and calls u16::try_from on query.json.len()),
remove the expect/panic paths and instead validate the JSON length against
u16::MAX (or use u16::try_from and map the Err to a protocol error), returning
an appropriate protocol error Result rather than panicking; do the same for the
other similar block around the 401-407 range so oversized query.json payloads
produce a clean protocol error response instead of crashing the task.
- HMAC validation on Intent::Query (accept_query_response) - fast-path uses original handshake server_address, not proxy addr - Sentry log filter: Error|Warning instead of Fatal - sentry::capture_error in error path - handle_query timeout (10s) - TunnelError::PayloadTooLarge variant, no panics on json.len() - version gate for Intent::Query in encode/decode - is_code_error filter: skip TunnelError/io::Error in Sentry
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tun/src/lib.rs`:
- Around line 119-124: QueryResponseV5Msg's json field is not currently covered
by the Intent::Query MAC, so include the response payload bytes in the
authenticated data: modify the signing path that builds the Intent::Query MAC to
append the utf-8 bytes (or canonical serialized bytes) of
QueryResponseV5Msg.json to the MAC input and update the verification in
accept_query_response() to verify the MAC over the same ordered fields plus
response.json; ensure you use a canonical serialization (exact byte order) for
json before MACing and update any constructors, serializers, and tests that
build or verify QueryResponseV5Msg/Intent::Query to use the new MAC input.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 30f267f3-8c1e-447c-9407-df415549d890
📒 Files selected for processing (5)
src/proxy/mod.rssrc/tunnel/registry.rstun/src/bin/tun.rstun/src/client/runtime.rstun/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- tun/src/client/runtime.rs
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| /// Version 5 query response carrying backend status JSON. | ||
| pub struct QueryResponseV5Msg { | ||
| pub session: [u8; 32], | ||
| pub json: String, | ||
| } |
There was a problem hiding this comment.
Authenticate the query payload too.
This v5 response adds mutable json, but the authenticated fields still stop at the session/token metadata. So the server can verify who answered, y’know, but not whether query.json was tampered with on the wire. Right now a MITM or stale relay can swap the status body and still pass accept_query_response() because the HMAC never covers the JSON bytes. Please bind the response payload into the Intent::Query MAC (or move this path onto an authenticated transport) before shipping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tun/src/lib.rs` around lines 119 - 124, QueryResponseV5Msg's json field is
not currently covered by the Intent::Query MAC, so include the response payload
bytes in the authenticated data: modify the signing path that builds the
Intent::Query MAC to append the utf-8 bytes (or canonical serialized bytes) of
QueryResponseV5Msg.json to the MAC input and update the verification in
accept_query_response() to verify the MAC over the same ordered fields plus
response.json; ensure you use a canonical serialization (exact byte order) for
json before MACing and update any constructors, serializers, and tests that
build or verify QueryResponseV5Msg/Intent::Query to use the new MAC input.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
record_beacon()now correctly rejects beacons for unregisteredkey_idinstead of silently returningOk(())TunnelCommandqueue with safe drain + final affirmation on unregisterAgentRecord.taskfield and direct keepalive write path (now routed through the command queue)QueryRequestV5to agents, agent performs local MC status handshake+query against backend, returns JSON directly without opening a tunneled TCP streamProtocol changes (v5 breaking)
QueryRequestV5(kind=6)session[32] + protocol_version[i32] + server_address + targetIntent::Query(5)session[32] + json_len[u16] + json[bytes]Agents < v5 fall back to the existing tunnel TCP status path unchanged.
Files changed
src/tunnel/registry.rs— beacon validation, agent version getter, query request/response APIsrc/tunnel/mod.rs—TunnelCommand::QueryRequest,pending_querymap,TunnelStatusResponsesrc/proxy/mod.rs—try_tunnel_status_query_fast_path(),open_tunnel_status_connection()rebuild,Intent::Queryingress handlertun/src/lib.rs— v5 types, version bump, encode/decodetun/src/client/runtime.rs—handle_query(),QueryRequestV5dispatch, beacon reconnectiontun/src/client/commands.rs—Intent::Querymatch armTesting
cargo test --lib: 39 passedcargo test --lib -p tun: 54 passedcargo build: cleanSummary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests