fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover#287
Conversation
The one-shot `enumerate()` free fn (CLI `list`/`diag`) retried only while a probed node reported `healthy == false`. A Unifying receiver defines `healthy = pairing_count.is_some()`, which stays true even when a device's arrival event lands after the 1.5s drain window and the device is dropped, so the retry never fired and the CLI returned a short device set. Separate the one-shot retry signal from per-node ledger health: add a `complete` flag reported alongside `healthy`. For Unifying, `complete` requires `paired.len() == pairing_count`, so a partial drain now triggers a retry while ledger replay (`healthy`) is unchanged. The retry loop stops on completeness, an unchanged inventory between consecutive attempts (so a chronically-short set stabilizes instead of burning every attempt), or the attempt cap. Add `PartialEq`/`Eq` derives to the inventory device structs so successive attempts can be diffed for the unchanged-stop fallback. Refs AprilNEA#277
Greptile SummaryThis PR fixes a silent device-drop in the one-shot
Confidence Score: 4/5Safe to merge; the core fix is correct and the watcher path is untouched. Core fix is correct and watcher path is untouched; the only concern is that arrival-event ordering for Unifying receivers is not canonicalized before the stability comparison, which can prevent the early exit from firing and cause the CLI to burn the full retry budget unnecessarily. crates/openlogi-hid/src/inventory.rs — specifically probe_unifying_receiver and the paired Vec ordering relative to the inventory equality check. Important Files Changed
Reviews (1): Last reviewed commit: "fix(hid): make one-shot enumerate retry ..." | Re-trigger Greptile |
| @@ -632,6 +688,7 @@ async fn probe_unifying_receiver( | |||
| paired, | |||
| }), | |||
| healthy, | |||
| complete, | |||
| outcomes, | |||
| } | |||
There was a problem hiding this comment.
Arrival-event ordering makes the stability check non-deterministic for Unifying
drain_device_arrival_unifying pushes events into out in the order they arrive over the HID channel; there is no sort. If a receiver sends events in a different slot order on two consecutive calls — e.g., slot 2 before slot 1 on the retry — the resulting paired vecs have the same devices but a different element order. Vec's PartialEq is element-wise, so previous == current returns false and the "stable healthy-but-short" early exit never fires even though the device set is identical. The loop then burns the full ONESHOT_ATTEMPTS budget instead of stopping after two passes, adding up to ~900 ms of unnecessary latency for every one-shot CLI call made against a Unifying receiver with offline paired devices. Sorting paired by slot in probe_unifying_receiver — or in enumerate_reporting_completeness before storing/comparing inventories — would make the comparison deterministic.
Summary
The one-shot
enumerate()free fn (used by the CLIlist/diagpaths) retries through a transient probe miss, but its early-exit never tripped for a Unifying receiver, so a late-arriving device was silently dropped and the CLI returned a short device set.The retry loop stopped on
all_healthy, the AND of each probed node'sprobe.healthy. The two receiver kinds definehealthydifferently: a Bolt receiver setshealthy = paired.len() == pairing_count(a missing device makes it false and the retry fires), while a Unifying receiver setshealthy = pairing_count.is_some()because offline Unifying devices aren't enumerable yet. A Unifying device whose arrival event lands just after the ~1.5s drain window is dropped, yetpairing_count.is_some()stays true, soall_healthystays true and the retry never fires. The continuous GUI/agent watcher is unaffected because it re-enumerates every ~2s.Fixes #277.
Why this matters
openlogi listandopenlogi diagare the CLI's primary read paths, and a one-shot caller builds a freshEnumeratorwith an empty ledger, so it has no prior snapshot to replay a transient miss against. The retry loop was added precisely to recover from that, but for a Unifying receiver it was a no-op: a device whose arrival event lands after the drain window is dropped whilepairing_count.is_some()keepsall_healthytrue, so the loop returns a short device set on the first pass. Users on Unifying hardware intermittently see fewer devices than are actually paired, with no recovery, while Bolt users (whosehealthyrides on the count) are unaffected.Approach
Make the one-shot early-exit transport-agnostic by separating the retry signal from per-node ledger health:
completeflag toNodeProbe, reported alongside the existinghealthy. For Unifying,completerequirespaired.len() == pairing_count, so a partial drain now reportscomplete = falseand the retry fires.healthy(which drives the ledger's per-node replay) is unchanged, so the watcher and the#218replay behavior are preserved.all_complete, on an unchanged inventory between two consecutive healthy passes (the expected stable Unifying offline-drain case, so a chronically-short set stabilizes instead of burning every attempt), or on the attempt cap. The unchanged-inventory shortcut is gated onall_healthyand the previous snapshot is only retained from a healthy pass, so a failed/timed-out/open-failed probe keeps using the full retry budget to reopen its channel and recover.PartialEq/Eqderives to the inventory device structs (BatteryInfo,ReceiverInfo,DeviceModelInfo,DeviceTransports,PairedDevice,DeviceInventory) so successive attempts can be diffed. These are additive derives only; field order is untouched, so the bincode wire format is unchanged (openlogi-agent-core/tests/wire_format.rsstill passes).As a side benefit, this also stops a chronically-unhealthy node (e.g. an asleep BT-direct device that never answers) from forcing every one-shot call to burn all attempts.
Tests
Added unit tests in
openlogi-core(structural equality across nested device fields) andopenlogi-hid:cargo fmt,cargo clippy, andcargo testpass across the workspace.