Skip to content

fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover#287

Open
mvanhorn wants to merge 1 commit into
AprilNEA:masterfrom
mvanhorn:fix/277-oneshot-enumerate-transport-agnostic
Open

fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover#287
mvanhorn wants to merge 1 commit into
AprilNEA:masterfrom
mvanhorn:fix/277-oneshot-enumerate-transport-agnostic

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

The one-shot enumerate() free fn (used by the CLI list / diag paths) retries through a transient probe miss, but its early-exit never tripped for a Unifying receiver, so a late-arriving device was silently dropped and the CLI returned a short device set.

The retry loop stopped on all_healthy, the AND of each probed node's probe.healthy. The two receiver kinds define healthy differently: a Bolt receiver sets healthy = paired.len() == pairing_count (a missing device makes it false and the retry fires), while a Unifying receiver sets healthy = pairing_count.is_some() because offline Unifying devices aren't enumerable yet. A Unifying device whose arrival event lands just after the ~1.5s drain window is dropped, yet pairing_count.is_some() stays true, so all_healthy stays true and the retry never fires. The continuous GUI/agent watcher is unaffected because it re-enumerates every ~2s.

Fixes #277.

Why this matters

openlogi list and openlogi diag are the CLI's primary read paths, and a one-shot caller builds a fresh Enumerator with an empty ledger, so it has no prior snapshot to replay a transient miss against. The retry loop was added precisely to recover from that, but for a Unifying receiver it was a no-op: a device whose arrival event lands after the drain window is dropped while pairing_count.is_some() keeps all_healthy true, so the loop returns a short device set on the first pass. Users on Unifying hardware intermittently see fewer devices than are actually paired, with no recovery, while Bolt users (whose healthy rides on the count) are unaffected.

Approach

Make the one-shot early-exit transport-agnostic by separating the retry signal from per-node ledger health:

  • Add a complete flag to NodeProbe, reported alongside the existing healthy. For Unifying, complete requires paired.len() == pairing_count, so a partial drain now reports complete = false and the retry fires. healthy (which drives the ledger's per-node replay) is unchanged, so the watcher and the #218 replay behavior are preserved.
  • The retry loop stops on all_complete, on an unchanged inventory between two consecutive healthy passes (the expected stable Unifying offline-drain case, so a chronically-short set stabilizes instead of burning every attempt), or on the attempt cap. The unchanged-inventory shortcut is gated on all_healthy and the previous snapshot is only retained from a healthy pass, so a failed/timed-out/open-failed probe keeps using the full retry budget to reopen its channel and recover.
  • Add PartialEq / Eq derives to the inventory device structs (BatteryInfo, ReceiverInfo, DeviceModelInfo, DeviceTransports, PairedDevice, DeviceInventory) so successive attempts can be diffed. These are additive derives only; field order is untouched, so the bincode wire format is unchanged (openlogi-agent-core/tests/wire_format.rs still passes).

As a side benefit, this also stops a chronically-unhealthy node (e.g. an asleep BT-direct device that never answers) from forcing every one-shot call to burn all attempts.

Tests

Added unit tests in openlogi-core (structural equality across nested device fields) and openlogi-hid:

  • Happy path: a complete inventory on attempt 1 returns after a single pass.
  • Healthy partial drain stabilizes: the loop keeps retrying while the set grows, then stops once it stabilizes.
  • Stabilization stop: a stable healthy-but-short set stops on "unchanged" rather than burning every attempt.
  • Failed probe keeps retrying: an unchanged replay from an unhealthy pass does not stop early (regression guard).
  • Bounded retries: a set that keeps changing still exits at the attempt cap.

cargo fmt, cargo clippy, and cargo test pass across the workspace.

The one-shot `enumerate()` free fn (CLI `list`/`diag`) retried only while
a probed node reported `healthy == false`. A Unifying receiver defines
`healthy = pairing_count.is_some()`, which stays true even when a device's
arrival event lands after the 1.5s drain window and the device is dropped,
so the retry never fired and the CLI returned a short device set.

Separate the one-shot retry signal from per-node ledger health: add a
`complete` flag reported alongside `healthy`. For Unifying, `complete`
requires `paired.len() == pairing_count`, so a partial drain now triggers
a retry while ledger replay (`healthy`) is unchanged. The retry loop stops
on completeness, an unchanged inventory between consecutive attempts (so a
chronically-short set stabilizes instead of burning every attempt), or the
attempt cap.

Add `PartialEq`/`Eq` derives to the inventory device structs so successive
attempts can be diffed for the unchanged-stop fallback.

Refs AprilNEA#277
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a silent device-drop in the one-shot enumerate() free function used by openlogi list / openlogi diag: a Unifying receiver's pairing_count.is_some() kept all_healthy true even when a late-arriving device missed the drain window, so the retry loop exited on the first pass with a short device set.

  • Adds a complete flag to NodeProbe (orthogonal to healthy) that requires paired.len() == pairing_count; Unifying now reports complete = false for partial drains, firing the retry where it was previously a no-op. healthy is unchanged, so the watcher path and ledger-replay behavior are unaffected.
  • The retry loop gains an "unchanged inventory across two consecutive healthy passes" early exit, so a chronically-offline Unifying device stops burning the full attempt budget once the partial set has stabilized.
  • PartialEq / Eq are added to the six inventory structs needed by the stability comparison; all field types already implement these traits and field order is untouched, so the bincode wire format is unchanged.

Confidence Score: 4/5

Safe to merge; the core fix is correct and the watcher path is untouched.

Core fix is correct and watcher path is untouched; the only concern is that arrival-event ordering for Unifying receivers is not canonicalized before the stability comparison, which can prevent the early exit from firing and cause the CLI to burn the full retry budget unnecessarily.

crates/openlogi-hid/src/inventory.rs — specifically probe_unifying_receiver and the paired Vec ordering relative to the inventory equality check.

Important Files Changed

Filename Overview
crates/openlogi-core/src/device.rs Adds PartialEq + Eq derives to BatteryInfo, ReceiverInfo, DeviceModelInfo, DeviceTransports, PairedDevice, and DeviceInventory; all field types already implement these traits, so the derives are safe and wire-format-neutral. New unit tests exercise deep structural equality including nested battery, slot, and WPID changes.
crates/openlogi-hid/src/inventory.rs Introduces a complete flag on NodeProbe (separate from healthy) and propagates it through enumerate_reporting_completeness; the one-shot retry loop now stops on all_complete, on a stable healthy-but-short pass, or at the attempt cap. Bolt/direct/failed paths correctly alias complete to healthy; Unifying correctly sets complete = paired.len() == pairing_count while keeping healthy = pairing_count.is_some(). The stability check uses Vec PartialEq on paired, which is sensitive to arrival-event ordering for Unifying receivers.

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "fix(hid): make one-shot enumerate retry ..." | Re-trigger Greptile

Comment on lines 680 to 693
@@ -632,6 +688,7 @@ async fn probe_unifying_receiver(
paired,
}),
healthy,
complete,
outcomes,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Arrival-event ordering makes the stability check non-deterministic for Unifying

drain_device_arrival_unifying pushes events into out in the order they arrive over the HID channel; there is no sort. If a receiver sends events in a different slot order on two consecutive calls — e.g., slot 2 before slot 1 on the retry — the resulting paired vecs have the same devices but a different element order. Vec's PartialEq is element-wise, so previous == current returns false and the "stable healthy-but-short" early exit never fires even though the device set is identical. The loop then burns the full ONESHOT_ATTEMPTS budget instead of stopping after two passes, adding up to ~900 ms of unnecessary latency for every one-shot CLI call made against a Unifying receiver with offline paired devices. Sorting paired by slot in probe_unifying_receiver — or in enumerate_reporting_completeness before storing/comparing inventories — would make the comparison deterministic.

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

One-shot enumerate() retry is a no-op for Unifying partial arrival-drain misses

1 participant