fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover by mvanhorn · Pull Request #287 · AprilNEA/OpenLogi

mvanhorn · 2026-06-18T08:24:30Z

Summary

The one-shot enumerate() free fn (used by the CLI list / diag paths) retries through a transient probe miss, but its early-exit never tripped for a Unifying receiver, so a late-arriving device was silently dropped and the CLI returned a short device set.

The retry loop stopped on all_healthy, the AND of each probed node's probe.healthy. The two receiver kinds define healthy differently: a Bolt receiver sets healthy = paired.len() == pairing_count (a missing device makes it false and the retry fires), while a Unifying receiver sets healthy = pairing_count.is_some() because offline Unifying devices aren't enumerable yet. A Unifying device whose arrival event lands just after the ~1.5s drain window is dropped, yet pairing_count.is_some() stays true, so all_healthy stays true and the retry never fires. The continuous GUI/agent watcher is unaffected because it re-enumerates every ~2s.

Fixes #277.

Why this matters

openlogi list and openlogi diag are the CLI's primary read paths, and a one-shot caller builds a fresh Enumerator with an empty ledger, so it has no prior snapshot to replay a transient miss against. The retry loop was added precisely to recover from that, but for a Unifying receiver it was a no-op: a device whose arrival event lands after the drain window is dropped while pairing_count.is_some() keeps all_healthy true, so the loop returns a short device set on the first pass. Users on Unifying hardware intermittently see fewer devices than are actually paired, with no recovery, while Bolt users (whose healthy rides on the count) are unaffected.

Approach

Make the one-shot early-exit transport-agnostic by separating the retry signal from per-node ledger health:

Add a complete flag to NodeProbe, reported alongside the existing healthy. For Unifying, complete requires paired.len() == pairing_count, so a partial drain now reports complete = false and the retry fires. healthy (which drives the ledger's per-node replay) is unchanged, so the watcher and the #218 replay behavior are preserved.
The retry loop stops on all_complete, on an unchanged inventory between two consecutive healthy passes (the expected stable Unifying offline-drain case, so a chronically-short set stabilizes instead of burning every attempt), or on the attempt cap. The unchanged-inventory shortcut is gated on all_healthy and the previous snapshot is only retained from a healthy pass, so a failed/timed-out/open-failed probe keeps using the full retry budget to reopen its channel and recover.
Add PartialEq / Eq derives to the inventory device structs (BatteryInfo, ReceiverInfo, DeviceModelInfo, DeviceTransports, PairedDevice, DeviceInventory) so successive attempts can be diffed. These are additive derives only; field order is untouched, so the bincode wire format is unchanged (openlogi-agent-core/tests/wire_format.rs still passes).

As a side benefit, this also stops a chronically-unhealthy node (e.g. an asleep BT-direct device that never answers) from forcing every one-shot call to burn all attempts.

Tests

Added unit tests in openlogi-core (structural equality across nested device fields) and openlogi-hid:

Happy path: a complete inventory on attempt 1 returns after a single pass.
Healthy partial drain stabilizes: the loop keeps retrying while the set grows, then stops once it stabilizes.
Stabilization stop: a stable healthy-but-short set stops on "unchanged" rather than burning every attempt.
Failed probe keeps retrying: an unchanged replay from an unhealthy pass does not stop early (regression guard).
Bounded retries: a set that keeps changing still exits at the attempt cap.

cargo fmt, cargo clippy, and cargo test pass across the workspace.

The one-shot `enumerate()` free fn (CLI `list`/`diag`) retried only while a probed node reported `healthy == false`. A Unifying receiver defines `healthy = pairing_count.is_some()`, which stays true even when a device's arrival event lands after the 1.5s drain window and the device is dropped, so the retry never fired and the CLI returned a short device set. Separate the one-shot retry signal from per-node ledger health: add a `complete` flag reported alongside `healthy`. For Unifying, `complete` requires `paired.len() == pairing_count`, so a partial drain now triggers a retry while ledger replay (`healthy`) is unchanged. The retry loop stops on completeness, an unchanged inventory between consecutive attempts (so a chronically-short set stabilizes instead of burning every attempt), or the attempt cap. Add `PartialEq`/`Eq` derives to the inventory device structs so successive attempts can be diffed for the unchanged-stop fallback. Refs AprilNEA#277

greptile-apps · 2026-06-18T08:30:14Z

Greptile Summary

This PR fixes a silent device-drop in the one-shot enumerate() free function used by openlogi list / openlogi diag: a Unifying receiver's pairing_count.is_some() kept all_healthy true even when a late-arriving device missed the drain window, so the retry loop exited on the first pass with a short device set.

Adds a complete flag to NodeProbe (orthogonal to healthy) that requires paired.len() == pairing_count; Unifying now reports complete = false for partial drains, firing the retry where it was previously a no-op. healthy is unchanged, so the watcher path and ledger-replay behavior are unaffected.
The retry loop gains an "unchanged inventory across two consecutive healthy passes" early exit, so a chronically-offline Unifying device stops burning the full attempt budget once the partial set has stabilized.
PartialEq / Eq are added to the six inventory structs needed by the stability comparison; all field types already implement these traits and field order is untouched, so the bincode wire format is unchanged.

Confidence Score: 4/5

Safe to merge; the core fix is correct and the watcher path is untouched.

Core fix is correct and watcher path is untouched; the only concern is that arrival-event ordering for Unifying receivers is not canonicalized before the stability comparison, which can prevent the early exit from firing and cause the CLI to burn the full retry budget unnecessarily.

crates/openlogi-hid/src/inventory.rs — specifically probe_unifying_receiver and the paired Vec ordering relative to the inventory equality check.

Important Files Changed

Filename	Overview
crates/openlogi-core/src/device.rs	Adds PartialEq + Eq derives to BatteryInfo, ReceiverInfo, DeviceModelInfo, DeviceTransports, PairedDevice, and DeviceInventory; all field types already implement these traits, so the derives are safe and wire-format-neutral. New unit tests exercise deep structural equality including nested battery, slot, and WPID changes.
crates/openlogi-hid/src/inventory.rs	Introduces a complete flag on NodeProbe (separate from healthy) and propagates it through enumerate_reporting_completeness; the one-shot retry loop now stops on all_complete, on a stable healthy-but-short pass, or at the attempt cap. Bolt/direct/failed paths correctly alias complete to healthy; Unifying correctly sets complete = paired.len() == pairing_count while keeping healthy = pairing_count.is_some(). The stability check uses Vec PartialEq on paired, which is sensitive to arrival-event ordering for Unifying receivers.

_{Reviews (1): Last reviewed commit: "fix(hid): make one-shot enumerate retry ..." | Re-trigger Greptile}

greptile-apps · 2026-06-18T08:30:18Z

@@ -632,6 +688,7 @@ async fn probe_unifying_receiver(
            paired,
        }),
        healthy,
+        complete,
        outcomes,
    }


Arrival-event ordering makes the stability check non-deterministic for Unifying

drain_device_arrival_unifying pushes events into out in the order they arrive over the HID channel; there is no sort. If a receiver sends events in a different slot order on two consecutive calls — e.g., slot 2 before slot 1 on the retry — the resulting paired vecs have the same devices but a different element order. Vec's PartialEq is element-wise, so previous == current returns false and the "stable healthy-but-short" early exit never fires even though the device set is identical. The loop then burns the full ONESHOT_ATTEMPTS budget instead of stopping after two passes, adding up to ~900 ms of unnecessary latency for every one-shot CLI call made against a Unifying receiver with offline paired devices. Sorting paired by slot in probe_unifying_receiver — or in enumerate_reporting_completeness before storing/comparing inventories — would make the comparison deterministic.

mvanhorn mentioned this pull request Jun 18, 2026

One-shot enumerate() retry is a no-op for Unifying partial arrival-drain misses #277

Open

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover#287

fix: make one-shot enumerate() retry transport-agnostic so Unifying partial drains recover#287
mvanhorn wants to merge 1 commit into
AprilNEA:masterfrom
mvanhorn:fix/277-oneshot-enumerate-transport-agnostic

mvanhorn commented Jun 18, 2026

Uh oh!

greptile-apps Bot commented Jun 18, 2026

Uh oh!

greptile-apps Bot Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mvanhorn commented Jun 18, 2026

Summary

Why this matters

Approach

Tests

Uh oh!

greptile-apps Bot commented Jun 18, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant