Skip to content

agent+transport: 5.6× flash throughput over rack pod via baud switching#89

Merged
widgetii merged 1 commit into
masterfrom
agent-baud-over-rack
May 11, 2026
Merged

agent+transport: 5.6× flash throughput over rack pod via baud switching#89
widgetii merged 1 commit into
masterfrom
agent-baud-over-rack

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

Three coordinated changes that make the flash agent's high-speed-UART mode (DEFAULT_FAST_BAUD = 921600) actually work over rack-pod WiFi-bridged links. Previously the host-side baud switch path (port.baudrate = baud — pyserial) silently failed on SocketTransport, so defib agent {read,write,scan} over tcp://<pod>:9000 fell back to FALLBACK_BAUD = 115200 and ran at ~10 KB/s.

Live result on the prototype

256 KB sustained flash read at 0x14000000 through the agent over rack://10.216.128.69:

Path Rate Speedup
115200 (fallback) 11.1 KB/s 1.0×
921600 (rack baud switch) 61.9 KB/s 5.57×

~70 % of the theoretical 8× ceiling — the rest is COBS + windowed-ACK protocol overhead, which is the same as the on-serial path.

What changed

1. Transport.set_baudrate(baud) abstraction

New method on defib.transport.base.Transport. Default raises NotImplementedError. Overrides:

  • SerialTransport — sets self._port.baudrate (was inlined in FlashAgentClient.set_baud).
  • Rfc2217Transport — already had set_baudrate from PR Add OpenIPC Vectis support (RFC 2217 transport) #64 (Vectis); just exposed through the ABC.
  • New RackTransport(SocketTransport) — captures the pod's HTTP base URL at construction; set_baudrate POSTs {"rate": baud} to /uart/baud. New rack://host[:bridge_port][?api=http_port] URL scheme in serial_platform.create_transport (defaults 9000 / 8080).

FlashAgentClient.set_baud now await transport.set_baudrate(baud) — works across all four transport flavours; cleanly returns False when the transport refuses (was: raw AttributeError).

2. Agent: stop auto-reverting on the post-switch verification window

handle_set_baud used to switch UART, then proto_recv(timeout=3000) for a verification packet from the host, reverting to 115200 if nothing arrived. The "3000 ms" budget is a CPU-speed-dependent busy-waitfor (volatile int d=25; d>0; d--) {} × timeout_ms*100 iterations — and on a fast Cortex-A7 the actual window collapses to ~300 ms.

Over a rack pod the host's POST /uart/baud itself takes ~1 s (WiFi RTT + httpd dispatch), so the agent reverted to 115200 long before any verification packet could land. Result: agent at 115200, bridge at 921600, host reading 35 bytes of misclocked 0x80 0x00 … garbage forever.

Fix: drop the verification window. The agent stays at whatever baud the last CMD_SET_BAUD selected. If the new rate doesn't work the agent is unreachable until the next power-cycle / fastboot — both of which the rack pod and RouterOS trivially provide.

(This also matches the local-UART experience: defib has been using the same set_baud against MikroTik+pyserial-attached cameras successfully because pyserial's port.baudrate= is microsecond-fast, easily landing within the agent's collapsed ~300 ms window. The bug only surfaces when the host-side switch is on the wrong side of a high-RTT control plane.)

3. Pod firmware (rack repo, local-only — uart-bridge-flush-rx-on-accept branch)

Defensive UART hygiene around /uart/baud: drain the TX FIFO at the old rate before uart_set_baudrate, and read back the actual divisor via uart_get_baudrate. Belt + braces — even with the agent fix, leaving in-flight bytes from the old rate gets clocked out at the new rate and corrupts the agent's RX window.

Tests

7 new tests/test_transport_rack.py:

  • set_baudrate POSTs correct URL + body
  • HTTP / URL errors surface as TransportError
  • rack:// URL parsing with default + custom + ?api= query
  • Reject missing host

Suite: 468 passed / 2 skipped; ruff + mypy clean.

Test plan

  • uv run pytest tests/ -x -v --ignore=tests/fuzz
  • uv run ruff check src/defib/ tests/
  • uv run mypy src/defib/ --ignore-missing-imports
  • Regression: agent baud switch on local serial (pyserial path) — same protocol changes, just no RackTransport. Confirm reading 256 KB at 921600 still works on a USB-serial-attached camera.

🤖 Generated with Claude Code

Three coordinated changes that make the flash agent's high-speed-UART
mode (`DEFAULT_FAST_BAUD = 921600`) work over rack-pod WiFi-bridged
links — previously the host-side `_port.baudrate = baud` path was
serial-only, so `defib agent {read,write,scan}` over `tcp://<pod>:9000`
silently fell back to `FALLBACK_BAUD = 115200` and ran at ~10 KB/s.

### 1. Host: `Transport.set_baudrate(baud)` abstraction

New method on `defib.transport.base.Transport`. Default raises
`NotImplementedError`. Overrides:

- **`SerialTransport`** — sets `self._port.baudrate` (was inlined in
  `FlashAgentClient.set_baud`).
- **`Rfc2217Transport`** — already had `set_baudrate` from PR #64
  (Vectis), now exposed through the ABC.
- **New `RackTransport(SocketTransport)`** with the pod's HTTP base
  URL captured at construction; `set_baudrate` POSTs to
  `/uart/baud {"rate": baud}`. New `rack://host[:bridge_port][?api=http_port]`
  URL scheme in `serial_platform.create_transport` (defaults 9000 / 8080).

`FlashAgentClient.set_baud` now `await transport.set_baudrate(baud)`
instead of poking `_port.baudrate` — works across all four transport
flavours; cleanly returns `False` if the transport doesn't support
baud changes (was: raw `AttributeError`).

### 2. Agent: stop auto-reverting on the post-switch verification

`handle_set_baud` used to switch UART, then `proto_recv(timeout=3000)`
for a verification packet from the host, reverting to 115200 if
nothing arrived. The "3000 ms" budget is a CPU-speed-dependent
busy-wait — `for (volatile int d=25; d>0; d--) {}` × `timeout_ms*100`
iterations — and on a fast Cortex-A7 the actual window collapses to
~300 ms.

Over a rack pod the host's `POST /uart/baud` itself takes ~1 s
(WiFi RTT + httpd dispatch), so the agent reverted to 115200 long
before any verification packet could land. Result: agent at 115200,
bridge at 921600, host reading 35 bytes of misclocked `0x80 0x00 …`
garbage forever.

Fix: drop the verification window. The agent stays at whatever baud
the last `CMD_SET_BAUD` selected. If the new rate doesn't work the
agent is unreachable until the next power-cycle / fastboot — both of
which the rack pod and RouterOS trivially provide.

### 3. Pod: defensive UART hygiene around `/uart/baud`

`uart_bridge_set_baud` (rack repo, local-only): drain the TX FIFO at
the old rate before calling `uart_set_baudrate`, and read back the
actual divisor via `uart_get_baudrate`. Belt + braces — even with the
agent fix, leaving in-flight bytes from the old rate gets clocked
out at the new rate and corrupts the agent's RX window. (See companion
rack-firmware commit on the `uart-bridge-flush-rx-on-accept` branch.)

### Live verification

Against the rack-pod prototype at 10.216.128.69 (hi3516ev300, 16 MB
W25Q128). 256 KB sustained flash read through the agent:

| Path | Rate | Speedup |
|---|---|---|
| 115200 (fallback) | 11.1 KB/s | 1.0× |
| 921600 (rack baud switch) | 61.9 KB/s | **5.57×** |

7 new transport tests (`tests/test_transport_rack.py`):
- `set_baudrate` POSTs correct URL + body
- HTTP / URL errors surface as `TransportError`
- `rack://` URL parsing with default + custom + ?api= query

Suite: **468 passed / 2 skipped**; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit 3f7f3b1 into master May 11, 2026
13 checks passed
@widgetii widgetii deleted the agent-baud-over-rack branch May 11, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant