Skip to content

restore: route through pod fastboot + pod TFTP when power=rack#94

Merged
widgetii merged 1 commit into
masterfrom
restore-via-pod-tftp
May 12, 2026
Merged

restore: route through pod fastboot + pod TFTP when power=rack#94
widgetii merged 1 commit into
masterfrom
restore-via-pod-tftp

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

Brings defib restore to parity with defib install (#88 + #93) for rack-controlled cameras. Three pieces:

Phase 1 — fastboot when power=rack

The previous host-side frame-blast race (power-off → open serial → start session → power-on) is RouterOS-only. Rack pods don't expose independent power_off/power_on and don't need to — the pod's /fastboot endpoint does the whole sequence locally with microsecond ACK latency. Drop the hard-coded "restore needs RouterOSController only" reject — RackController is now an accepted alternative. Vectis stays rejected.

Phase 5 — --tftp-via=auto|pod|host (default auto)

Same flag as install. Auto → pod when power=rack, host otherwise. Pod path stages every partition via RackController.tftp_put, sets serverip=192.168.1.1 (the pod), and unifies the UBI rootfs file-swap through _replace_in_tftp(name, data).

Two robustness improvements:

  • tftp_clear BEFORE staging. A prior aborted run leaves PSRAM occupied; if the next run can't allocate, the 4 MB rootfs OOMs at 256 KB largest-free. Wipe first.
  • try/finally around Phase 5 + 6. A mid-loop write failure skipped __aexit__ and leaked ~7 MB of pod PSRAM until the next install. The try/finally (with the cleanup hooks pre-registered on the AsyncExitStack) makes cleanup unconditional.

Live verification on rack pod 10.216.128.69 (hi3516ev300)

Synthetic dump dir at /tmp/cam_dump/ (mtd0..3 sized to match the 16 MB NOR layout):

$ DEFIB_POWER_TYPE=rack DEFIB_RACK_HOST=10.216.128.69 \
  defib restore -c hi3516ev300 -i /tmp/cam_dump/ \
                -p rack://10.216.128.69 --power-cycle --flash-type nor

  Power: rack pod HTTP API
Phase 1: Loading U-Boot to RAM
  Pod-side fastboot in progress…
Phase 4: Network setup — Network OK (attempt 1)
Phase 5: Writing flash
  Staging 7664 KB in pod PSRAM via POST /tftp/<name>...
  Pod TFTP ready on 192.168.1.1:69
  mtd1: 64KB    → 0x40000     Written (7.5 s)
  mtd2: 3072KB  → 0x50000     Written (11.7 s)
  mtd3: 4272KB  → 0x350000    Written (15.7 s)
  mtd0: 256KB   → 0x0         Written (8.3 s)
Restore complete!

Camera reaches openipc-hi3516ev300 login: cleanly. exit=0.

Companion rack-firmware change (local-only)

UART_IDLE_TIMEOUT_S 60 → 600. The 60-second idle timer was killing the bridge socket mid-staging — ~50 s of HTTP /tftp uploads counts as "idle" to the bridge (no host→pod UART traffic during that window). 600 s comfortably covers full installs and restores.

Test plan

  • uv run pytest tests/ -x -v --ignore=tests/fuzz — 486 passed / 2 skipped (no new unit tests; _restore_async is integration-only)
  • uv run ruff check src/defib/cli/app.py — clean
  • uv run mypy src/defib/cli/app.py --ignore-missing-imports — clean
  • Regression: defib restore --tftp-via host … still works on existing RouterOS+host-TFTP setups — host branch is byte-identical except for being inside the shared AsyncExitStack.
  • --tftp-via pod without DEFIB_POWER_TYPE=rack → clean error message.

🤖 Generated with Claude Code

Brings restore to parity with install (#88 + #93) for rack-controlled
cameras:

* Phase 1 — when power=rack, drive the bring-up via
  `run_rack_fastboot()` (which handles its own power-cycle and locks
  UART to the pod for the upload).  The previous host-side
  frame-blast race (power-off → open serial → start session → power-on)
  is RouterOS-only; rack pods don't expose independent power_off/on
  and don't need it — the pod's `/fastboot` does the whole sequence
  locally with microsecond ACK latency.

* Phase 5 — add `--tftp-via=auto|pod|host` (default auto: pod when
  power=rack, host otherwise) and pick the TFTP backend with the same
  `AsyncExitStack` pattern install uses.  Pod path stages every
  partition via `RackController.tftp_put`; sets `serverip=192.168.1.1`
  (the pod itself).  `tftp_clear` is called BEFORE staging too, so a
  prior aborted run can't OOM the next one.

* `_replace_in_tftp(name, data)` unifies the UBI rootfs swap — pod
  re-POSTs to /tftp/<name>; host reassigns the in-memory dict.

* Wrap the partition-write loop + final reset in `try/finally` so the
  pod TFTP cleanup (or host UDP-socket close) always fires, even on
  a mid-loop failure.  Without the wrap a Phase-5 raise would skip
  __aexit__ and leak ~7 MB of pod PSRAM until the next install.

* Drop the hard-coded "restore needs RouterOSController only" reject
  in the power-controller setup — RackController is now an accepted
  alternative.  Vectis stays rejected (no independent off/on, no
  /fastboot equivalent).

### Live verification on rack pod 10.216.128.69

Synthetic dump dir at /tmp/cam_dump/ (mtd0..3 sized to match the
16 MB NOR layout):

  $ DEFIB_POWER_TYPE=rack DEFIB_RACK_HOST=10.216.128.69 \
    defib restore -c hi3516ev300 -i /tmp/cam_dump/ \
                  -p rack://10.216.128.69 --power-cycle --flash-type nor

  Power: rack pod HTTP API
  Phase 1: Loading U-Boot to RAM
    Pod-side fastboot in progress…
  Phase 4: Network setup — Network OK (attempt 1)
  Phase 5: Writing flash
    Staging 7664 KB in pod PSRAM via POST /tftp/<name>...
    Pod TFTP ready on 192.168.1.1:69
    mtd1: 64KB    → 0x40000     Written (7.5s)
    mtd2: 3072KB  → 0x50000     Written (11.7s)
    mtd3: 4272KB  → 0x350000    Written (15.7s)
    mtd0: 256KB   → 0x0         Written (8.3s)
  Restore complete!

Camera reaches `openipc-hi3516ev300 login:` cleanly. exit=0.

Companion rack-firmware bump (local-only): UART_IDLE_TIMEOUT_S 60 → 600.
The 60-second idle timer was killing the bridge socket mid-staging
(~50 s of HTTP /tftp uploads with zero UART traffic counts as "idle"
to the bridge); 600 s comfortably covers full installs and restores.

Suite: 486 passed / 2 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit a5ae454 into master May 12, 2026
13 checks passed
@widgetii widgetii deleted the restore-via-pod-tftp branch May 12, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant