Skip to content

agent: post-erase verify must use register-mode read past 1 MB#91

Merged
widgetii merged 1 commit into
masterfrom
agent-verify-erased-past-1mb
May 11, 2026
Merged

agent: post-erase verify must use register-mode read past 1 MB#91
widgetii merged 1 commit into
masterfrom
agent-verify-erased-past-1mb

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

flash_verify_erased (the post-erase smoke test) read sample bytes directly from FLASH_MEM (the memory-mapped window). On hi3516ev300 — and apparently every SoC where flash_read_full already takes the register-mode-read path with the comment "boot mode memory window wraps at 1 MB on some SoCs" — that direct read wraps at 1 MB. For any sector at flash offset ≥ 0x100000 the verify read returned bytes from (offset % 0x100000) instead of the actual just-erased sector, the bytes weren't 0xFF, and the smoke test reported ACK_FLASH_ERROR (0x02) — even though the erase had completed cleanly.

Visible on W25Q128 (16 MB NOR): 12 sectors of a kernel write completed, then sector 13 at flash offset 0x110000 failed. Same chip programmed fine via U-Boot's sf write, and the agent's higher-level CRC32 verify (which uses flash_read() indirectly) also succeeded when bypassing the smoke test — the bug was localised to this one read path.

Fix: route the verify reads through flash_read(), the same register-mode SPI READ path flash_read_full has used since the 1 MB window workaround originally landed.

Verification on rack pod 10.216.128.69 (hi3516ev300 + W25Q128)

Before fix:
  0x00050000: OK in 6.4s  CRC match=True   ← <1 MB
  0x000C0000: OK in 6.8s  CRC match=True   ← <1 MB
  0x00110000: FAIL in 6.1s                  ← =1 MB + 0x10000
  0x00350000: FAIL in 6.1s                  ← 3.3 MB
  0x00F00000: FAIL in 6.1s                  ← 15 MB

After fix:
  0x00050000: OK in 6.4s  CRC match=True
  0x000C0000: OK in 6.3s  CRC match=True
  0x00110000: OK in 6.2s  CRC match=True   ✓
  0x00350000: OK in 6.3s  CRC match=True   ✓
  0x00F00000: OK in 6.3s  CRC match=True   ✓

Full OpenIPC nor-neo install through the agent (kernel 2.0 MB + rootfs 4.2 MB) now completes end-to-end in 92 s at 81 KB/s sustained, Linux boots to openipc-hi3516ev300 login:.

Test plan

  • uv run pytest tests/ -x -v --ignore=tests/fuzz — 480 passed / 2 skipped
  • make -C agent test HOST_CC=gcc — 5406/5406 agent C tests pass
  • Regression: smaller writes (<1 MB) continue to work — verified on 0x50000 and 0xC0000 offsets.

🤖 Generated with Claude Code

flash_verify_erased read sample bytes directly from FLASH_MEM (the
memory-mapped window), which on hi3516ev300 wraps at 1 MB.  For any
sector at offset ≥ 0x100000 the verify read returned bytes from
sector (offset % 0x100000) instead of the actual just-erased sector,
so the smoke test saw non-0xFF data and reported ACK_FLASH_ERROR
even though the erase succeeded.

Effect: write_flash to any offset past 1 MB on hi3516ev300 (and any
other SoC where the boot-mode memory window wraps at 1 MB) silently
failed. Visible on W25Q128 (16 MB NOR) — 12 sectors of a kernel
write completed, then sector 13 at flash offset 0x110000 failed with
ACK_FLASH_ERROR (0x02). Same chip programmed cleanly via U-Boot's
`sf write`, which the agent's CRC32-based higher-level path also
verified, so the bug was localised to the post-erase smoke test.

Fix: route the verify reads through flash_read() (register-mode SPI
READ via FMC normal-mode), the same path flash_read_full has used
since the 1 MB-window workaround landed.  The 1 MB-window-wraps
hazard exists for the verify path with identical reasoning.

Confirmed on hardware against rack pod 10.216.128.69
(hi3516ev300 + W25Q128):

  Before fix:
    0x00050000: OK in 6.4s  CRC match=True   ← <1 MB
    0x000C0000: OK in 6.8s  CRC match=True   ← <1 MB
    0x00110000: FAIL in 6.1s                  ← =1 MB+0x10000
    0x00350000: FAIL in 6.1s                  ← 3.3 MB
    0x00F00000: FAIL in 6.1s                  ← 15 MB

  After fix:
    0x00050000: OK in 6.4s  CRC match=True
    0x000C0000: OK in 6.3s  CRC match=True
    0x00110000: OK in 6.2s  CRC match=True   ✓
    0x00350000: OK in 6.3s  CRC match=True   ✓
    0x00F00000: OK in 6.3s  CRC match=True   ✓

Full nor-neo install through the agent (kernel 2.0 MB + rootfs 4.2 MB)
now completes end-to-end in 92 s at 81 KB/s sustained, Linux boots
to `openipc-hi3516ev300 login:`.

Suite: 480 passed / 2 skipped; agent C tests: 5406/5406 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit 6913b01 into master May 11, 2026
13 checks passed
@widgetii widgetii deleted the agent-verify-erased-past-1mb branch May 11, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant