profiles: per-board variant overrides; loud diagnostic when agent stays silent#102
Merged
Conversation
…ys silent
DDR init (PRESTEP0/DDRSTEP0) in defib's chip profiles is calibrated for
ONE board variant per chip. Two hi3516av300 cameras side-by-side: same
profile, same boot protocol, same TAIL ACKs — agent at 0x81000000
runs on the SPI NOR variant, hangs silently on the eMMC variant because
DDR isn't backed there. Bootrom source (OpenIPC/openhisilicon
bootrom/hi3516av300/re/bootloader.c:uart0_recv_payload) confirms the
bootrom faithfully calls the HEAD's load address after every ACKed
TAIL — the protocol is fine, the per-board DDR setup isn't.
This wires up the infrastructure to declare per-board overrides without
duplicating whole profile files, and replaces the "Agent not responding"
one-liner with a diagnostic that names the actual cause and a fix.
* `parse_chip_variant("hi3516av300:emmc") → ("hi3516av300", "emmc")` —
colon syntax, accepted by `--chip` consistently.
* Profile JSON gains an optional `variants` map keyed by variant name;
entries override matching top-level keys (typically DDRSTEP0,
PRESTEP0). Schema itself stays variant-unaware — variants are popped
before pydantic validation and merged in via dict.update().
* `list_variants(chip)` for surfacing options to humans.
* `get_agent_binary`, `firmware_url`, `get_cached_path`, `download_firmware`
strip the variant suffix — those resources are per-chip, not per-board.
* New CLI helper `_agent_not_responding_message(chip, uboot_address)`
names the DDR-mismatch root cause first, lists declared variants if
any (with a concrete `-c chip:variant` next-step), and includes the
vendor-U-Boot loadx fallback (loady → ymodem → go). Used by both
`defib agent upload` and `defib agent flash`.
* 21 new tests in test_profiles.py and test_cli.py covering parsing,
override merging, alias-chain transparency, unknown-variant errors,
variant stripping in chip-keyed lookups, and the diagnostic content.
No variant data is shipped yet — the eMMC av300 variant needs DDR init
bytes extracted from a working vendor U-Boot on that board, which is a
follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
widgetii
added a commit
that referenced
this pull request
May 15, 2026
## Why We have two hi3516av300 cameras on the bench: one with SPI NOR flash (ether8), one with eMMC (ether1). They use different DDR chips, and the OpenIPC U-Boot for hi3516av300 ships an SPL targeting SPI NOR boards. On the eMMC board, defib's boot protocol completes every stage with ACKs from the bootrom, but the bootrom faithfully calls the agent at 0x81000000 — DDR isn't backed there, the CPU fetches garbage, and the link goes silent (0 bytes for 30s, no READY). Two pieces here. The first builds on #102's variant infrastructure to carry an SPL blob; the second is the actual extracted-vendor variant. ## What ### `SPL_BLOB` schema addition Optional profile field naming a binary file (resolved relative to the profile JSON's directory). The loader reads it into `profile.spl_data`. The agent-upload CLI prefers `profile.spl_data` over the downloaded U-Boot when set: \`\`\`python if profile.spl_data is not None: spl_data = profile.spl_data # variant SPL takes precedence else: spl_data = cached_fw.read_bytes() # fall back to OpenIPC U-Boot first 20K \`\`\` Variant declaration looks like: \`\`\`json { \"name\": \"hi3516av300\", \"...\": \"...\", \"variants\": { \"emmc\": { \"SPL_BLOB\": \"hi3516av300-emmc-spl.bin\" } } } \`\`\` ### `hi3516av300:emmc` variant 20480 bytes extracted from a working eMMC av300 board's vendor U-Boot (eMMC offset 0, truncated at the gzip boundary at 0x5000). Lives at `src/defib/profiles/data/hi3516av300-emmc-spl.bin`. End-to-end verified on real hardware: | Camera | `--chip` | Result | |---|---|---| | SPI NOR av300 | `hi3516av300` | agent READY at t=0.3s | | eMMC av300 | `hi3516av300` | 0 bytes for 30s (pre-existing failure mode) | | eMMC av300 | `hi3516av300:emmc` | **agent READY at t=0.3s** | ### Failure-diagnostic content update The diagnostic message from #102 now actually has a real variant to suggest: \`\`\` Known board variants for hi3516av300: emmc Try: defib agent upload -c hi3516av300:emmc ... \`\`\` ## Extraction recipe Captured in kaeru \`hi3516av300-emmc-variant-shipped-2026-05-15\` for the next board family that hits this: 1. Catch vendor U-Boot prompt (^C bombardment) 2. \`mmc dev 0\` then \`mmc read 0 0x82000000 0 0x40\` — note: this U-Boot 2016.11 wants \`mmc read DEV addr blk# cnt\`, not \`mmc read addr blk# cnt\` 3. \`loady 0x81000000\` the defib agent, then \`go 0x81000000\` 4. \`agent.read_memory(0x82000000, 0x6000)\` to pull the bytes back 5. Truncate at the byte before the \`\\x1f\\x8b\\x08\` gzip signature (0x5000 here) to drop the gzipped U-Boot tail 6. Drop into \`src/defib/profiles/data/<chip>-<variant>-spl.bin\` and add a variant block ## Test plan - [x] \`uv run pytest tests/ -x --ignore=tests/fuzz\` — **522 passed, 2 skipped** (5 new tests in test_profiles.py covering blob resolution, missing-blob error path, blob-via-variant, real av300:emmc, real av300 base) - [x] \`uv run ruff check\` + \`mypy\` on changed files — clean - [x] Real-hardware: eMMC av300 reaches READY at t=0.3s with \`--chip hi3516av300:emmc\` (was 0 bytes for 30s before this PR) - [x] Real-hardware: SPI NOR av300 still reaches READY at t=0.3s with the base \`--chip hi3516av300\` (no regression) - [x] \`*.bin\` gitignore got a negation rule for \`src/defib/profiles/data/*.bin\` so SPL blobs don't get hidden ## Aside Found a separate routeros power-controller bug while iterating: \`power_off → power_on\` over a port that was already off restores it to \"off\" because \`power_off\` saves the current state (off) and \`power_on\` restores it. Worked around in test scripts via \`_set_poe(port, 'forced-on')\`. Worth fixing separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Two hi3516av300 cameras side-by-side. Same chip silicon, same defib code, same boot protocol, same TAIL ACKs from the bootrom. The agent uploaded to 0x81000000 ran cleanly on the SPI NOR variant and hung silently on the eMMC variant. The bootrom source (`OpenIPC/openhisilicon: bootrom/hi3516av300/re/bootloader.c:uart0_recv_payload`) confirmed the protocol is fine — `((foreign_fn)frame[6])()` is called after every ACKed TAIL. What differed was `DDRSTEP0` — defib's per-chip DDR init is calibrated for ONE board and doesn't bring DDR up on the other variant, so the bootrom faithfully jumps to 0x81000000 but the CPU fetches garbage there and hangs without writing a single byte to UART. Full investigation captured in kaeru `av300-ddr-init-is-per-board-not-per-chip-2026-05-15`.
What
Two things, both keyed off the realisation that DDR init is per-board, not per-chip.
Per-board variant support in profiles
Optional `variants` map in profile JSON. Each variant overrides matching top-level keys (typically `DDRSTEP0`, sometimes `PRESTEP0`):
```json
{
"name": "hi3516av300",
"DDRSTEP0": [...],
...
"variants": {
"emmc": { "DDRSTEP0": [...board-specific DDR init...] }
}
}
```
CLI accepts `--chip hi3516av300:emmc` consistently. Variant suffix gets stripped in chip-keyed lookups (`firmware_url`, `get_cached_path`, `get_agent_binary`) — those resources are per-chip. Profile loader pops `variants` before pydantic validation and merges in via `dict.update()`, so the `SoCProfile` model itself stays variant-unaware. Aliases still work transparently (`hi3516dv300_alias:emmc` resolves the alias chain and then applies the variant on the final target).
Loud diagnostic when agent stays silent
`defib agent upload` / `defib agent flash` used to print one red line when the agent never returned READY:
```
Agent not responding
```
Now it explains what's happening:
```
Agent not responding
Boot-protocol upload completed but the agent never sent READY.
Most common cause: the chip profile's DDR init (PRESTEP0/DDRSTEP0)
doesn't match this board's DDR layout. The bootrom faithfully calls
the agent at 0x81000000, but DDR isn't backed there, so the CPU
fetches garbage and hangs silently (no UART output).
No board variants declared for hi3516av300.
Manual workaround (vendor U-Boot must be intact in flash):
```
When variants are declared, the message names them and suggests `defib agent upload -c hi3516av300:` as the next step. JSON output mode wraps the same text under a `diagnostic` key.
Not in scope
The actual `hi3516av300:emmc` variant DATA — we don't have working DDR init bytes for the eMMC board. Extracting them from that board's vendor U-Boot is the follow-up task. This PR ships only the infrastructure plus the diagnostic.
Test plan
🤖 Generated with Claude Code