Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions openspec/changes/2026-06-09-packed-hdr-decode/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Proposal: packed HDR format decode for remote texture export

## Motivation

Issue #236 named R11G11B10_FLOAT and R9G9B9E5_SHAREDEXP as formats to support.
PR #237 (621528f) scoped them out: `_decode_texture_png` rejects all non-Regular
`ResourceFormatType`s with `-32002 "format not supported for remote decode"`.

Both formats are common HDR render-target and light-probe formats:
- R11G11B10_FLOAT is the standard G-buffer emission/radiance target in UE5, Unity HDRP,
and most modern engines. Its 32 bits-per-pixel with no sign makes it a first-class RT.
- R9G9B9E5_SHAREDEXP appears as HDR skybox / IBL texture storage.

Both are closed-form bit-unpackable in numpy with no GPU round-trip, so they can follow
the same local-decode path already used for Regular Float formats.

## Design

### Entry point

`_decode_texture_png` currently has a hard gate at the top:

```python
if fmt.type != rd.ResourceFormatType.Regular:
return None
```

The fix adds two explicit branches **before** this gate, keyed on
`fmt.type == rd.ResourceFormatType.R11G11B10` and
`fmt.type == rd.ResourceFormatType.R9G9B9E5`. Each branch:
1. Length-checks `len(raw) != width * height * depth_lvl * 4` (4 bytes/pixel, fixed).
2. Reinterprets `raw` as `uint32` LE and extracts float32 RGB via numpy bitops.
3. Feeds the result into the existing Float display path: `nan_to_num`, `clip(0,1)`,
`_srgb_encode`, alpha=255 opaque, output RGBA PNG.

The Regular gate is unchanged; every other non-Regular format still returns `None`.

### Unpack functions

Two private helpers (in `_helpers.py` alongside the existing helpers):

**`_unpack_r11g11b10(words: np.ndarray) -> np.ndarray`**

Input: uint32 array shape `(N,)`. Output: float32 array shape `(N, 3)` — R, G, B.

Bit extraction (all shifts on the uint32 word):
- R 11-bit: `words & 0x7FF` (bits [0:11))
- G 11-bit: `(words >> 11) & 0x7FF` (bits [11:22))
- B 10-bit: `(words >> 22) & 0x3FF` (bits [22:32))

For 11-bit component `x` (exp=5 bits, mant=6 bits, no sign):
- `exp = x >> 6`, `mant = x & 0x3F`
- exp == 0 → subnormal: `value = (mant / 64.0) * 2**-14`
- exp == 31 → Inf/NaN (handled by nan_to_num downstream)
- else → normal: `value = (1.0 + mant / 64.0) * 2**(exp - 15)`

For 10-bit component `x` (exp=5 bits, mant=5 bits, no sign):
- `exp = x >> 5`, `mant = x & 0x1F`
- exp == 0 → subnormal: `value = (mant / 32.0) * 2**-14`
- exp == 31 → Inf/NaN
- else → normal: `value = (1.0 + mant / 32.0) * 2**(exp - 15)`

Vectorised implementation: build `exp` and `mant` arrays, apply numpy `where` for the
three cases (subnormal / inf-nan / normal). The inf/nan case can emit `np.inf` or any
non-finite value — `nan_to_num` in the display path sanitises them.

**`_unpack_r9g9b9e5(words: np.ndarray) -> np.ndarray`**

Input: uint32 array shape `(N,)`. Output: float32 array shape `(N, 3)`.

Bit extraction:
- R mantissa 9-bit: `words & 0x1FF` (bits [0:9))
- G mantissa 9-bit: `(words >> 9) & 0x1FF` (bits [9:18))
- B mantissa 9-bit: `(words >> 18) & 0x1FF` (bits [18:27))
- Shared exponent 5-bit: `(words >> 27) & 0x1F` (bits [27:32))

Decode: `value_c = mant_c * 2.0**(exp - 24)` (equivalent to `mant_c / 512.0 * 2^(exp-15)`).
No Inf/NaN possible (the exponent has no reserved value in this format); shared exponent
E=31 is valid and just produces large values which clip to 1 in the display path.

### Integration into `_decode_texture_png`

IMPORTANT (ordering): in the current code the `ResourceFormatType.Regular` gate
(`if fmt.type != rd.ResourceFormatType.Regular: return None`) comes FIRST, and the MSAA
guard (`if getattr(tex, "msSamp", 1) > 1: return None`) comes AFTER it. Packed formats are
non-Regular, so they are rejected by the Regular gate before ever reaching the MSAA guard.
The packed branch MUST therefore be inserted **before** the Regular gate, and it MUST:
(a) perform its own MSAA check (the existing guard is below the Regular gate and is
unreachable for non-Regular formats), and (b) compute `width`/`height`/`depth_lvl`
locally, because those locals are not yet defined this early in the function.

Insert immediately after `fmt = tex.format` (and after the `if not raw: return None`
check), before the Regular gate:

```python
# Packed HDR formats: 4 bytes/pixel, closed-form numpy decode.
if fmt.type in (rd.ResourceFormatType.R11G11B10, rd.ResourceFormatType.R9G9B9E5):
if getattr(tex, "msSamp", 1) > 1:
return None
width = max(1, tex.width >> mip)
height = max(1, tex.height >> mip)
depth_lvl = max(1, getattr(tex, "depth", 1) >> mip)
if len(raw) != width * height * depth_lvl * 4:
return None
words = np.frombuffer(raw, dtype=np.dtype("<u4")).reshape((depth_lvl * height, width))
flat = words.ravel()
if fmt.type == rd.ResourceFormatType.R11G11B10:
rgb = _unpack_r11g11b10(flat)
else:
rgb = _unpack_r9g9b9e5(flat)
rgb_img = rgb.reshape((depth_lvl * height, width, 3))
# Reuse Float display path.
sanitized = np.nan_to_num(rgb_img, nan=0.0, posinf=1.0, neginf=0.0)
f = np.clip(sanitized, 0.0, 1.0)
alpha = np.full((depth_lvl * height, width, 1), 255, np.uint8)
rgb8 = (_srgb_encode(f) * 255.0).round().astype(np.uint8)
out = np.concatenate([rgb8, alpha], axis=2)
buf = io.BytesIO()
Image.fromarray(out, mode="RGBA").save(buf, format="PNG")
return buf.getvalue()
```

Notes:
- `depth_lvl * height` matches the 3D tiling logic already in the Regular path.
- `BGRAOrder()` does not apply (R11G11B10 and R9G9B9E5 have no BGRA variant).
- `is_depth` never applies (these are color formats; callers do not set it for HDR RTs).
- MSAA is rejected by the explicit `msSamp` check inside this branch (the function's other
MSAA guard sits below the Regular gate and never sees non-Regular formats).
- The length check uses `* 4` not `* cc * cbw` because `compCount`/`compByteWidth` are
not meaningful for packed formats — only `ElementSize()` (which equals 4) matters.

### What is NOT changed

- Local mode `SaveTexture` path: unchanged.
- The `_decode_dtype` table: unchanged (packed formats never reach it).
- All other non-Regular formats: still rejected via the existing gate.
- `rt_overlay` guard: still blocked.
- `_export_remote` and call sites in `texture.py`: no change needed; they already pass
`raw` to `_decode_texture_png` and propagate `None` → `-32002`.

## Risks

| Risk | Mitigation |
|------|------------|
| Bit-extraction off-by-one | Hand-computed known-value unit tests with exact uint32 words. |
| numpy `where` for subnormal/normal wrong | Explicit test case for subnormal (exp=0, mant=1) — value must be ~9.5e-7, not zero. |
| Inf/NaN leaked to Image.fromarray | `nan_to_num` is applied before clip; verified by existing NaN test pattern on Float path. |
| Length check wrong (using cc*cbw) | Spec explicitly mandates `* 4`; length test covers wrong-size rejection. |
| Regression on existing Regular formats | New branches are fully guarded by `fmt.type`; Regular path code is untouched. |
46 changes: 46 additions & 0 deletions openspec/changes/2026-06-09-packed-hdr-decode/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Tasks: packed-hdr-decode

## Phase A: unpack helpers

- [x] Add `_unpack_r11g11b10(words: np.ndarray) -> np.ndarray` to
`src/rdc/handlers/_helpers.py` (place adjacent to `_decode_dtype`).
Vectorised numpy: extract R/G 11-bit and B 10-bit fields; apply subnormal /
normal / inf-nan cases via `np.where`; return float32 shape `(N, 3)`.

- [x] Add `_unpack_r9g9b9e5(words: np.ndarray) -> np.ndarray` to the same file.
Extract R/G/B 9-bit mantissas and 5-bit shared exponent; decode as
`mant * 2.0**(exp - 24)`; return float32 shape `(N, 3)`.

## Phase B: hook into `_decode_texture_png`

- [x] In `_decode_texture_png`, insert the packed-HDR branch **before** the
`ResourceFormatType.Regular` gate (the existing MSAA guard sits below that gate and is
unreachable for non-Regular formats, so it cannot be relied on):
- Guard on `fmt.type in (rd.ResourceFormatType.R11G11B10, rd.ResourceFormatType.R9G9B9E5)`
- Own MSAA check: `if getattr(tex, "msSamp", 1) > 1: return None`
- Compute `width`/`height`/`depth_lvl` locally (those locals are defined only after the
Regular gate in the current code)
- Length check: `len(raw) != width * height * depth_lvl * 4` → return None
- Reinterpret as `uint32` LE, reshape to `(depth_lvl * height, width)`, ravel, call the
appropriate unpack helper, reshape back to `(depth_lvl * height, width, 3)`
- Apply Float display path: `nan_to_num`, `clip`, `_srgb_encode`, alpha=255, RGBA PNG

## Phase C: unit tests

- [x] Add TC-1 through TC-14 (from test-plan.md) to
`tests/unit/test_tex_stats_handler.py`, following the `_remote_state` / `_handle_request`
pattern used by the existing remote decode tests.
- Use `struct.pack("<I", <word>)` to construct raw bytes for each test vector.
- Pixel assertions use `img.getpixel((0, 0))` on the decoded PNG.
- [x] TC-15 (MANDATORY): repurpose the existing
`test_tex_export_remote_packed_format_rejected` — it currently asserts R11G11B10
(`type=13`) is rejected with `-32002`, which this change breaks. Swap its fixture to a
still-unsupported non-Regular packed type (e.g. `R5G6B5` type=14 or `R10G10B10A2`
type=12), keep the `-32002 "not supported"` assertion, and rename the test.

## Phase D: verification

- [x] Run `pixi run lint` — no new lint errors.
- [x] Run `pixi run test` — all existing tests pass; new TC-1 through TC-14 pass.
- [ ] Real-GPU verify step per test-plan.md section "Manual / real-GPU verification"
(or mark DEFERRED with a tracking comment if no suitable capture is available).
185 changes: 185 additions & 0 deletions openspec/changes/2026-06-09-packed-hdr-decode/test-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Test plan: packed HDR format decode

All tests follow the pattern in `tests/unit/test_tex_stats_handler.py`:
`_remote_state(tex, raw, tmp_path)` + `_handle_request(rpc_request("tex_export", {...}), state)`.
Format fields use `rd.ResourceFormat(type=..., compByteWidth=4, compCount=3, compType=1)`.
- `rd.ResourceFormatType.R11G11B10 = 13`
- `rd.ResourceFormatType.R9G9B9E5 = 16`

---

## Bit-vector construction reference

### R11G11B10_FLOAT

Per-pixel layout in a little-endian uint32:
- R 11-bit: bits [0:11) — 5-bit exponent (bits 6-10), 6-bit mantissa (bits 0-5), no sign.
- G 11-bit: bits [11:22).
- B 10-bit: bits [22:32) — 5-bit exponent (bits 27-31 of the full word), 5-bit mantissa.

Decode of an 11-bit component `x`:
- exp = x >> 6, mant = x & 0x3F
- exp == 0: value = (mant / 64) * 2^-14 (subnormal)
- exp == 31: Inf (mant==0) or NaN (mant!=0)
- else: value = (1 + mant/64) * 2^(exp-15)

Decode of the 10-bit B component `x`:
- exp = x >> 5, mant = x & 0x1F
- exp == 0: value = (mant / 32) * 2^-14
- exp == 31: Inf/NaN
- else: value = (1 + mant/32) * 2^(exp-15)

**Known-value uint32 words (LE):**

| Color (R, G, B) | uint32 word | LE bytes | Notes |
|-----------------|-------------|----------|-------|
| (1.0, 0.5, 0.25) | `0x681C03C0` | `[0xC0,0x03,0x1C,0x68]` | R: exp=15 mant=0; G: exp=14 mant=0; B: exp=13 mant=0 |
| max finite (all ch) | `0xF7FDFFBF` | `[0xBF,0xFF,0xFD,0xF7]` | R,G: exp=30 mant=63; B: exp=30 mant=31 |
| Inf (all ch) | `0xF83E07C0` | `[0xC0,0x07,0x3E,0xF8]` | R,G: exp=31 mant=0; B: exp=31 mant=0 |
| NaN (all ch) | `0xF87E0FC1` | `[0xC1,0x0F,0x7E,0xF8]` | R,G: exp=31 mant=1; B: exp=31 mant=1 |
| subnormal (mant=1 all) | `0x00400801` | `[0x01,0x08,0x40,0x00]` | R,G: exp=0 mant=1; B: exp=0 mant=1 |

### R9G9B9E5_SHAREDEXP

Per-pixel layout in a little-endian uint32:
- R mantissa 9-bit: bits [0:9)
- G mantissa 9-bit: bits [9:18)
- B mantissa 9-bit: bits [18:27)
- Shared exponent 5-bit: bits [27:32)

Decode: `value_c = mant_c * 2.0^(exp - 24)` (= `mant_c / 512 * 2^(exp-15)`).
No reserved exponent values; no Inf/NaN possible.

**Known-value uint32 words (LE):**

| Color (R, G, B) | uint32 word | LE bytes | Build (E, rm, gm, bm) |
|-----------------|-------------|----------|-----------------------|
| (1.0, 1.0, 1.0) | `0xC0040201` | `[0x01,0x02,0x04,0xC0]` | E=24, m=1 each: `1 * 2^0 = 1.0` |
| (1.0, 0.5, 0.25) | `0xB0040404` | `[0x04,0x04,0x04,0xB0]` | E=22, rm=4, gm=2, bm=1: `4*2^-2=1, 2*2^-2=0.5, 1*2^-2=0.25` |

**Expected sRGB output bytes** (after clip + `_srgb_encode`):
- 1.0 → 255, 0.5 → 188, 0.25 → 137, 0.0 → 0

---

## R11G11B10_FLOAT unit tests

**TC-1: happy path (1.0, 0.5, 0.25)**
- `fmt`: type=13, compByteWidth=4, compCount=3, compType=1, name="R11G11B10_FLOAT"
- `tex`: 1×1, msSamp=1
- `raw`: `struct.pack("<I", 0x681C03C0)` (4 bytes)
- `rpc`: `tex_export`, id=<tex_id>
- Assert: `resp["result"]` present; PNG RGBA; pixel[0,0][0]==255, pixel[0,0][1]` ≈ 188 (±2), pixel[0,0][2]` ≈ 137 (±2), alpha==255

**TC-2: Inf clips to white**
- `raw`: `struct.pack("<I", 0xF83E07C0)` (all-Inf)
- Assert: pixel[0,0] == (255, 255, 255, 255)

**TC-3: NaN renders black**
- `raw`: `struct.pack("<I", 0xF87E0FC1)` (all-NaN)
- Assert: pixel[0,0][0] == 0, pixel[0,0][1] == 0, pixel[0,0][2] == 0, alpha == 255

**TC-4: subnormal is non-negative and very small**
- `raw`: `struct.pack("<I", 0x00400801)` (exp=0 mant=1 for R,G,B)
- Assert: `resp["result"]` present; pixel[0,0] == (0, 0, 0, 255) (sRGB(~1.5e-19) rounds to 0); no error

**TC-5: wrong length rejected**
- `tex`: 2×2
- `raw`: `b"\x00" * 4` (should be 16 bytes)
- Assert: `resp["error"]["code"] == -32002`

**TC-6: MSAA rejected**
- `tex`: 1×1, msSamp=4
- `raw`: `struct.pack("<I", 0x681C03C0)`
- Assert: `resp["error"]["code"] == -32002`

**TC-7: 3D tiled (depth=2)**
- `tex`: 1×1, depth=2
- `raw`: `struct.pack("<2I", 0x681C03C0, 0x00000000)` (8 bytes = 2 slices)
- Assert: `resp["result"]` present; PNG size == (1, 2); pixel[0,0] ≈ (255, 188, 137, 255); pixel[0,1] == (0, 0, 0, 255)

---

## R9G9B9E5_SHAREDEXP unit tests

**TC-8: happy path (1.0, 1.0, 1.0)**
- `fmt`: type=16, compByteWidth=4, compCount=3, compType=1, name="R9G9B9E5_SHAREDEXP"
- `tex`: 1×1
- `raw`: `struct.pack("<I", 0xC0040201)` (4 bytes)
- Assert: `resp["result"]` present; pixel[0,0] == (255, 255, 255, 255)

**TC-9: happy path (1.0, 0.5, 0.25)**
- `raw`: `struct.pack("<I", 0xB0040404)`
- Assert: pixel[0,0][0] == 255, pixel[0,0][1] ≈ 188 (±2), pixel[0,0][2] ≈ 137 (±2), alpha == 255

**TC-10: zero value**
- `raw`: `struct.pack("<I", 0x00000000)` (E=0, all m=0)
- Assert: pixel[0,0] == (0, 0, 0, 255) — `0 * 2^(0-24) = 0`

**TC-11: wrong length rejected**
- `tex`: 2×2
- `raw`: `b"\x00" * 4`
- Assert: `resp["error"]["code"] == -32002`

**TC-12: 3D tiled (depth=2)**
- `tex`: 1×1, depth=2
- `raw`: `struct.pack("<2I", 0xC0040201, 0x00000000)` (8 bytes)
- Assert: PNG size == (1, 2); pixel[0,0] == (255, 255, 255, 255); pixel[0,1] == (0, 0, 0, 255)

**TC-12b: max shared exponent (E=31, mantissa=511) clips to white**
- `raw`: `struct.pack("<I", 0xFFFFFFFF)` (E=31, all mantissas=511 → each ch = 65408.0)
- Assert: pixel[0,0] == (255, 255, 255, 255). Confirms E=31 is a valid (non-reserved)
exponent that produces large finite values clipped to 1, not Inf/NaN.

---

## Regression guard

**TC-13: existing Regular Float format still works**
- `fmt`: type=0 (Regular), compByteWidth=4, compCount=4, compType=1 (R32G32B32A32_FLOAT)
- Verify that an existing test (e.g. `test_tex_export_remote_rgba32f_hdr_clip`) still passes
unchanged — confirms the new branches do not interfere with the Regular path.

**TC-14: BC1 (block-compressed) still rejected**
- `fmt`: type=2 (BC1), compByteWidth=0, compCount=4
- Assert: `-32002 "not supported"` — confirms the non-Regular gate is intact for BC.

**TC-15: repurpose the existing rejection test (MANDATORY — currently broken by this change)**
- The existing `test_tex_export_remote_packed_format_rejected` (in
`tests/unit/test_tex_stats_handler.py`) builds an R11G11B10 (`type=13`) texture and
asserts `-32002 "not supported"`. After this change R11G11B10 **decodes**, so that test
WILL FAIL as written.
- Required action: repurpose it. Replace the `type=13` fixture with a still-unsupported
non-Regular format that retains the rejection-test role — e.g. `R5G6B5` (`type=14`) or
`R10G10B10A2` (`type=12`), both present in the mock enum and not decoded by this change.
Keep the `-32002 "not supported"` assertion. Rename the test accordingly (e.g.
`test_tex_export_remote_unsupported_packed_format_rejected`).
- Note: TC-14 (BC1) already guards the block-compressed path; TC-15 specifically preserves
coverage of a still-unsupported *packed* non-Regular type after R11G11B10/R9G9B9E5 became
decodable.

---

## Manual / real-GPU verification

1. Find or create a RenderDoc capture that contains R11G11B10_FLOAT or R9G9B9E5_SHAREDEXP
render targets. Any modern engine HDR G-buffer pass or light probe capture works.
If no capture is available locally, generate one from a Vulkan sample (e.g. Sascha Willems
`hdr` sample) with `rdc capture`.

2. Open the capture in remote-replay mode: `rdc open capture.rdc --proxy host:port`.

3. Run `rdc rt <eid> -o /tmp/hdr_rt.png` for a draw event whose primary RT has one
of the packed formats. Verify:
- Command exits 0.
- The PNG file exists and opens in an image viewer showing a plausible HDR scene
(bright highlights clipped to white, not garbled noise).
- `file /tmp/hdr_rt.png` reports PNG, `identify /tmp/hdr_rt.png` (ImageMagick) reports
geometry matching the RT dimensions.

4. Cross-check: use `SaveTexture` in local mode on the same event/resource. Compare the
two PNGs visually; they should be perceptually similar (same content, slight gamma
difference acceptable since local mode may use a different display mapping).

5. If a capture is unavailable: fallback is unit vectors only (TC-1 through TC-12 above).
Mark the real-GPU step as DEFERRED and file a tracking comment in the PR.
Loading
Loading