DWARF / source-map remapping for fused output (witness MC/DC integration)

## Summary

`meld fuse` produces a core wasm module whose code section is fully rewritten compared to the inputs, but DWARF custom sections (`.debug_info`, `.debug_line`, `.debug_str`, …) carried by the input components are passed through *byte-for-byte*. Every DWARF address inside those passed-through sections is a byte offset into the **original** per-component code section and is therefore wrong against the merged code section.

The downstream consumer is `pulseengine/witness` (sibling repo). Witness uses `gimli` to build a `(code-section byte offset) -> (file, line)` map so it can attribute MC/DC `br_if` decisions to source lines. After meld fusion every offset is wrong, so witness produces incorrect coverage attribution for fused modules.

This issue tracks a phased fix.

## Discovery (Phase 1, this issue)

Audit of `meld-core/src/` on `main` (commit `c52eb1b`):

1. **Default custom-section policy is `Merge`, not `Drop`** — see `meld-core/src/lib.rs:110`. The earlier assumption in CLI/agent docs was that DWARF was being dropped; in practice it is being passed through.
2. **`merger.rs:2010-2012` naively concatenates** every input core module's `custom_sections` vec into the merged module's vec without dedup, ordering, or address rewriting.
3. **`lib.rs::encode_output` (line 1345-1356) emits them all** at the end of the output module unless `CustomSectionHandling::Drop` is set.
4. No code in `meld-core/` parses, rewrites, or reconciles `.debug_*` sections. `parser.rs` ignores component-level custom sections entirely (line 1082-1084) and only stores core-module custom sections.

Concrete data from the new discovery test on `tests/wit_bindgen/fixtures/lists.wasm` (P2 component embedding 2 core modules):

```text
DWARF sections at top level of fused module: {
    \".debug_abbrev\": 2,
    \".debug_info\": 2,
    \".debug_line\": 2,
    \".debug_loc\": 2,
    \".debug_ranges\": 2,
    \".debug_str\": 2,
}
input code section (sum across embedded modules): 231531 bytes
fused code section:                                213242 bytes
```

So today's fused module:
- carries duplicate DWARF sections (one set per input core module),
- has a code section of a different length (so addresses can't be coincidentally valid),
- gives `gimli` ambiguous + wrong input.

## Cross-repo dependency: witness

`pulseengine/witness` v0.11.x reads DWARF via:
- `crates/witness-core/src/decisions.rs::extract_dwarf_sections` — looks for `.debug_abbrev`, `.debug_info`, `.debug_line`, `.debug_str`, `.debug_line_str`, `.debug_str_offsets`, `.debug_addr`, `.debug_rnglists`, `.debug_loclists`.
- `build_line_map` — uses gimli to compute `(code-section byte offset) -> (file, line)`.
- `reconstruct_decisions` — attributes each `br_if` byte offset to a source line.

Witness is intentionally NOT a meld-core dependency (it pulls `wasmtime`, `walrus`, `gimli` and lives in a separate workspace). End-to-end verification (\"run witness on a meld-fused module, assert > X% of branches got source attribution\") therefore has to live cross-repo — likely in `pulseengine/wasm-component-examples` release evidence, or as a scripted smoke check in CI.

## Phased plan

### Phase 1 — discovery (THIS ISSUE)

- [x] Audit current DWARF handling in meld-core
- [x] Identify witness as the consumer + document its DWARF contract
- [x] Add `meld-core/tests/dwarf_passthrough.rs` pinning the lossy behavior (5 tests, all green today, flip when phases land)
- [ ] Open draft PR with the discovery and this issue link

### Phase 1.5 — explicit policy

Document and surface the choice: today, debug-info-bearing components produce a fused module with **wrong** DWARF, which is worse than no DWARF. Options:

- Keep `Merge` default: silent corruption, witness gives wrong source lines.
- Switch default to `Drop` for `.debug_*` sections specifically: witness gives no attribution but no wrong attribution.
- Add a CLI flag `--debug-info {drop,passthrough,remap}` so users opt in.

Recommendation: **add a `.debug_*`-aware policy distinct from generic custom-section handling**, default to `drop` for `.debug_*` until Phase 2 ships, keep `Merge` for non-DWARF custom sections. Out of scope for the discovery PR; tracked here.

### Phase 2 — DWARF remap

For each `.debug_line` line program, rewrite addresses from per-input code-section offsets to merged-code-section offsets using the function-body relocation map the merger already builds. Single-pass through the line program is enough — the addresses are sequential and the merger preserves function-body byte offsets within each rewritten body modulo index reencoding.

`.debug_info` DIEs that reference code addresses (`DW_AT_low_pc`, `DW_AT_high_pc`, `DW_AT_ranges`) need the same remap. `.debug_ranges` / `.debug_rnglists` need entry-by-entry rewriting. `.debug_str` and `.debug_abbrev` are address-free and can pass through.

Multi-input dedup: `.debug_info` becomes the concatenation of input compile units, with offset adjustments. `.debug_str` needs string-pool dedup. `.debug_abbrev` either merges (if abbreviation tables are byte-equal) or stays per-CU.

This is real DWARF surgery. Feasible with `gimli` (read) + a write path. Witness already has the `gimli` dep; meld-core would need to add it (or fork the writer-side logic into a thin standalone helper).

### Phase 3 — adapters and inlined code

The merger generates new function bodies for cross-component adapters (memory.copy + cabi_realloc trampolines). These have NO source. Options:

- Synthesize DIEs that point at a placeholder file `\"<meld-adapter>\"` line N where N = the adapter's role (memory copy, realloc, lift, lower). Witness's truth-table view becomes correct: adapter `br_if`s show up as \"adapter\" branches that don't need source-level MC/DC coverage.
- Accept the gap: leave adapter ranges out of the line map. Witness already has a strict-per-`br_if` fallback when DWARF is absent or sparse, so this degrades gracefully.

Variable-level debug info (`.debug_loc`, `.debug_loclists`) is **explicitly out of scope** for this epic — instruction → source mapping only.

## Done criteria for THIS issue (discovery)

- [x] Audit findings posted (this comment)
- [x] Failing-but-documenting test fixture: `meld-core/tests/dwarf_passthrough.rs`
- [x] Witness DWARF contract documented in test docstring
- [ ] Draft PR linked

## References

- `meld-core/src/lib.rs:110` — default `CustomSectionHandling::Merge`
- `meld-core/src/lib.rs:1345-1356` — encoder writes custom sections verbatim
- `meld-core/src/merger.rs:2010-2012` — naive per-module custom-section accumulation
- `meld-core/src/parser.rs:1279-1283` — core-module custom sections collected
- `meld-core/src/parser.rs:1082-1084` — component-level custom sections explicitly dropped
- `pulseengine/witness:crates/witness-core/src/decisions.rs:83-140` — DWARF section extraction contract
- `pulseengine/witness:crates/witness-core/src/decisions.rs:142-160` — DWARF address semantics (\"byte offsets into the *Code section*\")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DWARF / source-map remapping for fused output (witness MC/DC integration) #130

Summary

Discovery (Phase 1, this issue)

Cross-repo dependency: witness

Phased plan

Phase 1 — discovery (THIS ISSUE)

Phase 1.5 — explicit policy

Phase 2 — DWARF remap

Phase 3 — adapters and inlined code

Done criteria for THIS issue (discovery)

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DWARF / source-map remapping for fused output (witness MC/DC integration) #130

Description

Summary

Discovery (Phase 1, this issue)

Cross-repo dependency: witness

Phased plan

Phase 1 — discovery (THIS ISSUE)

Phase 1.5 — explicit policy

Phase 2 — DWARF remap

Phase 3 — adapters and inlined code

Done criteria for THIS issue (discovery)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions