Skip to content

DWARF / source-map remapping for fused output (witness MC/DC integration) #130

@avrabe

Description

@avrabe

Summary

meld fuse produces a core wasm module whose code section is fully rewritten compared to the inputs, but DWARF custom sections (.debug_info, .debug_line, .debug_str, …) carried by the input components are passed through byte-for-byte. Every DWARF address inside those passed-through sections is a byte offset into the original per-component code section and is therefore wrong against the merged code section.

The downstream consumer is pulseengine/witness (sibling repo). Witness uses gimli to build a (code-section byte offset) -> (file, line) map so it can attribute MC/DC br_if decisions to source lines. After meld fusion every offset is wrong, so witness produces incorrect coverage attribution for fused modules.

This issue tracks a phased fix.

Discovery (Phase 1, this issue)

Audit of meld-core/src/ on main (commit c52eb1b):

  1. Default custom-section policy is Merge, not Drop — see meld-core/src/lib.rs:110. The earlier assumption in CLI/agent docs was that DWARF was being dropped; in practice it is being passed through.
  2. merger.rs:2010-2012 naively concatenates every input core module's custom_sections vec into the merged module's vec without dedup, ordering, or address rewriting.
  3. lib.rs::encode_output (line 1345-1356) emits them all at the end of the output module unless CustomSectionHandling::Drop is set.
  4. No code in meld-core/ parses, rewrites, or reconciles .debug_* sections. parser.rs ignores component-level custom sections entirely (line 1082-1084) and only stores core-module custom sections.

Concrete data from the new discovery test on tests/wit_bindgen/fixtures/lists.wasm (P2 component embedding 2 core modules):

DWARF sections at top level of fused module: {
    \".debug_abbrev\": 2,
    \".debug_info\": 2,
    \".debug_line\": 2,
    \".debug_loc\": 2,
    \".debug_ranges\": 2,
    \".debug_str\": 2,
}
input code section (sum across embedded modules): 231531 bytes
fused code section:                                213242 bytes

So today's fused module:

  • carries duplicate DWARF sections (one set per input core module),
  • has a code section of a different length (so addresses can't be coincidentally valid),
  • gives gimli ambiguous + wrong input.

Cross-repo dependency: witness

pulseengine/witness v0.11.x reads DWARF via:

  • crates/witness-core/src/decisions.rs::extract_dwarf_sections — looks for .debug_abbrev, .debug_info, .debug_line, .debug_str, .debug_line_str, .debug_str_offsets, .debug_addr, .debug_rnglists, .debug_loclists.
  • build_line_map — uses gimli to compute (code-section byte offset) -> (file, line).
  • reconstruct_decisions — attributes each br_if byte offset to a source line.

Witness is intentionally NOT a meld-core dependency (it pulls wasmtime, walrus, gimli and lives in a separate workspace). End-to-end verification ("run witness on a meld-fused module, assert > X% of branches got source attribution") therefore has to live cross-repo — likely in pulseengine/wasm-component-examples release evidence, or as a scripted smoke check in CI.

Phased plan

Phase 1 — discovery (THIS ISSUE)

  • Audit current DWARF handling in meld-core
  • Identify witness as the consumer + document its DWARF contract
  • Add meld-core/tests/dwarf_passthrough.rs pinning the lossy behavior (5 tests, all green today, flip when phases land)
  • Open draft PR with the discovery and this issue link

Phase 1.5 — explicit policy

Document and surface the choice: today, debug-info-bearing components produce a fused module with wrong DWARF, which is worse than no DWARF. Options:

  • Keep Merge default: silent corruption, witness gives wrong source lines.
  • Switch default to Drop for .debug_* sections specifically: witness gives no attribution but no wrong attribution.
  • Add a CLI flag --debug-info {drop,passthrough,remap} so users opt in.

Recommendation: add a .debug_*-aware policy distinct from generic custom-section handling, default to drop for .debug_* until Phase 2 ships, keep Merge for non-DWARF custom sections. Out of scope for the discovery PR; tracked here.

Phase 2 — DWARF remap

For each .debug_line line program, rewrite addresses from per-input code-section offsets to merged-code-section offsets using the function-body relocation map the merger already builds. Single-pass through the line program is enough — the addresses are sequential and the merger preserves function-body byte offsets within each rewritten body modulo index reencoding.

.debug_info DIEs that reference code addresses (DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges) need the same remap. .debug_ranges / .debug_rnglists need entry-by-entry rewriting. .debug_str and .debug_abbrev are address-free and can pass through.

Multi-input dedup: .debug_info becomes the concatenation of input compile units, with offset adjustments. .debug_str needs string-pool dedup. .debug_abbrev either merges (if abbreviation tables are byte-equal) or stays per-CU.

This is real DWARF surgery. Feasible with gimli (read) + a write path. Witness already has the gimli dep; meld-core would need to add it (or fork the writer-side logic into a thin standalone helper).

Phase 3 — adapters and inlined code

The merger generates new function bodies for cross-component adapters (memory.copy + cabi_realloc trampolines). These have NO source. Options:

  • Synthesize DIEs that point at a placeholder file \"<meld-adapter>\" line N where N = the adapter's role (memory copy, realloc, lift, lower). Witness's truth-table view becomes correct: adapter br_ifs show up as "adapter" branches that don't need source-level MC/DC coverage.
  • Accept the gap: leave adapter ranges out of the line map. Witness already has a strict-per-br_if fallback when DWARF is absent or sparse, so this degrades gracefully.

Variable-level debug info (.debug_loc, .debug_loclists) is explicitly out of scope for this epic — instruction → source mapping only.

Done criteria for THIS issue (discovery)

  • Audit findings posted (this comment)
  • Failing-but-documenting test fixture: meld-core/tests/dwarf_passthrough.rs
  • Witness DWARF contract documented in test docstring
  • Draft PR linked

References

  • meld-core/src/lib.rs:110 — default CustomSectionHandling::Merge
  • meld-core/src/lib.rs:1345-1356 — encoder writes custom sections verbatim
  • meld-core/src/merger.rs:2010-2012 — naive per-module custom-section accumulation
  • meld-core/src/parser.rs:1279-1283 — core-module custom sections collected
  • meld-core/src/parser.rs:1082-1084 — component-level custom sections explicitly dropped
  • pulseengine/witness:crates/witness-core/src/decisions.rs:83-140 — DWARF section extraction contract
  • pulseengine/witness:crates/witness-core/src/decisions.rs:142-160 — DWARF address semantics ("byte offsets into the Code section")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions