Fix: merge no longer corrupts specs by dropping frontmatter fences (Critical: data loss)#297
Conversation
`specsync merge` reported "✓ resolved" while writing a spec whose YAML
frontmatter `---` fences were gone, so the very next `check` failed with
"✗ Frontmatter invalid" — data loss disguised as success. It triggered whenever
a git conflict hunk spanned the whole frontmatter block (each side carrying its
own `---…---`): `resolve_frontmatter_conflict` parsed the fields but re-emitted
only loose `key: value` lines, never the delimiters.
Two-part fix:
- Correctness: re-emit the `---` fences when the conflict hunk carried them
(detected via a `---` line inside either side). The common interior conflict,
where the fences live in the surrounding clean regions, is unchanged.
- Fail-safe backstop: if the assembled output no longer has parseable
frontmatter, downgrade the result to Manual instead of Resolved. The caller
only writes on Resolved, so a resolution that would produce an invalid spec
now leaves the ORIGINAL file untouched rather than overwriting it.
Also derives Debug/Clone/PartialEq/Eq on MergeStatus (for test assertions).
Tests: a unit test that a fence-spanning conflict resolves to valid frontmatter,
and an end-to-end integration test that `merge --all` followed by `check` yields
a valid spec with no leftover conflict markers.
Note: scalar precedence is unchanged ("theirs wins"); a differing version/status
is still resolved silently to the incoming side. That is a separate,
lower-severity merge-policy question, not data loss, and is out of scope here.
Found by dogfooding the CLI end-to-end.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
Code Review
This pull request ensures that frontmatter conflict resolution preserves the surrounding --- fences when they are swallowed by a conflict hunk, and adds a safety net to downgrade to manual resolution if the resolved output has invalid frontmatter. The feedback highlights a cross-platform compatibility issue on Windows where CRLF line endings could bypass the safety net or cause false positives, suggesting normalization of line endings before validation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if all_resolved | ||
| && !output.is_empty() | ||
| && parse_frontmatter(&output).is_none() | ||
| && content.contains("---\n") | ||
| && parse_frontmatter(&output).is_none() |
There was a problem hiding this comment.
On Windows or environments where files use CRLF (\r\n) line endings, content.contains("---\n") will evaluate to false because the line ending is \r\n. This silently bypasses the safety net entirely, allowing corrupted specs to be written back.\n\nAdditionally, if the frontmatter was in a clean region and preserved with \r\n, parse_frontmatter(&output) would fail (return None) because FRONTMATTER_RE does not match \r\n. If we only fixed the presence check, valid CRLF files would be incorrectly downgraded to Manual.\n\nTo make this fully robust and cross-platform, we should:\n1. Check for the presence of the frontmatter fence using content.lines().any(|l| l == "---").\n2. Normalize CRLF line endings in output before calling parse_frontmatter (similar to how validate_spec does it).
if all_resolved\n && !output.is_empty()\n && content.lines().any(|l| l == "---")\n && parse_frontmatter(&output.replace("\r\n", "\n")).is_none()There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"That's a nice looking export you've got there."
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (34806/34806) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
Review found the first fix incomplete — and worse in one case. A plain `git merge` of two specs whose content differs right after the frontmatter (esp. an add/add with no shared anchor lines) produces a conflict hunk that spans the closing `---` AND the body/headings. `resolve_frontmatter_conflict` rebuilds the block from parsed `key: value` fields ALONE, so it silently discarded the body; the `had_fences` re-wrap then emitted a structurally-parseable `---\n---\n<fields>` double fence, which the parse-only backstop treated as valid — so it wrote a body-deleted spec and reported "✓ resolved" (main at least printed a warning). Fix the root cause: only auto-merge when BOTH conflict sides are genuinely frontmatter-only (field/list lines, optionally fenced, nothing after the closing fence). New `is_frontmatter_only` rejects any side carrying a heading, prose, or content past the closing `---`, so such hunks fall to Manual — markers preserved, the original body left intact — instead of being silently deleted. Also strengthen the backstop: fire on an empty/doubled `---\n---` block (residue of a dropped body), and detect the frontmatter fence via `.lines()` so CRLF specs are covered too. Tests: a body-swallowing conflict (the reviewer's exact shape) now falls to Manual with the body preserved. Verified end-to-end on a real `git merge` and on the giant single-hunk case: both bodies survive, reported as needing manual resolution, no double-fence corruption. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"That's a nice looking export you've got there."
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (34908/34908) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
…d hunks Re-review found a third fence edge: a blank/whitespace/comment line between the clean opening `---` and a fence-swallowing hunk defeated both the `had_fences` re-wrap (which injected a second opening fence) and the backstop (`parse_frontmatter` accepts an EMPTY frontmatter, and the doubled-fence check was a literal `---\n---` prefix the separator evaded), writing an empty-frontmatter spec as "✓ resolved". Three rounds of fence-reconstruction bugs is the signal: don't reconstruct fences at all. `is_frontmatter_only` is now strict — a side qualifies only if EVERY non-blank line is a `key: value`/`- item` field line; any `---` fence, heading, or prose disqualifies it. So we auto-merge only pure interior field conflicts (the common "two branches bumped `version`" case, where the fences stay in the surrounding clean regions). Any hunk carrying a fence or body falls to Manual with the file left untouched. The re-wrap is gone; the resolver emits only merged fields, so it can never produce a broken/doubled/empty fence. Backstop is now shape-independent: downgrade to Manual if the resolved output doesn't parse as frontmatter OR parses to one missing `module` (fields demoted to body), instead of the evadable literal-prefix check. Tests: fence-in-hunk and separator-before-fence conflicts both fall to Manual (file untouched, fields + body preserved); a NEW positive test confirms the common interior version-bump conflict still auto-resolves (theirs wins, valid frontmatter). Verified end-to-end on the separator vector: "need manual resolution", file untouched, no empty-frontmatter corruption. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"That's a nice looking export you've got there."
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (34920/34920) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
Final re-review, probing beyond the fence corruption, found two pre-existing content-loss bugs in merge's other resolvers (present on main, unrelated to fences) — both fixed here with the same "only auto-merge what you can do losslessly" principle: 1. A conflict in a pre-first-`##` body region (a `# Notes`/intro after the frontmatter) has no `## ` heading, so `detect_section` returned "" and it was routed to the frontmatter FIELD resolver — which dropped bullet lists and non-winning prose. resolve_conflict now takes `in_frontmatter` (true only while inside the leading `---…---` block, i.e. <2 fence lines emitted) and only uses the field resolver there; a pre-heading body hunk falls to Manual. 2. `resolve_changelog_conflict` ran with no purity gate (unlike the generic table branch), so a Change Log hunk that ran past the table into the next section deleted that section. All row-merges (changelog + generic table) are now gated on `is_pure_table_rows`; an impure hunk falls to Manual. Not addressed (pre-existing, lower severity, arguably by-design): the generic table resolver unions rows by first cell, so distinct rows sharing a first cell (e.g. a class name with two method rows) collapse — tracked as a follow-up. Tests: a pre-heading body hunk with bullets, and a changelog hunk swallowing the next section, both fall to Manual with content preserved; the common interior field + pure changelog conflicts still auto-resolve (existing tests). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"Caw! Found a shiny new spec!"
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (35027/35027) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
Convergence review found one more `git merge`-reproducible content loss: when two branches both extend a list (`files:`/`db_tables:`/`depends_on:`) AND another field, git yields a hunk that STARTS with the diverging `- item` tail (the list key is in the clean region) then continues into a second field. `parse_yaml_fields` drops a `- item` whose owning key it hasn't seen (current_key == None), while the trailing field keeps `fields` non-empty so the empty-fields Manual fallback never fires — so both branches' new list items vanish, written as "✓ resolved". Generalize the guard instead of patching the instance: `is_frontmatter_only` now mirrors `parse_yaml_fields`'s exact keep/drop rules — a `- item` is accepted only while under a key with an empty/`[]` value (a list key opened earlier in the same hunk). A leading/orphan item (or an item after a scalar) makes the side ineligible, so the conflict falls to Manual with the items preserved. This rejects EVERY hunk the parser would silently drop a line from, not just this shape. Test: a real-merge-shaped hunk with leading orphan `files:` items + a trailing `db_tables:` now falls to Manual with both branches' items intact; the common in-hunk `files:` list conflict (key present) still auto-resolves (existing tests). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"Looking sharp! Like a beak should be."
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (35085/35085) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
Final review found one more (pre-existing) content-loss path: `parse_table_rows` excluded any `|`-row starting with `| -` / `|--` / `|-` to skip the GFM separator row, but that also dropped legitimate DATA rows whose first cell begins with a dash — a CLI flag (`| --debug | … |`) or a negative number. In a mixed table conflict (a normal row + a `--flag` row) the normal row kept the merge non-empty, so the flag row was silently deleted and the spec written as "✓ resolved". Detect the separator row structurally instead: `is_table_separator_row` is true only when every cell contains just dashes/colons (with at least one dash), so `| --debug | Enable debug |` is kept as data. Implements the reviewer's suggested fix. This closes the last easily-fixable table content-loss path. The remaining deferred item is narrower: the generic table resolver unions rows by first cell, so two DISTINCT rows sharing a first cell still collapse (union-by-key, intended for symbol-keyed API tables) — tracked as a follow-up. Tests: separator detection excludes dash-leading data/header rows; a mixed table conflict now preserves the `--debug`/`--quiet` rows on auto-resolve. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
There was a problem hiding this comment.
✅ Corvin says...
_
<(^\ .oO(Caw! ^v^)
|/(\
\(\\
" "\\
"That's a nice looking export you've got there."
CI Summary
| Check | Status |
|---|---|
| Validate action.yml | ✅ Passed |
| Dependency Audit | ✅ Passed |
| Code Coverage | ✅ Passed |
| Format Check | ✅ Passed |
| Docs Site | ✅ Passed |
| Spec Validation | ✅ Passed |
| Tests (build, test, clippy) | ✅ Passed |
| VS Code Extension | ✅ Passed |
📋 Spec Validation Details
✅ SpecSync: Passed
| Metric | Value |
|---|---|
| Specs checked | 60 |
| Passed | 60 |
| Errors | 0 |
| Warnings | 0 |
| File coverage | 100% (76/76) |
| LOC coverage | 100% (35336/35336) |
Generated by specsync · Run specsync check --format github to reproduce
Powered by corvid-pet
0xGaspar
left a comment
There was a problem hiding this comment.
Strong fix, and the right shape for it. Making the field resolver fire only for pure interior-field hunks (is_frontmatter_only + in_frontmatter), gating every row merge on is_pure_table_rows, and refusing to reconstruct swallowed --- fences all attack the root cause — reconstructing structure from parsed fields alone — rather than patching individual repros. The module-present backstop combined with merge writing only on Resolved means a risky resolution leaves the original byte-identical, which is exactly the fail-safe you want for a data-loss class. in_frontmatter via "<2 fence lines emitted" is sound because a hunk carrying a fence is already rejected upstream, so the count can't be fooled by a body/horizontal-rule ---. Coverage (14 unit + 2 integration, every prior repro verified falling to Manual) is convincing, and deferring union-by-first-cell table collapse is a reasonable scope line since it's not new data loss. LGTM.
The release now includes the two fixes merged after this branch was cut: #298 (files: path-escape / info disclosure, Security) and #297 (merge data-loss, Fixed). Merged main in so the release tag ships the actual code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m
* chore: release v4.6.1 Patch release rolling up the three post-4.6.0 correctness/security fixes: - Security: aiCommand is honored only from SPECSYNC_AI_COMMAND (malicious-repo RCE closed) — #294 - Fixed: non-UTF-8 source files no longer pass validation silently — #293 - Fixed: incremental check re-validates on schema/config change — #295 Bumps Cargo.toml + Cargo.lock and adds the dated CHANGELOG section. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m * Update: expand 4.6.1 changelog to cover all five fixes (#297, #298) The release now includes the two fixes merged after this branch was cut: #298 (files: path-escape / info disclosure, Security) and #297 (merge data-loss, Fixed). Merged main in so the release tag ships the actual code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m * Fix: flaky ai_command test race on SPECSYNC_AI_COMMAND env var `resolve_ai_provider` reads the process-global `SPECSYNC_AI_COMMAND` env var above the config `ai_command` tier. Two unit tests exercised those two tiers: `resolve_with_env_var` set the env var while `resolve_with_ai_command_in_config` assumed it unset. Env vars are shared across test threads, so when the two ran concurrently the config test could observe the leaked `env-ai-tool` value and fail `assert_eq!("env-ai-tool", "my-custom-ai")`. It passed locally and on main by scheduling luck but failed deterministically often enough to block CI on all three platforms. Serialize every test that touches the var through a shared `ENV_LOCK` mutex (poison-tolerant) and have the config test clear the var before asserting, so the two tiers are exercised in isolation regardless of thread interleaving. Verified: 2 env tests pass 25/25 in isolation and the full 666-test binary passes 6/6 under full parallelism. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KDJxU4R8hUEuq1Y5jzft5m --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Summary
Fixes a Critical data-loss bug found while dogfooding, and — after adversarial review probed deeper — several related pre-existing content-loss bugs in
specsync merge(the flagship auto-resolve-conflicts command). The unifying fix: only auto-resolve a conflict hunk when it can be merged losslessly; otherwise leave it for manual resolution with the file untouched.What was corrupting / losing content (all now closed)
---…---block resolved to loose/doubled/empty frontmatter →checksays "Frontmatter invalid".---+ body was rebuilt from parsed fields alone, deleting the body.---and a fence-carrying hunk produced an empty-frontmatter spec.##body hunk — a# Notes/intro conflict (no##heading) was routed to the frontmatter field resolver, dropping bullet lists and prose.|-rows).Fix
is_frontmatter_only: every line akey: value/- item) while genuinely inside the frontmatter (in_frontmatter: <2 fence lines emitted). Any hunk carrying a---, heading, or body → Manual.is_pure_table_rows(changelog + generic table), so a hunk that swallowed prose/section → Manual.module, downgrade to Manual.mergewrites only onResolved, so a risky resolution never overwrites the original.Not in scope (deferred, by-design)
Verified
versionbump → valid frontmatter (theirs wins); pure changelog rows → unioned chronologically.🤖 Generated with Claude Code