feat: chapter-based narrative + drop log/encyclopedia (ADR-015, PR 1) by ditvor · Pull Request #58 · ditvor/trailstory

ditvor · 2026-05-27T19:16:55Z

Summary

PR 1 of the ADR-015 rollout — migrates the narrative data model from the flat paragraphs + selected_photo_indices shape to a six-chapter envelope so the four upcoming Trailpath layouts (Zine / Sunday / Postcard / Album) can consume their native input.

BREAKING: NarrativeOutput.chapters: list[Chapter] (exactly 6) replaces flat paragraphs and selected_photo_indices as canonical state. schema_version bumps from 3 to 4. Each chapter binds one photo via photo_index and carries its own id + time + place + lat/lon + tri-lingual title + sentence-leveled body. paragraphs_as_localized() becomes a computed view over chapters[*].body; selected_photo_indices survives as a read-only @property.
Style.log and Style.encyclopedia removed — never surfaced in the picker, never shipped to users. v0 product decision is to ship the five Trailpath styles (Letter + the four new ones in subsequent renderer PRs).
narrative_max_tokens default raised 4096 → 8192 to fit the new chapter JSON envelope (case 04 was truncating at 4096).
Writer prompt rewritten with the chapter envelope; previous Phase 4 skeleton preserved as a dated comment.
Eval rubric refit (drops indices_valid, adds chapter_count_is_six + chapter_photo_binding_valid, loosens paragraph-count check to 3-6 to fit six chapter bodies).
All four eval cases refreshed via make eval-update-golden (paid); follow-up make eval-live confirms judge non-regression (threshold 1.0) across every axis on every case.

Letter (editorial) template walks chapter bodies with no visible-output change vs the PR-57 baseline. Phase 0 of ADR-015 docs.

Eval score tables

All four cases pass rubric and judge non-regression vs the fresh ADR-015 goldens (threshold 1.0). Score deltas reflect single-run model noise, not real regression — they're vs the goldens written 1 minute earlier.

Rubric (every check on every case)

```
schema_validates PASS
chapter_count_is_six PASS (6 chapters)
chapter_photo_binding_valid PASS (6 unique indices, all in range)
paragraph_count_3_to_5_each_lang PASS (en=6, ru=6, de=6)
russian_actually_cyrillic PASS
word_count_ratio_en_ru_in_0_7_to_1_4 PASS (~0.79-0.85)
word_count_ratio_en_de_in_0_7_to_1_4 PASS (~0.92-1.03)
title_under_60_chars PASS
subtitle_under_90_chars PASS
milestone_under_30_chars PASS
pull_quote_drawn_from_body PASS (overlap >= 0.82)
```

Judge (eval-live vs the freshly-written goldens; 0-5; Δ from same-run noise)

Case	warmth	narrative_arc	russian_fidelity	photo_selection	faithfulness
01-fixture-baseline	4.50 (+0.50)	4.50 (0.00)	4.50 (0.00)	3.50 (0.00)	2.08 (-0.24)
02-joyful-summit	4.00 (-0.50)	4.00 (-0.50)	4.50 (0.00)	3.00 (-0.50)	2.67 (-0.16)
03-exhausted-foggy	4.50 (0.00)	4.50 (0.00)	4.50 (0.00)	3.00 (-0.50)	3.03 (+0.24)
04-bad-tolz-family	4.00 (0.00)	4.00 (0.00)	4.50 (0.00)	4.00 (-0.50)	3.42 (-0.44)

All cases: judge non-regressing (threshold 1.00).

The judge consistently notes that the writer's photo binding is "evenly-spaced and mechanical" (e.g. [0,2,4,6,8,10]) — a follow-up tuning opportunity for ADR-006's Option C (per-style prompt suffix) or a chapter-aware photo-selection rubric. Not a PR 1 gate.

Test plan

`make ci` clean: ruff + ruff format + mypy + pytest (353 pass, 92% coverage)
Editorial visible output unchanged: `tests/golden/test-render-editorial.html` regenerated; structural assertions (`@font-face`, oklch tokens, display+eyebrow+margin+quote+elevation classes, all three `data-lang` buttons) hold
Carousel still works for the new fixed-6 binding (8 slides: 1 title + 6 photos + 1 quote)
Paid eval refresh: `make eval-update-golden` (writer × 4 + judge × 4) + `make eval-live` (writer × 4 + judge × 4), both pass
Reviewer: skim ADR-015 — the load-bearing decisions (A2 chapters canonical, A-writer single pass, six chapters exact, no rev-geocoder for `place`) are documented there

Out of scope for this PR

The four new style templates (Zine / Sunday / Postcard / Album) — separate PRs per ADR-015 rollout sequence.
Reverse geocoding for chapter `place` — v0 falls back to `HikeInput.location_name`. Follow-up ADR.
Builder-side chapter edit mode (per-chapter accept/edit/remove) — Phase 4.1 / 4.2 of the faithfulness initiative.
Per-style register tuning (ADR-006 Option C escape hatch) — wait for real usage signal first.

🤖 Generated with Claude Code

…lopedia (ADR-015) BREAKING: NarrativeOutput.paragraphs + selected_photo_indices are replaced by a six-Chapter envelope. Each chapter binds one photo and carries its own id, time, place, lat/lon, tri-lingual title, and sentence-leveled body. Schema bumps 3->4; every prior cache entry invalidates. The four upcoming Trailpath layouts (Zine, Sunday, Postcard, Album) consume this shape natively in subsequent renderer PRs; the Letter (editorial) template walks chapter bodies with no visible-output change vs PR-57. The legacy Style.log and Style.encyclopedia values are removed: they never surfaced in the picker and the v0 product decision is to ship the five Trailpath styles only. - trailstory/models.py: new Chapter; NarrativeOutput.chapters (exactly CHAPTER_COUNT=6) replaces flat paragraphs + selected_photo_indices. paragraphs_as_localized() becomes a computed view over chapters[*].body; selected_photo_indices survives as a read-only @Property. Style enum collapses to editorial only. schema_version=4. - trailstory/llm/prompts.py: writer JSON skeleton rewritten for the six-chapter contract; previous (Phase 4) skeleton preserved as a dated comment. - trailstory/llm/narrative.py: verifier walks chapters[*].body. - trailstory/config.py: narrative_max_tokens 4096->8192 (longer output for the new shape). - tests/eval/rubric.py: drops indices_valid; adds chapter_count_is_six + chapter_photo_binding_valid; loosens paragraph_count_3_to_5_each_lang to 3-6 to fit the computed view. - tests/eval/golden/*.json: refreshed under the new shape via make eval-update-golden; eval-live confirms judge non-regression (threshold 1.0) across every axis on every case. - templates/styles/editorial.html.j2: walks narrative.chapters and emits per-chapter body with sentence-level provenance spans. - templates/styles/log.html.j2 + encyclopedia.html.j2: deleted. - tests/golden/test-render-editorial.html: regenerated. - web/copy.py, web/pipeline.py, web/routes.py, web/dev.py, Makefile: drop log + encyclopedia references; web/dev.py fake narrative rebuilt as six chapter envelopes. - CHANGELOG.md, CLAUDE.md: BREAKING entry + decision register 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: chapter-based narrative + drop log/encyclopedia (ADR-015, PR 1)#58

feat: chapter-based narrative + drop log/encyclopedia (ADR-015, PR 1)#58
ditvor wants to merge 1 commit into
developfrom
claude/flamboyant-chaplygin-6763f9

ditvor commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ditvor commented May 27, 2026

Summary

Eval score tables

Rubric (every check on every case)

Judge (eval-live vs the freshly-written goldens; 0-5; Δ from same-run noise)

Test plan

Out of scope for this PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant