Digest auto-detects research-style vs project-context (no new flag)#22
Open
ranjiao wants to merge 1 commit into
Open
Digest auto-detects research-style vs project-context (no new flag)#22ranjiao wants to merge 1 commit into
ranjiao wants to merge 1 commit into
Conversation
User context: 'I need to reference academic papers + industry tech reports periodically. Is the current /pmo digest suitable for this?' Analysis: current digest's schema (TL;DR + Key facts + Open questions + What PMO must remember + Section map) was designed for project-context docs (term sheets, internal constraints, screenshots). It's underspec'd for academic content where the reuse value lives in Method / Limitations / Reproducibility / Applicability, not in 'key facts.' User direction: do NOT add a --paper flag. Have the agent auto-detect the document type from signals in the document itself. Implementation: classify into 2 types at digest time, weighted-signal heuristic with explicit fallback. New: `## Document-type auto-detection` section in digests.md - 6 weighted signals (length, structural markers, citations density, file source / URL, writing style, section structure) - Threshold: ≥3 signals → high-confidence research-style; ≤1 → high-confidence project-context; =2 → ambiguous, AskUserQuestion - Logged in digest front-matter `Doc type:` field for audit + later manual override via --refresh New schema: `## Digest schemas (two variants share lifecycle)` - Project-context schema (existing) → state/digest_TEMPLATE.md unchanged - Research-style schema (new) → state/digest_paper_TEMPLATE.md - TL;DR (plain language, not the abstract verbatim) - Method (sample / setup / baseline / eval — judgeable + transferable) - Key results (number + context, not raw claim) - Limitations (author-acknowledged + reader-observed) - Reproducibility (data / code / clarity / barriers) - Applicable to this project? (verdict: use / partially / don't, + reason) - Followup citations (2-3 max with one-line reason each — NOT dump) - Open questions + What PMO must remember (carried over from existing schema) - Extra front-matter: Authors, Published, Source venue (arXiv:NNNN.NNNNN etc.) Plus a `### Why a different schema for research` paragraph explaining the rationale per section, so future contributors don't add Method back to the project-context schema or strip Method from research. Shared across both schemas (unchanged): - inputs/ → knowledge/<topic>/ flow - Status: active/archived/eternal/superseded lifecycle - archive_inactive_days candidate detection - knowledge/INDEX.md registration - <source>-digest.md naming convention Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User feedback: '我时不时需要参考学术论文 / 行业技术报告。现在的 digest 适合处理这类信息吗?'
Analysis: current digest schema (TL;DR + Key facts + Open questions + What PMO must remember + Section map) was tuned for project-context docs — term sheets, constraints, screenshots, internal notes. It underserves academic / industry research content where the reuse value lives in Method / Limitations / Reproducibility / Applicability, not in 'key facts.'
User direction: NO new flag. Have the agent auto-detect the document type from signals in the source itself, fall back to one AskUserQuestion when ambiguous.
Auto-detection logic (new § in digests.md)
Two classes:
project-context— existing default; term sheets, internal docs, constraints, screenshotsresearch-style— papers, industry tech reports, white papers, formal technical notesSix weighted signals computed at digest time:
research-styleif …<author>-<year>-<topic>.pdfpatternThreshold: ≥ 3 signals = high-confidence
research-style(proceed silently). ≤ 1 = high-confidenceproject-context(proceed silently). = 2 = ambiguous → oneAskUserQuestionwith the user picking once.Classification logged in digest's
Doc type:front-matter for audit + manual override via--refresh.Two schemas
Both share: front-matter convention, placement under
knowledge/<topic>/,Status: active | archived | eternal | supersededlifecycle, archive-candidate detection, INDEX registration.They differ in body sections only.
Project-context (unchanged)
state/digest_TEMPLATE.md— TL;DR + Key facts + Open questions + What PMO must remember + Section map.Research-style (NEW)
state/digest_paper_TEMPLATE.md:use/partially/don't) + reason + landing pointEach section has a stated rationale in a
### Why a different schema for researchparagraph — so future edits don't drift the boundary in either direction.Why this is better than --paper flag
inputs/, run/pmo digest <path>, agent does the right thingFiles changed
pmo/reference/digests.md: +144 lines (auto-detection signals, two-schema description, rationale)pmo/state/digest_paper_TEMPLATE.md: new file, 76 linespmo/state/digest_TEMPLATE.md: unchanged (still the project-context template)/pmo digest; type-detection is a reference-internal concern)Test plan
arxivin URL, 12 pages, abstract+references) → confirm agent silently picks research-style and writes the 6-section schema.Doc type:front-matter field is set correctly.🤖 Generated with Claude Code