Skip to content

Commit 8e2f222

Browse files
committed
Simplify logic
1 parent 880cb0d commit 8e2f222

24 files changed

+790
-1486
lines changed

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"name": "bibtools",
1414
"source": "./",
1515
"description": "A bibliography toolkit for LaTeX",
16-
"version": "1.7.4",
16+
"version": "1.7.5",
1717
"keywords": ["bibtex", "bibliography", "latex", "overleaf", "academic", "reference", "citation"],
1818
"category": "academic",
1919
"license": "MIT"

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "bibtools",
33
"description": "A bibliography toolkit for LaTeX",
4-
"version": "1.7.4",
4+
"version": "1.7.5",
55
"author": {
66
"name": "Yunguan Fu"
77
},

README.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,38 @@ Or in Claude Code, use the slash command: `/bibtidy refs.bib`
6363

6464
bibtidy verifies each entry against [Google Scholar](https://scholar.google.com/) and [CrossRef](https://search.crossref.org/), fixes errors, and upgrades stale preprints to published versions. Every change includes the original entry commented out above so you can compare or revert, plus one or more `% bibtidy:` URL lines for verification. We recommend using git to track changes. If using [Overleaf](https://www.overleaf.com/), this can be done with [git sync](https://docs.overleaf.com/integrations-and-add-ons/git-integration-and-github-synchronization). To remove bibtidy comments after review, ask your agent to remove all `bibtidy` comments from the file.
6565

66-
Note that bibtidy assumes standard brace-style BibTeX like `@article{...}`. Parenthesized forms like `@article(...)` are not supported; convert them to brace style first.
66+
Note that bibtidy assumes standard brace-style BibTeX like `@article{...}`. Parenthesized forms like `@article(...)` are not supported. Special blocks such as `@string`, `@preamble`, and `@comment` are ignored by the parser.
67+
68+
### How it works
69+
70+
bibtidy walks each entry through a bounded state machine. Every entry has a **web-search budget of 1**, spent at most once across two possible waves:
71+
72+
```mermaid
73+
flowchart TD
74+
P1["Phase 1: duplicates.py (exact/subset, lossless)"]
75+
P2["Phase 2: compare.py fetches CrossRef candidates"]
76+
HAS{"candidates?"}
77+
WA["Wave A web search<br/>(mandatory, budget spent)"]
78+
P3{"Phase 3: agent decides per entry"}
79+
BUDGET{"budget spent?"}
80+
WB["Wave B web search<br/>(budget spent)"]
81+
DECIDE2["decide again with combined info"]
82+
REVIEW["add '% bibtidy: REVIEW' comment<br/>with URLs, bib entry unchanged"]
83+
PATCH["build fix patch (or no-op)"]
84+
P4["Phase 4: duplicates.py (post-fix)<br/>+ manual near-duplicate review"]
85+
86+
P1 --> P2 --> HAS
87+
HAS -- yes --> P3
88+
HAS -- no --> WA --> P3
89+
P3 -- confident --> PATCH
90+
P3 -- not confident --> BUDGET
91+
BUDGET -- no --> WB --> DECIDE2 --> PATCH
92+
BUDGET -- yes --> REVIEW
93+
PATCH --> P4
94+
REVIEW --> P4
95+
```
6796

97+
Each entry ends in one of four states: **Clean** (no change, no comment), **Fix** (patch applied with URLs + explanation), **Not found** (hallucinated, entry commented out), or **Review** (budget spent, entry unchanged, comment added for human attention).
6898

6999
### Examples
70100

@@ -390,7 +420,7 @@ You shouldn't, and that's by design. The point of bibtidy is to surface potentia
390420

391421
**Why does bibtidy flag so many page number errors?**
392422

393-
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree, for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). When sources conflict, bibtidy applies the DOI-linked version and flags the entry with `% bibtidy: REVIEW` so you can verify.
423+
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree, for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). bibtidy applies the DOI-linked version; you can verify via the DOI URL included in the `% bibtidy:` comments.
394424

395425
## License
396426

docs/build.py

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ def parse_entries(text: str) -> list[dict]:
3535
while i < len(lines):
3636
line = lines[i]
3737
stripped = line.strip()
38-
# Skip @string, @preamble, and blank lines
39-
if not stripped or re.match(r"^@(string|preamble)", stripped, re.IGNORECASE):
38+
# Skip @string, @preamble, @comment, and blank lines
39+
if not stripped or re.match(r"^@(string|preamble|comment)\b", stripped, re.IGNORECASE):
4040
i += 1
4141
continue
4242

@@ -134,7 +134,7 @@ def classify_entry(bibtidy_comments: list[str], diff: list[tuple[str, str]]) ->
134134
return "badge-ok", "unchanged"
135135

136136

137-
_URL_RE = re.compile(r"(https?://[^\s,;)\"'&{}]+)")
137+
_URL_RE = re.compile(r"https?://[^\s,;)\"'{}<]+")
138138

139139

140140
def escape_html(s: str) -> str:
@@ -143,10 +143,17 @@ def escape_html(s: str) -> str:
143143

144144
def linkify(s: str) -> str:
145145
"""Escape HTML and convert URLs to clickable links."""
146-
escaped = escape_html(s)
147-
return _URL_RE.sub(
148-
r'<a href="\1" target="_blank" rel="noopener" style="color:inherit;text-decoration:underline">\1</a>', escaped
149-
)
146+
parts = []
147+
last = 0
148+
for match in _URL_RE.finditer(s):
149+
parts.append(escape_html(s[last : match.start()]))
150+
url = escape_html(match.group(0))
151+
parts.append(
152+
f'<a href="{url}" target="_blank" rel="noopener" style="color:inherit;text-decoration:underline">{url}</a>'
153+
)
154+
last = match.end()
155+
parts.append(escape_html(s[last:]))
156+
return "".join(parts)
150157

151158

152159
def render_diff_row(typ: str, line: str) -> str:
@@ -599,17 +606,6 @@ def main() -> None:
599606
for key, inp in input_entries.items():
600607
if key in seen_keys:
601608
continue
602-
# Find bibtidy comments for this key in expected text
603-
bibtidy_comments = []
604-
for line in expected_text.splitlines():
605-
stripped = line.strip()
606-
if (
607-
stripped.startswith("% bibtidy:")
608-
and key in expected_text[expected_text.index(stripped) : expected_text.index(stripped) + 500]
609-
):
610-
bibtidy_comments.append(stripped)
611-
break
612-
# Simpler approach: scan expected text for bibtidy comments near the commented-out entry
613609
bibtidy_comments = []
614610
exp_lines = expected_text.splitlines()
615611
for idx, line in enumerate(exp_lines):

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "bibtools"
3-
version = "1.7.4"
3+
version = "1.7.5"
44
description = "A bibliography toolkit for LaTeX, built as agent skills"
55
requires-python = ">=3.10"
66
license = "MIT"

0 commit comments

Comments
 (0)