fix: 8 Round 11 audit findings (3 HIGH + 5 MED)#93
Merged
Conversation
Replace the direct ``with open(...) as f: w.write(f)`` loop with ``self._atomic_pdf_write(w, out_path, sources=[pdf_path])`` so the existing tempfile + os.replace path is reused and the same-source guard rejects an output path that collides with the input PDF. Without this, a split row whose output name matched the basename of the source (e.g. input ``part_1.pdf`` split into ``part_1.pdf``) would truncate the input mid-flight while pypdf still held lazy reads against it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The drop handler globbed both ``*.pdf`` and ``*.PDF``; on Windows and macOS HFS+ those patterns match the same files, so every PDF was opened twice. Switch to a single ``os.listdir`` pass with an extension filter so the walk yields each file once regardless of filesystem case sensitivity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Loading a PDF with sticky notes mirrors each one into
``self._pending`` (tagged ``_existing=True``) so the canvas can
render their bubbles. Without filtering, every gate that watches
``_pending`` mistook those mirrored entries for real user edits:
* ``closeEvent`` (in window.py) prompted "discard unsaved
changes?" the moment the user opened any PDF with notes
* ``_run`` skipped the no-pending guard and re-saved a byte-
identical copy
* The Forms-mode "has pending edits" warning fired on a
freshly-opened PDF
* The ``_pending_list`` UI listed pre-existing notes as if they
were unsaved edits
Add a ``_user_pending`` property that filters ``_existing`` entries
(while keeping user-driven ``delete_annot`` actions, which have no
flag) and route every gate through it. Also stop pushing existing
notes into ``_pending_list`` — that widget is for THIS session's
unapplied edits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Garbage input like ``abc`` or ``1-`` previously bubbled up the raw
``ValueError("invalid literal for int() with base 10: …")``, which
``show_error`` then masked behind "Something unexpected went wrong"
— hiding the actionable cause from the user.
Wrap the two ``int()`` calls in a ``try``/``except ValueError`` and
re-raise with the new translated ``tool.err.bad_page_input`` key
(localised across all 8 supported languages). The happy path,
out-of-range message, and dedupe/sort contract are unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous implementation collected every per-page Tesseract PDF output in ``pdf_pages = []`` and then walked the list to append each into a writer. For a 100-page job at 300 dpi this kept ~200– 500 MB of intermediate bytes resident before the final write — enough to OOM on machines tight on RAM. Append each page directly into the writer as soon as Tesseract returns it, then drop ``pix``, ``img``, and the per-page ``page_bytes`` locals. The writer's internal copies retain only what it actually needs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``_clear_recent`` wrote ``recent_files: []`` to disk but did not notify any open viewer, so the placeholder pane kept showing the stale entries until the next reload. Mirror the ``_load_and_track`` pattern: iterate ``self._viewers`` and invoke ``_refresh_recents`` on each (wrapped in ``contextlib.suppress`` so a deleted viewer doesn't abort the rest). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``os.path.isfile`` on a OneDrive Files-On-Demand placeholder triggers a placeholder rehydration, blocking the UI for seconds the first time the app opens. The recents validation runs at startup, so the freeze is the first thing the user sees. Switch the per-entry check to ``os.path.lexists``, which does not follow symlinks or force a download. Broken-symlink entries are acceptable here — they get pruned the moment the user actually tries to open them. When entries are dropped during the scan, write the trimmed list back through ``_update_config`` so the next startup has less to re-check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``PdfReader(self._doc_path)`` held an internal stream pinned by the reader object until garbage collection. On Windows that blocked any subsequent tool from renaming the same file (file lock still active), surfacing as cryptic "file in use" errors when chaining Forms apply with another tool. Wrap the source open in an explicit ``with open(...) as _src`` block and pass the stream to ``PdfReader``. All writer work (``append``, ``update_page_form_field_values``, ``write``, ``os.replace``) happens INSIDE the with-block so pypdf's lazy reads still see a live stream when they fire. On scope exit the file handle is released deterministically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
22 tests covering the 8 PR-G bugs:
* split.py atomic write + same-source rejection
* Drop-folder dedup on case-insensitive filesystems
* Editor _user_pending filter (closeEvent, _run, Forms warning,
_pending_list UI)
* parse_pages friendly error + happy-path regression guard
* OCR streaming writer (no more pdf_pages = [] accumulation)
* _clear_recent open-viewer refresh
* get_recent_files lexists + writeback (behavioral, monkeypatched
config path)
* _apply_forms with-open + writer-inside-block
Plus an i18n parity assertion that the new ``tool.err.bad_page_input``
key exists with a ``{text}`` placeholder in all 8 supported languages.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 11 review found that the _user_pending filter excluded all edits with _existing=True. But delete_annot edits for existing notes are also registered with _existing=True (PR-D fix #3 path), so they were silently dropped from save processing and closeEvent prompts -- defeating both PR-D and PR-F's note deletion persistence fixes. Now the filter excludes _existing edits EXCEPT for delete_annot, which represents the user's explicit intent to remove. Added two regression tests covering the bug surface. Also addresses minor review nits: removed unverified 200-500MB memory claim from OCR comment; clarified UNC SMB caveat for lexists in get_recent_files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| os.replace(tmp, out) | ||
| except Exception: | ||
| try: os.unlink(tmp) | ||
| except OSError: pass |
| if len(valid) != len([p for p in recents if isinstance(p, str)]): | ||
| try: | ||
| _update_config(lambda c: c.__setitem__("recent_files", valid)) | ||
| except Exception: |
|
|
||
| import json | ||
| import os | ||
| import tempfile |
| import os | ||
| import tempfile | ||
| from pathlib import Path | ||
| from unittest.mock import patch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes 8 bugs from audit Round 11, focused on dataloss prevention and UX consistency. Discovered and fixed a critical regression masking note deletions during adversarial review.
HIGH
with open(..., "wb") as f: w.write(f). Ifout_dir == dirname(input)and a split name collided with the input, the input was truncated before PdfWriter finished its lazy reads. Now uses_atomic_pdf_write(w, out, sources=[pdf_path])with same-source rejection.glob("*.pdf") + glob("*.PDF")returned identical entries (case-insensitive FS), opening each PDF twice. Replaced with singleos.listdir+ case-insensitive filter._existing=Truetriggered closeEvent prompts and_pending_listUI rows even when the user hadn't edited anything. New_user_pendingproperty filters them out for those checks, with one critical exception:delete_annotedits stay even when_existing=Truebecause they represent the user's explicit intent to remove an existing note (review caught a regression that would have silently dropped these deletions).MEDIUM
ValueError("invalid literal for int()")was surfaced raw viashow_erroras "Algo inesperado correu mal". Now catches and re-raises with translatedtool.err.bad_page_input(8 languages).pdf_pages = [] ; ... ; for each page accumulate bytes ; final writewas replaced with per-pagewriter.append(...). Reduces peak memory for long PDFs by avoiding the simultaneous-buffer pile-up (exact savings depend on pypdf's internal copy semantics).self._viewerscalling_refresh_recents.os.path.isfilehydrated OneDrive Files-On-Demand placeholders on every startup, freezing UI. Switched toos.path.lexists. Also persists the cleaned list (_update_config) so disappeared entries don't re-trigger the disk check on next launch. UNC SMB caveat documented in comment.PdfReader(self._doc_path)left an open file handle until GC, blocking subsequent operations on Windows. Wrapped inwith open(..., "rb") as _srcso the handle is released deterministically.i18n
1 new key x 8 languages:
tool.err.bad_page_input. Parity verified by test.Tests
tests/test_audit_r11.py— 24 tests (22 from initial implementation + 2 added during review for the critical regression). Mix of source-level and behavioral coverage.Adversarial review notes
The review caught a critical regression in commit 229b4e3 (editor _user_pending): the filter would have silently dropped
delete_annotedits for existing notes, defeating both PR-D and PR-F's annotation deletion persistence fixes. Fixed in commit c08c1dc with two new regression tests covering the bug surface. Also addressed minor cosmetic notes (unverified OCR memory claim, UNC SMB caveat).Validation
APP_VERSIONbump