Add anonymizer tool #166

sfraczek · 2025-11-24T22:23:38Z

Introduce a Rust tool for detecting and replacing personally identifiable information (PII) in PDF files. The tool supports both detection and replacement modes while ensuring the original stream size is preserved. Documentation and usage instructions are included.

add use std::error::Error return Err instead of Ok after read_pdf

jczaja · 2025-12-08T07:56:49Z

src/anonymizer/anonymizer.rs

+
+/// Parse arguments and dispatch to detect / replace logic. Returns Ok even
+/// for usage errors (prints help) to keep CLI simple.
+pub fn run(args: Vec<String>) -> Result<(), Box<dyn Error>> {


For consistency with rest of project and ease of maintainance I would suggest using clap (ideally wich derive feature) to handle commandline. I have a branch where clap is bumped up to most recent version. So this can be used as an example

Change the logic so that the search for each AnchorOffset happens from the beginning. This makes algorithm simple, allows for easy configurtion change. It's more maintainable.

…onymizer

Updated regex pattern to properly capture strings containing escaped parentheses ($ and $) in PDF streams. Implemented PDF 1.3 spec-compliant unescape function supporting all escape sequences (\n, \r, \t, \b, \f, $, $, \\, and \ddd octal). Now extracts 618 texts instead of 529, including previously missing 'NET CREDITS/(DEBITS)' and other parenthesized strings like dollar amounts and date ranges. Added comprehensive unit tests.

$@sfraczek$

Add anonymizer tool

dc6239d

$@sfraczek$ sfraczek requested a review from jczaja November 24, 2025 22:24

$@sfraczek$ sfraczek added the enhancement New feature or request label Nov 24, 2025

$@sfraczek$ sfraczek marked this pull request as draft November 24, 2025 22:25

sfraczek added 2 commits November 29, 2025 22:49

$@sfraczek$

also detect #id

d863116

$@sfraczek$

add list mode

d24aa57

add use std::error::Error return Err instead of Ok after read_pdf

jczaja reviewed Dec 8, 2025

View reviewed changes

sfraczek added 5 commits December 10, 2025 22:29

$@sfraczek$

omit forgotten std::error in detect.rs

df1e0ba

$@sfraczek$

extract code to fn find_putput_path

63bcdee

$@sfraczek$

extract path modification function to anonymizer/path.rs

4664a76

$@sfraczek$

use path ref broadly

33611d8

$@sfraczek$

switch to clap

7514747

$@sfraczek$ sfraczek force-pushed the sfraczek/anonymizer branch from 775c0da to 7514747 Compare December 10, 2025 22:04

sfraczek added 8 commits December 12, 2025 21:26

$@sfraczek$

use Path and PathBuf and minor improvements

4606fc3

$@sfraczek$

add anchor for recipient data

6aa3dba

$@sfraczek$

rename anchors

c194f7b

$@sfraczek$

nicely format anchors

bd9e5bb

$@sfraczek$

Use just AnchorOffset struct

090829c

$@sfraczek$

Cargo.lock

ed5e43e

$@sfraczek$

add todo to readme. remove ref to screenshots

d78ccaf

$@sfraczek$

bring back accidentaly removed line from REUSE.toml

816af4d

$@sfraczek$ sfraczek force-pushed the sfraczek/anonymizer branch from c7fdef8 to 816af4d Compare December 13, 2025 18:12

sfraczek added 5 commits December 13, 2025 20:21

$@sfraczek$

simplifying detect logic

295d856

Change the logic so that the search for each AnchorOffset happens from the beginning. This makes algorithm simple, allows for easy configurtion change. It's more maintainable.

$@sfraczek$

refactor detect logic for improved maintainability and configurability

8d0f370

$@sfraczek$

update license headers and enhance module documentation across the an…

a0dc3f4

…onymizer

$@sfraczek$

rustfmt

45fa804

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add anonymizer tool #166

Add anonymizer tool #166

Uh oh!

$@sfraczek$ sfraczek commented Nov 24, 2025

Uh oh!

jczaja Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add anonymizer tool #166

Are you sure you want to change the base?

Add anonymizer tool #166

Uh oh!

Conversation

sfraczek commented Nov 24, 2025

Uh oh!

jczaja Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

$@sfraczek$ sfraczek commented Nov 24, 2025