Skip to content

Conversation

@sfraczek
Copy link
Collaborator

Introduce a Rust tool for detecting and replacing personally identifiable information (PII) in PDF files. The tool supports both detection and replacement modes while ensuring the original stream size is preserved. Documentation and usage instructions are included.

@sfraczek sfraczek requested a review from jczaja November 24, 2025 22:24
@sfraczek sfraczek added the enhancement New feature or request label Nov 24, 2025
@sfraczek sfraczek marked this pull request as draft November 24, 2025 22:25
add use std::error::Error

return Err instead of Ok after read_pdf

/// Parse arguments and dispatch to detect / replace logic. Returns Ok even
/// for usage errors (prints help) to keep CLI simple.
pub fn run(args: Vec<String>) -> Result<(), Box<dyn Error>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with rest of project and ease of maintainance I would suggest using clap (ideally wich derive feature) to handle commandline. I have a branch where clap is bumped up to most recent version. So this can be used as an example

@sfraczek sfraczek force-pushed the sfraczek/anonymizer branch from 775c0da to 7514747 Compare December 10, 2025 22:04
@sfraczek sfraczek force-pushed the sfraczek/anonymizer branch from c7fdef8 to 816af4d Compare December 13, 2025 18:12
Change the logic so that the search for each AnchorOffset happens from the beginning. This makes algorithm simple, allows for easy configurtion change. It's more maintainable.
Updated regex pattern to properly capture strings containing escaped
parentheses (\( and \)) in PDF streams. Implemented PDF 1.3 spec-compliant
unescape function supporting all escape sequences (\n, \r, \t, \b, \f, \(,
\), \\, and \ddd octal).

Now extracts 618 texts instead of 529, including previously missing
'NET CREDITS/(DEBITS)' and other parenthesized strings like dollar
amounts and date ranges. Added comprehensive unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants