Skip to content

Conversation

@vanbroup
Copy link
Member

@vanbroup vanbroup commented Jan 3, 2026

This pull request introduces comprehensive improvements to the PDF library, focusing on modernizing the codebase, enhancing security and performance, and improving testing and CI practices. Key changes include a major update to the documentation, the addition of advanced benchmark and security corpus tests, and the introduction of a GitHub Actions CI workflow. Several new test files have been added to ensure correctness and robustness of encryption and filter logic.

Documentation and Project Metadata Updates:

  • Major rewrite of README.md to document the library's new high-performance, zero-allocation AST, improved security features (AES-128/256 support), robust error handling, memory efficiency, and benchmark results.
  • Added go.mod to define the module path as github.com/digitorus/pdf and set Go version to 1.23.

Continuous Integration and Testing:

  • Introduced a new GitHub Actions workflow (.github/workflows/ci.yml) to automate testing, security corpus validation, and cross-platform builds for multiple OS/architectures.

Benchmarking and Security Testing:

  • Added benchmark_test.go to provide benchmarks for object resolution and full object parsing, enabling performance regression tracking.
  • Added corpus_test.go to enable large-scale security and robustness testing using PDF Association corpora, with support for automatic corpus downloading and panic recovery to ensure no malformed input can crash the library.

Unit Testing for Core Features:

  • Added encryption_test.go with comprehensive tests for cryptographic key derivation, RC4/AES decryption, PDF 2.0 authentication, and stream decryption logic.
  • Added filter_test.go to verify the correctness of core PDF stream filters (ASCIIHexDecode, ASCII85Decode, FlateDecode).

@vanbroup vanbroup force-pushed the feature/v2-refactor-performance branch from c8e9221 to 9b720ff Compare January 3, 2026 20:05
…t coverage

## Performance Improvements
- Optimized object caching and resolution
- Improved xref stream parsing efficiency
- Added benchmark tests for performance tracking

## Security Hardening
- Added panic recovery to Page.Content() for malformed content streams
- Tested against 2,700+ PDFs from PDF Association corpora with no crashes
- Malicious/malformed inputs return errors instead of panicking

## Test Coverage (50% → 77%)
- Added comprehensive unit tests for all core modules
- Added corpus security test with on-demand download from:
  - veraPDF corpus (2,694 files)
  - BFO PDF/A test suite (24 files)
  - PDF Cabinet of Horrors (24 files)

## CI/CD
- Added GitHub Actions workflow with test, corpus-test, and build jobs
- Cross-platform builds (linux/darwin/windows × amd64/arm64)

## API Changes
- Extracted types.go for cleaner Value/Object API
- Added GetObject() method for direct object access
- Added Xref() method to expose cross-reference table
@vanbroup vanbroup force-pushed the feature/v2-refactor-performance branch from 9b720ff to 11f414e Compare January 3, 2026 20:06
@vanbroup vanbroup merged commit c52ca1c into main Jan 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants