Releases: pageseeder/diffx
Release 1.3.2
Highlights
This release strengthens XML parsing security (notably around XXE/entity expansion), improves loader extensibility via custom TextTokenizer support, and includes internal refactors for cleaner, more maintainable loader implementations.
New Features
- Consistent token sourcing in matching logic: matching tokens are now consistently taken from the
tosequence (instead offrom) to ensure predictable behavior. - Custom
TextTokenizersupport across all loaders: all loader implementations can now be configured with a customTextTokenizerto allow loaders to generate differentTextTokendepending on their requirements
Security / Hardening
- XXE and entity expansion protections
SAXLoaderhardening: improved defaultXMLReaderfactory behavior and XXE prevention.DOMLoader: prevents entity expansion (note: this may be behavior-changing in some XML inputs).
Improvements
- Documentation updates: improved
XMLLoaderJavadoc with clearer guidance on usage and thread-safety expectations.
Refactoring & Maintenance
- Extracted
LoadSessionfromDOMLoaderto improve separation of concerns and modularity. XMLEventLoadernow reuses an existingtextTokenizerwhen available.- Added support for a custom
XMLReaderfactory inSAXLoader.
Potential Breaking Changes / Migration Notes
DOMLoaderentity expansion disabled- If you previously relied on expanded entities during DOM loading, behavior may differ. Consider adjusting input XML or parser configuration accordingly.
- Matching tokens now sourced from
to- If downstream logic implicitly depended on the old
from-sourced behavior, validate matching results after upgrading.
- If downstream logic implicitly depended on the old
Full Changelog: v1.3.1...v1.3.2
Release 1.3.1
Highlights
- Equality-aware diffing: Algorithms can now accept a custom equality function, enabling comparisons that go beyond default element equality (e.g., case-insensitive matching or domain-specific equivalence).
- Improved test coverage: Expanded and unified equality-based testing across algorithms to ensure consistent behavior.
Added
- Algorithm support for an equality function to customize how items are considered equal during comparisons.
- New unit tests for
Actionsto increase coverage and validate behavior. - Additional unit tests for
BasicEqualityAlgorithmTest, including integrating equality-focused test cases across all algorithms.
Changed
MyersGreedyAlgorithmconstructor is now public, making it easier to instantiate and use directly in external code.ActionsBuffernow uses generics, simplifying type usage and reducing boilerplate; related tests were simplified accordingly.BasicEqualityAlgorithmTestnow uses case-insensitive equality, improving robustness for text comparisons where case should be ignored.
Fixed / Maintenance
- Removed unused imports to keep the codebase clean and reduce warnings:
TextOnlyProcessorfromProfilersIOExceptionfromFormatComparisonTestArraysfromAttributeXMLTokenTest
Notes for Upgraders
- If you instantiate
MyersGreedyAlgorithmoutside the package, the public constructor should remove previous access barriers.
Full Changelog: v1.3.0...v1.3.1
Release 1.3.0
Breaking changes
- Removed deprecated
getEventsmethod (BREAKING). - Removed deprecated
getOpenElementmethod (BREAKING). - Aligned
getValuebehavior withXMLTokencontract: removed getValue methods that returnednull, which was inconsistent with the XMLToken interface (BREAKING).
Bug fixes
- Fixed inconsistency where some
getValueimplementations returned null despite theXMLTokencontract.
Improvements / Maintenance
Dependency & tooling updates
- Bumped
pso-xmlwriterto 1.1.1. - Updated JUnit and JReleaser versions.
Nullability & annotations
- Replaced jetbrains-annotations dependency by jspecify.
- Adopted jspecify
@Nullableannotations across algorithm, action, and token packages. - Removed unused
@ApiStatus.Experimentalannotation from XMLEventBalancer.
API / docs
- Added default getValue implementations for ElementToken.
- Clarified usage documentation, fixed missing Javadoc and applied minor Javadoc improvements.
Code quality & refactoring
- Simplified code and cleaned up warnings (including suppressions in Main and KumarRanganAlgorithm to match source-paper naming).
- Replaced legacy collection helpers with modern factory methods
- Test suite cleanup: improved naming consistency, removed unused imports, reduced redundant modifiers, and made constants static where appropriate.
Migration notes (for consumers)
- If you used
getEvents()orgetOpenElement(), migrate to the supported replacements (the deprecated methods are now removed). - If you relied on
getValue()returningnull, update call sites to match the non-null/contract-consistent behavior (or handle absence via the new recommended approach in your API usage).
Full Changelog: v1.2.4...v1.3.0
Release 1.2.4
This release includes a bugfix for whitespace handling, improvements to the configuration handling, and several code quality enhancements.
Bugfixes
- Whitespace Handling: Fixed an issue where leading spaces were not being stripped correctly in certain contexts within the component. This ensures proper handling of mixed content with inline elements.
- Improved the whitespace context management with a more robust implementation of the method
replaceByTrailing - Added additional test cases to validate the fix for mixed content scenarios
- Improved the whitespace context management with a more robust implementation of the method
Improvements
- Configuration Handling: Enhanced the
DiffConfigclass with proper equality and hashcode methods- Fixed the
equals()method to correctly compare all configuration properties - Improved the
hashCode()implementation for better performance and correctness - Added support for the
allowDoctypeDeclarationproperty in equality checks
- Fixed the
Full Changelog: v1.2.3...v1.2.4
Release 1.2.2
New Features
XML Balance Checking
- XMLBalanceCheckFilter: Added a new filter to ensure that XML tokens are properly balanced in the DiffHandler processing pipeline
- Tracks start and end element pairs to verify they match correctly
- Detects and reports XML structure imbalances including extra or missing elements
- Provides diagnostics through the method and detailed error reporting
isBalanced() - Useful for validating XML integrity during difference operations
API Improvements
Enhanced XMLToken Interface
- Improved Null Safety: Added annotations throughout the API to prevent null pointer exceptions
@NotNull - Standardized Documentation: Enhanced Javadoc clarity with consistent descriptions and improved parameter/return value documentation
- New Default Methods: Added the method to simplify whitespace detection across token types
isWhitespace() - Clarified Contract: Improved documentation on equals/hashCode implementation requirements for better performance
Whitespace Handling
- Simplified Whitespace Detection: Replaced custom
isWhiteSpacemethods with the standardized approach in both andtoken.isWhitespace()``WhitespaceStripper``ExtendedWhitespaceStripper - Enhanced Processing Logic: Improved the whitespace processing algorithm for more consistent results
- Better Edge Case Handling: Added test cases for complex mixed content scenarios
- Fixed Context Management: Implemented more robust context tracking for accurate whitespace preservation
Full Changelog: v1.2.1...v1.2.2
Release 1.2.1
New Features
XML Processing Improvements
- XMLEventBalancer (Beta): Added new implementation to ensure balanced XML structure in DiffHandler operations. This experimental component ensures well-formed XML during diff operations by maintaining properly paired start and end elements.
Core Functionality
- NoOpFilter: Added implementation for transparent operation forwarding in DiffHandler. This filter passes operations through without modifications, providing a clean way to chain handlers.
Maintenance and Improvements
Build System Enhancements
- Replaced Maven publishing scripts with JReleaser configuration
- Refactored build scripts to use centralized dependencies management
- Updated wrapper scripts for compliance and robustness
- Added SPDX license headers to script files
- Improved JAVA_HOME validation in scripts
Documentation
- Enhanced Javadoc for key methods and classes across the project
- Added detailed parameter annotations to improve code clarity
Testing
- Added unit tests for
ExtendedWhitespaceStripperto verify various whitespace handling scenarios
Code Quality
- Improved Maven publishing configuration to use assignment syntax for task descriptions and credentials
- Removed unused import statements
Compatibility
This release maintains compatibility with Java 11 and later versions. The library continues to provide efficient differencing algorithms specifically optimized for XML structures.
Notes
The XMLEventBalancer is currently marked as experimental (beta) and subject to change in future releases.
Release 1.2.0
Breaking changes
Now requires Java 11
New Features
- Document Tokens: Added
StartDocumentTokenandEndDocumentTokenclasses to represent XML document boundaries - Sequence Processing: Introduced
SequenceProcessorinterface withExtendedWhitespaceStripperimplementation for configurable whitespace handling in XML sequences - Similarity Metrics: Added new similarity measurement capabilities:
- Implemented
XMLElementSimilarityclass with length-based boosting and child stream similarity - Added
StreamSimilarityinterface with Edit, Jaccard, and Cosine similarity implementations
- Implemented
Code Improvements
-
Dependency Updates:
- Upgraded to Java 11 and configured toolchain for compatibility
- Updated to Gradle 8.13 with improved distribution validation
- Updated JUnit dependencies to use BOM (Bill of Materials) for version alignment
-
Refactoring:
- Refactored
SAXLoaderfor improved XML reader handling - Refactored
XMLElementfor better content handling - Renamed
XMLElementSimilaritytoElementSimilaritywith improved method names - Replaced
SimilarityFunctionwithSimilarityinterface (old interface deprecated)
- Refactored
-
API Changes:
- Deprecated
getChildrenmethod inXMLElementin favor ofgetContent - Deprecated
setXMLReaderClassmethod - Removed debug flags for cleaner codebase
- Deprecated
-
Null Safety:
- Added
@NotNullannotations totoXMLmethod parameters - Added
@NotNullannotations toNamespaceSet.addmethod parameters - Enhanced exception handling throughout the codebase
- Added
v1.1.2
Release 1.1.2
New Features
- Wagner-Fischer Algorithm: Added similarity-based diffing using the Wagner-Fischer algorithm for improved text comparison capabilities
- Whitespace Handling Utility: Added a new utility class to strip whitespace from specified list of elements
Bug Fixes
- Fixed constructor to properly respect the namespace-aware parameter
- Fixed potential bug in KumarRanganAlgorithm implementation
Code Improvements
- Refactored NilToken to use a singleton pattern for better memory efficiency
- Improved ElementToken and its default implementation
- Added constructor and enhanced annotations in Sequence class
- Refactored stack usage with Deque for better performance in isWellFormed method
- Standardized static final field declarations across the codebase
- Renamed "open" to "start" in XML element handling logic for improved clarity
- Added private constructor to Actions utility class to prevent instantiation
- Simplified empty checks by utilizing the isEmpty method
Documentation and Testing
- Improved overall documentation with better comments and explanations
- Added
@versionand@sincetags for better version tracking - Added comprehensive unit tests for SimilarityWagnerFischerAlgorithm
- Enhanced code with
@Overrideand@NotNullannotations for better type safety
Full Changelog: 1.1.1...v1.1.2
Release 1.1.1
- Fixes issues in
PostXMLFixerwhere some element could be left unclosed. - Improved support for Unicode characters in
TokenizerBySpaceWord
Release 1.1.0
The focus of this release was to address a number of security vulnerabilities, in particular XML eXternal Entity injection (XXE) in the code.
Although, XXE issues could easily be mitigated by filtering the XML input before submitting to diffx, we changed the default configuration to be secure by default and disabled loading external entities and DTDs as outlined in https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
If for some reason, you need to use the DTD, you can set the allowDoctypeDeclaration boolean option to true in the DiffConfig.