Skip to content

Releases: pageseeder/diffx

Release 1.3.2

16 Feb 10:59

Choose a tag to compare

Highlights

This release strengthens XML parsing security (notably around XXE/entity expansion), improves loader extensibility via custom TextTokenizer support, and includes internal refactors for cleaner, more maintainable loader implementations.

New Features

  • Consistent token sourcing in matching logic: matching tokens are now consistently taken from the to sequence (instead of from) to ensure predictable behavior.
  • Custom TextTokenizer support across all loaders: all loader implementations can now be configured with a custom TextTokenizer to allow loaders to generate different TextToken depending on their requirements

Security / Hardening

  • XXE and entity expansion protections
    • SAXLoader hardening: improved default XMLReader factory behavior and XXE prevention.
    • DOMLoader: prevents entity expansion (note: this may be behavior-changing in some XML inputs).

Improvements

  • Documentation updates: improved XMLLoader Javadoc with clearer guidance on usage and thread-safety expectations.

Refactoring & Maintenance

  • Extracted LoadSession from DOMLoader to improve separation of concerns and modularity.
  • XMLEventLoader now reuses an existing textTokenizer when available.
  • Added support for a custom XMLReader factory in SAXLoader.

Potential Breaking Changes / Migration Notes

  • DOMLoader entity expansion disabled
    • If you previously relied on expanded entities during DOM loading, behavior may differ. Consider adjusting input XML or parser configuration accordingly.
  • Matching tokens now sourced from to
    • If downstream logic implicitly depended on the old from-sourced behavior, validate matching results after upgrading.

Full Changelog: v1.3.1...v1.3.2

Release 1.3.1

16 Feb 10:28

Choose a tag to compare

Highlights

  • Equality-aware diffing: Algorithms can now accept a custom equality function, enabling comparisons that go beyond default element equality (e.g., case-insensitive matching or domain-specific equivalence).
  • Improved test coverage: Expanded and unified equality-based testing across algorithms to ensure consistent behavior.

Added

  • Algorithm support for an equality function to customize how items are considered equal during comparisons.
  • New unit tests for Actions to increase coverage and validate behavior.
  • Additional unit tests for BasicEqualityAlgorithmTest, including integrating equality-focused test cases across all algorithms.

Changed

  • MyersGreedyAlgorithm constructor is now public, making it easier to instantiate and use directly in external code.
  • ActionsBuffer now uses generics, simplifying type usage and reducing boilerplate; related tests were simplified accordingly.
  • BasicEqualityAlgorithmTest now uses case-insensitive equality, improving robustness for text comparisons where case should be ignored.

Fixed / Maintenance

  • Removed unused imports to keep the codebase clean and reduce warnings:
    • TextOnlyProcessor from Profilers
    • IOException from FormatComparisonTest
    • Arrays from AttributeXMLTokenTest

Notes for Upgraders

  • If you instantiate MyersGreedyAlgorithm outside the package, the public constructor should remove previous access barriers.

Full Changelog: v1.3.0...v1.3.1

Release 1.3.0

16 Feb 10:23

Choose a tag to compare

Breaking changes

  • Removed deprecated getEvents method (BREAKING).
  • Removed deprecated getOpenElement method (BREAKING).
  • Aligned getValue behavior with XMLToken contract: removed getValue methods that returned null, which was inconsistent with the XMLToken interface (BREAKING).

Bug fixes

  • Fixed inconsistency where some getValue implementations returned null despite the XMLToken contract.

Improvements / Maintenance

Dependency & tooling updates

  • Bumped pso-xmlwriter to 1.1.1.
  • Updated JUnit and JReleaser versions.

Nullability & annotations

  • Replaced jetbrains-annotations dependency by jspecify.
  • Adopted jspecify @Nullable annotations across algorithm, action, and token packages.
  • Removed unused @ApiStatus.Experimental annotation from XMLEventBalancer.

API / docs

  • Added default getValue implementations for ElementToken.
  • Clarified usage documentation, fixed missing Javadoc and applied minor Javadoc improvements.

Code quality & refactoring

  • Simplified code and cleaned up warnings (including suppressions in Main and KumarRanganAlgorithm to match source-paper naming).
  • Replaced legacy collection helpers with modern factory methods
  • Test suite cleanup: improved naming consistency, removed unused imports, reduced redundant modifiers, and made constants static where appropriate.

Migration notes (for consumers)

  • If you used getEvents() or getOpenElement(), migrate to the supported replacements (the deprecated methods are now removed).
  • If you relied on getValue() returning null, update call sites to match the non-null/contract-consistent behavior (or handle absence via the new recommended approach in your API usage).

Full Changelog: v1.2.4...v1.3.0

Release 1.2.4

14 Jul 07:58

Choose a tag to compare

This release includes a bugfix for whitespace handling, improvements to the configuration handling, and several code quality enhancements.

Bugfixes

  • Whitespace Handling: Fixed an issue where leading spaces were not being stripped correctly in certain contexts within the component. This ensures proper handling of mixed content with inline elements.
    • Improved the whitespace context management with a more robust implementation of the method replaceByTrailing
    • Added additional test cases to validate the fix for mixed content scenarios

Improvements

  • Configuration Handling: Enhanced the DiffConfig class with proper equality and hashcode methods
    • Fixed the equals() method to correctly compare all configuration properties
    • Improved the hashCode() implementation for better performance and correctness
    • Added support for the allowDoctypeDeclaration property in equality checks

Full Changelog: v1.2.3...v1.2.4

Release 1.2.2

14 Jul 07:49

Choose a tag to compare

New Features

XML Balance Checking

  • XMLBalanceCheckFilter: Added a new filter to ensure that XML tokens are properly balanced in the DiffHandler processing pipeline
    • Tracks start and end element pairs to verify they match correctly
    • Detects and reports XML structure imbalances including extra or missing elements
    • Provides diagnostics through the method and detailed error reporting isBalanced()
    • Useful for validating XML integrity during difference operations

API Improvements

Enhanced XMLToken Interface

  • Improved Null Safety: Added annotations throughout the API to prevent null pointer exceptions @NotNull
  • Standardized Documentation: Enhanced Javadoc clarity with consistent descriptions and improved parameter/return value documentation
  • New Default Methods: Added the method to simplify whitespace detection across token types isWhitespace()
  • Clarified Contract: Improved documentation on equals/hashCode implementation requirements for better performance

Whitespace Handling

  • Simplified Whitespace Detection: Replaced custom isWhiteSpace methods with the standardized approach in both and token.isWhitespace()``WhitespaceStripper``ExtendedWhitespaceStripper
  • Enhanced Processing Logic: Improved the whitespace processing algorithm for more consistent results
  • Better Edge Case Handling: Added test cases for complex mixed content scenarios
  • Fixed Context Management: Implemented more robust context tracking for accurate whitespace preservation

Full Changelog: v1.2.1...v1.2.2

Release 1.2.1

27 Jun 02:19

Choose a tag to compare

New Features

XML Processing Improvements

  • XMLEventBalancer (Beta): Added new implementation to ensure balanced XML structure in DiffHandler operations. This experimental component ensures well-formed XML during diff operations by maintaining properly paired start and end elements.

Core Functionality

  • NoOpFilter: Added implementation for transparent operation forwarding in DiffHandler. This filter passes operations through without modifications, providing a clean way to chain handlers.

Maintenance and Improvements

Build System Enhancements

  • Replaced Maven publishing scripts with JReleaser configuration
  • Refactored build scripts to use centralized dependencies management
  • Updated wrapper scripts for compliance and robustness
  • Added SPDX license headers to script files
  • Improved JAVA_HOME validation in scripts

Documentation

  • Enhanced Javadoc for key methods and classes across the project
  • Added detailed parameter annotations to improve code clarity

Testing

  • Added unit tests for ExtendedWhitespaceStripper to verify various whitespace handling scenarios

Code Quality

  • Improved Maven publishing configuration to use assignment syntax for task descriptions and credentials
  • Removed unused import statements

Compatibility

This release maintains compatibility with Java 11 and later versions. The library continues to provide efficient differencing algorithms specifically optimized for XML structures.

Notes

The XMLEventBalancer is currently marked as experimental (beta) and subject to change in future releases.

Release 1.2.0

27 Jun 04:56

Choose a tag to compare

Breaking changes

Now requires Java 11

New Features

  • Document Tokens: Added StartDocumentToken and EndDocumentToken classes to represent XML document boundaries
  • Sequence Processing: Introduced SequenceProcessor interface with ExtendedWhitespaceStripper implementation for configurable whitespace handling in XML sequences
  • Similarity Metrics: Added new similarity measurement capabilities:
    • Implemented XMLElementSimilarity class with length-based boosting and child stream similarity
    • Added StreamSimilarity interface with Edit, Jaccard, and Cosine similarity implementations

Code Improvements

  • Dependency Updates:

    • Upgraded to Java 11 and configured toolchain for compatibility
    • Updated to Gradle 8.13 with improved distribution validation
    • Updated JUnit dependencies to use BOM (Bill of Materials) for version alignment
  • Refactoring:

    • Refactored SAXLoader for improved XML reader handling
    • Refactored XMLElement for better content handling
    • Renamed XMLElementSimilarity to ElementSimilarity with improved method names
    • Replaced SimilarityFunction with Similarity interface (old interface deprecated)
  • API Changes:

    • Deprecated getChildren method in XMLElement in favor of getContent
    • Deprecated setXMLReaderClass method
    • Removed debug flags for cleaner codebase
  • Null Safety:

    • Added @NotNull annotations to toXML method parameters
    • Added @NotNull annotations to NamespaceSet.add method parameters
    • Enhanced exception handling throughout the codebase

v1.1.2

13 Jun 22:55

Choose a tag to compare

Release 1.1.2

New Features

  • Wagner-Fischer Algorithm: Added similarity-based diffing using the Wagner-Fischer algorithm for improved text comparison capabilities
  • Whitespace Handling Utility: Added a new utility class to strip whitespace from specified list of elements

Bug Fixes

  • Fixed constructor to properly respect the namespace-aware parameter
  • Fixed potential bug in KumarRanganAlgorithm implementation

Code Improvements

  • Refactored NilToken to use a singleton pattern for better memory efficiency
  • Improved ElementToken and its default implementation
  • Added constructor and enhanced annotations in Sequence class
  • Refactored stack usage with Deque for better performance in isWellFormed method
  • Standardized static final field declarations across the codebase
  • Renamed "open" to "start" in XML element handling logic for improved clarity
  • Added private constructor to Actions utility class to prevent instantiation
  • Simplified empty checks by utilizing the isEmpty method

Documentation and Testing

  • Improved overall documentation with better comments and explanations
  • Added @version and @since tags for better version tracking
  • Added comprehensive unit tests for SimilarityWagnerFischerAlgorithm
  • Enhanced code with @Override and @NotNull annotations for better type safety

Full Changelog: 1.1.1...v1.1.2

Release 1.1.1

01 Mar 05:28

Choose a tag to compare

  • Fixes issues in PostXMLFixer where some element could be left unclosed.
  • Improved support for Unicode characters in TokenizerBySpaceWord

Release 1.1.0

01 Mar 05:22

Choose a tag to compare

The focus of this release was to address a number of security vulnerabilities, in particular XML eXternal Entity injection (XXE) in the code.

Although, XXE issues could easily be mitigated by filtering the XML input before submitting to diffx, we changed the default configuration to be secure by default and disabled loading external entities and DTDs as outlined in https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html

If for some reason, you need to use the DTD, you can set the allowDoctypeDeclaration boolean option to true in the DiffConfig.