Compare and diff two tabular datasets (CSV, Excel, JSON) in Java with zero runtime dependencies.
- ✅ Zero runtime dependencies — pure Java 11, no transitive classpath pollution
- 📄 Multi-format — CSV, TSV, XLSX, XLS, and JSON array inputs
- 🔑 Key-column row matching — diff by business key, not row position
- 🔢 Numeric tolerance — flag 0.5% price differences, ignore floating-point noise
- 📅 Date normalization —
2024-01-15==01/15/2024when enabled - 🔤 Case-insensitive comparison option
- 📊 HTML reports — beautiful inline-CSS diff report, zero dependencies
- 🗄️ JSON reports — machine-readable output for CI pipelines
- 🚀 Streaming mode — 100K-row files under 50MB heap
<dependency>
<groupId>io.github.chitralabs.schemamatch</groupId>
<artifactId>schemamatch-core</artifactId>
<version>1.0.0</version>
</dependency>implementation 'io.github.chitralabs.schemamatch:schemamatch-core:1.0.0'DiffResult result = SchemaMatcher.diff("baseline.csv", "actual.csv");
System.out.println(result.isIdentical()); // false
System.out.println(result.getRowDiffCount()); // 3SchemaMatcher.diff("before.xlsx", "after.xlsx")
.report("diff-report.html");DiffResult r = SchemaMatcher.options()
.keyColumn("customer_id") // match rows by key, not position
.tolerance(0.01) // 1% numeric tolerance
.ignoreCase(true) // case-insensitive string comparison
.diff("v1.csv", "v2.csv");
// Inspect column changes
r.getColumnDiffs().forEach(cd ->
System.out.println(cd.getChangeType() + ": " + cd.getActualColumnName()));
// Inspect row changes
r.getRowDiffs().forEach(rd -> {
System.out.println("Row " + rd.getRowIndex() + " [" + rd.getChangeType() + "]");
rd.getChangedValues().forEach(vd ->
System.out.println(" " + vd.getColumnName() + ": " +
vd.getBaselineValue() + " → " + vd.getActualValue()));
});SchemaMatcher.diff("expected.csv", "actual.csv").report("diff.json");
// Fails CI if jq '.rowDiffCount > 0' diff.json| Format | Extension | Notes |
|---|---|---|
| CSV | .csv |
RFC 4180, quoted fields, embedded commas |
| TSV | .tsv |
Tab-delimited variant |
| Excel | .xlsx |
Requires Apache POI on classpath |
| Excel | .xls |
Legacy format, requires Apache POI |
| JSON | .json |
Top-level array of objects |
For Excel support, add Apache POI to your own pom.xml:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.5</version>
</dependency>| File Size | Mode | Heap | Time |
|---|---|---|---|
| 10K rows | Standard | < 20MB | < 1s |
| 100K rows | Streaming | < 50MB | ~3s |
| 1M rows | Streaming | < 50MB | ~25s |
Enable streaming for large files:
SchemaMatcher.options().streaming(5000).diff("huge.csv", "huge2.csv");- chitralabs/sheetz — Excel/CSV processing library
- schemamatch-examples — runnable demos
- schemamatch-benchmarks — JMH benchmarks
Apache License 2.0 — see LICENSE.
© 2026 Chitrapradha Ganesan — github.com/chitralabs