Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6dacef3
Rewrite XML parser with tokenizer and XPath
joshday Mar 5, 2026
97384c3
remove dead code
joshday Mar 5, 2026
1844b16
more test files
joshday Mar 5, 2026
b6f4d47
Add validation tests and remove legacy DTD/raw code
joshday Mar 5, 2026
21f647d
Update CI actions and add validation tests
joshday Mar 5, 2026
c673427
update ci
joshday Mar 5, 2026
46c5a31
Add XMark benchmark generator and expand benchmarks
joshday Mar 5, 2026
33bcf35
Add LazyNode type and StringViews extension
joshday Mar 6, 2026
d011424
Refactor simple_value checks and use direct attrs iteration
joshday Mar 6, 2026
754f8fa
Refactor tokenizer into XMLTokenizer and add LazyNode
joshday Mar 6, 2026
8483fed
Add benchmarks, StringViews tests, simplify XML module
joshday Mar 7, 2026
eb5caeb
Add GC.gc before tmpfile cleanup for Windows
joshday Mar 7, 2026
b914bfe
Bump version to v0.4.0
joshday Mar 7, 2026
d76c484
Use mktempdir for temp file cleanup in StringViews tests
joshday Mar 7, 2026
41836ae
Remove StringViews extension and simplify tokenizer
joshday Mar 8, 2026
b670267
Replace printstyled with print in show methods
joshday Mar 8, 2026
4a728ee
Revamp benchmarks and expand test suite
joshday Mar 9, 2026
2f71f9a
Add Attributes type and performance optimizations
joshday Apr 2, 2026
6c4e8f3
Add sourcetext, write, eachchildnode for LazyNode
joshday Apr 9, 2026
45137a8
Rename tokenizer internals and fix DOCTYPE comment underflow
joshday Apr 23, 2026
5133cd9
Return SubString views from LazyNode accessors
joshday Apr 23, 2026
4e08bf3
Pretty-print only when children are pure elements
joshday Apr 23, 2026
e7e21a7
Refresh benchmark result snapshots on Julia 1.12.6
joshday Apr 23, 2026
60725db
Namespace token kinds and document API
joshday May 15, 2026
9d129b8
Add LazyNode perf APIs and XLSX-pattern benchmarks
joshday May 15, 2026
fb583c4
Refresh XLSX-pattern benchmark snapshot
joshday May 15, 2026
cfc1f81
Add AbstractTrees package extension
joshday May 15, 2026
895e994
Use byte-level Base.write in XML serializer
joshday May 15, 2026
ff84960
Skip unescape scan when tokenizer saw no entities
joshday May 15, 2026
18d88b1
Use findnext for tokenizer text/attr scans
joshday May 15, 2026
b790e85
Refresh benchmark snapshot and README bars
joshday May 15, 2026
a93b9a0
Wire Token.has_entities into LazyNode read path
joshday May 15, 2026
e532a28
Add end-to-end XLSX hot-loop benchmarks
joshday May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- os: macOS-latest
arch: x86
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.version }}
Expand All @@ -41,9 +41,13 @@ jobs:
${{ runner.os }}-test-${{ env.cache-name }}-
${{ runner.os }}-test-
${{ runner.os }}-
- uses: actions/cache@v4
with:
path: test/data/w3c
key: w3c-xmlconf-v20130923
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
- uses: codecov/codecov-action@v5
with:
file: lcov.info
files: lcov.info
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
*Manifest.toml
*generated_xsd.jl
*.xml
*.gz
*.tar
*.DS_Store
*.claude
test/data/w3c/
benchmarks/data/
158 changes: 158 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Changelog

All notable changes to XML.jl will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- New streaming tokenizer (`XMLTokenizer` module) for fine-grained XML token iteration.
- XPath support via `xpath(node, path)`.
- `test/test_libxml2_testcases.jl`: 243 test cases borrowed from the [libxml2](https://github.com/GNOME/libxml2) test suite covering CDATA, comments, processing instructions, attributes, namespaces, DTD internal subsets, entity references, whitespace handling, Unicode, error cases, and real-world document patterns.
- `AbstractTrees` package extension: loading both `XML` and `AbstractTrees` enables `print_tree`, `PreOrderDFS`, `Leaves`, etc. on `Node` and `LazyNode`.

### Fixed
- **Tokenizer: multi-byte UTF-8 in attribute values** — Parsing attribute values containing multi-byte UTF-8 characters (e.g., `<doc city="東京"/>`) could produce a `StringIndexError` because `attr_value()` used byte arithmetic (`ncodeunits - 1`) instead of `prevind` to strip quotes. The same issue existed in `_read_attr_value!`.
- **Tokenizer: quotes inside DTD comments** — A `"` or `'` character inside a `<!-- -->` comment within a DTD internal subset caused the tokenizer to misinterpret it as a quoted string delimiter, leading to an "Unterminated quoted string" error. The DOCTYPE body parser now correctly skips comment content.

## [0.3.8]

### Fixed
- `XML.write` now respects `xml:space="preserve"` and suppresses indentation for elements with this attribute ([#49]).

## [0.3.7]

### Fixed
- Resolved remaining issues from [#45] and fixed [#46] (whitespace preservation edge cases) ([#47]).

## [0.3.6]

### Added
- `XML.write` respects `xml:space="preserve"` on elements, suppressing automatic indentation ([#45]).

### Fixed
- `String` type ambiguity on Julia nightly resolved ([#38]).

## [0.3.5]

### Fixed
- `depth` and `parent` functions corrected to work properly with the DOM tree API ([#37]).
- `escape` updated to no longer be idempotent — every `&` is now escaped, matching spec behavior ([#32], addressing [#31]).
- `pushfirst!` support added for `Node` children ([#29]).

## [0.3.4]

### Fixed
- Fixed [#26].
- CI updated to use `julia-actions/cache@v4` and `lts` Julia version.

## [0.3.3]

### Added
- `h` constructor for concise element creation (e.g., `h.div("hello"; class="main")`).

### Fixed
- Path definition error in README example ([#20]).

## [0.3.2]

### Fixed
- Minor typos.

## [0.3.1]

### Added
- Julia 1.6 compatibility ([#16]).

### Changed
- Smarter escaping logic.

## [0.3.0]

### Changed
- Attribute internal representation changed from `Dict` to `OrderedDict` (later reverted to `Vector{Pair}`).

## [0.2.3]

### Fixed
- Parse method fix.

## [0.2.2]

### Added
- DTD parsing via `parse_dtd`.
- `is_simple` and `simple_value` exports.
- `setindex!` methods for modifying attributes.
- `unescape` function.

### Fixed
- DOCTYPE parsing made case-insensitive.

## [0.2.1]

### Fixed
- Write output fixes.

## [0.2.0]

### Changed
- Major rewrite: introduced `NodeType` enum, `Node{S}` parametric struct, callable `NodeType` constructors, and `XML.write`.
- Processing instruction support.
- Benchmarks added.

## [0.1.3]

### Changed
- Improved print output for `AbstractXMLNode`.

## [0.1.2]

### Added
- AbstractTrees 0.4 compatibility ([#5]).

## [0.1.1]

### Added
- `Node` implementation with `print_tree`.
- Color output in REPL display.
- Stopped stripping whitespace from text nodes.

## [0.1.0]

- Initial release.

[Unreleased]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.8...HEAD
[0.3.8]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.7...v0.3.8
[0.3.7]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.6...v0.3.7
[0.3.6]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.5...v0.3.6
[0.3.5]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.4...v0.3.5
[0.3.4]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.3...v0.3.4
[0.3.3]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.2...v0.3.3
[0.3.2]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.1...v0.3.2
[0.3.1]: https://github.com/JuliaComputing/XML.jl/compare/v0.3.0...v0.3.1
[0.3.0]: https://github.com/JuliaComputing/XML.jl/compare/v0.2.3...v0.3.0
[0.2.3]: https://github.com/JuliaComputing/XML.jl/compare/v0.2.2...v0.2.3
[0.2.2]: https://github.com/JuliaComputing/XML.jl/compare/v0.2.1...v0.2.2
[0.2.1]: https://github.com/JuliaComputing/XML.jl/compare/v0.2.0...v0.2.1
[0.2.0]: https://github.com/JuliaComputing/XML.jl/compare/v0.1.3...v0.2.0
[0.1.3]: https://github.com/JuliaComputing/XML.jl/compare/v0.1.2...v0.1.3
[0.1.2]: https://github.com/JuliaComputing/XML.jl/compare/v0.1.1...v0.1.2
[0.1.1]: https://github.com/JuliaComputing/XML.jl/compare/v0.1.0...v0.1.1
[0.1.0]: https://github.com/JuliaComputing/XML.jl/releases/tag/v0.1.0

[#5]: https://github.com/JuliaComputing/XML.jl/pull/5
[#16]: https://github.com/JuliaComputing/XML.jl/pull/16
[#20]: https://github.com/JuliaComputing/XML.jl/pull/20
[#26]: https://github.com/JuliaComputing/XML.jl/issues/26
[#29]: https://github.com/JuliaComputing/XML.jl/pull/29
[#31]: https://github.com/JuliaComputing/XML.jl/issues/31
[#32]: https://github.com/JuliaComputing/XML.jl/pull/32
[#37]: https://github.com/JuliaComputing/XML.jl/pull/37
[#38]: https://github.com/JuliaComputing/XML.jl/pull/38
[#43]: https://github.com/JuliaComputing/XML.jl/issues/43
[#45]: https://github.com/JuliaComputing/XML.jl/pull/45
[#46]: https://github.com/JuliaComputing/XML.jl/issues/46
[#47]: https://github.com/JuliaComputing/XML.jl/pull/47
[#49]: https://github.com/JuliaComputing/XML.jl/pull/49
14 changes: 8 additions & 6 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
name = "XML"
uuid = "72c71f33-b9b6-44de-8c94-c961784809e2"
version = "0.4.0"
authors = ["Josh Day <emailjoshday@gmail.com> and contributors"]
version = "0.3.8"

[deps]
Mmap = "a63ad114-7e13-5084-954f-fe012c677804"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
[weakdeps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"

[extensions]
XMLAbstractTreesExt = "AbstractTrees"

[compat]
OrderedCollections = "1.4, 1.5"
julia = "1.6"
AbstractTrees = "0.4"
julia = "1.9"
Loading
Loading